This Topic is loosely mirrored on the LESSWRONG Forum on the AI Developer side. A notable post to the AI Alignment Forum by folks from Anthropic, Conjecture, etc., covers “Induction Heads”, as clusters that form in in large neural nets as they train, as emergent “circuit” structures hoped to provide sound basis for “mechanistic interpretability”. We may anticipate growing Induction Heads in the OODA framework presented here, to interface to ROS objects.
Especially relevant here is Conjecture’s independent parallel identification of the OODA Loop paradigm.
" Increasing automation elevates the importance of thinking about the ‘automated interpretability [OODA] in which we use models to help us interpret networks and decide which experiments or interventions to perform on them."
Current themes in mechanistic interpretability research — AI Alignment Forum
1 Like