Downes.ca ~ Stephen's Web ~ On the Biology of a Large Language Model

On the Biology of a Large Language Model

Jack Lindsey, et al., Transformer Circuits Thread, Mar 28, 2025
Commentary by Stephen Downes

You may have seen a recent MIT Technology Review article (archive) on 'circuit tracing'. This (and a companion paper) is the actual research, and unlike the MIT article, it's not behind a paywall. The idea is to reverse engineer the model created and used by Anthropic to find collections of 'circuits' that correspond to specific functions or processes. You might think of it as cognitive psychology for machines. To be clear: these 'circuits' aren't programmed into the model; they emerge as a result of the training. And they're not just a specific set of neural connections. You have to extract or 'trace' the circuit because "model neurons are often polysemantic - representing a mixture of many unrelated concepts," as illustrated in the image.

Today: Total: [Direct link] [Share]

View full size