Content-type: text/html Downes.ca ~ Stephen's Web ~ On the Biology of a Large Language Model

Stephen Downes

Knowledge, Learning, Community

You may have seen a recent MIT Technology Review article (archive) on 'circuit tracing'. This (and a companion paper) is the actual research, and unlike the MIT article, it's not behind a paywall. The idea is to reverse engineer the model created and used by Anthropic to find collections of 'circuits' that correspond to specific functions or processes. You might think of it as cognitive psychology for machines. To be clear: these 'circuits' aren't programmed into the model; they emerge as a result of the training. And they're not just a specific set of neural connections. You have to extract or 'trace' the circuit because "model neurons are often polysemantic - representing a mixture of many unrelated concepts," as illustrated in the image.

Today: Total: [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2025
Last Updated: Mar 31, 2025 6:57 p.m.

Canadian Flag Creative Commons License.

Force:yes