Circuit-Tracer: A New Library for Finding Feature Circuits
| Authors |
|
|---|---|
| Publication date | 2025 |
| Host editors |
|
| Book title | The 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP |
| Book subtitle | BlackboxNLP 2025 : proceedings of the workshop: November 9, 2025 |
| ISBN |
|
| Event | 8th BlackboxNLP Workshop |
| Pages (from-to) | 239-249 |
| Number of pages | 11 |
| Publisher | Kerrville, TX: Association for Computational Linguistics |
| Organisations |
|
| Abstract |
Feature circuits aim to shed light on LLM behavior by identifying the features that are causally responsible for a given LLM output, and connecting them into a directed graph, or *circuit*, that explains how both each feature and each output arose. However, performing circuit analysis is challenging: the tools for finding, visualizing, and verifying feature circuits are complex and spread across multiple libraries.To facilitate feature-circuit finding, we introduce `circuit-tracer`, an open-source library for efficient identification of feature circuits. `circuit-tracer` provides an integrated pipeline for finding, visualizing, annotating, and performing interventions on such feature circuits, tested with various model sizes, up to 14B parameters. We make `circuit-tracer` available to both developers and end users, via integration with tools such as Neuronpedia, which provides a user-friendly interface.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.18653/v1/2025.blackboxnlp-1.14 |
| Downloads |
2025.blackboxnlp-1.14
(Final published version)
|
| Permalink to this page | |