sae-lens5.9.2
sae-lens5.9.2
Published
Training and Analyzing Sparse Autoencoders (SAEs)
pip install sae-lens
Package Downloads
Authors
Project URLs
Requires Python
<4.0,>=3.10
Dependencies
- automated-interpretability
<1.0.0,>=0.0.5
- babe
<0.0.8,>=0.0.7
- datasets
<3.0.0,>=2.17.1
- mamba-lens
<0.0.5,>=0.0.4; extra == "mamba"
- matplotlib
<4.0.0,>=3.8.3
- matplotlib-inline
<0.2.0,>=0.1.6
- nltk
<4.0.0,>=3.8.1
- plotly
<6.0.0,>=5.19.0
- plotly-express
<0.5.0,>=0.4.1
- pytest-profiling
<2.0.0,>=1.7.0
- python-dotenv
<2.0.0,>=1.0.1
- pyyaml
<7.0.0,>=6.0.1
- pyzmq
==26.0.0
- safetensors
<0.5.0,>=0.4.2
- simple-parsing
<0.2.0,>=0.1.6
- transformer-lens
<3.0.0,>=2.0.0
- transformers
<5.0.0,>=4.38.1
- typer
<0.13.0,>=0.12.3
- typing-extensions
<5.0.0,>=4.10.0
- zstandard
<0.23.0,>=0.22.0
SAE Lens
SAELens exists to help researchers:
- Train sparse autoencoders.
- Analyse sparse autoencoders / research mechanistic interpretability.
- Generate insights which make it easier to create safe and aligned AI systems.
Please refer to the documentation for information on how to:
- Download and Analyse pre-trained sparse autoencoders.
- Train your own sparse autoencoders.
- Generate feature dashboards with the SAE-Vis Library.
SAE Lens is the result of many contributors working collectively to improve humanity's understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence.
This library is maintained by Joseph Bloom, Curt Tigges, Anthony Duong and David Chanin.
Loading Pre-trained SAEs.
Pre-trained SAEs for various models can be imported via SAE Lens. See this page in the readme for a list of all SAEs.
Tutorials
- SAE Lens + Neuronpedia
- Loading and Analysing Pre-Trained Sparse Autoencoders
- Understanding SAE Features with the Logit Lens
- Training a Sparse Autoencoder
Join the Slack!
Feel free to join the Open Source Mechanistic Interpretability Slack for support!
Citation
Please cite the package as follows:
@misc{bloom2024saetrainingcodebase,
title = {SAELens},
author = {Bloom, Joseph and Tigges, Curt and Duong, Anthony and Chanin, David},
year = {2024},
howpublished = {\url{https://github.com/jbloomAus/SAELens}},
}