A curated list of resources for activation engineering
control concept transparent ai-safety interpretability large-language-models llm llm-aligment activation-engineering concept-rep concept-activation-vector
-
Updated
Apr 7, 2025