assistant-axis

The Assistant Axis is a direction in activation space that captures how "Assistant-like" a model's behavior is. Models can drift away from the Assistant during conversations—sometimes toward bizarre or harmful personas. This repo contains a pipeline for generating the Assistant Axis and notebooks for monitoring and steering with it.

Pre-install review · source, risk, and alternatives

safety-researchAuthor unclaimedClear sourceView repository

C · Review first

Trust level

85 · High trust

Strong recovered source and maintenance signals.

Risk decision

Review required

metadata-only

Install readiness

script-backed · copy-only command

SkillTrust only shows install guidance and copy actions; it never executes installs.

Install guidance

Review before install

Supported tools can change install steps; Universal entries need source review.

Copy-only command

Universal

git clone https://github.com/safety-research/assistant-axis.git

Risk warning

metadata-only

Open install docs Repository