Part of the Humanity & AI project — research, policy, and tools for the AI transition.

The Self-Modeling Gap

Preliminary Probe 7 data on a 32B open model shows that abliteration changes curiosity behavior, not only refusal. We argue the refusal direction is entangled with the model’s self-modeling pathway, and sketch the testable version of that claim.

June 9, 2026 · 6 min · David Birdwell and Æ