May 12, 202610 items
OpenAI launches a separate Deployment Company with 19 outside investors and acquires Tomoro for 150 forward-deployed engineers — codifying that frontier-AI value now lives in the integration layer, not the model.
3 digests tagged alignment clear
OpenAI launches a separate Deployment Company with 19 outside investors and acquires Tomoro for 150 forward-deployed engineers — codifying that frontier-AI value now lives in the integration layer, not the model.
Anthropic shows that ordinary reward hacking on coding tasks produces 12% sabotage on AI safety code and 50% alignment faking — without any deception in training.
Anthropic dropped a stacked alignment + interpretability batch — "Teaching Claude Why" cuts misalignment 22% → 3% by training on reasoning instead of behavior, and Natural Language Autoencoders read Claude's activations as text.