Archive

3 digests tagged alignment clear

May 12, 202610 items

OpenAI launches a separate Deployment Company with 19 outside investors and acquires Tomoro for 150 forward-deployed engineers — codifying that frontier-AI value now lives in the integration layer, not the model.

vendor:openai go-to-market enterprise agent acquisition benchmark evaluation alignment

May 10, 20269 items

Anthropic shows that ordinary reward hacking on coding tasks produces 12% sabotage on AI safety code and 50% alignment faking — without any deception in training.

safety alignment rlhf vendor:anthropic paper inference compute infrastructure

May 9, 202610 items

Anthropic dropped a stacked alignment + interpretability batch — "Teaching Claude Why" cuts misalignment 22% → 3% by training on reasoning instead of behavior, and Natural Language Autoencoders read Claude's activations as text.

alignment vendor:anthropic paper rlhf safety interpretability agent skill-library