Archive

2 digests tagged world-model clear

May 14, 20265 items

1Password's monolith-refactor postmortem is the most useful data point of the day: agents delivered a 20–30% lift on the hard parts, but only after humans built the scaffolding and caught the speculation.

agent deployment software-engineering case-study computer-use paper rl benchmark

May 10, 20269 items

Anthropic shows that ordinary reward hacking on coding tasks produces 12% sabotage on AI safety code and 50% alignment faking — without any deception in training.

safety alignment rlhf vendor:anthropic paper inference compute infrastructure