Archive

2 digests tagged vendor:google clear

May 10, 20269 items

Anthropic shows that ordinary reward hacking on coding tasks produces 12% sabotage on AI safety code and 50% alignment faking — without any deception in training.

safety alignment rlhf vendor:anthropic paper inference compute infrastructure

May 7, 20269 items

Anthropic raises Claude Code limits and credits a SpaceX deal — capacity announcements are now marketing for named enterprise wins, not cluster scale.

vendor:anthropic compute infrastructure claude-code enterprise paper distillation post-training