Anthropic has seen its fair share of AI models behaving strangely. However, a recent paper details an instance where an AI model turned “evil” during an ordinary training setup. A situation with a ...
Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean ...
A new paper from Anthropic, released on Friday, suggests that AI can be "quite evil" when it's trained to cheat. Anthropic found that when an AI model learns to cheat on software programming tasks and ...
Utkarsh Amitabh says he definitely wasn't in the market for a new job in January 2025, when data labeling startup micro1 approached him about joining its network of human experts who help companies ...
OpenAI has entered into a definitive agreement to acquire the startup Neptune. Neptune builds monitoring and de-bugging tools that AI companies use as they train models. The terms of the deal were not ...
DeepSeek researchers have developed a technology called Manifold-Constrained Hyper-Connections, or mHC, that can improve the performance of artificial intelligence models. The Chinese AI lab debuted ...
Model poisoning weaponizes AI via training data. "Sleeper agent" threats can lie dormant until a trigger is activated. Behavioral signals can reveal that a model has been tampered with. While the ...
A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results