ai benchmark - Search News

Geekwire on MSN · 14h

Allen Institute for AI challenges DeepSeek on key benchmarks with big new open-source AI model

Amid the industry fervor over DeepSeek, the Seattle-based Allen Institute for AI (Ai2) released a significantly larger version of its Tülu 3 AI model, aiming to further advance the field of open-source artificial intelligence and demonstrate its own techniques for enhancing the capabilities of AI models.

Leave Deepseek, China’s new AI model Kimi k1.5 also surpasses ChatGPT in key benchmarks

Moonshot AI's Kimi k1.5 outperforms OpenAI's GPT-4o and Claude 3.5 Sonnet in key areas, showcasing superior multimodal abilities.

TechCrunch on MSN · 3d

DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarks

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that it claims performs as well as OpenAI’s o1 on certain AI benchmarks. R1 is available from the AI dev platform Hugging Face under an MIT license,

12hon MSN

Ai2 says its new AI model beats one of DeepSeek’s best

Move over, DeepSeek. Seattle-based nonprofit AI lab Ai2 has released a benchmark-topping model called Tulu3-405B.

3d

'Humanity's Last Exam' benchmark is stumping top AI models - can you do any better?

A new academic benchmark aims to 'test the limits of AI knowledge at the frontiers of human expertise.' So far, these LLMs ...

3don MSN

How DeepSeek achieved its AI breakthrough, Benchmark partner Chetan Puttagunta explains

Chinese AI startup DeepSeek is sending tech stocks plunging as the market digests what its cheaper and more efficient model ...

1d

Alibaba releases AI model it says surpasses DeepSeek

Max's release points to the pressure DeepSeek's meteoric rise in the past three weeks has placed on overseas rivals and ...

2d

Alibaba’s Qwen2.5-Max challenges U.S. tech giants, reshapes enterprise AI

Alibaba's Qwen2.5-Max AI model sets new performance benchmarks in enterprise-ready artificial intelligence, promising reduced ...

7don MSN

Even some of the best AI can’t beat this new benchmark

The nonprofit Center for AI Safety and Scale AI have released a challenging new benchmark for frontier AI systems.

1don MSN

AI battle heats up: Alibaba claims Qwen 2.5 Max outperforms Meta and DeepSeek in benchmarks

Alibaba claims that its new AI model, Qwen 2.5 Max, demonstrates superior performance over competitors like Meta's Llama and ...

1don MSN

DeepSeek: everything you need to know about the AI that dethroned ChatGPT

Chinese startup DeepSeek has been taking the AI industry by storm with a new chatbot rivaling ChatGPT and Gemini that uses a ...

Hosted on MSN6d

OpenAI Accused of Manipulating Benchmark Results as Chinese Models Close AI Performance Gap

It was recently revealed that OpenAI secretly funded and accessed data related to the FrontierMath AI benchmark. The ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results