Seek No More: DeepSeek Has A New Benchmark In AI Math Scores

The International Mathematical Olympiad (IMO), being held annually since 1959, is widely regarded as the world’s most prestigious maths competition. It tests participants with problems that demand deep insight, creativity, and rigorous reasoning, according to Harvard AI researcher Huang Yichen and UCLA computer science professor Yang Lin.

Now, Chinese AI startup DeepSeek has made its Math-V2 model widely available, open-sourcing it on Hugging Face and GitHub under a permissive license that allows developers to adapt and repurpose the system, according to Bojan Stojkovski of Interesting Engineering.

Math-V2 has demonstrated gold-medal-level performance at the IMO, a feat requiring not just correct answers but also transparent reasoning behind them – a standard only about 8 per cent of human participants achieve.

The company says its Math-V2 model achieved gold-level scores on problems from both this year’s International Mathematical Olympiad and the 2024 Chinese Mathematical Olympiad. By open-sourcing the model, DeepSeek aims to lower barriers for researchers and developers eager to experiment with advanced AI capable of reasoning through high-level mathematical challenges, a domain traditionally dominated by proprietary systems, the South China Morning Post reported.

In a Hugging Face post, DeepSeek researchers emphasized that further developing AI’s mathematical capabilities could have a transformative impact on scientific research, from complex simulations to theoretical problem-solving.

They cautioned, however, that many of today’s AI systems have been primarily optimized to perform well on standard maths benchmarks, achieving high scores without necessarily improving the underlying reasoning and problem-solving abilities that drive real innovation.

To strengthen the rigour of its AI’s mathematical reasoning, DeepSeek focused on enabling the model to "self-verify" its answers, even for problems without pre-existing solutions, the researchers explained. This self-checking ability allows the AI to assess the consistency and validity of its reasoning, helping ensure that its conclusions are not only correct when known solutions exist, but also reliable when tackling novel or unsolved mathematical challenges.

DeepSeek’s approach tackles a longstanding limitation in AI development: most systems only show improvement on tasks where solutions can be easily verified. By enabling self-verifiable reasoning, the model can extend its capabilities to more complex, open-ended problems. The researchers noted that, although significant work remains, these results indicate that self-verifying mathematical reasoning is a promising research direction that could pave the way for more advanced and capable AI systems in mathematics and beyond.

After achieving gold at the International Mathematical Olympiad, Google DeepMind made its proprietary model accessible to subscribers of its premium Ultra plan, giving a select group of developers early access to the advanced AI. In contrast, OpenAI’s CEO Sam Altman announced that the company’s experimental model, which also earned a gold medal at the IMO, would remain unavailable to the public for many months, SCMP added.

At the same time, such moves highlight differing strategies among leading AI firms, with some opting for controlled access to protect intellectual property and ensure responsible use, while others focus on gradually broadening availability to researchers and developers.

DeepSeek Has A New Benchmark In AI Math Scores

0 comments

Post a Comment

About Us

Featured Post

AI Named Tilly Norwood Debuted As An Actress

Advance Technology

SME Business

Agri-Business

Privacy Policy

Contact Us

Popular Posts

Recent Comments

Blog Archive