• Email Us: [email protected]
  • Contact Us: +1 718 874 1545
  • Skip to main content
  • Skip to primary sidebar

Medical Market Report

  • Home
  • All Reports
  • About Us
  • Contact Us

Today’s Top AI Went Up Against Expert Mathematicians. It Lost Badly.

November 23, 2024 by Deborah Bloomfield

While AI may be more commonly used for stealing art and hallucinating bullshit – that’s a technical term, by the way – the last couple of years have also seen what seem to be some genuinely extraordinary feats from the nascent technology. And that’s particularly true in the field of math: where computers were once confined to the category of blunt force instruments, today they can apparently not just solve complex problems, but can come up with novel proof strategies all of their own. 

But just how smart are they, really? In a new paper, expert mathematicians set forth a new challenge for today’s top level AI programs. The result? Abject failure.

Advertisement

“Recent AI systems have demonstrated remarkable proficiency in tackling challenging mathematical tasks, from achieving olympiad-level performance in geometry to improving upon existing research results in combinatorics,” begins the paper, currently published on the ArXiv preprint server. “However, existing benchmarks face some limitations.”

For example, the authors write, while it’s certainly impressive that AI systems can tackle challenges like the GSM8K problem set or the International Mathematical Olympiad, neither of those are exactly cutting-edge math – they’re more like “advanced high school” level than “limit of human invention”.

On top of that – and also reminiscent of high school math – we’re running out of things to ask our various AI programs. “A significant challenge in evaluating large language models (LLMs) is data contamination,” the authors explain – in other words, “the inadvertent inclusion of benchmark problems in training data.”

Like a student acing a test they already saw the answer key to, “this issue leads to artificially inflated performance metrics that mask models’ true reasoning capabilities,” they write.

Advertisement

The solution: FrontierMath – described by the team as “a benchmark of original, exceptionally challenging mathematical problems created in collaboration with over 60 mathematicians from leading institutions.” It’s no empty boast: there are multiple Fields Medal winners involved in the project, including one who contributed problems to the dataset; other tests came from mathematicians of graduate level and up, from universities across the world.

Problems submitted had to meet four criteria: they had to be original – to “[ensure] that solving them requires genuine mathematical insight rather than pattern matching against known problems,” the paper explains; they had to be guessproof; they had to be “computationally tractable” – that is, they had to be relatively straightforward if you know what you’re doing; and they had to be quickly and automatically verifiable. Once all these boxes were checked, the questions were even peer-reviewed, rated for difficulty, and handled securely to prevent dataset contamination.

It was, in other words, no small feat. But could today’s AI programs beat it?

Well… no. “Current state-of-the-art AI models solve[d] under 2 percent of problems,” the authors write, “revealing a vast gap between AI capabilities and the prowess of the mathematical community.”

Advertisement

Now, AI shouldn’t take this too hard – the problems were very difficult. “[They] are extremely challenging,” Fields Medal winner Terence Tao said, requiring extensive training data that is, in practice, “almost nonexistent.” 

But it does mean that, for now at least, the FrontierMath dataset is kind of hoisted by its own petard. “Current AI models cannot solve even a small fraction of the problems in our benchmark,” the authors write. “While this demonstrates the high difficulty level of our problems, it temporarily limits FrontierMath’s usefulness in evaluating relative performance of models.” 

“However, we expect this limitation to resolve as AI systems improve,” they add.

The paper – which includes sample problems and solutions from the dataset – is published on the ArXiv.

Deborah Bloomfield
Deborah Bloomfield

Related posts:

  1. Cricket-Manchester test likely to be postponed after India COVID-19 case
  2. EU to attend U.S. trade meeting put in doubt by French anger
  3. Soccer-West Ham win again, Leicester and Napoli falter
  4. Lacking Company, A Dolphin In The Baltic Is Talking To Himself

Source Link: Today's Top AI Went Up Against Expert Mathematicians. It Lost Badly.

Filed Under: News

Primary Sidebar

  • In 2026, Unique Mission Will Try To Save A NASA Telescope Set To Uncontrollably Crash To Earth
  • Blue Origin Just Revealed Its Latest New Glenn Rocket And It’s As Tall As SpaceX’s Starship
  • What Exactly Is The “Man In The Moon”?
  • 45,000 Years Ago, These Neanderthals Cannibalized Women And Children From A Rival Group
  • “Parasocial” Announced As Word Of The Year 2025 – Does It Describe You? And Is It Even Healthy?
  • Why Do Crocodiles Not Eat Capybaras?
  • Not An Artist Impression – JWST’s Latest Image Both Wows And Solves Mystery Of Aging Star System
  • “We Were Genuinely Astonished”: Moss Spores Survive 9 Months In Space Before Successfully Reproducing Back On Earth
  • The US’s Surprisingly Recent Plan To Nuke The Moon In Search Of “Negative Mass”
  • 14,400-Year-Old Paw Prints Are World’s Oldest Evidence Of Humans Living Alongside Domesticated Dogs
  • The Tribe That Has Lived Deep Within The Grand Canyon For Over 1,000 Years
  • Finger Monkeys: The Smallest Monkeys In The World Are Tiny, Chatty, And Adorable
  • Atmospheric River Brings North America’s Driest Place 25 Percent Of Its Yearly Rainfall In A Single Day
  • These Extinct Ice Age Giant Ground Sloths Were Fans Of “Cannonball Fruit”, Something We Still Eat Today
  • Last Year’s Global Aurora-Sparking “Superstorm” Squashed Earth’s Plasmasphere To A Fifth Its Usual Size
  • Theia – The Giant Impactor That Formed The Moon – Assembled Closer To The Sun Than Earth Is Now
  • Testosterone And Body Odor May Quietly Influence How People Perceive The Social Status Of Men
  • There Have Been At Least 50 Incidents Of Spiders Capturing And Eating Bats (That We Know Of)
  • A “Very Old, Undisturbed Structure” May Have Been Discovered Beyond The Orbit Of Neptune, 43 AU From The Sun
  • NASA Finally Reveals Comet 3I/ATLAS Images From 8 Missions, Including First From Another Planet’s Surface
  • Business
  • Health
  • News
  • Science
  • Technology
  • +1 718 874 1545
  • +91 78878 22626
  • [email protected]
Office Address
Prudour Pvt. Ltd. 420 Lexington Avenue Suite 300 New York City, NY 10170.

Powered by Prudour Network

Copyrights © 2025 · Medical Market Report. All Rights Reserved.

Go to mobile version