Blog Posts (on matharena.ai)

MathArena Apex: Unconquered Final-Answer Problems
With Flying Colors: Language Models Ace IMC 2025
Not Even Bronze: Evaluating LLMs on IMO 2025

Publications

2025

MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Mislav Balunović, Jasper Dekoninck, Nikola Jovanović, Ivo Petrov, Martin Vechev
NeurIPS Datasets and Benchmarks 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
Johannes Schmitt, Gergely Bérczi, Jasper Dekoninck, Jeremy Feusi, Tim Gehrunger, Raphael Appenzeller, Jim Bryan, Niklas Canova, Timo de Wolff, Filippo Gaia, Michel van Garrel, Baran Hashemi, David Holmes, Aitor Iribar Lopez, Victor Jaeck, Martina Jørgensen, Steven Kelk, Stefan Kuhlmann, Adam Kurpisz, Chiara Meroni, Ingmar Metzler, Martin Möller, Samuel Muñoz-Echániz, Robert Nowak, Georg Oberdieck, Daniel Platt, Dylan Possamaï, Gabriel Ribeiro, Raúl Sánchez Galán, Zheming Sun, Josef Teichmann, Richard P. Thomas, Charles Vial
ArXiv 2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov, Jasper Dekoninck, Martin Vechev
ArXiv 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev
AI4Math@ICML 2025
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Jasper Dekoninck, Ivo Petrov, Kristian Minchev, Mislav Balunovic, Martin Vechev, Miroslav Marinov, Maria Drencheva, Lyuba Konova, Milen Milenov Shumanov, Kaloyan Tsvetkov, Nikolay Drenchev, Lazar D. Todorov, Kalina Nikolova, Nikolay Georgiev, Vanesa Kalinkova, Margulan Ismoldayev
AI4Math@ICML 2025
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Mislav Balunović*, Jasper Dekoninck*, Nikola Jovanović, Ivo Petrov, Martin Vechev
ICML 2025 * Equal contribution

2024

Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning
Timofey Fedoseev, Dimitar I. Dimitrov, Timon Gehr, Martin Vechev
Workshop on Mathematical Reasoning, NeurIPS 2024