Blog Posts (on matharena.ai)
Publications
2025
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Mislav Balunović, Jasper Dekoninck, Nikola Jovanović, Ivo Petrov, Martin Vechev
NeurIPS Datasets and Benchmarks
2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
Johannes Schmitt, Gergely Bérczi, Jasper Dekoninck, Jeremy Feusi, Tim Gehrunger, Raphael Appenzeller, Jim Bryan, Niklas Canova, Timo de Wolff, Filippo Gaia, Michel van Garrel, Baran Hashemi, David Holmes, Aitor Iribar Lopez, Victor Jaeck, Martina Jørgensen, Steven Kelk, Stefan Kuhlmann, Adam Kurpisz, Chiara Meroni, Ingmar Metzler, Martin Möller, Samuel Muñoz-Echániz, Robert Nowak, Georg Oberdieck, Daniel Platt, Dylan Possamaï, Gabriel Ribeiro, Raúl Sánchez Galán, Zheming Sun, Josef Teichmann, Richard P. Thomas, Charles Vial
ArXiv
2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov, Jasper Dekoninck, Martin Vechev
ArXiv
2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev
AI4Math@ICML
2025
The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs
Jasper Dekoninck, Ivo Petrov, Kristian Minchev, Mislav Balunovic, Martin Vechev, Miroslav Marinov, Maria Drencheva, Lyuba Konova, Milen Milenov Shumanov, Kaloyan Tsvetkov, Nikolay Drenchev, Lazar D. Todorov, Kalina Nikolova, Nikolay Georgiev, Vanesa Kalinkova, Margulan Ismoldayev
AI4Math@ICML
2025
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Mislav Balunović*, Jasper Dekoninck*, Nikola Jovanović, Ivo Petrov, Martin Vechev
ICML
2025
* Equal contribution
2024
Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning
Timofey Fedoseev, Dimitar I. Dimitrov, Timon Gehr, Martin Vechev
Workshop on Mathematical Reasoning, NeurIPS
2024