Discover and Prove: an open-source agentic framework pushes theorem proving into "Hard Mode"
What the paper introduces
A new arXiv preprint, "Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4" (arXiv:2604.15839), proposes an agentic, open-source system designed to tackle a stricter and more realistic version of automated theorem proving (ATP) the authors call "Hard Mode." Rather than accepting problems where the final result is embedded in the statement (a convention the paper dubs "Easy Mode"), Hard Mode requires the system to independently discover the target statement and then construct a formal proof in the Lean 4 proof assistant — the modern, performance-oriented successor to Lean developed for interactive theorem proving and formal verification.
How it works — and why it matters
The framework marries agentic planning and search with Lean 4's tactic and proof infrastructure, enabling autonomous exploration and proof synthesis rather than simple goal refinement. Why is that important? Because most contemporary ATP benchmarks give the solver a strong hint about what to prove, producing optimistic evaluations of capability. Hard Mode more closely mirrors the human challenge of discovery and proof, and so provides a sterner, more informative test of reasoning systems.
Results and provenance
The paper is a new submission on arXiv and has not been peer reviewed. The authors reportedly show that their agentic approach can discover and prove theorems in Lean 4 that standard Easy Mode pipelines struggle with, and they release code under an open-source license so others can reproduce and extend the work. It has been reported that the project leverages arXivLabs-style openness to encourage community contributions and rapid iteration.
Broader implications
Automating discovery as well as verification could accelerate formalization of mathematics and improve software and hardware assurance workflows. Could this shift change how research teams and industry approach verification? Possibly — and there are strategic angles too: more capable formal-verification tools matter for critical infrastructure and national tech stacks, so advances here will attract attention beyond academia. For now, the paper offers a concrete baseline and an open codebase for researchers wanting to push ATP out of Easy Mode and into the harder, more realistic territory of autonomous proof discovery.
