An AI auditor in the top 6 at Sherlock: what it means for smart contract security

September 2025 · Alexandra Gulamova

In September 2025, Savant.chat ranked top 6 in a public Sherlock DeFi audit contest — competing directly against dozens of expert human auditors. To our knowledge, it was the first time an AI system publicly performed on par with senior human auditors in a fiercely competitive environment.

Why a contest, not a benchmark

Benchmarks are useful, and we track them closely — Savant scores 82% on EVMbench, the best result we know of on the market. But benchmarks are built from known vulnerabilities. A live audit contest is different: fresh code, real economic incentives, and human competitors who have every reason to find what you missed. There is no way to overfit to it.

What actually made the difference

Multi-model orchestration. Savant is not a wrapper around a single LLM. It combines several Tier-1 models to generate vulnerability hypotheses, trace code execution and filter out false positives.
Proprietary training data. The system is built on a dataset of 20,000+ analyzed vulnerabilities, from which we extract optimized security skills for our agents.
False-positive discipline. The single biggest reason teams abandon AI security tools is noise. Ranking in a contest requires findings that are real, reproducible and clearly explained — exactly what we optimize for.

What it does not mean

It does not mean human auditors are obsolete. Our own recommendation to clients is a layered stack: AI in CI/CD during development, an AI deep audit as preparation, and a human audit before mainnet for high-value code. What the result does mean is that the AI layer is no longer optional — it now catches, at a fraction of the cost and time, a large share of what only humans could catch before.

The people who built DeFi — 1inch, Lido, BGD Labs, TON — already trust Savant with their own code. The Sherlock result is public evidence of why.