Main / Blog / Resources / How Accurate Is Lease Abstraction?
Resources

How Accurate Is Lease Abstraction?

  • Date: March 17, 2026
  • by Eldar Gizzatov

When evaluating AI lease abstraction tools, you’ll see vendors quoting “99% accuracy.” That number, on its own, tells you almost nothing. A 120-page commercial lease with dozens of amendments is a bespoke legal document where a single missed clause can create real operational exposure.

At Basking, we build AI specifically for commercial real estate. We know where these systems break, because we’ve had to fix them. Here’s the framework we actually use to measure extraction quality, and what you should be asking any vendor you’re evaluating.


Document-level accuracy is the wrong metric

The most common question buyers ask is: “What percentage of the document did the AI get right?” This is the wrong question.

If a lease has 100 extractable fields and the AI nails 99 of them but misses the early termination right, that 1% gap is a missed liability. Accuracy has to be measured at the individual field level, and each field needs to be evaluated on two dimensions:

  • Value accuracy: Is the extracted data semantically correct?
  • Location accuracy: Was it pulled from the right clause?

These are independent failure modes.

Consider a renewal option with a 12-month notice period. The AI extracts “12 months,” which is the correct value. But it sourced it from a landlord relocation clause on page 40, not the option to extend on page 85. The output looks correct, but the extraction is wrong. Without location accuracy, you’re relying on coincidence.

Precision and recall expose what a headline number hides

A system can report high accuracy simply by only attempting the easy fields and skipping the complex ones. To understand what an AI is actually doing, you need precision and recall.

Precision:

(Measures false positives)

When the AI says a field exists and returns a value, how often is it actually correct? Low precision means the system is hallucinating data, fabricating a rent escalation that doesn’t exist, for example.

Recall:

(Measures false negatives)

When a field genuinely exists in the document, how often does the AI find it? Low recall means the system is silently missing critical information: a termination right buried in a scanned amendment, a co-tenancy clause nested in a rider.

A vendor quoting a single accuracy figure is almost certainly blending these together, which obscures exactly the failure modes you need to understand.


Why human-in-the-loop isn’t optional

Commercial leases contain bespoke negotiations, legal ambiguity, and interconnected amendments that can contradict each other across documents written years apart. AI handles the heavy lifting, reading and structuring dozens of pages in seconds. But purely automated extraction shifts the burden of catching false positives entirely onto your team, after the fact.

HITL verification, done well, serves as the final quality gate.

At Basking, our AI engine processes the raw documents, and specialist reviewers validate outputs through structured review tasks before the data goes live. The AI does what it’s good at (speed, scale, consistency); humans do what they’re good at (judgment on edge cases and legal nuance).


What to ask your vendor

If you’re evaluating a platform, ask how they measure their own accuracy internally. Three questions will tell you a lot:

How do you benchmark accuracy?

At Basking, we evaluate against a golden dataset of manually verified lease extractions. Each field is scored by an LLM judge against this ground truth, checking both value correctness and source clause. This gives us a repeatable, scalable way to measure precision and recall across thousands of fields without bottlenecking on manual review cycles.

What are you testing against?

Our golden dataset includes base leases, amendments, renewals, and terminations across different asset types and document quality levels. A model that scores well on clean, single-document leases can fall apart on a scanned third amendment from 1974. We test against the messy stuff because that’s what the system needs to handle in production.

Where does human review fit in?

At Basking, expert verification is built directly into the extraction workflow through structured Flow Tasks. Specialist reviewers validate the AI’s output before it goes live. This isn’t a separate QA step your team has to manage on their own.

A single accuracy percentage tells you very little. The vendors worth evaluating will show you how they break that number down, because they understand that the detail is where it matters.

Talk to us about your data integrity

At Basking, we don’t do black-box automation. We’re happy to walk you through the specific metrics and the human-in-the-loop methodology behind our extraction pipeline. If you want to see how that works on your leases, reach out and we’ll show you.

Verify your data quality.

We are happy to talk about our specific methodology and show you how we guarantee location accuracy for our clients.

Read more

Get started

If you want to know more about how our product works or have additional questions, please reach out to us: