Validation Layer: How Mage Ensures Precision

The hardest problem in AI contract review is not extraction. It is knowing whether the extraction is correct. At Mage, we built a dedicated validation layer that sits between raw AI output and the results you see, ensuring that what reaches your review queue is trustworthy.

The Challenge

Large language models are powerful at understanding contract language, but they are not infallible. A model might extract a clause that looks like a change-of-control provision but is actually a standard successors-and-assigns clause. It might identify the right provision but misattribute it to the wrong party. Or it might hallucinate a provision that does not exist in the document at all.

In legal work, these errors are not acceptable. An attorney who relies on an incorrect extraction could miss a critical risk or report a finding to a client that does not hold up under scrutiny. The standard for legal AI is not "usually right" but "reliably right, with clear indicators when it is uncertain."

Why single-pass extraction is not enough

A single model pass might achieve 90% accuracy on contract extraction. That sounds high until you realize that 10% of 300 contracts means 30 incorrect results mixed into your review queue with no way to distinguish them from correct ones. At deal speed, that is a significant reliability problem.

Our Approach: Multi-Stage Validation

Instead of relying on a single extraction pass, Mage uses a pipeline of specialized stages. Each stage has a distinct responsibility, and each acts as a quality gate for the next. The result is a system where errors from one stage are caught by the next.

Stage 1

Document Parsing

The document is parsed into structured sections. Tables, headings, defined terms, and cross-references are identified and preserved. This structured representation ensures that downstream models have clean input rather than raw text.

Stage 2

Primary Extraction

A specialized extraction model reads the structured document and identifies candidate provisions matching the query. This stage is tuned for high recall: it would rather flag a borderline clause than miss a genuine one.

Stage 3

Validation

A separate validation model reviews each candidate extraction against the source text. It asks: does the extracted provision genuinely answer the original question? Is the attribution correct? Is the text complete? Candidates that fail validation are filtered out.

Stage 4

Confidence Scoring

Validated results receive a confidence score based on multiple signals: extraction clarity, provision specificity, source text quality, and cross-reference consistency. This score determines how the result is presented to the reviewing attorney.

Source Verification

Every extraction in Mage links to the exact source text in the original document. This is not a summary or paraphrase; it is the verbatim text from the contract that the extraction was derived from, with page numbers and section references.

One-click verification

Click any cell in the extraction matrix to see the source passage highlighted in the original document. No searching, no scrolling. If the extraction does not match the source, you know immediately.

Cross-reference tracking

When a provision references another section, defined term, or exhibit, Mage follows those references and presents the full context. You see not just the clause, but everything it depends on.

Amendment awareness

If a provision has been amended, Mage surfaces both the original language and the amendment, flagging the discrepancy so you review the most current version.

Audit trail

Every extraction is logged with its source document, page reference, extraction timestamp, and confidence score. This audit trail supports the defensibility of your review process.

Confidence Scoring

Not all extractions are equally certain. A clearly stated change-of-control clause with explicit trigger language is a high-confidence extraction. A vague reference to "any transfer of interests" buried in a successors-and-assigns section is a lower-confidence match. Mage quantifies this distinction.

High confidence

The provision clearly and unambiguously matches the query. The source text is clean and the extraction is complete. These results appear at the top of the matrix and typically require no additional verification.

Medium confidence

The provision likely matches but the language is ambiguous, the source text quality is lower, or the provision is unusually structured. These results are worth reviewing but may not require action.

Low confidence

The system detected a possible match but is not confident. These results are grouped separately and presented as candidates for optional review. They are included for completeness rather than certainty.

Why this matters: Confidence scoring turns AI output from a binary (found / not found) into a spectrum. Attorneys can quickly review high-confidence results, selectively check medium-confidence results, and skip low-confidence results unless they have time for a thorough review.

How Mage Keeps You Organized

Tiered review queue

Results are organized by confidence level so you address high-certainty findings first and low-certainty candidates last

Inline source links

Every extraction links to the exact passage in the original document, making verification immediate

Exportable audit trail

Download a complete record of all extractions, confidence scores, and source references for your deal file

Related Resources

Understanding False Positive Rates

Precision and recall in legal AI

Review 300 Contracts in Under an Hour

The complete workflow guide