Validation Layer: How Mage Ensures Precision
The hardest problem in AI contract review is not extraction. It is knowing whether the extraction is correct. At Mage, we built a dedicated validation layer that sits between raw AI output and the results you see, ensuring that what reaches your review queue is trustworthy.
The Challenge
Large language models are powerful at understanding contract language, but they are not infallible. A model might extract a clause that looks like a change-of-control provision but is actually a standard successors-and-assigns clause. It might identify the right provision but misattribute it to the wrong party. Or it might hallucinate a provision that does not exist in the document at all.
In legal work, these errors are not acceptable. An attorney who relies on an incorrect extraction could miss a critical risk or report a finding to a client that does not hold up under scrutiny. The standard for legal AI is not "usually right" but "reliably right, with clear indicators when it is uncertain."
Why single-pass extraction is not enough
A single model pass might achieve 90% accuracy on contract extraction. That sounds high until you realize that 10% of 300 contracts means 30 incorrect results mixed into your review queue with no way to distinguish them from correct ones. At deal speed, that is a significant reliability problem.
Our Approach: Multi-Stage Validation
Instead of relying on a single extraction pass, Mage uses a pipeline of specialized stages. Each stage has a distinct responsibility, and each acts as a quality gate for the next. The result is a system where errors from one stage are caught by the next.
Stage 1
Document Parsing
The document is parsed into structured sections. Tables, headings, defined terms, and cross-references are identified and preserved. This structured representation ensures that downstream models have clean input rather than raw text.
Stage 2
Primary Extraction
A specialized extraction model reads the structured document and identifies candidate provisions matching the query. This stage is tuned for high recall: it would rather flag a borderline clause than miss a genuine one.
Stage 3
Validation
A separate validation model reviews each candidate extraction against the source text. It asks: does the extracted provision genuinely answer the original question? Is the attribution correct? Is the text complete? Candidates that fail validation are filtered out.
Stage 4
Confidence Scoring
Validated results receive a confidence score based on multiple signals: extraction clarity, provision specificity, source text quality, and cross-reference consistency. This score determines how the result is presented to the reviewing attorney.
Source Verification
Every extraction in Mage links to the exact source text in the original document. This is not a summary or paraphrase; it is the verbatim text from the contract that the extraction was derived from, with page numbers and section references.
One-click verification
Click any cell in the extraction matrix to see the source passage highlighted in the original document. No searching, no scrolling. If the extraction does not match the source, you know immediately.
Cross-reference tracking
When a provision references another section, defined term, or exhibit, Mage follows those references and presents the full context. You see not just the clause, but everything it depends on.
Amendment awareness
If a provision has been amended, Mage surfaces both the original language and the amendment, flagging the discrepancy so you review the most current version.
Audit trail
Every extraction is logged with its source document, page reference, extraction timestamp, and confidence score. This audit trail supports the defensibility of your review process.
Confidence Scoring
Not all extractions are equally certain. A clearly stated change-of-control clause with explicit trigger language is a high-confidence extraction. A vague reference to "any transfer of interests" buried in a successors-and-assigns section is a lower-confidence match. Mage quantifies this distinction.
High confidence
The provision clearly and unambiguously matches the query. The source text is clean and the extraction is complete. These results appear at the top of the matrix and typically require no additional verification.
Medium confidence
The provision likely matches but the language is ambiguous, the source text quality is lower, or the provision is unusually structured. These results are worth reviewing but may not require action.
Low confidence
The system detected a possible match but is not confident. These results are grouped separately and presented as candidates for optional review. They are included for completeness rather than certainty.
Why this matters: Confidence scoring turns AI output from a binary (found / not found) into a spectrum. Attorneys can quickly review high-confidence results, selectively check medium-confidence results, and skip low-confidence results unless they have time for a thorough review.
How Mage Keeps You Organized
Tiered review queue
Results are organized by confidence level so you address high-certainty findings first and low-certainty candidates last
Inline source links
Every extraction links to the exact passage in the original document, making verification immediate
Exportable audit trail
Download a complete record of all extractions, confidence scores, and source references for your deal file