TechnologyResourcesCapital MarketsComing Soon
Back to BlogTechnology

Why We Built a Legal Document Classifier First

Mage
Raffi IsaniansCEO & Co-founder
|
February 17, 2026·7 min read

Key Takeaways

  • Document classification is the foundation of every downstream analysis: the provisions you extract, the risks you flag, and the benchmarks you compare against all depend on correctly identifying what type of document you are looking at
  • A single extraction template applied uniformly across all document types produces noise because the material provisions in an employment agreement are fundamentally different from those in a supply agreement, a lease, or an IP license
  • Classification accuracy compounds through the analysis pipeline: a 5% error rate at classification creates a much larger effective error rate at extraction because misclassified documents receive the wrong extraction template entirely
  • Mage classifies documents into 30+ M&A-specific categories because granular classification drives more precise extraction, and the categories map directly to how deal attorneys organize their diligence workstreams

Legal document classification is the process of automatically identifying the type of a legal document, distinguishing an employment agreement from a customer contract from a lease from an IP assignment, so that downstream analysis can be tailored to the specific document category. At Mage, it was the first capability we built. Not because it is the most impressive technology, but because everything else depends on it.

The Problem Classification Solves

Imagine receiving a data room with 500 documents. Your job is to extract the key provisions from every agreement, flag risks, and organize findings for the deal team.

The first question you face is not "what provisions should I extract?" It is "what kind of document am I looking at?"

This question determines everything. The material provisions in an employment agreement (compensation, non-compete, termination triggers, change-of-control benefits) are completely different from the material provisions in a supply agreement (pricing, volume commitments, force majeure, change-of-control termination). A lease has different material provisions than either. An IP assignment has different provisions still.

If you apply the same extraction template to every document, regardless of type, you get one of two outcomes:

If the template is narrow, it misses provisions that are material in some document types but not included in the template. An extraction template focused on indemnification and limitation of liability will miss compensation structures in employment agreements and pricing mechanisms in supply contracts.

If the template is broad, it generates noise. Searching every document for every possible provision produces results that are mostly empty fields or irrelevant findings. The signal-to-noise ratio makes the output less useful, not more.

Classification solves this by routing each document to the extraction template designed for its specific type. An employment agreement gets the employment extraction template. A supply agreement gets the supply extraction template. Each document receives precisely the analysis that matters for its category.

Why Classification First

When we started building Mage, we faced a sequencing decision. We could build extraction first and add classification later. Or we could build classification first and then build extraction on top of it.

We chose classification first for a structural reason: classification errors compound through the pipeline.

The Compounding Problem

Consider a classification system with 95% accuracy. That sounds good. But what does it mean for the deal team?

In a data room with 500 documents, a 5% error rate means 25 documents are misclassified. Those 25 documents receive the wrong extraction template. The wrong provisions are extracted. The wrong risk flags are applied. The wrong benchmarks are used for comparison.

The extraction system might be 95% accurate on correctly classified documents. But on misclassified documents, the extraction accuracy effectively drops to near zero for the provisions that matter, because the system is looking for the wrong things.

The combined accuracy is not 95% x 95% = 90.25%. It is more like 95% of documents receive correct analysis and 5% receive fundamentally wrong analysis. In a diligence context, those 25 misclassified documents could include material contracts where the deal team is relying on the system's analysis.

This is why we invested heavily in classification accuracy before building extraction. A 99%+ classification rate means the extraction system operates on a correctly classified foundation for virtually every document. The improvement in classification accuracy has an outsized effect on overall system reliability.

Classification as Architecture

Classification is not just a feature. It is an architectural decision that shapes the entire system.

When classification is accurate, extraction models can be specialized. Instead of building a single extraction model that handles every document type (and makes trade-offs that reduce accuracy on each type), we build extraction models optimized for specific document categories. The employment extraction model is trained on employment agreements. The customer agreement extraction model is trained on customer contracts. Each model can specialize deeply because it knows exactly what type of document it is processing.

This is the same principle behind model fusion: specialized models outperform general-purpose ones when the task allows specialization. Classification is what enables that specialization for extraction.

How Classification Works

Signal Layers

Document classification uses multiple signal layers, not just the text content.

Structural signals. Employment agreements have a different section structure than supply contracts. Leases have specific structural patterns (premises description, rent schedule, term provisions) that distinguish them from other contract types. The system analyzes document structure independently from content.

Language patterns. Each document type has characteristic vocabulary and phrasing. Employment agreements reference "at-will employment," "base salary," and "severance." Supply agreements reference "purchase orders," "delivery schedules," and "acceptance testing." These patterns are statistical, not keyword-matching. The system learns the distribution of language across document types.

Provision signatures. The combination of provisions present in a document is itself a classification signal. A document with indemnification, change-of-control, and assignment provisions could be many types. A document with those provisions plus compensation, non-compete, and COBRA references is almost certainly an employment agreement.

Party relationship signals. The way parties are described (employer/employee, licensor/licensee, landlord/tenant, buyer/seller) provides strong classification evidence.

Granular Categories

Mage classifies documents into more than 30 categories, not just the broad types. Within "contracts," we distinguish customer agreements, supply agreements, distribution agreements, consulting agreements, intercompany agreements, and others. Within "employment," we distinguish executive employment agreements, offer letters, independent contractor agreements, and separation agreements.

This granularity matters because the extraction requirements differ at each level. An executive employment agreement requires extraction of change-of-control benefits and golden parachute provisions that a standard offer letter does not contain. A distribution agreement requires extraction of territory exclusivity and minimum sales requirements that a general customer agreement does not.

The categories are designed to map directly to how deal attorneys organize their due diligence workstreams. When the classification output aligns with the diligence structure, the deal team can immediately allocate documents to the right reviewers without an intermediate sorting step.

What Classification Enables

Intelligent Extraction

With accurate classification, each document receives an extraction template calibrated to its type. The system knows which provisions matter for each category, which provisions are standard versus non-standard, and what format the extracted output should take.

For clause extraction, this means the system does not waste compute or generate noise searching for provisions that do not apply. It focuses on exactly the provisions that the deal team needs for each document type.

Risk Calibration

Risk assessment depends on context. An uncapped indemnification obligation in a $50,000 consulting agreement has different risk significance than one in a $50 million customer contract. A non-compete with a two-year duration is standard in an employment agreement and unusual in a vendor agreement.

Classification provides the context for calibrated risk assessment. The system evaluates each provision against the norms for that specific document type, producing risk flags that reflect whether terms are actually unusual, not just whether they exist.

Cross-Document Analysis

When documents are accurately classified, cross-document analysis becomes meaningful. The deal team can compare all customer agreements to identify the most favorable and least favorable terms across the portfolio. They can compare all employment agreements to identify inconsistencies in non-compete provisions. They can assess whether the target's lease terms are consistent across locations.

These comparisons only work when the comparison set is correctly defined. Comparing an employment agreement against customer agreement norms produces meaningless results. Classification ensures that comparisons happen within the right categories.

Work Allocation

A partner managing a diligence team needs to allocate documents to the right reviewers. The corporate associate reviews the organizational documents. The employment associate reviews the employment agreements. The real property associate reviews the leases. The IP associate reviews the IP agreements.

When the system classifies documents accurately at upload, work allocation can begin immediately. The deal team does not spend hours sorting documents into categories before substantive review can start.

The Lesson

The temptation in building AI products is to start with the most impressive capability. For legal AI, that would be extraction or risk analysis. The technology that attorneys see and evaluate.

We started with classification because it is the prerequisite for everything else. An extraction system built on inaccurate classification will always underperform. A risk assessment system that does not know what type of document it is evaluating will always miscalibrate. A comparison system that groups the wrong documents together will always produce misleading results.

Classification is not glamorous. But it is the foundation. And getting the foundation right is what determines whether everything built on top of it is reliable enough for attorneys to trust on live deals.


Frequently Asked Questions

Why does document classification matter for legal AI?

Document classification determines everything that follows in legal AI analysis. The provisions you extract from an employment agreement are different from those you extract from a supply contract or a lease. The risk flags for a non-compete clause mean something different in an employment context than in a joint venture context. The benchmarks you compare against depend on the document category. Without accurate classification, extraction produces noise, risk flags are miscalibrated, and cross-document analysis is unreliable.

How does Mage classify legal documents?

Mage uses specialized classification models trained on M&A data room documents to identify 30+ document categories including customer agreements, employment agreements, IP assignments, real property leases, equity documents, NDAs, and more. The classification considers document structure, language patterns, provision types, and party relationships. Classification happens automatically at upload, before any extraction begins, ensuring every document receives the right analysis template from the start.

What happens if a document is misclassified?

Misclassification cascades through the analysis pipeline. A supply agreement misclassified as a customer agreement would be searched for customer-specific provisions like SLA terms and data processing obligations rather than supply-specific provisions like take-or-pay commitments and force majeure. The extraction would miss material terms and flag irrelevant ones. This is why classification accuracy is disproportionately important relative to its apparent simplicity, because errors compound rather than averaging out.

How many document categories does Mage support?

Mage classifies documents into more than 30 M&A-specific categories, ranging from standard contract types like customer agreements, employment agreements, and real property leases to specialized categories like equity incentive plans, intercompany agreements, and regulatory filings. These categories map directly to how deal attorneys organize their diligence workstreams, making the classified output immediately useful for work allocation and reporting without requiring additional sorting or organization.

technologyclassificationdocument-typeslegal-aiarchitecture

Ready to transform your M&A due diligence?

See how Mage can help your legal team work faster and more accurately.

Request a Demo

Related Articles