What is the difference between research and extraction in contract review software?

Research-paradigm tools let you ask natural language questions about documents and get narrative answers. Extraction-paradigm tools systematically process every document to produce structured data: provision types, extracted values, and source citations organized by contract and category. Research gives you answers to specific questions. Extraction gives you a complete dataset of every relevant provision across every document, structured for cross-contract analysis and deliverable generation.

Which is better for M&A due diligence: research or extraction?

For systematic contract review in M&A, extraction is better because diligence requires comprehensive coverage across hundreds of contracts with structured output for deliverables. Research tools are better for ad hoc legal questions during the deal. Most deal teams benefit from both: extraction for the systematic contract review phase, and research for legal questions that arise during negotiation and strategy. The key is not choosing one over the other but using each for the phase it serves best.

Can research-based AI tools handle large-scale contract review?

Research tools can technically analyze large document sets, but the research paradigm creates practical problems at scale. Reviewing 300 contracts with 50 questions each means 15,000 individual queries, each with potential for hallucination, missed context, or incomplete retrieval. Error rates compound across queries. The output is 15,000 narrative answers that must be manually organized into structured findings. For systematic review at scale, extraction produces structured data more efficiently and reliably than repeated research queries.

What contract review software works best for law firms?

The best contract review software for law firms depends on the workflow. For M&A due diligence with large data rooms, extraction tools like Mage that produce structured, deliverable-ready output are most effective. For legal research, memo drafting, and ad hoc document analysis, research tools like Harvey serve different needs. For firms doing both transactional and advisory work, having specialized tools for each workflow outperforms using a single general-purpose tool for everything.

Research vs. Extraction: Two Paradigms for Contract Review Software

Contract review software is technology that assists attorneys in analyzing legal agreements, extracting relevant provisions, and identifying risks. The market has grown rapidly, but beneath the feature comparisons and vendor claims, there are two fundamentally different paradigms at work. Understanding which paradigm a tool uses tells you more about its real-world utility than any feature list.

The research paradigm treats document analysis as a question-answering problem: you ask questions, the tool searches your documents, and it generates answers. The extraction paradigm treats document analysis as a data structuring problem: the tool processes every document, extracts relevant provisions into a structured schema, and presents a complete analytical layer across your document set.

The Research Paradigm

Research-paradigm tools are built on a conversational model. You upload documents and ask questions in natural language. The tool retrieves relevant content from your documents and generates an answer.

Example interaction:

"Does Contract #47 contain a change of control provision?"
"What is the indemnification cap in the ABC Corp. MSA?"
"Are there any non-compete restrictions in the employment agreements?"

Each question produces a narrative answer, typically a paragraph summarizing the relevant finding with varying degrees of specificity.

Strengths of the research paradigm:

Flexible. You can ask any question about any document.
Intuitive. The conversational interface requires no training.
Fast for single questions. Getting one answer about one document is quick.
Good for ad hoc analysis. When you need to check a specific provision in a specific document, research tools are efficient.

Limitations for systematic contract review:

Scale. Systematic diligence requires extracting dozens of data points from each of hundreds of contracts. Running 50 queries per contract across 300 contracts means 15,000 individual query-response cycles. Each cycle has a small probability of hallucination or missed context. At 15,000 cycles, even a 2% error rate produces 300 errors.
Coverage gaps. Research tools find what you ask about. They do not surface provisions you did not think to query. If you do not ask about audit rights in vendor agreements, you will not learn that three vendors have unusual audit provisions. The value of diligence is comprehensiveness, finding what you do not know to look for.
Narrative output. Research tools produce paragraphs. Diligence deliverables require structured data: specific provision types with specific values and specific sources. Converting 15,000 narrative answers into structured memo entries requires significant manual work.
No cross-contract analysis. Each query-response cycle is independent. The tool answers one question about one (or several) documents at a time. Comparing indemnification caps across 200 customer agreements requires either a carefully crafted comparative query (which RAG pipelines handle poorly) or manual compilation from individual answers.

The Extraction Paradigm

Extraction-paradigm tools are built on a data structuring model. You upload documents, and the tool processes every document through a structured extraction pipeline, producing a complete dataset of provisions across the entire document set.

Example output: A structured table showing every customer agreement with columns for: contract name, counterparty, effective date, term, auto-renewal, change of control, assignment restrictions, indemnification cap, basket type, survival period, termination triggers, and non-standard terms. Each cell contains the extracted value with a link to the source clause.

Strengths of the extraction paradigm:

Comprehensive. Every document is processed against the full extraction schema. No provisions are missed because no one asked about them. If the schema includes audit rights, every audit right across every agreement is extracted, whether or not the attorney thought to look for it.
Structured output. The output is a dataset, not a collection of paragraphs. Provision types, extracted values, and source citations are organized in a structured format that maps directly to diligence deliverables.
Cross-contract analysis. Because every provision is extracted into a structured schema, comparing any provision type across the entire document set is a table operation, not a research project. How many customer agreements have change of control clauses? Which employment agreements have non-competes exceeding 12 months? Where do indemnification caps deviate from the standard form? These questions are answered by the existing dataset.
Deliverable-ready. Structured extraction output maps directly to diligence memos, disclosure schedules, and exception lists. The data populates the deliverable template without manual reformatting.

Limitations:

Less flexible for ad hoc questions. If you need to know something outside the extraction schema, the tool may not answer it.
Requires domain-specific design. The extraction schema must be built for the specific use case (e.g., M&A diligence). A generic extraction schema produces generic output.
Structured interfaces may feel less intuitive than a conversational interface for users accustomed to search-and-ask workflows.

Why the Paradigm Matters

The paradigm choice determines three downstream consequences:

1. Deliverable Quality

Research output requires manual assembly. An attorney using a research tool collects individual answers, organizes them by document and provision type, extracts specific data points from narrative paragraphs, and formats everything into the memo template. This assembly work takes significant time and introduces transcription errors.

Extraction output is pre-assembled. The structured data maps to deliverable fields directly. The attorney reviews and edits, not compiles and reformats.

2. Error Patterns

Research tools produce errors through hallucination and missed retrieval across individual queries. Each error is independent and unpredictable. Quality control requires checking every answer against the source document.

Extraction tools produce errors through boundary detection and parameter extraction failures. These errors are systematic and predictable: if the system struggles with a particular document format, the pattern is visible across multiple documents. Systematic errors are easier to identify and correct than random ones.

3. Total Time to Deliverable

For a 300-contract data room, the total time from upload to deliverable-quality output differs significantly:

Research paradigm: Processing time (1-2 hours of querying) + manual compilation (40-60 hours of organizing narrative answers into structured findings) + formatting (10-20 hours of populating deliverables). Total: 50-80 hours of attorney time.

Extraction paradigm: Processing time (minutes) + review time (20-30 hours of reviewing structured findings against sources) + deliverable generation (2-3 hours of configuring and exporting). Total: 25-35 hours of attorney time.

The extraction paradigm is not faster because it processes documents faster. It is faster because it eliminates the compilation and formatting steps that dominate the research-paradigm workflow.

When to Use Which

The paradigms are complementary, not competitive.

Use research for:

Legal research questions during deal strategy and negotiation
Ad hoc analysis of specific documents or provisions
Drafting assistance for memos and correspondence
Quick checks on individual document terms

Use extraction for:

Systematic contract review across full data rooms
Generating structured diligence deliverables
Cross-contract comparison and pattern detection
Achieving 100% contract coverage within deal timelines

The evaluation framework for choosing between specific tools within each paradigm involves testing on real deal data, measuring output quality against existing work product, and assessing total time to deliverable. But the first decision, before evaluating any specific tool, is identifying which paradigm matches the problem you are solving.

For M&A deal teams, the contract review phase is an extraction problem. The legal research phase is a research problem. Using the right tool for each produces better results than forcing either paradigm to serve both purposes.

Research vs. Extraction: Two Paradigms for Contract Review Software

Key Takeaways

The Research Paradigm

The Extraction Paradigm

Why the Paradigm Matters

1. Deliverable Quality

2. Error Patterns

3. Total Time to Deliverable

When to Use Which

Frequently Asked Questions

What is the difference between research and extraction in contract review software?

Which is better for M&A due diligence: research or extraction?

Can research-based AI tools handle large-scale contract review?

What contract review software works best for law firms?

Ready to transform your M&A due diligence?

Related Articles

How to Evaluate Legal AI Tools for M&A: A 5-Axis Framework

Why Most Legal AI Fails: Three Failure Modes That Kill Adoption

Harvey vs. Kira vs. Infrastructure: Three Approaches to Legal AI