Why 90 days specifically?

Because legal AI evaluations stretch indefinitely without a forcing function. 90 days is enough to run two real deals end-to-end; six months is long enough that the original sponsors lose interest. The decision quality is the same; the velocity is much better.

Who should own the pilot?

The partner-sponsor owns the buy/no-buy decision. The innovation officer or COO runs the project. The senior associate is the daily user and gives ground-truth feedback. IT supports security and integration. Don't let any one of these own the others' role.

What does 'standardized rollout' actually require?

Trained users (every M&A associate uses the tool on every deal), firm-branded templates (memo voice, schedule format, redline conventions), DMS integration (deliverables flow back to iManage or NetDocuments), access controls (per-deal, per-team), and a feedback loop (what's working, what isn't, what to ask the vendor for).

What's the most common rollout failure?

Picking the wrong tool because the demo was good. The pilot phase exists to surface this. The second-most-common failure is partner indifference: the tool gets bought, the associates use it occasionally, the partners ignore the output. Without partner adoption the leverage doesn't compound.

How do you handle junior associate buy-in?

Pitch the tool as eliminating the worst part of the job (reading 1,200 contracts for the same fifty issues), not as a headcount play. The associates who get freed from grunt diligence get pushed into negotiation prep, structuring, and client work earlier — which is the career trajectory they want anyway. Frame it that way, in writing, in the first all-hands.

What if the firm has already deployed Harvey or another firm-wide tool?

Then the question is whether to add a specialist on top of the generalist for the M&A workstream specifically. Many firms run both. The pilot framework above still applies — run two real deals through the specialist tool, compare output quality on M&A workflows specifically, and decide based on the comparison.

How to Roll Out Legal AI at a Law Firm

This is a practitioner's playbook for deploying legal AI inside an M&A practice. Written from the inside: what works, what reliably fails, and what to insist on regardless of which vendor you pick.

I (Raffi) ran M&A at Kirkland and have now spent two years on the building side at Mage. Both vantage points say the same thing: legal AI rollouts succeed or fail on the people side, not the tooling side. Pick a partner-sponsor who actually closes deals, run two real deals through the candidate tool, and decide in 90 days. Six-month evaluations produce six-month evaluation results.

The shape of the playbook below is the shape we use when we sit on the customer side of a Mage deployment. The order matters. Skipping a step is what produces the failure modes you're trying to avoid.

Day 0: frame the buy

Before the first vendor call, the firm should be clear internally on three things.

Scope. What workstreams do you want the tool to own? End-to-end M&A diligence is a different scope than firm-wide assistance. Many firms run both — a generalist (Harvey, Legora, similar) plus a specialist for M&A specifically. Be explicit about which slot you're filling. The shortlist is different.

Volume. How many deals per year, what mix, what size? A firm doing 30 mid-market deals a year wants different pricing and different customization than a firm doing 3 mega-deals.

Quality bar. What does "partner-grade output" mean in this firm? "Memo we'd send to a sophisticated client without rewriting" is one bar; "issues list we'd discuss internally before drafting" is another. Both are valid; the tools are different.

We covered the framing in more detail in Evaluating Legal AI Tools: A Buyer's Guide for M&A Counsel. The TL;DR is that vendor demos are designed to win and real deals are designed to ship; the only meaningful evaluation runs real deals.

Days 1-7: lock down trust posture in parallel

The single biggest cause of delayed legal-AI rollouts is GC review of the vendor's data-handling posture, started at the wrong time. Most firms run procurement sequentially: pick a vendor, then start security review. This adds two to four months on its own.

The fix: run both in parallel. On day one, send your top three candidate vendors' SOC 2 Type II reports, DPAs, and a written security questionnaire to GC and privacy review. By the time you're picking between the finalists in week three, the trust review is done and you're not waiting on it.

The minimum bar (we cover this in detail on the Mage security page for our own posture):

SOC 2 Type II report, available on request
Written no-training-on-customer-data clause in the DPA, with penalty
Minimum-required retention, days not years
Single-tenant or strongly isolated infrastructure
AES-256 at rest, TLS 1.3 in transit
MFA, SSO (Okta/Azure AD/Google), SAML 2.0
Audit logging
Documented incident response with notification timelines

A vendor that can't answer these in week one isn't ready for production deployment. Scope down or reconsider.

Days 7-21: pilot setup

By the end of week three, the decisions are made and the pilot is configured.

Partner-sponsor identified. A respected M&A partner who agrees to sponsor the pilot. This person makes the buy/no-buy call. The innovation officer or COO is the project owner; the partner is the bar-setter.

Two pilot deals committed. One buy-side, one sell-side, ideally different complexity profiles. Real deals, not historical reruns. (Historical reruns are useful for accuracy comparison but tell you nothing about workflow fit.)

Senior associate as daily user. This person spends most of the pilot inside the tool. They evaluate workflow fit, output quality, and whether the team would adopt unprompted. Don't let an innovation officer or IT person be the daily user — they aren't representative of the actual user base.

Success metrics in writing. Before any deal data touches the platform:

Time-to-issues-list: target 50%+ reduction
Time-to-memo: target 50%+ reduction
Recall vs. manual ground truth: target ≥ associate baseline
Precision: target ≥70% (so the partner doesn't have to re-read everything)
Memo rewrite percentage: target <30%
Subjective: would the team adopt this tool unprompted?

If the firm can't agree on these targets in advance, the rollout will fail at the decision gate. Lock them in.

Days 21-42: pilot deal one

The first real deal runs through the tool in parallel with the manual workflow. The associate uses the tool; the partner reviews both outputs.

The team should expect surprises. The first deal is when the gap between vendor demo and reality becomes visible. Common patterns:

The tool is faster but less accurate on the firm's specific contract patterns. Resolution: configurable risk lists, partner-tuned thresholds.
The tool's output voice doesn't match the firm's house style. Resolution: customizable templates, firm-branded deliverables.
The tool surfaces too many low-severity issues. Resolution: severity tuning per deal, or escalation-only mode.
The tool misses an obvious issue everyone expected it to catch. Resolution: investigate why, document, discuss with vendor.

A serious vendor responds to deal-one feedback within days, not weeks. Watch the response cadence; it predicts post-deployment support.

Days 42-63: pilot deal two

Different complexity profile. If deal one was straightforward, deal two should have multi-jurisdiction issues, a custom indemnity package, an amendment-chain-heavy commercial portfolio, or contracts in non-English languages. Different complexity surfaces different failure modes.

The bar at the end of deal two: did the tool make the team faster, more accurate, and more confident? Or did it create a parallel workflow that competes with the existing one without replacing it?

We've written specifically about the technical hard cases in Amendment Chain Resolution: The Hardest Problem in Legal AI and LLM Hallucination in Contract Analysis. Both should be tested explicitly during deal two.

Day 70: the decision gate

The partner-sponsor reviews the metrics, the team's qualitative feedback, and decides. The decision needs to be one of three:

Yes, expand to standardized M&A rollout. Move to weeks 10-13.
No, decline this vendor. Document why, in writing, so the next evaluation has the institutional memory.
Yes with caveats. A scope-down (e.g., "use it for diligence but not for memo drafting") with a specific 30-day re-evaluation. Don't let "yes with caveats" turn into perpetual evaluation; set the re-eval date in writing.

The mistake to avoid: the half-yes that becomes "we'll deploy it but only if the team wants to". The team won't deploy it on their own. Adoption is a leadership decision.

Days 70-90: standardized rollout

The pilot's two-person user base expands to the M&A group's full headcount. Three things have to happen:

Training. Every associate gets hands-on, deal-shaped training. Not a slide deck. They run a simulated deal end-to-end on the tool, with a senior associate or partner as the reviewer. This is the one-time investment that lets the team operate independently.

Integration. DMS connection (iManage, NetDocuments). Data-room provider connections (Datasite, Intralinks, ShareFile). Firm-branded templates (memo voice, schedule format, redline conventions) finalized with the partner-sponsor. Access controls per deal team.

Feedback loop. Weekly 30-minute checkin for the first 60 days. What's working, what isn't, what's the team asking for that isn't there. Push the feedback to the vendor on a regular cadence. Vendors that ship against feedback prove themselves; vendors that don't are a future migration cost.

Post-90: firm-wide expansion or stay focused

If the M&A pilot succeeded, the firm now has a question: expand to adjacent practices, or keep the tool focused on M&A?

Our view: expand only when the adjacent practice has its own partner-sponsor and pilot. Tools deployed across practices on the strength of a single practice's success rarely succeed in the new practice. Each practice has its own workflow shape, its own quality bar, and its own taste in output. The tool that wins for M&A may not win for litigation, regulatory, or financing.

The exception is a firm-wide assistant that genuinely is firm-wide-shaped (Harvey, Legora). For specialist tools like Mage, the discipline is to expand when the adjacent practice is genuinely M&A-adjacent (PE, financings) and pause when it isn't.

What we wish someone had told us

A few things, written from the customer side:

Junior associate adoption is load-bearing. The associates use the tool for hours a day; the partners use it for minutes. If the associates love it, the firm gets the leverage. If they don't, the tool becomes shelfware regardless of partner enthusiasm. Pitch leverage, not headcount.
Workflow fit beats accuracy. A 90% accurate tool that fits the workflow beats a 95% accurate tool that doesn't. Accuracy is a metric; adoption is the outcome.
Output voice matters more than feature count. A tool whose first-draft memo requires more rewriting than starting from scratch has not earned its keep, no matter how many features it has.
Vendor responsiveness predicts post-deployment quality. The vendor's response cadence on pilot feedback is the leading indicator of how the partnership goes once the deal is live. Watch it.

For the operational mechanics of running deals on the tool once it's deployed, see AI Due Diligence: An Operational Playbook. For the master view of where legal AI fits in M&A, see Legal AI for M&A.

If you want to run a structured 90-day pilot on real deals, request a demo. We'll bring the framework above and a concrete pilot plan. The decision should be obvious by day 70 in either direction.