SOC 2 and Legal AI: What M&A Lawyers Should Demand
Key Takeaways
- •SOC 2 Type II is necessary but not sufficient. The substance of how the vendor handles privileged data matters more than the audit certificate.
- •The DPA's training and retention provisions are the load-bearing legal documents. Get them reviewed by privacy counsel; don't accept the vendor's template.
- •Single-tenant infrastructure, encryption at rest and in transit, MFA/SSO are table stakes. A vendor without these is not viable for privileged work.
- •Vendor responses to security questions in week one of evaluation predict post-deployment posture.
This is the security-and-privilege checklist for M&A counsel evaluating legal AI vendors. SOC 2 Type II is the floor; the questions beyond it are where the real evaluation happens.
The core idea: legal AI vendors handle privileged content. The same level of diligence applied to outside counsel's information security applies to the vendor.
SOC 2 Type II: the floor
Get the report. Under NDA, on request. Type II only.
What SOC 2 Type II tells you:
- The vendor has documented security controls.
- An auditor verified those controls operated effectively over a defined period.
- The controls cover at least the AICPA Trust Services Criteria categories the vendor scoped (commonly Security; sometimes also Confidentiality, Availability, Processing Integrity, Privacy).
What SOC 2 Type II does not tell you:
- Whether the controls are appropriate for privileged legal work specifically.
- How the vendor handles your specific data.
- Whether the vendor will train on your inputs.
- Whether your data is co-mingled with other customers' data.
SOC 2 is necessary. The substance is in the answers to the questions below.
The DPA: the load-bearing document
The Data Processing Agreement is the legal document that governs how the vendor handles your data. Generic vendor DPAs are written for SaaS, not for legal AI. Privacy counsel should review and negotiate the specifics.
Provisions to nail down:
Training. Written commitment that the vendor does not train any model — proprietary or third-party — on customer data, ever. The clause should cover fine-tuning, RAG indexing, embedding generation, and sub-processor training. Penalty for breach should be specified (typical range: 100x annual contract value, or higher).
Retention. Maximum retention of customer data, with deletion timeline and method. Days, not years. The vendor should be able to articulate exactly what's kept, for how long, and why.
Sub-processors. List of all third parties that touch customer data (cloud providers, model vendors, monitoring). Notification rights for additions. Verification that each sub-processor has comparable controls.
Data residency. Where data sits geographically. For some firms (especially those serving regulated clients), single-region or in-jurisdiction residency is required.
Access. Who at the vendor can access customer data, under what conditions. Audit logging of all access. Access requests by the vendor's employees should require customer-side approval, not be silently logged.
Incident response. Notification timeline (48-72 hours typical), what's notified, what investigation rights the customer has.
Termination. What happens to data on contract end. Export format, deletion timeline, certification of deletion.
These provisions in the DPA are what make the relationship enforceable. Generic vendor templates rarely have all of them; getting them in is the work.
Infrastructure: table stakes
What every legal AI vendor should be doing on infrastructure:
- AES-256 encryption at rest.
- TLS 1.3 in transit.
- Single-tenant or strongly isolated multi-tenant infrastructure. For sensitive deals, single-tenant should be available.
- MFA on all access.
- SSO support (Okta, Azure AD, Google Workspace, SAML 2.0).
- Audit logging of all data access and modifications.
- Network isolation for processing environments.
- Hardware security modules for key management.
- Regular penetration testing (annually minimum) by external parties.
A vendor missing items on this list is not viable for privileged work. Don't proceed.
Architecture: how data flows
Beyond the DPA and infrastructure: how does the vendor's architecture handle your data day-to-day?
Questions:
- Is data co-mingled with other customers'? Multi-tenant SaaS without strong isolation can have side-channel risks (vector embeddings leaking between tenants, prompt-injection attacks across tenants).
- What happens to data on each request? Is it stored? Logged? Cached? For how long?
- What models process the data? First-party? OpenAI? Anthropic? Each one has its own data-handling commitments to negotiate against.
- Are model providers sub-processors of the vendor? They should be listed in the sub-processor schedule.
- What happens on a disaster? Backup retention, recovery procedures.
The right vendor can answer all of these in writing within a week of the question being asked. Vendors that deflect, defer, or "need to check with engineering" repeatedly are vendors whose post-deployment security posture will frustrate the firm's GC.
The deflections to watch for
A few patterns that indicate vendor security posture is weaker than the marketing suggests:
- "We're SOC 2 compliant" but won't share the report. Compliant on paper doesn't mean operationally compliant. Reports speak for themselves.
- "We don't train on customer data" but won't put it in the DPA with penalty. A no-training position that isn't legally enforceable is a marketing position.
- "We use industry-standard security" without specifics. The specifics matter.
- "We're working on SOC 2" without a Type II date. Working on it is fine; deploying without it on privileged work is not.
- "We can't share that for proprietary reasons." Some specifics genuinely are sensitive (specific architecture diagrams). The bar should be: share enough that the customer's privacy counsel can do their job.
What we publish
We document Mage's specific posture on the security page. The level of detail there is what we think a serious vendor should be able to publish: SOC 2 Type II report (under NDA), explicit no-training clause (in the DPA, with penalty), single-tenant infrastructure, AES-256 + TLS 1.3, MFA/SSO, comprehensive audit logging, documented incident response, clean termination provisions.
The bar should be that any vendor under consideration can answer with the same level of specificity. Vendors who can't are not yet ready for production deployment on privileged content.
Companion reading
- Mage Security & Compliance — the full posture
- Evaluating Legal AI Tools — the broader buyer's framework
- How to Roll Out Legal AI at a Law Firm — running the trust review in parallel with vendor selection
If you are running a security review on Mage for a current evaluation: request a demo and we'll start the security questionnaire process the same week. The procurement timeline gets faster when the security review is done up front.
Frequently Asked Questions
Is SOC 2 enough?
It's necessary but not sufficient. SOC 2 Type II tells you the vendor has documented controls and an auditor verified those controls operated effectively over a period of time. It doesn't tell you whether the vendor's controls are appropriate for privileged legal work specifically. The questions beyond SOC 2 are about the substance of how the vendor handles privilege, training, retention, and access.
Type I vs. Type II?
Insist on Type II. Type I is a point-in-time design review. Type II is an audit of whether controls operated effectively over a period (typically 6-12 months). The difference matters. A vendor without a Type II report has not yet been operating long enough to demonstrate the controls hold; a vendor unwilling to share Type II should be reconsidered.
What about data training?
Get a written no-training clause in the DPA, with penalties for breach specified. Most vendors will write some version of this; the question is whether it's enforceable, what the penalty is, and whether it covers all variants of training (model fine-tuning, RAG indexing, sub-processor training). Privacy counsel review is non-negotiable here.
What about data retention?
Minimum-required retention. Days, not years. The vendor should articulate exactly what's kept, for how long, and why. Default consumer SaaS retention (30-90 days at provider discretion) is not appropriate for privileged work.
What about subprocessors?
Get the list. Get notification rights for additions. Verify each subprocessor's posture is comparable. A vendor that uses sub-processors with weaker controls inherits those controls' weaknesses on your data.
Ready to transform your M&A due diligence?
See how Mage can help your legal team work faster and more accurately.
Request a Demo