Understanding Technology-Assisted Review: A Practitioner's Guide

Machine learning has quietly rewritten the rules of document review. Here's what every legal professional needs to know before their next large-scale matter.

Not long ago, responding to a major document request meant assembling a room full of attorneys and setting them loose on millions of emails, memos, and attachments - one document at a time. The process was expensive, slow, and prone to the kind of fatigue-driven errors that only compound across weeks of linear review.

Technology-Assisted Review changed that equation fundamentally. Today, the most defensible, cost-effective, and accurate large-scale reviews aren't purely human - they're human-machine partnerships. And yet many legal professionals still approach TAR as a black box: something the vendor handles while they handle the law.

That gap is increasingly costly. Understanding TAR at a working level - not just its marketing promises, but its mechanics, its failure modes, and its appropriate applications - has become a core competency for litigators, eDiscovery counsel, and review managers alike.

"What once required armies of contract attorneys reviewing documents one by one can now be accomplished with machine learning algorithms that learn from human decisions and apply those patterns across millions of documents."

What is TAR, exactly?

Technology-Assisted Review (TAR) is a document review methodology in which human reviewers work alongside software to train machine learning algorithms to identify relevant documents. The system learns from human coding decisions - relevant or not relevant, responsive or non-responsive - and applies those learned patterns across the remaining document population.

The critical word is train. TAR is not autonomous AI making independent legal judgments. It is a supervised learning process where the quality of the model is a direct function of the quality of the human decisions fed into it. Garbage in, garbage out - the aphorism applies with particular force here.

Also Known As

Predictive Coding

Emphasizes the system's ability to predict relevance - the most common term in legal contexts.

Computer-Assisted Review

A broader umbrella including non-ML assistance tools. Often used interchangeably with TAR.

Supervised Machine Learning

The technical description of what's actually happening under the hood - the model learns from labeled examples.

CAL

Continuous Active Learning - a modern TAR variant that learns throughout the review, not just upfront.

The core promise - and the fine print

TAR's value proposition is well-established in the literature and the case law: when applied correctly, it produces results that are at least as accurate as traditional linear review, in a fraction of the time, at a fraction of the cost. Courts have accepted TAR as a defensible methodology in numerous landmark decisions, from Da Silva Moore to Rio Tinto.

But "when applied correctly" is doing a lot of work in that sentence. The four pillars of TAR's promise each carry implicit conditions.

Accuracy

At least as accurate as manual review - provided the training set is representative and seed reviewers are consistent.

Speed

Significantly faster than linear review - the speed advantage compounds with collection size.

Cost

Far more cost-effective for large collections - ROI typically materializes at 50,000+ documents.

Consistency

More consistent than human review alone - the model applies the same criteria uniformly across the corpus.

Why consistency matters more than people think

Of the four pillars above, consistency is the one that surprises practitioners the most when they first dig into the data. Human reviewers - even experienced ones working from the same protocol - show statistically significant inter-rater disagreement. Studies consistently find that two attorneys reviewing the same document will code it differently 20-30% of the time on borderline calls.

TAR doesn't eliminate that problem - it quarantines it. The inconsistency happens once, during training, and then the model applies the resulting decision boundary uniformly. Whether that's better than distributed human inconsistency depends on your matter and your protocol, but for high-volume collections it often is.

TAR 1.0 vs. TAR 2.0: a critical distinction

Early TAR workflows - often called TAR 1.0 or Simple Active Learning - involved a discrete training phase followed by a bulk classification of the remaining documents. You trained, you stopped, you reviewed the results. The model was static after training concluded.

Modern platforms have largely moved to Continuous Active Learning (CAL), sometimes called TAR 2.0. In CAL workflows, the model updates continuously as reviewers code documents throughout the review - not just in a defined training phase. Every coding decision is a new training signal. The result is a system that gets smarter as the review progresses, and that can surface the highest-priority documents earlier in the workflow.

The practical implications are significant: CAL workflows tend to require less rigid training discipline upfront, are more forgiving of reviewer inconsistency during early rounds, and produce better recall curves on collections with low richness. Platforms like Relativity's Active Learning and Epiq's AIDA implement CAL-based approaches.

"TAR is not autonomous AI making independent legal judgments. It is a supervised learning process where the quality of the model is a direct function of the quality of the human decisions fed into it."

When TAR is the right tool - and when it isn't

TAR is not universally appropriate. The methodology delivers its best returns on large, homogeneous collections with a clear and stable relevance definition. If you're reviewing 2 million email threads from a single custodian on a single topic, TAR is probably the right call. If you're reviewing 8,000 documents across a dozen custodians with a shifting privilege analysis, the overhead of training may not be worth it.

Collection size is the most commonly cited factor, but richness matters just as much. Very low-richness collections - where relevant documents make up less than 1-2% of the total - can present challenges for certain TAR implementations, because the model has relatively few positive examples to learn from. In those cases, additional sampling strategies, targeted keyword runs to increase richness before TAR begins, or hybrid approaches may be warranted.

The defensibility question

TAR's legal defensibility is now well-established, but it doesn't come automatically. The core principle from the case law is transparency and process documentation. Courts don't require a specific TAR methodology - they require that whatever methodology is used be applied consistently, documented thoroughly, and disclosed appropriately.

This means maintaining detailed logs of training decisions, documenting the criteria applied by seed reviewers, preserving quality control metrics, and being prepared to explain your process if challenged. The burden is not onerous, but it is real - and teams that treat TAR as a set-and-forget automation rather than a documented process tend to run into trouble during discovery disputes.

The practical upshot: treat your TAR workflow the same way you'd treat any other methodological choice that might be subject to a Rule 26(f) conference or a motion to compel. Build the paper trail as you go, not after the fact.

This post covers the foundational layer of TAR - terminology, the key promises, the major workflow variants, and the threshold defensibility considerations. Future posts in this series will go deeper on training protocols, validation methodology, quality control in CAL workflows, and how to read the analytics dashboards that platforms like Relativity and Epiq surface during active review.

If you're preparing for a TAR engagement or studying for a certification in eDiscovery technology, the concepts above form the scaffolding everything else hangs on. Get these right, and the rest becomes much more legible.