Glossary of Key Terms

Plain-English terms for TAR, validation, AI-assisted review, and defensibility discussions.

Core TAR and AI Terms

Technology-Assisted Review (TAR)

A review process that uses machine learning or analytics to prioritize, classify, or help identify documents for legal review.

Why it matters: TAR is a process, not just a button in a platform; defensibility depends on workflow and validation.

Continuous Active Learning (CAL)

A TAR approach where the model keeps learning from reviewer coding throughout the review.

Why it matters: CAL is common because it can surface likely relevant documents quickly while adapting as coding improves.

Seed Set

An initial set of coded documents used to begin training or orienting a model.

Why it matters: Seed-set design can affect early model behavior and bias.

Control Set

A sample of documents coded independently and held aside to evaluate model performance.

Why it matters: Control sets support measurement but must be large and representative enough to be meaningful.

Elusion Test

A sample of documents that the model or workflow left behind, reviewed to estimate missed relevant material.

Why it matters: Elusion testing helps justify stopping decisions and production completeness.

Recall

The share of all relevant documents that the workflow found.

Why it matters: Recall is central to defensibility because it speaks to completeness.

Precision

The share of documents marked relevant that are actually relevant.

Why it matters: Precision affects review efficiency and cost.

Richness

The percentage of relevant documents in the overall population.

Why it matters: Low richness can make relevant documents harder to find and affects sampling strategy.

F1 Score

A combined measure of precision and recall.

Why it matters: Useful for comparing models, but legal defensibility usually requires more context than one score.

Overturn

A coding decision changed during quality control, second-level review, or expert review.

Why it matters: High overturn rates can signal unclear guidance or unstable training data.

Conceptual Clustering

Grouping documents by conceptual similarity rather than exact keywords.

Why it matters: Clustering can help explore unknown collections and find related themes.

Near-Duplicate Detection

Identifying documents that are substantially similar but not exact duplicates.

Why it matters: It can improve consistency and reduce redundant review.

Family Documents

Related documents such as an email and its attachments.

Why it matters: Family handling affects responsiveness, privilege, and production completeness.

Generative AI Review Assistance

Use of large language models to summarize, classify, extract issues, draft explanations, or support review decisions.

Why it matters: Helpful but risky; outputs need human review, validation, and privilege/data controls.

Hallucination

An AI-generated statement that sounds plausible but is false, unsupported, or not grounded in the source material.

Why it matters: Legal workflows require source-grounded outputs and human verification.

Model Drift

A change in model behavior over time as inputs, coding decisions, or review priorities shift.

Why it matters: Review teams should monitor consistency and revalidate when scope changes.

Defensibility

The ability to explain and justify the review process as reasonable, proportionate, and reliable under the circumstances.

Why it matters: Defensibility comes from documented process, not from using a famous tool.

← Vendor Education & Tool Documentation FAQ: Common TAR & AI Questions →