The use of generative AI in document review provokes two familiar questions, echoing the early debates around technology-assisted review (TAR):
- Is it accurate?
- Is it defensible?
TAR earned its judicial acceptance through transparent validation metrics, and many of those same principles apply to generative AI. Accuracy can be measured using established metrics:
- Recall: the percentage of relevant documents correctly identified
- Precision: the percentage of AI-predicted relevant documents that are truly relevant
- Elusion: the percentage of relevant documents incorrectly tagged as irrelevant
- Richness: the share of all documents that are relevant, which influences sample size and margin of error
These metrics provide a quantitative foundation for assessing an AI model’s reliability. What’s new with generative AI is when validation occurs. Instead of validating after a full review, legal teams can now test and refine prompts on smaller datasets upfront—measuring recall, precision, and elusion before scaling to the full document set.
By validating early, teams can optimize prompt design, build defensibility into their processes, and set realistic expectations for AI-assisted review. As Ben Sexton, SVP of Innovation & Strategy at JND eDiscovery, explains:
“Being able to assess recall and precision before committing to a full review run is a major advantage. If results meet expectations, we can proceed with confidence; if not, we can iterate and re-validate without costly rework.”