Data Analytics and the Law: Analysis

For data-based evidence, the analysis is the heart of the content: the output of the data compiled for a case. In most instances, the analytics do not need to be complex. Indeed, powerful results can be derived by simply calculating summary statistics (mean, median, standard deviation). More complicated techniques, like regressions, time-series models, and pattern analyses, do require a background in statistics and coding languages. But even the most robust results are ineffective if an opposing witness successfully argues they are immaterial to the case. Whether simple or complex, litigants and expert witnesses should ensure an analysis is both relevant and robust against criticism.

 

What type of result would provide evidence of a party’s assertion? The admissibility and validity of statistical evidence varies by jurisdiction. In general, data-based evidence should be as straightforward as possible; more complex models should only be used when necessary. Superfluous analytics are distractions, leading to expert witnesses “boiling the ocean” in search of additional evidence. Additionally, courts still approach statistical techniques with some skepticism, despite their acceptance in other fields.

 

If more complex techniques are necessary, like regressions, litigants must be confident in their methods. For example, what kind of regression will be used? Which variables are “relevant” as inputs? What is the output, and how does it relate to a party’s assertion of fact? Parties need to link outputs, big or small, to a “therefore” moment: “the analysis gave us a result, therefore it is proof of our assertion in the following ways.” Importantly, this refocuses the judge or jury’s attention to the relevance of the output, rather than its complex derivation.

 

Does the analysis match the scope of the complaint or a fact in dispute? Is the certified class all employees, or just a subset of in a company? Is the location a state, or a county within a state? If the defendant is accused of committing fraud, for how many years? Generalizing from a smaller or tangential analysis is inherently risky, and an easy target for opposing witnesses. If given a choice, avoid conjecture. Do not assume that an analysis in one area, for one class, or for one time automatically applies to another.

 

A key component of analytical and statistical work is replicability. In fields such as finance, insurance, or large scale employment cases, the analysis of both parties should be replicable. Outside parties should be able to analyze the same data and obtain the same results. In addition, replicability can expose error, slights of hand, or outright manipulation.

 

Data-based evidence requires focus, clarity, and appropriate analytical techniques, otherwise an output is just another number.