Big Data |

26

APR 2022

Gathering Data for Labor Market and Mitigation Studies

Posted by Emma Dooley | Big Data, Data Analytics, Economics, Employment, Labor data, Statistical Analysis

Performing labor market and mitigation studies requires gathering and using specific information. Often, this information pertains to the plaintiff’s job search efforts within the labor market. For example, did the individual apply for jobs that matched their expertise and education level. Additionally, if an application is made to a different job, we must determine if the job qualifications are similar to their previous position.

Labor market data sources such as, U.S. Bureau of Labor Statistics labor market survey (BLS) and U.S Department of Labor’s ONET, are often used to analyze an individual’s potential job matches. It is through this type of research that an accurate picture of the plaintiff’s job search efforts can be measured and provide needed information in these types of labor market and mitigation studies.

For more information visit Employstats or contact us at info@employstats.com.

9

MAR 2022

EmployStats Advises the EEOC on Collection of Survey Data

Posted by Emma Dooley | Big Data, Data Analytics, Economics, Employment, Labor data, Statistical Analysis

EmployStats was brought on to provide our feedback on the best uses of this EEOC-2 data. In these panel meetings, we testified about our industry level experience in using available pay data to analyze claims of disparate pay and employment discrimination. We described to the EEOC how companies like EmployStats, research institutions, and public users utilize federally maintained datasets in practice, comparing the survey data the EEOC collected to other federal databases like the Bureau of Labor Statistics (BLS).

We explained the benefits of current benchmark pay data from different public and private sources, and what additional value the EEO-2 survey data could bring. We also provided EEOC recommendations on best practices for the formatting and publication of the EEOC’s data, so this survey data can be of maximum utility to researchers and the general public.

A few years ago the EEOC had created an additional component to their Equal Employment Opportunity (EEO) survey sent out to employers in the United States, known as Component 2 (EEOC-2 / EEO-2). This addition to their survey asked employers about the compensation of employees and their hours worked, organized by job category, gender, race, ethnicity, and certain pay bands. After collecting this data, the EEOC was interested in analyzing this data and determining how it could be best utilized by both the commission, and the public at large. Partnering with the National Academy of Sciences (NAS), the EEOC formed a panel to closely examine this compensation data, and collect input on its utilization. EmployStats was able to collaborate with several well known professionals including William Rogers, Elizabeth Hirsh, Jenifer Park, and Claudia Goldin.

To discuss a potential case or to answer any questions, you can email info@employstats.com or contact us at 1-866-629-0011.

17

DEC 2021

Research Associate Proma Paromita Participates in Stata Online Training Course

Posted by Emma Dooley | Big Data, Data Analytics, Economics, Statistical Analysis

Our very own Research Associate Proma Paromita recently participated in an online training course for Stata. Stata is an integrated statistical software package that provides services for data manipulation, visualization, statistics, and automated reporting.

Here at EmployStats our researchers are always working with huge data sets that need to be analyzed and formatted. This course provided Proma with a better understanding of Stata programming and insight on how to more efficiently dissect large quantities of data.

In a sit-down interview Proma discussed her experience and general layout of the course. She explained the scaffolding of course content beginning with the basics and increasing with complexity as the course continued. Proma described the course as concise with practical examples resulting in a perfect tool to utilize for future Stata programming. Stata offers many resources on its website and YouTube channel to help individuals navigate challenges by accessing helpful information.

As a takeaway from the course Proma believes understanding the syntax is more beneficial than memorizing it. Additionally, she stressed it is essential to understand the data that you are working with in order to produce tangible results.

29

MAR 2021

Data Mining and Litigation (Part 1)

Posted by Carl McClain | Big Data, Data Analytics, Economics, Statistical Analysis

Data Mining is one of the many buzzwords floating about in the data science ether, a noun high on enthusiasm, but typically low on specifics. It is often described as a cross between statistics, analytics, and machine learning (yet another buzzword). Data mining is not, as is often believed, a process that extracts data. It is more accurate to say that data mining is a process of extracting unobserved patterns from data. Such patterns and information can represent real value in unlikely circumstances.

Those who work in economics and the law may find themselves confused by, and suspicious of, the latest fads in computer science and analytics. Indeed, concepts in econometrics and statistics are already difficult to convey to judges, juries, and the general public. Expecting a jury composed entirely of mathematics professors is fanciful, so the average economist and lawyer must find a way to convincingly say that X output from Y method is reliable, and presents an accurate account of the facts. In that instance, why make a courtroom analysis even more remote with “data mining” or “machine learning”? Why risk bamboozling a jury, especially with concepts that even the expert witness struggles to understand? The answer is that data mining and machine learning open up new possibilities for economists in the courtroom, if used for the right reasons and articulated in the right manner.

Consider the following case study:

A class action lawsuit is filed against a major Fortune 500 company, alleging gender discrimination. In the complaint, the plaintiffs allege that female executives are, on average, paid less than men. One of the allegations is that starting salaries for women are lower than men, and this bias against women persists as they continue working and advancing at this company. After constructing several different statistical models, the plaintiff’s expert witness economist confirms that the starting salaries for women are, on average, several percentage points lower than men. This pay gap is statistically significant, the findings are robust, and the regressions control for a variety of different employment factors, such as the employee’s department, age, education, and salary grade.

However, the defense now raises an objection in the following vein: “Of course men and women at our firm have different starting salaries. The men we hire tend to have more relevant prior job experience than women.” An employee with more relevant experience would (one would suspect) be paid more than an employee with less relevant prior experience. In that case, the perceived pay gap would not be discriminatory, but a result of an as-of-yet unaccounted variable. So, how can the expert economist quantify relevant prior job experience?

For larger firms, one source could be the employees’ job applications. In this case, each job application was filed electronically and can be read into a data analytics programs. These job applications list the last dozen job titles the employee held, prior to their position at this company. Now the expert economist lets out a small groan. In all, there are tens of thousands of unique job titles. It would be difficult (or if not difficult, silly) to add every single prior job title as a control in the model. So, it would make sense to organize these prior job titles into defined categories. But how?

This is one instance where new techniques in data science come into play.

28

OCT 2020

Upcoming EmployStats Seminar for State Auditors

Posted by Carl McClain | Big Data, Data Analytics, Statistical Analysis

EmployStats is honored to announce it be teaching a course on statistical sampling for the Texas State Auditors Office (SAO) this winter. The course, titled Statistical Sampling for Large Audits, will take place online between December 14 and 15, 2020.

The State Auditor’s Office (SAO) is the independent auditor for Texas state government. The SAO performs audits, reviews, and investigations of any entity receiving state funds. EmployStats’ principal economist, Dwight Steward, Ph.D., along with Matt Rigling, MA and Carl McClain, MA, will be instructing this course for auditors from state and local government.

Over this two day, all-online course, the EmployStats team will provide a crash course to participants in the uses of statistical sampling, how statistical samples are conducted, and when statistical samples are legally and scientifically valid in performing audits.

To find out more about the seminar and the Texas State Auditor’s Office, please visit the SAO Website. For more on EmployStats, visit our website: Employstats.com!

8

JAN 2020

EmployStats Publishes Big Data Book

Posted by Carl McClain | Big Data, Data Analytics, Statistical Analysis

Big Data permeates our society, but how will it affect U.S. courts? In civil litigation, attorneys and experts are increasingly reliant on analyzing of large volumes of electronic data, which provide information and insight into legal disputes that could not be obtained through traditional sources. There are limitless sources of Big Data: time and payroll records, medical reimbursements, stock prices, GPS histories, job openings, credit data, sales receipts, and social media posts just to name a few. Experts must navigate complex databases and often messy data to generate reliable quantitative results. Attorneys must always keep an eye on how such evidence is used at trial. Big Data analyses also present new legal and public policy challenges in areas like privacy and cybersecurity, while advances continue in artificial intelligence and algorithmic design. For these and many other topics, Employstats has a roadmap on the past, present, and future of Big Data in our legal system.

Order your copy of Dr. Dwight Steward and Dr. Roberto Cavazos’ book on Big Data Analytics in U.S. Courts!

6

JAN 2020

Economics and Statistics Experts in Wage and Hour Litigation

Posted by Matt Rigling | Big Data, Data Analytics, Economics, Employment, Labor data, Wage and hour cases

Complex wage and hour litigation often involves significant data management and sophisticated analyses in order to assess potential liability and damages. This article highlights common wage and hour data management issues, sampling and surveying, as well as provides a case study as an example of the use of sampling in an overtime misclassification case.

Download Dr. Dwight Steward and Matt Rigling’s paper on wage and hour expert economists here!

Economics and Statistics Experts in Wage and Hour Litigation

20

MAY 2019

Benford’s Law and Fraud Detection

Posted by Carl McClain | Big Data, Data Analytics

Civil fraud cases hinge on litigants proving where specific fraudulent activity occurred. Tax returns, sales records, expense reports, or any other large financial data set can be manipulated. In many instances of fraud, the accused party diverts funds or creates transactions, intending to make their fraud appear as ordinary or random entries. More clever fraudsters ensure no values are duplicated or input highly specific dollar and cent amounts. Such ‘random’ numbers, to them, may appear normal, but few understand or replicate the natural distribution of numbers known as Benford’s Law.

A staple of forensic accounting, Benford’s Law is a useful tool for litigants in establishing patterns of fraudulent activity.

Benford’s Law states that, for any data set of numbers, the number 1 will be the leading numeral about 30% of the time, the number 2 will be the leading numeral about 18% of the time, and each subsequent number (3-9) will be a leading number with decreasing frequency. This decreasing frequency of numbers, from 1 though 9, can be represented by a curve that looks like this:

Frequency of each leading digit predicted by Benford’s Law.

For example, according to Benford’s Law, one would expect that more street addresses start with a 1 than a 8 or 3; such hypothesis can be tested and proven. The same pattern holds for any number of phenomenon: country populations, telephone numbers, passengers on a plane, or the volumes of trades. This predicted distribution permeates many aspects of numbers and big data sets. But Benford’s Law is not absolute: it does require larger data sets, and that all the leading digits (1-9) must have a theoretically equal chance of being the leading digit. Benford’s Law, for example, would not apply to a data set where only 4s or 9s are the leading number. Financial data sets do comport with a Benford distribution.

In accounting and financial auditing, Benford’s Law is used to test a data set’s authenticity. False transaction data is typically tampered by changing values or adding additional fake data. The test, therefore, is an early indicator if a data set has been altered or artificially created. Computer generated random numbers will tend to show an equal distribution of leading digits. Even manually created false entries will tend to have some sort of underlying pattern. A person may, for example, input more fake leading digits based on numbers closer to their typing fingers (5 and 6).

An examiner would compare the distribution of leading digits in the data set, and the Benford distrubtion. Then, the examiner would statistically test if the proportion of leading numbers in the data set matches a Benford distribution. The resulting “Z-scores” give a measure of how distorted these distributions are, with higher “Z-scores” implying a more distorted data set, which implies artificially created data.

If a data set violates Benford’s Law, that alone does not prove such transactions numbers fraudulent. But, a violation does give auditors, economists, and fact finders an additional reason to scrutinize individual transactions.

2

APR 2019

Big Data CLE in Baltimore, MD

Posted by Matt Rigling | Big Data, Data Analytics

On April 5, 2019, Dr. Dwight Steward, Ph.D. will be speaking alongside Robert Cavazos, Ph.D., Kyle Cheek, Ph.D., and Vince McKnight. The experts and attorney will be presenting together on a panel at the EmployStats sponsored CLE seminar, titled Data Analytics in Complex Litigation. The seminar will take place at the University of Baltimore in the Merrick School of Business, and will run from 9:30 AM to 1:30 PM.

The speakers will cover a spectrum of issues on Big Data Analytics, and its use in legal applications. Specifically, the general session of the CLE will provide an overview of data analytics in a legal context, discussing the various aspects of how to manage large data sets in complex litigation settings. Attendees will then be able to choose between two breakout sessions, Data Analytics in Litigation and Healthcare Litigation. Lunch will be included.

To find out more on the upcoming CLE, visit: www.bigdatacleseminar.com

Also, make sure to follow our blog and stay up to date with Employstats news and sponsored events! www.EmployStats.com

15

MAR 2019

Data Analytics and the Law: Putting it Together

Posted by Carl McClain | Big Data, Data Analytics, Economics

This series on data analytics in litigation emphasized how best practices help secure reliable, valid, and defensible results based off of “Big Data.” Whether it is inter-corporate litigation, class actions, or whistleblower cases, electronic data is a source of key insights. Courts hold wide discretion in admitting statistical evidence, which is why opposing expert witnesses scrutinize or defend results so rigorously. There is generally accepted knowledge on the techniques, models, and coding languages for generating analytical results from “Big Data.” However, the underlying assumptions of a data analysis are biased. These assumptions are largest potential source of error, leading parties to confuse, generalize, or even misrepresent their results. Litigants need to be aware of and challenge such underlying assumptions, especially in their own data-driven evidence.

When it comes to big data cases, the parties and their expert witnesses should be readily prepared with continuous probing questions. Where (and on what program) are the data stored, how they are interconnected, and how “clean” they are, directly impact the final analysis. These stages can be overlooked, leading parties to miss key variables or spend additional time cleaning up fragmented data sets. When the data are available, litigants should not miss on opportunities due to lack of preparation or foresight. When data do not exist or they do not support a given assertion, a party should readily examine its next best alternative.

When the proper analysis is compiled and presented, the litigating parties must remind the court of the big picture: how the analysis directly relates to the case. Do the results prove a consistent pattern of “deviation” from a given norm? In other instances, an analysis referencing monetary values can serve as a party’s anchor for calculating damages.

In Big Data cases, the data should be used to reveal facts, rather than be molded to fit assertions.