That’s a question that comes up a lot in wage and hour land employment lawsuits. Typically the question is how many employees do I need to look at to have a statistically significant sample?
In some instances it’s not feasible to collect data or get all the records for
all the employees of a particular company. Sometimes the data is kept
in such a way that it takes a lot of effort to get that information. In
other instances it is a matter of the limitations of imposed by the court.
In any event, that’s a question that comes up a number times in wage and hour lawsuits particularly ones involving class or collective actions. So what’s the answer?
Generally, the size of the sample needs to be sufficiently large so that it is representative of
the entire employee population. That number could be relatively small say 40 employees or relatively large say to 200 employees depending on the number of employees at the company and the characteristics of the employee universe that is being analyzed.
For example if there are no meaningful distinctions between the employees in the universe, that is
it is generally accepted that all the employees are pretty much all
similarly situated, then a sheer simple random sample could be
appropriate.
That is, you could simply draw names from a hat, essentially. A simple random sample typically requires the smallest number of employees.
If there are distinctions between employees that need to be accounted for, then
either a larger sample or some type of stratified sampling could be appropriate.
Even if there are distinctions between employees, if the sample is sufficiently large then distinctions between the employees in the data could take care of themselves.
For instance, assume that you have a population of 10,000 employees and they are
divided into four different groups that need to be looked at differently.
One way to do a sample in this setting is to sample over each of the different groups of employees separately. The main purpose of the individual samples is to make sure that you have the appropriate number of employees in each of the different groups. That is, to make sure that the number of employees in the different samples are sufficiently representative of the distribution of the different groups of employees in the overall population.
Another way to do this is to simply just take a large enough sample so that the distinctions take care of themselves. If the sample is sufficiently large then the distribution of the different groups of employees in the sample should on be representative of the employee population as a whole.
So in this example, if there is a sufficiently large sample it could be okay to use a simple random sample and you would get to the same point as a more advanced stratified type of approach.
The key however is to make sure that the sample is sufficiently large that of course depends on the overall population and the number of groups of employees being studied.