## 28

**APR 2017**

Posted by Matt Rigling and Susie Wirtanen | Big Data, Statistical Analysis

Due to the massive computational requirements of analyzing big data, trying to find the best approach to big data projects can be a daunting task for most individuals. At EmployStats, our team of experts utilize top of the line data systems and software to seamlessly analyze big data and provide our clients with high quality analysis as efficiently as possible.

- The general approach for big data analytics begins with fully understanding the data provided as a whole. Not only must the variable fields in the data be identified, but one must also understand what these variables represent and determine what values are reasonable for each variable in the data set.
- Next, the data must be cleaned and reorganized into the clearest format, ensuring that data values are not missing and are within reasonable ranges of certainty. As the size of the data increases, the amount of work necessary to clean the data increases. In larger datasets there are more individual components which are typically dependent on each other, therefore it is necessary to write computer programs to evaluate the accuracy of the data.
- Once the entire dataset has been cleaned and properly formatted, one needs to define the question that will be answered with the data. One must look at the data and see how it relates to the question. The questions for big data projects may be related to frequencies, probabilities, economic models, or any number of statistical properties. Whatever it is, one must then process the data in the context of the question at hand.
- Once the answer has been obtained, one must determine that the answer is a strong answer. A delicate answer, or one that would significantly change if the technique of the analysis was altered, is not ideal. The goal of big data analytics is to have a robust answer, and one must try to attack the same question in a number of different ways in order to build confidence in the answer.