When manual data entry of non-analyzable financial or wage data is not an option, OCR software and specialized designed and written computer software data cleaning routines is a good alternative.
For example in our approach, we use a number of OCR programs including Abbey Reader to first translate the data into a format that is recognized by statistical programs such as STATA and computer software script languages such as VBA.
Once the data is converted, we write specialized computer software routines to extract the relevant data from the converted file. The computer code, which is written in STATA, VBA, or other scripting language, puts the extracted data into a format that can be analyzed by statistical and spreadsheet programs.
These approach to converting wage, business, employment or other types of data has the advantage of being able tobe reproduced by either party if required.
Having both the data cleaning and statistical and economic analysis performed by the same economic outfit and team is desirable. Data cleaning is not performed in a vacuum; that is the very definition of ‘dirty data; depends on what the data is to be used for. Some data items may not convert very well by the OCR and software code, but the items may be of little value in the economic and statistical analysis in the first place.
One advantage of using the same research outfit to do both the data cleaning and the economic and statistical analysis is that the distinction gets made early in the analysis process.