The new data files released by the CMS regarding the payments made to U.S. medical doctors by drug and medical device manufacturers contains a treasure trove of information. However, the large size of the data will limit the use and the nuggets that can mined for some.
Using the statistical program STATA, which is generally one of the fastest and most efficient ways to handle large data sets, required an allocation of 6G of RAM memory to just read in the program. STATA is efficient at handling large wage and hour, employment, and business data sets (like ones with many daily prices)
The table below shows what STATA required in terms of memory to be able to read the data:
Current memory allocation
current memory usage
settable value description (1M = 1024k)
——————————————————————–
set maxvar 5000 max. variables allowed 1.947M
set memory 6144M max. data space 6,144.000M
set matsize 400 max. RHS vars in models 1.254M
———–
6,147.201M