Working with large data sets: The new CMS medial records files

The new data files released by the CMS regarding the payments made to U.S. medical doctors by drug and medical device manufacturers contains a treasure trove of information.  However, the large size of the data will limit the use and the nuggets that can mined for some.

Using the statistical program STATA,  which is generally one of the fastest and most efficient ways to handle large data sets, required an allocation of 6G of RAM memory to just read in the program. STATA is efficient at handling large wage and hour, employment, and business data sets (like ones with many daily prices)

The table below shows what STATA required in terms of memory to be able to read the data:
Current memory allocation

current                                 memory usage
settable          value     description                 (1M = 1024k)
——————————————————————–
set maxvar         5000     max. variables allowed           1.947M
set memory         6144M    max. data space              6,144.000M
set matsize         400     max. RHS vars in models          1.254M
———–
6,147.201M

 

 

 

 

Published by

Dwight Steward, Ph.D.

Dr. Steward regularly writes and speaks on topics involving business and individual economic damages, employment audits, and the analysis of payroll and time data in wage and hour investigations. Dr. Steward has also held teaching positions at The University of Texas-Austin in the Department of Economics and in the Red McCombs School of Business, The College of Business at Sam Houston State University, and at The University of Iowa. He has taught numerous courses in statistics, corporate finance, labor economics, business policies, managerial economics, and microeconomics.