Working with large data sets: The new CMS medial records files

The new data files released by the CMS regarding the payments made to U.S. medical doctors by drug and medical device manufacturers contains a treasure trove of information. However, the large size of the data will limit the use and the nuggets that can mined for some.

Using the statistical program STATA, which is generally one of the fastest and most efficient ways to handle large data sets, required an allocation of 6G of RAM memory to just read in the program. STATA is efficient at handling large wage and hour, employment, and business data sets (like ones with many daily prices)

The table below shows what STATA required in terms of memory to be able to read the data:
Current memory allocation

current                                 memory usage
settable          value     description                 (1M = 1024k)
——————————————————————–
set maxvar         5000     max. variables allowed           1.947M
set memory         6144M    max. data space              6,144.000M
set matsize         400     max. RHS vars in models          1.254M
———–
6,147.201M

Post Views: 490

Working with large data sets: The new CMS medial records files

Published by

Dwight Steward, Ph.D.