EmployStats Research Associate, Susan Wirtanen, recently visited New York, NY to attend a course in Stata.  Stata is a statistical software data analytics tool utilized by EmployStats analysts in almost all of our case work, especially wage and hour, and employment litigation.  The tools Susan learned in attending this training include data management, data manipulation, and tools used for complex analyses.  These skills will allow Susan to work quickly and efficiently through large data sets our clients may provide for analysis.   

Susan Wirtanen was hired at EmployStats in June 2016 as an Intern after graduating from the University of Texas in Austin with a Bachelor’s degree in Economics.  Susan recently began working full-time as a Research Associate at the beginning of 2017.  In addition to being a full-time employee, Susan coaches club volleyball here in Austin, and recently finished her first season of coaching.

This past week, Employstats associate Matt Rigling visited Washinton D.C. for a training course led by StataCorp experts. The course was titled Using Stata Effectively: Data Management, Analysis, and Graphics Fundamentals, and was taught by instructor Bill Rising at the MicroTek Training Solutions facility, just a few blocks away from the White House.

Here at Employstats, our analysts utilize the statistical software package Stata for data management, as well as data analysis in all types of wage & hour, economic, and employment analyses. With Stata, all analyses can be reproduced and documented for publication and review.

The training course covered topics ranging from Stata’s syntax to data validation and generation, and even topics such as estimation and post-estimation. “I took away a lot of useful techniques from the Stata course, and I learned about some new features of Stata 14, such as tab auto-complete and the command to turn Stata Markup files into reproducible do-files. Most importantly, I learned data manipulation skills that will help me work more efficiently and accurately.” said associate Matt Rigling.

 

The STATA code for estimating the Millimet et a;. (2002) econometric worklife model can be found below. The code  will need to be adjusted to fit your purposes. However, the basic portions are here.

use 1992-2013, clear

drop if A_W==0
keep if A_A>=16 & A_A<86

*drop if A_MJO==0
*drop if A_MJO==14 | A_MJO==15

gen curr_wkstate = A_W>1
lab var curr_wkstate “1= active in current period”
gen prev_wkstate = prev_W>1
lab var prev_wkstate “1= active in previous period”
gen age = A_A
gen age2 = age*age
gen married = A_MA<4
gen white = A_R==1
gen male = A_SE==1

gen mang_occ = A_MJO<3
gen tech_occ = A_MJO>2 & A_MJO<7
gen serv_occ = A_MJO>6 & A_MJO<9
gen oper_occ = A_MJO>8

gen occlevel = 0
replace occlevel = 1 if mang_occ==1
replace occlevel = 2 if tech_occ==1
replace occlevel = 3 if serv_occ==1
replace occlevel = 4 if oper_occ ==1

gen lessHS = A_HGA<=38
gen HS = A_HGA==39
gen Coll = A_HGA>42
gen someColl = A_HGA>39 & A_HGA<43

gen white_age = white*age
gen white_age2 = white*age2
gen married_age = married*age

gen child_age = HH5T*age

/*
gen mang_occ_age = mang_occ*age
gen tech_occ_age = tech_occ*age
gen serv_occ_age = serv_occ*age
gen oper_occ_age = oper_occ*age
*/

merge m:1 age using mortalityrates

keep if _m==3
drop _m

gen edlevel = 1*lessHS + 2*HS + 3*someColl + 4*Coll

save anbasemodel, replace
*/ Active to Active and Active to Inactive probabilities

local g = 0
local e = 1

forvalues g = 0/1 {

forvalues e = 1/4 {

use anbasemodel, clear

xi: logit curr_wkstate age age2 white white_age white_age2 married married_age HH5T i.year_out if prev_wk==1 & male==`g’ & HS==1
*Gives you conditional probability
*summing these figures gives the average predicted probabilities

predict AAprob

keep if occlevel==`e’
*collapse (mean) AAprob mortality, by(age)

collapse (mean) AAprob mortality (rawsum) MARS [aweight=MARS], by(age)

gen AIprob = 1-AAprob

replace AAprob = AAprob*(1-mortality)
replace AIprob = AIprob*(1-mortality)

save Active_probs, replace

*Calculates Inactive first period probabiliteis

use anbasemodel, clear

xi: logit curr_wkstate age age2 white white_age white_age2 married married_age HH5T i.year_out if prev_wk==0 & male==`g’ & HS==1

predict IAprob

keep if occlevel==`e’

*collapse (mean) IAprob mortality , by(age)
collapse (mean) IAprob mortality (rawsum) MARS [aweight=MARS], by(age)

gen IIprob = 1-IAprob
save Inactive_probs, replace

*Calculates WLE for Active and Inactive

merge 1:1 age using Active_probs

drop _m

order AAprob AIprob IAprob IIprob
*Set the probablilties for end period T+1

*Note the top age changes to 80 in the later data sets
gen WLE_Active = 0
replace WLE_Active = AAprob[_n-1]*(1+AAprob) + AIprob[_n-1]*(0.5 + IAprob)
gen WLE_Inactive = 0
replace WLE_Inactive = IAprob[_n-1]*(0.5+AAprob) + IIprob[_n-1]*IAprob

gen WLE_Active_2 = 0
replace WLE_Active_2 = WLE_Active if age==85

gen WLE_Inactive_2 = 0
replace WLE_Inactive_2 = WLE_Inactive if age==85
local x = 1
local y = 80 – `x’

forvalues x = 1/63 {

replace WLE_Active_2 = AAprob*(1+WLE_Active_2[_n+1]) + AIprob*(0.5 + WLE_Inactive_2[_n+1]) if age==`y’
replace WLE_Inactive_2 = IAprob*(0.5 + WLE_Active_2[_n+1]) + IIprob*WLE_Inactive_2[_n+1] if age==`y’

local x = `x’ + 1
local y = 80 – `x’

}

keep age WLE_Active_2 WLE_Inactive_2
rename WLE_Active_2 WLE_Active_`g’_`e’
rename WLE_Inactive_2 WLE_Inactive_`g’_`e’

save WLE_`g’_`e’, replace

keep age WLE_Active_`g’_`e’
save WLE_Active_`g’_`e’, replace

use WLE_`g’_`e’, clear
keep age WLE_Inactive_`g’_`e’
save WLE_Inactive_`g’_`e’, replace

di `e’
/**End of Active to Active and Active to Inactive probabilities*/

local e = `e’ + 1
}

local g = `g’ + 1

}
local g = 0
local e = 1

forvalues g = 0/1 {

forvalues e = 1/4{

if `e’ == 1 {
use WLE_Active_`g’_`e’, clear
save WLE_Active_`g’_AllOccLevels, replace

use WLE_Inactive_`g’_`e’, clear
save WLE_Inactive_`g’_AllOccLevels, replace

}

if `e’ > 1 {

use WLE_Active_`g’_AllOccLevels, replace
merge 1:1 age using WLE_Active_`g’_`e’
drop _m
save WLE_Active_`g’_AllOccLevels, replace

use WLE_Inactive_`g’_AllOccLevels, replace
merge 1:1 age using WLE_Inactive_`g’_`e’
drop _m
save WLE_Inactive_`g’_AllOccLevels, replace

}

local e = `e’ + 1
}

if `g’ ==1 {
use WLE_Active_0_AllOccLevels, clear
merge 1:1 age using WLE_Active_1_AllOccLevels
drop _m
save WLE_Active_BothGenders_AllOccLevels, replace
use WLE_Inactive_0_AllOccLevels, clear
merge 1:1 age using WLE_Inactive_1_AllOccLevels
drop _m
save WLE_Inactive_BothGenders_AllOccLevels, replace
}

local g = `g’ + 1

}

!del anbasemodel.dta

In the stats world there is somewhat of a debate going on regarding which statistical analyses programs are “better”.  Of course, the answer always depends on what you use it for.  Some like the open-source, developing nature of R.  While others like the established and tried STATA.

In the world of labor and employment economics and in ligation matters that require data analysis of large sets of data, STATA wins hands down.  However, the open source nature of R is appealing in some settings; but the many decades of pre-written (and de bugged) programs make STATA the best choice in most employment and wage and hour cases that require analysis of large data sets.  Performing basic tabulations and data manipulations in R requires many lines of code while STATA often has the command built in.

Here are some interesting snippets from the web on the R v STATA debate:

http://www.researchgate.net/post/What_is_the_difference_between_SPSS_R_and_STATA_software

The main drawback of R is the learning curve: you need a few weeks just to be able to import data and create a simple plot, and you will not cease learning basic operations (e.g. for plotting) for many years. You will stumble upon weirdest problems all the time because you have missed the comma or because your data frame collapses to a vector if only one row is selected.

However, once you mastered this, you will have the full arsenal of modern cutting-edge statistical techniques at your disposal, along with in-depth manuals, references, specialized packages, graphical interface, a helpful community — and all at no cost. Also, you will be able to do stunning graphics.

 

http://forum.thegradcafe.com/topic/44595-stata-or-r-for-statistics-software/

http://www.econjobrumors.com/topic/r-vs-stata-is-like-a-mercedes-vs-a-bus

 

 

The new data files released by the CMS regarding the payments made to U.S. medical doctors by drug and medical device manufacturers contains a treasure trove of information.  However, the large size of the data will limit the use and the nuggets that can mined for some.

Using the statistical program STATA,  which is generally one of the fastest and most efficient ways to handle large data sets, required an allocation of 6G of RAM memory to just read in the program. STATA is efficient at handling large wage and hour, employment, and business data sets (like ones with many daily prices)

The table below shows what STATA required in terms of memory to be able to read the data:
Current memory allocation

current                                 memory usage
settable          value     description                 (1M = 1024k)
——————————————————————–
set maxvar         5000     max. variables allowed           1.947M
set memory         6144M    max. data space              6,144.000M
set matsize         400     max. RHS vars in models          1.254M
———–
6,147.201M

 

 

 

 

The Center for Medical and Medicaid Services (CMS) new Open Payments database shows the consulting fees, research grants, travel and other reimbursements made to medical industry in 2013

There are 2,619,700 payments in the CMS data made to 356,190 physicians.   The average payment made to physicians was $255.22.   The median payment was $15.52

Table 1: Summary of Payments – STATA  output

table1

 

The physicians received an average total  of $1,877.11 in payments.  The median total payment for the 356,190 physicians in the data was $94.15

Table 2: Summary of Payments – STATA  output

tabl2

 

Below is the STATA code for the results:
count gen
bysort phy: gen hj =_n
bysort physician_p: gen hj =_n
sum hj, det
count if hj==1
sum tot, det
bysort physician_p: egen hj2 = tot(total)
bysort physician_p: egen hj2 = total(total_am)
sum hj2
sum hj2, det
sort physcian_p
sort physician_p
sum hj2 if _n==1, det
sum hj, det
sum hj2 if hj==1, det