Perhaps one of the most useful STATA commands out there…

Working with wage and hour data and employment data, like we do on a daily basis involves the analysis of very large data sets.  Big data and employment data are often one in the same.  STATA has a very useful command that allows you to load in large Excel 2007/2010 spreadsheet files.  It is:

set excelxlsxlargefile on

This simple command allows the user to bypass the pre-set limit on spreadsheet size.

Just remember, STATA and your computer will be unresponsive during the load.  So be patient and let it all load up.

 

Calculating business interruption damages of a cattle operation 

There a number of different types of economic damages that arise in business interruption cases. IMG_0125One type of damage is the lost profits associated with the business interruption. Another type has to do with the loss of asset and/or property values. Out-of-pocket expenses associated with the incident that caused the business interruption is another common loss in business litigation.

The example below involves business interruption litigation involving a midsize cattle operation. The impacted cattle operation is is based on an 1000 acre ranch in Central Texas. The cattle company earned its revenue from the sell of registered cows and calves, commercial steers, hay and other agricultural products related to cattle production.
The company’s the ranch operation was damaged by a welding fire that was sparked in the main section of the property. The fire caused a number of buildings to be damaged. Several livestock areas suffered damage that resulted in the death of livestock and the damage and destruction of machinery and equipment.
The company had business interruption insurance. The company’s insurance policy worked off a formula that reimbursed the company for a set period of lost profits, out-of-pocket expenses, and expenses to get the business back to where it was prior to the incident.

After the fire the ranch management filed a claim with the insurance company. The insurance company did their analysis and reimbursed them for their losses according to the policy. The ranch owners did not agree with the amount that was paid by the insurance company. The ranch owners calculated that they were both out of pocket a lot more than the insurance company calculated and their lost profits were higher. The ranch also calculated that it would take a longer period of time for them to be able to get back up to speed.
Ultimately after several year months of dealing with insurance company the ranch owners suit the insurance company and other parties including the company that started the fire. In this case the ranch alleges that several types of economic loss associated with the business interruption have occurred.

The ranch first alleges incurred out-of-pocket us expenses associated with the cleanup of the damage and destruction of its property and assets. In this case these expenses include removing machinery replacing machinery and disposing of livestock. In addition the company has also incurred a loss associated with the amount that will be required to get the company back to where it was before the fire. In this instance, these expenses include fixing and replacing the machinery fixing and replacing buildings and purchasing new livestock. In addition to the out-of-pocket expenses associated with the remediation of the damage and the purchase of new equipment and livestock, the business operation has also alleges lost profits during in the time period in which they were the remediation occurred. In this instance the company lost out on calf sales, hay, and other agricultural products that were damaged or destroyed from the fire. The ranch operation also was not able to conduct its auction of its crop because of the fire.

Big data question: How big of a random sample is big enough in a wage in hour case?

That’s a question that comes up a lot in wage and hour land employment lawsuits.  Typically the question is how many employees do I need to  look at to have a statistically significant sample?

bigdataIn some instances it’s not feasible to collect data or get all the records for
all the employees of a particular company. Sometimes the data is kept
in such a way that it takes a lot of effort to get that information.  In
other instances it is a matter of the limitations of imposed by the court.

In any event, that’s a question that comes up a number times in wage and hour lawsuits particularly ones involving class or collective actions. So what’s the answer?

Generally, the size of the sample needs to be sufficiently large so that it is representative of
the entire employee population. That number could be relatively small say 40 employees or relatively large say to 200 employees depending on the number of employees at the company and the characteristics of the employee universe that is being analyzed.

For example if there are no meaningful distinctions between the employees in the universe, that is
it is generally accepted that all the employees are pretty much all
similarly situated, then a sheer simple random sample could be
appropriate.

That is, you could simply draw names from a hat, essentially. A simple random sample typically requires the smallest number of employees.

If there are distinctions between employees that need to be accounted for, then
either a larger sample or some type of stratified sampling could be appropriate.
Even if there are distinctions between employees, if the sample is sufficiently large then distinctions between the employees in the data could take care of themselves.

For instance, assume that you have a population of 10,000 employees and they are
divided into four different groups  that need to be looked at differently.

One way to do a sample in this setting is to sample over each of the different groups of employees separately. The main purpose of the individual samples is to make sure that you have the appropriate number of employees in each of the different groups. That is, to make sure that the number of employees in the different samples are sufficiently representative of the distribution of the different groups of employees in the overall population.

Another way to do this is to simply just take a large enough sample so that the distinctions take care of themselves.  If the sample is sufficiently large then the distribution of the different groups of employees in the sample should on be representative of the employee population as a whole.

So in this example, if there is a sufficiently large sample it could be okay to use a simple random sample and you would get to the same point as a more advanced stratified type of approach.

The key however is to make sure that the sample is sufficiently large that of course depends on the overall population and the number of groups of employees being studied.

STATA statistical code for estimation of Millimet et al. (2002) econometric worklife model

The STATA code for estimating the Millimet et a;. (2002) econometric worklife model can be found below. The code  will need to be adjusted to fit your purposes. However, the basic portions are here.

use 1992-2013, clear

drop if A_W==0
keep if A_A>=16 & A_A<86

*drop if A_MJO==0
*drop if A_MJO==14 | A_MJO==15

gen curr_wkstate = A_W>1
lab var curr_wkstate “1= active in current period”
gen prev_wkstate = prev_W>1
lab var prev_wkstate “1= active in previous period”
gen age = A_A
gen age2 = age*age
gen married = A_MA<4
gen white = A_R==1
gen male = A_SE==1

gen mang_occ = A_MJO<3
gen tech_occ = A_MJO>2 & A_MJO<7
gen serv_occ = A_MJO>6 & A_MJO<9
gen oper_occ = A_MJO>8

gen occlevel = 0
replace occlevel = 1 if mang_occ==1
replace occlevel = 2 if tech_occ==1
replace occlevel = 3 if serv_occ==1
replace occlevel = 4 if oper_occ ==1

gen lessHS = A_HGA<=38
gen HS = A_HGA==39
gen Coll = A_HGA>42
gen someColl = A_HGA>39 & A_HGA<43

gen white_age = white*age
gen white_age2 = white*age2
gen married_age = married*age

gen child_age = HH5T*age

/*
gen mang_occ_age = mang_occ*age
gen tech_occ_age = tech_occ*age
gen serv_occ_age = serv_occ*age
gen oper_occ_age = oper_occ*age
*/

merge m:1 age using mortalityrates

keep if _m==3
drop _m

gen edlevel = 1*lessHS + 2*HS + 3*someColl + 4*Coll

save anbasemodel, replace
*/ Active to Active and Active to Inactive probabilities

local g = 0
local e = 1

forvalues g = 0/1 {

forvalues e = 1/4 {

use anbasemodel, clear

xi: logit curr_wkstate age age2 white white_age white_age2 married married_age HH5T i.year_out if prev_wk==1 & male==`g’ & HS==1
*Gives you conditional probability
*summing these figures gives the average predicted probabilities

predict AAprob

keep if occlevel==`e’
*collapse (mean) AAprob mortality, by(age)

collapse (mean) AAprob mortality (rawsum) MARS [aweight=MARS], by(age)

gen AIprob = 1-AAprob

replace AAprob = AAprob*(1-mortality)
replace AIprob = AIprob*(1-mortality)

save Active_probs, replace

*Calculates Inactive first period probabiliteis

use anbasemodel, clear

xi: logit curr_wkstate age age2 white white_age white_age2 married married_age HH5T i.year_out if prev_wk==0 & male==`g’ & HS==1

predict IAprob

keep if occlevel==`e’

*collapse (mean) IAprob mortality , by(age)
collapse (mean) IAprob mortality (rawsum) MARS [aweight=MARS], by(age)

gen IIprob = 1-IAprob
save Inactive_probs, replace

*Calculates WLE for Active and Inactive

merge 1:1 age using Active_probs

drop _m

order AAprob AIprob IAprob IIprob
*Set the probablilties for end period T+1

*Note the top age changes to 80 in the later data sets
gen WLE_Active = 0
replace WLE_Active = AAprob[_n-1]*(1+AAprob) + AIprob[_n-1]*(0.5 + IAprob)
gen WLE_Inactive = 0
replace WLE_Inactive = IAprob[_n-1]*(0.5+AAprob) + IIprob[_n-1]*IAprob

gen WLE_Active_2 = 0
replace WLE_Active_2 = WLE_Active if age==85

gen WLE_Inactive_2 = 0
replace WLE_Inactive_2 = WLE_Inactive if age==85
local x = 1
local y = 80 – `x’

forvalues x = 1/63 {

replace WLE_Active_2 = AAprob*(1+WLE_Active_2[_n+1]) + AIprob*(0.5 + WLE_Inactive_2[_n+1]) if age==`y’
replace WLE_Inactive_2 = IAprob*(0.5 + WLE_Active_2[_n+1]) + IIprob*WLE_Inactive_2[_n+1] if age==`y’

local x = `x’ + 1
local y = 80 – `x’

}

keep age WLE_Active_2 WLE_Inactive_2
rename WLE_Active_2 WLE_Active_`g’_`e’
rename WLE_Inactive_2 WLE_Inactive_`g’_`e’

save WLE_`g’_`e’, replace

keep age WLE_Active_`g’_`e’
save WLE_Active_`g’_`e’, replace

use WLE_`g’_`e’, clear
keep age WLE_Inactive_`g’_`e’
save WLE_Inactive_`g’_`e’, replace

di `e’
/**End of Active to Active and Active to Inactive probabilities*/

local e = `e’ + 1
}

local g = `g’ + 1

}
local g = 0
local e = 1

forvalues g = 0/1 {

forvalues e = 1/4{

if `e’ == 1 {
use WLE_Active_`g’_`e’, clear
save WLE_Active_`g’_AllOccLevels, replace

use WLE_Inactive_`g’_`e’, clear
save WLE_Inactive_`g’_AllOccLevels, replace

}

if `e’ > 1 {

use WLE_Active_`g’_AllOccLevels, replace
merge 1:1 age using WLE_Active_`g’_`e’
drop _m
save WLE_Active_`g’_AllOccLevels, replace

use WLE_Inactive_`g’_AllOccLevels, replace
merge 1:1 age using WLE_Inactive_`g’_`e’
drop _m
save WLE_Inactive_`g’_AllOccLevels, replace

}

local e = `e’ + 1
}

if `g’ ==1 {
use WLE_Active_0_AllOccLevels, clear
merge 1:1 age using WLE_Active_1_AllOccLevels
drop _m
save WLE_Active_BothGenders_AllOccLevels, replace
use WLE_Inactive_0_AllOccLevels, clear
merge 1:1 age using WLE_Inactive_1_AllOccLevels
drop _m
save WLE_Inactive_BothGenders_AllOccLevels, replace
}

local g = `g’ + 1

}

!del anbasemodel.dta

Younger workers today have slightly less attachment to the workforce than younger workers in the past

Big Data. Bureau of Labor Statistics. Survey data. Employment Big Data.  Those are all things that calculating worklife expectancy for U.S. workers requires.  Worklife expectancy is similar to life expectancy and indicates how long a person can be expected to be active in the workforce over their working life.  The worklife expectancy figure takes into account the anticipated to time out of the market due to unemployment, voluntary leaves, attrition, etc.

The goal of our recent work is to update the Millimet et al (2002) worklife expectancy paper and account for more recent CPS data. Their paper uses data from  the 1992 to 2000 time period. Our goal is to update that paper using data from 2000 to 2013 and  see if estimating the Millimet et al (2002) econometric worklife models with more recent data changes the results in the 2002 paper in any substantive way.

Finding: Overall, the worklife expectancy estimated using more recent data from 2000-2013 is shorter then in the earlier time period (1992-2000) data set. This is true for younger worker (18-early 40’s); younger workers from the more recent cohorts have a shorter expected work life then younger workers in the earlier cohorts.  Conversely, while older workers in their 40s and 50s have a slightly longer worklife expectancy in the later time period data set. We are in the process of determining the statistical significance of these differences.

Table 4. Comparsion of Worklife Expectancy for 1992-2000 and 2001-2013 Time Periods
1992-2000 2001-2013
Age Less than High School High School Less than High School High School
18 31.469 38.410 30.569 37.314
19 30.926 37.846 30.128 36.833
20 30.306 37.180 29.603 36.237
21 29.670 36.493 29.021 35.590
22 29.027 35.787 28.419 34.917
23 28.365 35.054 27.809 34.231
24 27.685 34.293 27.205 33.539
25 27.007 33.518 26.588 32.830
26 26.319 32.728 25.964 32.108
27 25.643 31.939 25.357 31.387
28 24.958 31.123 24.736 30.646
29 24.271 30.304 24.110 29.892
30 23.590 29.481 23.491 29.136
31 22.892 28.640 22.866 28.371
32 22.191 27.796 22.237 27.599
33 21.487 26.944 21.606 26.819
34 20.783 26.097 20.970 26.034
35 20.095 25.254 20.327 25.239
36 19.400 24.408 19.685 24.446
37 18.707 23.560 19.039 23.648
38 18.018 22.714 18.392 22.850
39 17.324 21.864 17.737 22.044
40 16.627 21.014 17.085 21.242
41 15.944 20.169 16.421 20.432
42 15.264 19.328 15.764 19.627
43 14.595 18.494 15.110 18.825
44 13.931 17.664 14.456 18.024
45 13.272 16.840 13.798 17.220
46 12.616 16.018 13.154 16.429
47 11.972 15.204 12.520 15.641
48 11.328 14.398 11.886 14.859
49 10.682 13.593 11.259 14.081
50 10.053 12.803 10.642 13.311
51 9.432 12.020 10.030 12.550
52 8.802 11.239 9.429 11.798
53 8.199 10.477 8.843 11.057
54 7.593 9.723 8.270 10.333
55 6.996 8.980 7.709 9.618
56 6.422 8.263 7.152 8.912
57 5.872 7.564 6.618 8.230
58 5.339 6.883 6.095 7.560
59 4.812 6.216 5.587 6.908
60 4.307 5.578 5.097 6.280
61 3.840 4.979 4.624 5.677
62 3.400 4.415 4.181 5.112
63 3.024 3.918 3.782 4.593
64 2.708 3.485 3.428 4.128
65 2.422 3.093 3.109 3.700
66 2.180 2.756 2.819 3.312
67 1.970 2.461 2.556 2.960
68 1.787 2.200 2.323 2.646
69 1.624 1.967 2.102 2.359
70 1.471 1.756 1.905 2.101
71 1.348 1.584 1.728 1.869
72 1.238 1.430 1.577 1.670
73 1.134 1.289 1.427 1.484
74 1.042 1.167 1.296 1.322
75 0.965 1.065 1.184 1.181
76 0.904 0.983 1.077 1.054
77 0.834 0.899 0.980 0.942
78 0.784 0.836 0.894 0.843
79 0.735 0.778 0.807 0.750
80 0.694 0.735 0.675 0.636

Notes:

The econometric model described by Millimet  et al (2002) and logistic regression equations by gender and education are used to calculate the worklife expectancy estimates.   The worklife model iin the left panel of the table is estimated using matched CPS cohorts from 1992–2000 time period as described in the Millimet et al. (2002) paper.   The model on the right panel is estimated using data from 2001-2013.

The logistic equation includes independent variable for age, age squared, race, race by age interaction, race by age interaction squared, marital status, martial status by age, occupation dummies, year and year dummies.

The model is first estimated separately for each gender and education level combination for active persons.  The model is then estimated again for inactive persons.  The educational attainment variables used to estimate our model differ from that of Millimet et al. (2002)   In our model, only individuals whose highest level of attainment is high school are included in the high school category.  Millimet et al (2002) includes individuals with some college in the high school category.

Replication of the Millimet et al. (2002) work was sufficient and yielded similar results

Big Data. Bureau of Labor Statistics. Survey data. Employment Big Data.  Those are all things that calculating worklife expectancy for U.S. workers requires.  Worklife expectancy is similar to life expectancy and indicates how long a person can be expected to be active in the workforce over their working life.  The worklife expectancy figure takes into account the anticipated to time out of the market due to unemployment, voluntary leaves, attrition, etc

Overall the goal of our recent work is to update the Millimet et al (2002) worklife expectancy paper and account for more recent CPS data. Their paper uses data from  the 1992 to 2000 time period. Our goal is to update that paper using data from 2000 to 2013. The main goal of the paper is to see if estimating the Millimet et al (2002) econometric worklife models with more recent data changes the results in the 2002 paper in any substantive way

As for the results, overall there are several findings. First we were able to create a match CPS data set of 201,797 individuals where as the Millimet et al. (2002) found 200,916 matched individuals.

Overall we match their results very closely as well.  For example Millimet et al. (2002) found that a male who was 26 years old with a less than a high school education had a 27.27 years WLE remaining while we found that person had 26.319 years remaining based on our replication of their work. They found that the same age person with a high school had 32.89 years remaining while we found 32.728 years remaining. The replication was particularly good for both less than high school and high school levels of educational attainment.

The WLE  numbers are close but not quite as close for college and some college. This is primarily due to the fact that we use different definitions of some college and college then Millimet et al. (2002)  did in their 2002 paper

Table 3. Comparsion of Millimet et al. (2002) and Steward and Gaylor (2015) Active to Active Worklife Expectancy Probabilities
Millimet et al (2002) Steward and Gaylor (2015) Replication
Age Less than High School High School Less than High School High School
18 32.331 38.944 31.469 38.410
19 31.801 38.239 30.926 37.846
20 31.247 37.522 30.306 37.180
21 30.684 36.794 29.670 36.493
22 30.080 36.058 29.027 35.787
23 29.450 35.294 28.365 35.054
24 28.766 34.513 27.685 34.293
25 28.035 33.711 27.007 33.518
26 27.270 32.890 26.319 32.728
27 26.495 32.052 25.643 31.939
28 25.710 31.201 24.958 31.123
29 24.923 30.341 24.271 30.304
30 24.131 29.477 23.590 29.481
31 23.345 28.606 22.892 28.640
32 22.556 27.735 22.191 27.796
33 21.775 26.862 21.487 26.944
34 21.006 25.989 20.783 26.097
35 20.233 25.112 20.095 25.254
36 19.452 24.240 19.400 24.408
37 18.681 23.370 18.707 23.560
38 17.921 22.504 18.018 22.714
39 17.178 21.641 17.324 21.864
40 16.459 20.782 16.627 21.014
41 15.734 19.928 15.944 20.169
42 15.031 19.081 15.264 19.328
43 14.333 18.242 14.595 18.494
44 13.669 17.410 13.931 17.664
45 13.020 16.588 13.272 16.840
46 12.381 15.775 12.616 16.018
47 11.758 14.974 11.972 15.204
48 11.144 14.185 11.328 14.398
49 10.538 13.409 10.682 13.593
50 9.952 12.646 10.053 12.803
51 9.379 11.898 9.432 12.020
52 8.836 11.167 8.802 11.239
53 8.299 10.459 8.199 10.477
54 7.775 9.772 7.593 9.723
55 7.265 9.107 6.996 8.980
56 6.767 8.456 6.422 8.263
57 6.261 7.829 5.872 7.564
58 5.800 7.236 5.339 6.883
59 5.397 6.678 4.812 6.216
60 5.016 6.153 4.307 5.578
61 4.678 5.672 3.840 4.979
62 4.350 5.225 3.400 4.415
63 4.060 4.815 3.024 3.918
64 3.797 4.420 2.708 3.485
65 3.574 4.061 2.422 3.093
66 3.395 3.741 2.180 2.756
67 3.224 3.445 1.970 2.461
68 3.047 3.162 1.787 2.200
69 2.873 2.886 1.624 1.967
70 2.691 2.621 1.471 1.756
71 2.528 2.401 1.348 1.584
72 2.362 2.196 1.238 1.430
73 2.170 1.999 1.134 1.289
74 2.002 1.829 1.042 1.167
75 1.898 1.672 0.965 1.065
76 1.743 1.533 0.904 0.983
77 1.592 1.449 0.834 0.899
78 1.514 1.339 0.784 0.836
79 1.461 1.274 0.735 0.778
80 1.374 1.172 0.694 0.735
81 1.273 1.046 0.661 0.687
82 1.222 0.993 0.631 0.656
83 1.121 0.912 0.604 0.623
84 0.874 0.755 0.569 0.585
85 0.433 0.355 0.522 0.532

Notes:

The econometric model described by Millimet  et al (2002) and logistic regression equations by gender and education are used to calculate the worklife expectancy estimates.   The model is estimated using matched CPS cohorts from 1992–2000 time period as described in the Millimet et al. (2002) paper.  The logistic equation includes independent variable for age, age squared, race, race by age interaction, race by age interaction squared, marital status, martial status by age, occupation dummies, year and year dummies.  The model is first estimated separately for each gender and education level combination for active persons.  The model is then estimated again for inactive persons.

 

Comparsion of CPS matched data sets – Millmet et al (2002) to Steward and Gaylor (2015)

Big Data. Bureau of Labor Statistics. Survey data. Employment Big Data.  Those are all things that calculating worklife expectancy for U.S. workers requires.  Worklife expectancy is similar to life expectancy and indicates how long a person can be expected to be active in the workforce over their working life.  The worklife expectancy figure takes into account the anticipated to time out of the market due to unemployment, voluntary leaves, attrition, etc.

Overall the goal of our recent work is to update the Millimet et al (2002) worklife expectancy paper and account for more recent CPS data. Their paper uses data from  the 1992 to 2000 time period. Our goal is to update that paper using data from 2000 to 2013. The main goal of the paper is to see if estimating the Millimet et al (2002) econometric worklife models with more recent data changes the results in the 2002 paper in any substantive way.

 

Our approach is two fold.  First we matched the BLS data cohorts based on the Millimet et al. (2002) and Peracchi and Welch (1995) papers. In a nutshell the CPS matching routine involves matching incoming and outgoing cohorts across a given year.  Once the data is matched, we then look at the work status of the individuals to determine if they were active or in active across the year that they were interviewed by the BLS. . We were able to create a match CPS data set of 201,797 individuals where as the Millimet et al. (2002) found 200,916 matched individuals.

Table 1. Comparsion of CPS cohort matched data sets
Year Millimet et al.  (2002) Steward and Gaylor (2015)
1992/93 37,709 36,652
1994/95 34,418 33,377
1996/97 31,691 32,739
1997/98 32,276 32,972
1998/99 32,083 32,893
1999/2000 32,739 33,164
Total 200,916 201,797

Notes:

The CPS data was matched using the algorithm similar to Millimet et al (2002) and Peracchi and Welch (1995).  Households in rotation 1-4 were matched using the household identifier number to the same household in rotations 5-8 of the following year. Individuals had to have the same sex, race and be a year older in rotation 5-8 to be determined a match.

 

New FLSA Rule due in Q1 2015 will address white collar OT exemptions @SHRM reports

This month’s HR magazine makes some educated guesses regarding the potential changes to the FLSA that can be expected to occur in Q1 2015.  According to SHRM’s sources, the new FLSA rule will:

* Increase the minimum salary threshold in determining FLSA exemption status

*More tightly define the percentage of the person’s work time that needs to engaged in exempt duties to be exempt from FLSA OT – SHRM suggest that the new threshold will be at least 50% similar to California’s OT standards

*Narrow the executive FLSA exemption by modifying or eliminating the primary duty standard.

Distribution of grants, consulting fees, and other payments to medical professionals by Speciality

The Center for Medical and Medicaid Services (CMS) new Open Payments database shows the consulting fees, research grants, travel and other reimbursements made to medical industry in 2013

There are 2,619,700 payments in the CMS data made to 356,190 physicians.   The average payment made to physicians was $255.22.   The median payment was $15.52.

Overall, Clinical Pharmacology and Orthopedic Surgery professions received the most grants and other payments.  The table below shows the percentage of average payment by specialty as a percentage of the average overall payment to all medical specialties.  The STATA code is listed below.

Medical Specialty % of average payment to all speciallites
Clinical Pharmacology 1516%
Orthopaedic Surgery 473%
Group: Multi-Specialty 436%
Nutritionist 328%
Medical Genetics 284%
Surgery 282%
Transplant Surgery 268%
Neurological Surgery 266%
Pathology 203%
Pediatrics 197%
Oral & Maxillofacial Surgery 160%
Laboratories 154%
Preventive Medicine 150%
Nuclear Medicine 135%
Neuromusculoskeletal Medicine 132%
Phlebology 130%
Radiology 123%
Thoracic Surgery 121%
Colon & Rectal Surgery 121%
Internal Medicine 116%
Anesthesiology 116%
Pharmacy Technician 115%
Dentist 108%
Group: Single Specialty 107%
Otolaryngology 97%
Other Service Providers 91%
Chiropractor 86%
Plastic Surgery 86%
Ophthalmology 84%
Allergy & Immunology 84%
Registered Nurse 83%
Agencies 81%
Technologists, Technicians & Other Technical Service Providers 79%
Physician Assistant 77%
Obstetrics & Gynecology 77%
Dermatology 74%
Podiatrist 73%
Psychiatry & Neurology 70%
Pain Medicine 64%
Urology 64%
Physical Medicine & Rehabilitation 57%
General Acute Care Hospital 57%
Counselor 56%
General Practice 52%
Dental Hygienist 43%
Student, Health Care 43%
Clinical Neuropsychologist 43%
Assistant, Podiatric 41%
Clinic/Center 40%
Optometrist 38%
Military Hospital 37%
Long Term Care Hospital 37%
Emergency Medicine 36%
Personal Emergency Response Attendant 36%
Respiratory, Developmental, Rehabilitative & Restorative Service Providers 35%
Dietary Manager 35%
Family Medicine 33%
Hospitalist 33%
Speech, Language & Hearing Service Providers 32%
Managed Care Organizations 31%
Electrodiagnostic Medicine 31%
Pharmacist 28%
Legal Medicine 27%
Nursing & Custodial Care Facilities 26%
Psychologist 24%
Hospital Units (Psychiatric and Rehabilitation) 22%
Licensed Practical or Vocational Nurse 20%
Suppliers 17%
Psychoanalyst 17%
Special Hospital 16%
Denturist 15%
Dental Laboratory Technician 14%
Social Worker 14%
Dietitian, Registered 13%
Residential Treatment Facilities: Mental Illness, Retardation, and/or Developmental Disabilities 11%
Psychiatric Hospital 10%
Eye & Vision Technician: Technologist 10%
Behavioral Analyst 9%
Emergency Medical Technician 9%
Dental Assistant 9%
Marriage & Family Therapist 9%
Chronic Disease Hospital 8%
Independent Medical Examiner 7%
Nursing Home Administrator 5%
Nurse’s Aide 5%

STATA Code

se “dataopenrecords-small.dta”, clear

rename recipient_state State
drop if State==”” | State==”AE” | State==”AA”| State==”AP” | State==”GU”| State==”ON” | State==”VI”| State==”PR”

keep physician_spec total number
rename p Specialty
sort S total number
drop if S==””
destring number, replace
collapse (mean) tot num , by(S)
rename tot Average_Payment_Amount
rename num Average_Number_of_Payments
outsheet S Average_P Average_N using “P:Business Dev ProjectsEmployStats9074 – OpenRecordsTablesspecialty_payments.csv”, comma nolabel replace

 

STATA or R for data analysis in wage and hour cases?

In the stats world there is somewhat of a debate going on regarding which statistical analyses programs are “better”.  Of course, the answer always depends on what you use it for.  Some like the open-source, developing nature of R.  While others like the established and tried STATA.

In the world of labor and employment economics and in ligation matters that require data analysis of large sets of data, STATA wins hands down.  However, the open source nature of R is appealing in some settings; but the many decades of pre-written (and de bugged) programs make STATA the best choice in most employment and wage and hour cases that require analysis of large data sets.  Performing basic tabulations and data manipulations in R requires many lines of code while STATA often has the command built in.

Here are some interesting snippets from the web on the R v STATA debate:

http://www.researchgate.net/post/What_is_the_difference_between_SPSS_R_and_STATA_software

The main drawback of R is the learning curve: you need a few weeks just to be able to import data and create a simple plot, and you will not cease learning basic operations (e.g. for plotting) for many years. You will stumble upon weirdest problems all the time because you have missed the comma or because your data frame collapses to a vector if only one row is selected.

However, once you mastered this, you will have the full arsenal of modern cutting-edge statistical techniques at your disposal, along with in-depth manuals, references, specialized packages, graphical interface, a helpful community — and all at no cost. Also, you will be able to do stunning graphics.

 

http://forum.thegradcafe.com/topic/44595-stata-or-r-for-statistics-software/

http://www.econjobrumors.com/topic/r-vs-stata-is-like-a-mercedes-vs-a-bus