All data projects can benefit from building a Data Management Plan (“DMP”) before the project begins.  Typically a DMP is a formal document that describes your data and what your team will do with it during and after the data project.

There is no cookie-cutter DMP that is right for every project, but in most cases the following questions should be addressed in your DMP:

  1. What kind of data will your project analyze?  What file formats and software packages will you use?  What will your data output be?  How will you collect and process the data?
  2. How will you document and organize your data?  What metadata will you collect?  What standards and formats will you use?
  3. What are your plans for data access within your team?  What are the roles that the individuals in your team will play in the data analysis process?  How will you address any privacy or ethical issues, if applicable?
  4. What are your plans for long term archiving?  What file formats will you archive the data in?  Who will be responsible for the data after the project is complete?  Where will you save the files?
  5. What outside resources do you need for your project?  How much time will the project take your team to complete and audit?  How much will it cost?

When working on any type of data project, planning ahead is a crucial step.  Before starting in on a project, it’s important to think through as many of the details as possible so you can budget enough time and resources to accomplish all of the objectives.  As a matter of fact, some organizations and government entities require a Data Management Plan (“DMP”) to be in place in all of their projects.

 

A DMP is a formal document that describes the data and what your team will do with it during and after the data project.  Many organizations and agencies require one, and each entity has specific requirements for their DMPs.

 

DMPs can be created in just a simple readme.txt file, or can be as detailed as DMPs tailored to specific disciplines using online templates such as DMPTool.org.  The DMPTool is designed to help create ready-to-use data management plans.

This past week, Employstats associate Matt Rigling visited Washinton D.C. for a training course led by StataCorp experts. The course was titled Using Stata Effectively: Data Management, Analysis, and Graphics Fundamentals, and was taught by instructor Bill Rising at the MicroTek Training Solutions facility, just a few blocks away from the White House.

Here at Employstats, our analysts utilize the statistical software package Stata for data management, as well as data analysis in all types of wage & hour, economic, and employment analyses. With Stata, all analyses can be reproduced and documented for publication and review.

The training course covered topics ranging from Stata’s syntax to data validation and generation, and even topics such as estimation and post-estimation. “I took away a lot of useful techniques from the Stata course, and I learned about some new features of Stata 14, such as tab auto-complete and the command to turn Stata Markup files into reproducible do-files. Most importantly, I learned data manipulation skills that will help me work more efficiently and accurately.” said associate Matt Rigling.