Data Management Plan FAQs
What is a “data management plan”?
A data management plan is a document describing data (and/or digital materials) that have been or will be gathered in a study or project. It often includes details about how these materials will be organized, preserved, and shared and which procedures are needed to access and use them.
A data management plan is a required component of proposals for funding with several federal agencies:
- National Science Foundation: Chapter II of NSF’s Grant Proposal Guide addresses data management in sections C.2.d(i) and C.2.j.
- National Institute of Health: NIH’s Data Sharing Policy and Implementation Guidance contains specific guidelines.
A data management plan can help you map out how data will be maintained and what resources will be needed to preserve it over a given time period.
A data management plan facilitates re-use of data sets by describing the data and metadata, how to use and interpret the data, and policies governing access, use, and attribution/copyright.
Do I always follow the general NSF policy on the dissemination and sharing of research results outlined in Chapter II.C.2.j?
In some cases the Directorate, Office, Division, Program, or other NSF Unit has data management requirements and plans that are specific to the Unit. See http://www.nsf.gov/bfa/dias/policy/dmp.jsp for details. If there are no specific guidelines then you do follow the general NSF Policy outlined in the Grant Proposal Guide, Chapter II.C.2.j.
Why should I manage my data?
Effective data management provides a number of benefits to both the researcher and the institution:
- Increased visibility: By making data available after publication, you allow other researchers to build on your work, thereby increasing your own prominence.
- Increased efficiency: You save time and energy by having a systematic process for accessing and sharing data as needed, allowing you to focus on the research itself.
- Improved documentation and preservation: Managed data is more easily authenticated and preserved for later use, particularly in the case of electronic data.
Data management increases researchers’ competitive advantage by allowing them to use data more efficiently and effectively, thereby allowing increases in both quality and quantity.
What data actually needs to be managed?
The U.S. Office of Management and Budget (OMB) defines data as “the recorded factual material commonly accepted in the scientific community as necessary to validate research findings.” Data produced through NSF-Funded projects generally takes one of four forms (from the MIT Libraries Data Management and Publishing guide):
- Observational: data captured in real-time, usually irreplaceable.
Examples: Sensor data, telemetry, survey data, sample data, neuroimages.
- Experimental: data from lab equipment, often reproducible, but can be expensive.
Examples: gene sequences, chromatograms, toroid magnetic field data.
- Simulation: data generated from test models where model and metadata (inputs) are more important than output data.
Examples: climate models, economic models.
- Derived or compiled: data that is reproducible (but very expensive).
Examples: text and data mining, compiled database, 3D models, data gathered from public documents.
Note that per a 1999 OMB report, these requirements exclude “preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues.” For the purposes of a data management plan raw data is considered “preliminary analyses.” Physical objects, such as gel samples, are also excluded from this requirement.
What information should I include in a data management plan?
Your data management plan should include, at minimum, information on:
- The type of data you expect to produce from your research.
- The standards to be used for data and metadata format and content.
- Policies and procedures for accessing and/or sharing your data.
- Policies and provisions for re-use, redistribution, and the production of derivatives.
- Plans for archiving data, samples, and other research products, and for preservation of access.
NSF requires that a 2-page data management plan be submitted as part of any grant proposal from January 2011 onwards. If you are submitting multiple collaborative proposals or proposals that include sub-awards, NSF will consider all such proposals to be one project and only one data management plan is required. Proposers who feel that the plan cannot fit within the supplement limit of two pages may use part of the 15-page Project Description for additional data management information. We expect to have templates for these plans available for use and reference before these new requirements take effect. See the NSF Grant Proposal Guide (GPG) Chapter II.C.2.j for more information.
Do I need to save anything other than the data itself?
You should be saving:
- The research data itself.
- Metadata, or data about the data, including title, creator, subject, identifiers, etc. See the Metadata Page for more information.
- Information about any programs used to compile or derive the analyzed data. If the program is particularly specialized, it may also be helpful to save a copy of the program in question.
How do I prepare my data for upload and management?
As much as is possible, data sets should be standardized before being uploaded into a repository. This process involves two main steps:
- Attaching metadata, which will be used to identify the data and will assist in searches for related information. See the Metadata Page for more information.
- Reformatting, which involves changing data from proprietary or unstable formats (Excel, Word, GIF) to non-proprietary formats (Tab-Delimited RTF, PDF/A, TIFF). See the Formats Page for more information.
When should I make my data accessible?
The National Institute of Health data sharing guidelines and NSF Engineering section guidelines both suggest releasing research data “no later than the acceptance for publication of the main findings of the final data set.” It is probable that the all-NSF data management guidelines will be similar. It may be possible to be exempted from this disclosure requirement in the case of patented or highly-sensitive data; in such a case, a reason for exemption must be included in the submitted data management plan.
Who is responsible for managing this data?
Ultimately, Principal Investigators are responsible for making sure that data can be produced, either from disclosure via a data repository or via data requested under the Shelby Amendment provision of the Freedom of Information Act (FOIA). See the NSF FOIA page for more information.
How long do I need to retain data?
The NSF Engineering Section guidelines recommend a minimum retention period of three years after either completion of the grant project or public release of research data, whichever is later. Certain types of data, such as data that supports patents or that is part of a longitudinal data set of broad interest to other researchers, may need to be kept available in excess of this three year minimum. Consult your program officer for more information.
What are my copyright and intellectual property rights under data management?
Per OMB Circular A-110, “A grantee may copyright any work that is subject to copyright that was developed under a grant. It reserves for the federal awarding agency a royalty-free, nonexclusive, and irrevocable right to reproduce, publish, or otherwise use the work for federal purposes and to authorize others to do so…. The government reserves the right to obtain, reproduce, publish, or otherwise use the data first produced under an award and to authorize others to receive, reproduce, publish, or otherwise use such data for federal purposes.”
Additionally, Shelby Amendment (1999), if the federal government has received a Freedom of Information Act request for data underlying a research article or report, an awardee is required to provide it. Grantees are allowed to charge fees for costs incurred in obtaining the research data.
Research data requiring additional proprietary or personal information protection may be temporarily or permanently exempt from the requirement to disclose. Watch this page for more information or contact your program officer.
What if I am co-administering a grant with a colleague from an outside institution?
In cases of collaborative research, your team should designate one of its member institutions to hold the “official copy” of research data. This designee will carry primary responsibility for ensuring that the data is formatted correctly, that it has appropriate metadata attached, and that it is being stored and shared appropriately per the data management plan. Please note that this designation does not abrogate the shared responsibility of the other institutions to manage the data.
What happens if I leave UWM while my grant is active?
In most cases, your grant funding and responsibility to manage and share data will follow you to your new institutional home. The funding agency, UWM’s Office of Sponsored Programs, and the grant making office(s) of the accepting institution will need to create a contract to formally transfer the funding and project infrastructure to the accepting institution.
What happens if I leave UWM after my grant has been completed?
In most cases, your grant funding and responsibility to manage and share data will follow you to your new institutional home. The funding agency, UWM's Office of Sponsored Programs, and the grant making office(s) of the accepting institution will need to create a contract to formally transfer the funding and project infrastructure to the accepting institution.