Metadata is information about the context, content, quality, provenance, and/or accessibility of a set of data.
Metadata may be . . .
- Required for depositing a data set in disciplinary repositories or for publishing it in research journals.
- Critical documentation for the longevity and reproducibility of research data.
- Useful for visualizing or analyzing the data in data files.
What should I include in my metadata?
A given project may include some or all of the following metadata (list from MIT Data Management and Publishing):
|Title||Name of the dataset or research project that produced it.|
|Creator||Names and addresses of the organization or people who created the data.|
|Identifier||Number used to identify the data, even if it is just an internal project reference number.|
|Subject||Keywords or phrases describing the subject or content of the data.|
|Funders||Organizations or agencies that funded the research.|
|Rights||Any known intellectual property rights held for the data.|
|Access information||Where and how your data can be accessed by other researchers.|
|Language||Language(s) of the intellectual content of the resource, when applicable.|
|Dates||Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule.|
|Location||Where the data relates to a physical location, record information about its spatial coverage.|
|Methodology||How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook.|
|Data processing||Record any information on how the data has been altered or processedduring the project.|
|Sources||Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed.|
|List of file names||List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov').|
|File Formats||Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data.|
|File structure||Organization of the data file(s) and the layout of the variables, when applicable.|
|Variable list||List of variables in the data files, when applicable.|
|Code lists||Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data').|
|Versions||Date/time stamp for each file, and use a separate ID for each version.|
|Checksums||To test if files have changed over time.|
How do data dictionaries or codebooks relate to metadata?
Data dictionaries/codebooks make it possible for an outside researcher to read and appropriately interpret your data, and thus are as important as metadata to allow proper dissemination of data. If your data is not human-readable except by use of a data dictionary, codebook, or program, those supplementary documents must be included along with metadata in whatever storage or sharing medium you employ.
What are metadata standards and why/how should I use them?
Metadata standards specify what information should be used in compiling metadata and how that information is expressed in digital files. Standards are useful because they provide a predictable array of metadata, allowing researchers and search engines to more easily find relevant articles.
There are a number of metadata standards available, and researchers should consult their program officers for the standard most appropriate for their discipline. UW-Madison has created a partial list of potentially relevant standards.
What file format should I use for my metadata?
Your goal when selecting a metadata file format should be to ensure that the file retains its readability for as long as possible (i.e. it is not rendered obsolete by new software or hardware). Some of the most common options are below:
|Format||Example of this type|
|A text or html document||Metadata includes authors, dates, location, etc. This metadata accompanies data on Seasonal Frost Depths, Midwestern USA (1971-1981) that is archived in the National Snow and Ice Data Center.|
|An XML document linked to data files||Metadata includes authors, locations, dates, etc. This metadata is linked to TIGER/Line Shapefile data on Wisconsin Congressional Districts, 2009 provided on Data.gov
Note: You may need to select “View page source” in your browser to see the XML format.
Follows the FDGC (Federal Geographic Data Committee) digital geospatial metadata standard.
|Information embedded in an XML data file||Metadata includes authors, dates, organism, publication, instrument, etc. It is kept within the X-ray diffraction data file for UDP-galactopyranose mutase in the Protein Data Bank repository
Note: You may need to select “View page source” in your browser to see the XML format.)
Follows the PDBML (Protein Data Bank Markup Language) specification.