UCC Library: Research Data Service: Documentation, Metadata & Data Quality

Documentation, Metadata & Data Quality

This aspect of data management is concerned with how data is organised and adequately documented such that it can be easily understood and that minimal quality standards appropriate to the data and research methodology are met.

Apart from facilitating efficient use of the data throughout the active research phase, this is also a critical step in ensuring the datasets that may be selected for sharing and long-term preservation can meet a minimal degree of interoperability outlined in the FAIR Data Principles.

Documentation and Metadata

Data discovery, re-use and reproducibility is dependent on high quality documentation and metadata. Collecting supporting documentation or metadata during active research is what allows FAIRification and data sharing further down the line. For a quick overview of what to consider when developing documentation and metadata for your research please watch the video below.

Documentation and Metadata

Data Quality

A data management plan should include details of data quality measures used these details give an indication of the quality and consistency of the data. Depending on the discipline this may include processes such as calibration, repeated samples or measurements, standardised data capture, data entry validation, peer review of data, or representation with controlled vocabularies. You should also be aware of common data quality mistakes within your discipline such as gene names being converted to dates by Excel.

For tips on organizing spreadsheet data to reduce errors and ease later analyses see:

Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989

Version Control

Version control or data versioning refers to the maintenance of different versions or drafts of a document, file or dataset. Each version may only differ slightly from the last but by maintaining multiple versions it provides a audit trail of changes and updates and a way back if mistakes are uncovered. Version control can be especially useful for quality control in research groups or teams when there are multiple contributors or authors of a document. Some software programs can provide version control but it can also be done using name conventions on documents files or datasets.