Research Support
-
Research Data Support
Processing
Data processing involves preparing your raw data for analysis. This includes tasks such as data cleaning (e.g., handling missing, inaccurate, or inconsistent values), standardizing formats, and labeling and organizing files. Processing ensures your data is accurate, reliable, interoperable, and ready for use.
To ensure your data can be interpreted and used by others, it is also important to create documentation describing the provenance of your data, the methods by which it was collected, and the processes used to organize and describe it. Common types of documentation include:
- Lab notebooks: Structured environment for recording and sharing your research data, notes, and observations.
- README files: Simple text files that can be used to provide information about the organization and content of your data files.
- Data dictionaries/codebooks: Structured document detailing variable names, data types, units of measurement, and potential values.
- Protocols: Documents describing the procedures or methods used in the implementation of a research project or experiment.
Metadata standards have been developed for many disciplines to support compatibility and uniformity of data documentation by providing element definitions and usage guidance. Proper metadata and documentation provide the information necessary to understand and interpret your research data, now and in the future.
Considerations
- Use tools like OpenRefine or Open Data Editor to clean and format your data
- Pseudonymize/anonymize any sensitive data
- Use version control to track and manage changes to your data
- Implement quality control checks to confirm data integrity after processing
- Determine what types documentation are needed to allow other researchers to understand and reuse your data
- Consult a librarian for guidance on selecting appropriate metadata standards for documenting your data
Resources
- OpenRefine
Free, open-source tool for cleaning, organizing, and reformatting messy data.
- Open Data Editor
Open-source tool for nontechnical data practitioners to explore and detect errors in tables.
- Protocols.io
Open access platform for developing and sharing research methods and protocols to facilitate collaboration, documentation, and reproducibility. Includes free and premium plans.
- Disciplinary Metadata
Information about metadata standards by discipline including profiles, tools to implement the standards, and use cases.
- Metadata Standards Catalog
Open directory of metadata standards applicable to research data.
- NIH Common Data Elements
Repository of standardized definitions to describe and define data to improve discoverability.
- Guide to Writing “Readme” Style Metadata
Guidance on creating readme files, which provide information about a data file to ensure that it can be correctly interpreted and reused in the future.
- How to Make a Data Dictionary
Guidance on creating a data dictionary to explain the names of variables used in your data.
- Codebook Cookbook
Guidance on creating a codebook describing the contents, structure, and layout of your data files.