In short, data management can be defined as the creation, storage, maintenance, disclosure, archiving and sustainable preservation of research data. Increasingly the so called FAIR principles are referred to as a final goal: data should be made 'Findable, Accessible, Interoperable and Re-usable'.
Good data management is important for:
- Ensuring the quality, findability and accessibility of research data;
- Increasing the visibility –and impact—of research;
- Compliance with requirements from the University and research funders.
Research Data Management briefly explained
Due to the selected cookie settings, we cannot show this video here.Watch the video on the original website or
Effective data management practices start with a thorough preparation. Already at the stage of writing a research proposal, you will make decisions that will have implications for the long term reusability of research data for others. For this reason, many research funders request a data management paragraph in a research proposal. At the actual start of the new project, you will have to write a full data management plan.
Leiden University has its own DMP template, that we published in the data archive Zenodo. Alongside you will also find a useful document with tips & tricks for filling out the template.
Policies and requirements
Leiden University has adopted a regulation for Data Management.
The most important requirements are:
- A data management plan should be written at the start of each research project.
- During the research, data must be stored securely.
- Data must be archived for at least 10 years according to international guidelines for FAIR data.
The regulation complies with the Nederlandse gedragscode Wetenschappelijke integriteit and also with most research funders’ requirements.
For most projects, the capacity of the university network storage will suffice: no extra costs will be charged for the use of the network, unless the project produces enormous amounts of data.
You can integrate costs for data management during a project in the budget of your project proposal. This may apply to costs for temporary, external storage during field work, extra storage space for BIG data, or support during anonymization of datasets or the curation and documentation of data to facilitate sustainable archiving. The website of the national coordination point for Research Data Management (LCRDM) provides a useful document for calculating these costs.
The Centre for Digital Scholarship (CDS) can provide support, firstly, during the initial planning of the research project. The Data Management Team can offer advice on research proposals, the writing of a DMP, or the budgeting of costs. In cooperation with support staff at the faculties, such as the privacy officers, you can consult us about safe storage facilities during your research as well as on the archives that can secure the long term preservation of data
Good organisation of your research data is time investment that will surely pay itself back, when you finish your research. With the right measures, you can make sure that your data remain finable, accessible and re-usable.
When you choose your storage facility, the deciding factors that you should take into account are among others: the size of the data, partners you cooperate with, the sensitivity of the data, the location of data collections, instruments that you may use. We recommend you use the University network as much as possible: access restrictions protect the files on the network against unauthorized access, and the ISSC will take care to make a back-up every night.
Occasionally, it may prove difficult or even impossible to use the Leiden network, for instance during field work, or in the case of a cooperation with external partners, who require access to the data, The Research Data Services Catalogue provides an up-to-date overview of the facilities available at Leiden.
With logical and unambiguous file naming you ensure that everybody will easily find their way, when working in shared folders. Do take care to follow standard procedures and workflows in your specific field of research. This does not only apply to file names: the structure of your folders, fields in your spreadsheets also deserve solid naming.
By using version control you will prevent unnecessary doubling or overwriting of data.
Metadata are the data that describe your data: it is this documentation that will make your data findable and intelligible for other people. The way in which metadata are linked to a dataset may vary: sometimes the documentation is integrated in the data files, but you can also add your metadata in a separate database, spreadsheet or readme file that you add to the same folder in which you store the data. Many subject areas have developed their own standards for uniform documentation. If there is no standard for metadata available in your filed as yet, you can also make use of a general archival standard such as Dublin Core, that is used by many data repositories.
Access to data
In projects that involve cooperation with third parties from outside the university it is important to make an agreement on who can have access to the data, In case of privacy sensitive data or pending patents, you will apply access restrictions. You will need to take these restrictions into account, when you choose your place for storage and devise a sound plan to determine the conditions for ownership and sharing. Below, you can read more on working with personal data.
After the research project has finished, the main results can generally be disseminated. Research outputs often consist of articles or monographs, but they may also include data sets. According to Leiden University’s data management regulation, the research data that have emerged from research projects must to be preserved during a period of at least 10 years. However, which data need to be preserved precisely?
Selection criteria for research data
Data can often be highly valuable for other researchers, or for society at large. It is not always necessary, however, to preserve all data. In the case of large data sets, preservation may be very costly. In some cases, the costs for replication (if possible) may be lower than the costs of preservation. Alternatively, it may also be the case that the models or the algorithms which have produced the data sets are ultimately more important than the data sets by themselves. While assessing the need to preserve data, the following criteria may be used:
- Are the data unique? It is often impossible to replicate observational data, for instance.
- Are the costs of replication disproportionally high?
- Is there there formal obligation to preserve data for the longer term? Such requirements may have been stipulated by funders or publishers.
See also the hand-out ‘Selection of Data for Archiving’.
Journals and funders increasingly stimulate researchers to provide open access to research data. Such forms of openness can produce a variety of benefits. When data are publicly accessible, they can be cited, and such citations can result in credits for the research that has been done. Open access to research data also encourages the reuse of these resources by other researchers. It enables peers to replicate specific analyses, or to validate the claims that have been made about these data in publications.
A growing number of journals ask their authors to provide access to the data that underlie a publication, either as supplementary materials or via a data repository. This is the case, for instance, for Science and Nature. The peer review processes that are organised by journals may occasionally include an inspection of the data. To avoid unpleasant surprises during the publication phase, it is very important to bear in mind thoughout the entire research process that some publishers may demand that research data are openly accessible. Data can be published most efficiently, if they are stored in the correct format, and if they have been documented well, by making use of appropriate metadata formats.
Where to publish data?
To ensure that data can be reused responsibly and productively, it is best to store these data in a trusted data repository. The option to archive data as supplementary materials, attached to a publication, ought to be avoided whenever possible. Data repositories have taken various measurs to make sure that data can remain findable and accessible. The data which are stored in data repositories can in most cases be cited though a persistent identifier (such as the DOI). Such archives consequently enhance your visibility as a researcher. Various studies have shown that the data which archived in data repositories receive more citations. The Research Data Services catalogue provides an overview of the most relevant data managemen facilities. The site also indicates whether or not these various services adhere to the requirements that are formulation in Leiden University’s data management policy. A similar catalogue of data management services can be found at www.re3data.org.
Data as a publication in itself
Even when a study produces data which do not support the overall objectives of this study, it can still be useful to publish these research data separately. In this way, it can become possible to avoid duplications of research efforts. Alternatively, these data may effectively be reused in another study.
You can draw attention to specific data sets by describing these in a data journal such as GigaScience or Scientific Data. These journals essentialy publish metadata about data set. Among other aspects, these metadata describe the way in which the data are produced, and they suggest potential applications of these data. Such descriptions of potenatial uses heighten the chance that these data sets
Leiden University and most research funders require a data management plan (DMP) before the start of a new research project.
In your DMP you list all gather all the information about the data in the project. You are asked to provide information on the type of data, the method of collection, the format and the documentation of the data. It also includes sections on facilities that are used, legal or ethical reasons (not) to share data, and on the way data is shared and preserved in the long term.
You may make use of several templates, when writing a DMP.
When you work with personal data, the General Data Protection Regulation (GDPR) requires you to record what happens to your data. In the data processing register you will explain which personal data you collect, who will have access, how you will protect the data, and how long you are planning to store the data. The university will support you in working in a privacy-proof way: on the staff website you will find all the information you need to compy with the GDPR.
Tools & tips for working securely online
The staff wesbite also provides this useful overview to help you work securely.
We set up a catalogue with data management facilities for researchers.
Research Data Services
This site aims to help researchers make a reasoned choice when planning for the management and the storage of their data. Additionally, the information that was accumulated should help to identify potential gaps or other shortcomings within the facilities which have been described.
You can find the catalogue at: https://digitalscholarship.nl/rds/
In our training sessions we often refer to the following handouts en best practices.
|Back-up strategies||File naming and folder structure|
|Versioning and authenticity||Anonymisation|
|Metadata||Selection of research data|
|Sensitive data protection||FAIR data|
Other useful references are:
- The MRI Data Sharing Guide is a useful flowchart showing researchers what they can share and where, and where they can find information or support.
- Expert tour guide on data management by CESSDA ERIC (Consortium of European Social Science Data Archives European Infrastructure Consortium).
- What is pseudonymous, de-identified or anonymous data? See A Visual Guide to Practical Data De-Identification (from the Future of Privacy Forum)
- Publishing and Sharing Sensitive Data decision tree by the Australian National Data Service