After your research
After the research project has finished, the main results can generally be disseminated. Research outputs often consist of articles or monographs, but they may also include data sets. According to Leiden University’s data management regulation, the research data that have emerged from research projects must to be preserved during a period of at least 10 years. However, which data need to be preserved precisely?
Selection criteria for research data
Data can often be highly valuable for other researchers, or for society at large. It is not always necessary, however, to preserve all data. In the case of large data sets, preservation may be very costly. In some cases, the costs for replication (if possible) may be lower than the costs of preservation. Alternatively, it may also be the case that the models or the algorithms which have produced the data sets are ultimately more important than the data sets by themselves. While assessing the need to preserve data, the following criteria may be used:
- Are the data unique? It is often impossible to replicate observational data, for instance.
- Are the costs of replication disproportionally high?
- Is there there formal obligation to preserve data for the longer term? Such requirements may have been stipulated by funders or publishers.
See also the hand-out ‘Selection of Data for Archiving’.
Journals and funders increasingly stimulate researchers to provide open access to research data. Such forms of openness can produce a variety of benefits. When data are publicly accessible, they can be cited, and such citations can result in credits for the research that has been done. Open access to research data also encourages the reuse of these resources by other researchers. It enables peers to replicate specific analyses, or to validate the claims that have been made about these data in publications.
A growing number of journals ask their authors to provide access to the data that underlie a publication, either as supplementary materials or via a data repository. This is the case, for instance, for Science and Nature. The peer review processes that are organised by journals may occasionally include an inspection of the data. To avoid unpleasant surprises during the publication phase, it is very important to bear in mind thoughout the entire research process that some publishers may demand that research data are openly accessible. Data can be published most efficiently, if they are stored in the correct format, and if they have been documented well, by making use of appropriate metadata formats.
Where to publish data?
To ensure that data can be reused responsibly and productively, it is best to store these data in a trusted data repository. The option to archive data as supplementary materials, attached to a publication, ought to be avoided whenever possible. Data repositories have taken various measurs to make sure that data can remain findable and accessible. The data which are stored in data repositories can in most cases be cited though a persistent identifier (such as the DOI). Such archives consequently enhance your visibility as a researcher. Various studies have shown that the data which archived in data repositories receive more citations. The Leiden Research Data Information Sheets provides an overview of the most relevant data managemen facilities. The site also indicates whether or not these various services adhere to the requirements that are formulation in Leiden University’s data management policy. A similar catalogue of data management services can be found at www.re3data.org.
Data as a publication in itself
Even when a study produces data which do not support the overall objectives of this study, it can still be useful to publish these research data separately. In this way, it can become possible to avoid duplications of research efforts. Alternatively, these data may effectively be reused in another study.
You can draw attention to specific data sets by describing these in a data journal such as GigaScience or Scientific Data. These journals essentialy publish metadata about data set. Among other aspects, these metadata describe the way in which the data are produced, and they suggest potential applications of these data. Such descriptions of potenatial uses heighten the chance that these data sets