Digital preservation for research datasets – WDPD2020

LIBNOVA joins World Digital Preservation Day 2020 celebrations by writing a guest post on the DPC blog about Digital Preservation for Research Datasets.

One of the most overlooked areas of digital preservation is the scientific world. In this post, we would like to share some thoughts and thinking on digital preservation for research datasets.

Digital preservation for research datasets

Last year, in our guest blog post for the DPC we wrote about “Augmenting the community, lowering the risk internationally” and we commented that many times individual problems related to digital preservation have a solution by looking at the experience of the community. This year the theme for the World Digital Preservation Day is ‘Digits: For Good´, and we want to focus on digital preservation of research datasets.

Let’s look back, LIBNOVA promise from the beginning is to provide the most advanced digital preservation platform to the community. And we are achieving it step by step.

A few years ago, we created LIBNOVA RESEARCH LABS, to coordinate the lines of research to be followed in technological innovation within the company. At the same time, we have been doing market research to understand the needs and the differences between sectors (e.g., cultural heritage vs research). And finally, last year, the confluence of these two paths has led us to the development and launch of a ground-breaking research data management and preservation tool.

But what have we learned along the way?

Research data challenges

During our market study, we have been getting feedback from more than 50 research organizations. And these are the most widespread reasons because they do not properly preserve research data:

  1. There is a lack of a unified view of research data, as it resides in many dispersed platforms during its lifecycle, due to functionality, protocols and featured needs.
  2. Digital preservation is addressed (if at all) at the end of the project, when the next project is on everybody’s mind, the resources are scarce, and the data is dispersed over a myriad of platforms.
  3. Due to the fact that in many projects data structure and software are cutting edge and no project is the same; there is no effective way to standardize formats and data structures.
  4. Often the code and data are not together, losing representation information.

Our own challenges as a researcher

As a research organization we also have our own concerns. The main ones are the following:

  1. We need to be confident about how research data is managed and protected for the whole data lifecycle.
  2. We need to provide the best available tools to our researchers, carefully balancing resources across research projects, plus asking: how much is this going to be?
  3. We are concerned about data volumes and platform scalability.

Thoughts and thinking on digital preservation for research datasets

These are the main insights we have reached in these early years of research and feedback on digital preservation for research datasets:

1. If we focus on archiving at the end of the project, most of it is already lost:

  • It should start BEFORE the project starts, providing a platform that researchers can use during the whole project lifecycle, as the “only” place to keep things.
  • Researchers work together (even from different institutions), so they need a place to share content.
  • All the previous topics, while also taking care of the “integrity chain”.

2. For Research data, code is usually data’s representation information:

  • It is important to preserve the code, together with the data. Reproducibility is usually needed in the long term.
  • ISO 16363 and OAIS alignment is crucial.

3. Create easy ways to provide metadata, including the possibility to create the Representation Information Network for the content.

  • Flexible to accommodate different disciplines.
  • But standards-based to improve accessibility.
  • How much metadata? At least consider FAIR and TRUST principles.

When thinking about digital preservation of research data we have to think that we are not only preserving digits, but that it may be the key to much future research. That is why we have to take all the necessary precautions during the process.

Antonio Guillermo Martinez, CEO and founder of LIBNOVA.