Enhancing data accessibility with our next-generation NIRD Research Data Archive

10.12.2024

Early next year, we are set to launch our much-anticipated new NIRD Research Data Archive.

This new archive solution is designed to address the evolving needs of researchers, particularly those utilising the national e-infrastructure services in Norway and other researchers requiring a robust data archiving solution or easy access to archived data.

Screenshot from a research archive website.

Why a new Research Data Archive?

The research landscape is rapidly changing, with data production increasing exponentially. Research is becoming increasingly data-driven, and researchers generate vast amounts of data that can be invaluable to peers within the same research field or across different disciplines.

However, the fixed nature of project funding often leaves datasets at risk of becoming inaccessible once the project concludes. The new NIRD Research Data Archive aims to bridge this gap, ensuring that valuable data remains available and reusable for future research accomplishments.

My primary role and interest were to ensure that the development and deployment carried out by DATOPIAN supported the long-term evolution of the NIRD archive, aligning it with FAIR principles and Open Science standards.

Anne Fouilloux, Senior Research Engineer at Simula.

Meeting new needs for researchers

The new archive solution is built with the future in mind, incorporating feedback from researchers and leveraging cutting-edge technologies to enhance usability and functionality.

Some key features and improvements

User-centred design

The initial design phase prioritised a user-centred methodology, incorporating end-user feedback early on. This approach led to the creation of mock-ups and prototypes that provided valuable insights into the researchers' needs and challenges.

Integration of emerging technologies

One of the standout features is the integration of Reseach Object Crate (RO-Crate), a technology that makes datasets machine-interpretable. This aligns with the FAIR (Findable, Accessible, Interoperable, Reusable) principles, significantly enhancing the discoverability, reusability, and interoperability of the data stored in the archive.

Flexible and long-term storage

All data will be accessible via S3 protocol, enabling flexible storage data access, seamless interaction with the long-term archive through APIs, and the creation of Analysis-Ready Cloud-Optimised Datasets. This shift from merely dumping data to depositing it for future reuse is particularly beneficial for fields like bio-imaging and Earth observation, ensuring continuous access to up-to-date data.

Siri Kallhovd from NRIS has been mainly involved with the integrations toward the NIRD service platform and assisting in the technical discussions. She says:
I think the new API and S3 access will be nice tools for automating interactions with the archive. Both for integrating ingestion of new research data from other systems and for fetching data for reuse in other projects. 

Enhanced Performance and Reliability

The new archive solution addresses several performance issues identified in the current system. The web interface for uploading data has been optimised for large datasets, and the system now supports fault-tolerant uploads, allowing researchers to resume uploads from the last successful file.

Improved Metadata Management

– We have tried to create a metadata model that can be interoperable with international community standards. This will make it easier for other infrastructures to interface with the new archive and for the community to find and access data. I hope this model can be further developed to support a larger community of users.

Lara FerrIghi, Senior Research Scientist at the Norwegian Meteorological Institute.

Modular and Composable Service

The archive service has been restructured into smaller, composable components, allowing for greater flexibility and easier updates. This modular approach ensures that changes to one part of the system do not impact the entire service, making it more adaptable to future needs.

About Datopian and the technology

The new Research Data Archive is powered by CKAN, an open-source platform renowned for its robust capabilities in managing, sharing, and discovering datasets of various sizes and formats. This cutting-edge system supports open access and aligns with the principles of FAIR data, advancing scientific research and collaboration in Norway.

Datopian, as creators, co-stewards, and core developers of CKAN, brings unparalleled expertise to organisations globally. They specialise in customisations, hosting, consulting, and training around CKAN to meet diverse data management needs.

Uniquely, Datopian operates as a fully distributed team across the globe, embracing a Holacracy-based structure that decentralises authority and empowers decision-making at all levels.

Reflecting on the collaboration, Senior Project Manager in Datopian, Daniela Popova shares:
– Partnering with Sigma2 AS on this transformative project has been both challenging and immensely rewarding. It’s a privilege to contribute to a solution that enables cutting-edge scientific discovery while reinforcing our shared commitment to open data and innovation.

Final words from the project team

Project Manager Tore Aalberg at Sigma2 states:
– Datopian has promised to have the solution ready for acceptance testing by New Year’s Eve. In large development projects like this, surprises, errors, and bugs always tend to surface towards the end, so this testing phase may take a few weeks into the new year. Regardless, we are working hard to roll out the solution early in 2025, as promised.

–We have addressed the limitations of the current NIRD Research Data Archive and incorporated advanced technologies to make sure we provide researchers with a robust, flexible, and future-proof platform for data archiving. The launch of the new NIRD Research Data Archive will mark a significant step towards our commitment to supporting the data-driven research community. This initiative not only enhances the reusability of research data but also aligns with the broader goals of FAIR and Open Science, ultimately enabling better and faster scientific discoveries, says Adil Hasan at Sigma2, architect and data manager for the NIRD Archive.

Now, as our dedicated project team commences the final rounds of testing before launch, Norwegian researchers can prepare to experience a new era of data archiving.

About the project

Key team members in the project since its inception in June 2022 have been:

  • Anne Fouilloux, Simula
  • Lara Ferrighi, the Norwegian Meteorological Institute
  • Siri Kallhovd, NRIS
  • Adil Hasan, Sigma2
  • Tore Aalberg, Sigma2 (Project Manager since May 2024)

Other researchers from the community, and other employees in Sigma2 as well as in NRIS and in SIKT have also contributed.

The project has been working agile, using the Scrum methodology. In close collaboration with Datopian’s resources, we have further developed and customised the CKAN-based solution they provided – to the benefit of both parties.

The new archive is defined by nine use-cases that are at this moment close to finished, and we have started stress testing.