Harnessing today’s data for tomorrow's discoveries.
NIRD Data Lake is a cloud-compatible storage service designed for sharing and storing data during and beyond a project’s lifetime. Offering unified file and object storage, NIRD Data Lake has the right balance between performance, functionalities, and cost.
Data integrity is ensured by daily snapshots, in-built redundancy and error-correcting mechanisms, making it an ideal solution for large datasets requiring long-term storage without the need for separate backup copies, ensuring efficient, reliable, and cost-effective storage.
Service description
Datasets which have been produced during the execution of a research project but no longer actively used, are labeled “cold” and should in principle be either deleted or archived and made openly accessible. However, there might be reasons for not doing so, such as business confidentiality or embargo for publication purposes.
NIRD Data Lake is specifically designed for storing large amounts of both structured and unstructured data and especially suited for less active datasets (cold data). With NIRD Data Lake you can store your data for longer periods, share selected datasets with co-workers and external collaborators, and consume the data with the services of your choice.
Please refer to our Data Policy for more information on data classification.
NIRD Data Lake is specifically suited if you need:
- Long-term storage of non-persistent data
- Storage for any type of inactive/cold data
- Storage for structured or unstructured data of any volume larger than 1 TiB
- Sharing datasets and libraries for collaboration across projects and institutions in the sector
- Interfacing with sensors or third party storage system
- Object storage
Features
NIRD Data Lake supports multitenancy and offers unified file and object storage and can dynamically scale to accommodate multiple petabyte-sized volumes as needed. Therefore, you can invite your co-workers to store, and work in the same shared area. Furthermore, it is also possible to run your services on the NIRD Service Platform, so that you can access and grant access to the data through your own services.
Interfaces
Datasets residing on the NIRD Data Lake can be accessed simultaneously through different protocols (e.g. POSIX, NFS and S3). Data is accessible directly from the NIRD login nodes, Sigma2´s national HPC systems and can be made directly accessible to third party storage systems, computing facilities or even desktops.
Data integrity
Data stored on the NIRD Data Lake have daily snapshots, built-in redundancy and error correcting mechanism to ensure the integrity of the data and protect it from human mistakes.
However, traditional backup to a secondary storage system is not offered as of now for the NIRD Data Lake service.
How to get access
The most convenient way to obtain access and resources on NIRD is by applying through our regular calls, which are published twice a year.
However, it is also possible to apply outside calls at any point in time throughout the year.
Your individual application is evaluated based on scientific merit by a resource allocation committee, who grants your project access to resources if the application is accepted.
Alternatively, you can reserve resources by procurement for your group and your institution. You are welcome to get in touch with us if you want to explore this model.
You might also need
We offer many services that you may need in addition to NIRD Data Lake. Under the National Infrastructure for Research Data (NIRD) umbrella we offer a range of storage services designed to support scientific research in every step of the research data lifecycle. Below, you can see a few selected ones, and if you visit our services overview, you will find all we have on offer.
The NIRD Data Lake service going forward
Take a look at the roadmap for the development of the NIRD storage services to follow the progress of the service: