Frequently Asked Questions

You can contact the following address zdv-forschungsdaten@uni-mainz.de to arrange a meeting with our technical support team.
They can help you clear any other questions and guide you through the process.
Archiving is meant for research data that is part of a publication that has been published. Archived data is meant to be used to replicate research results.
It can also be made public to be used by others or can be kept private to simply conform with grant requirements.
If you wish to backup a laboratory PC, go to www.zdv.uni-mainz.de/datensicherung
In order to Archive research data you do not need to change the way you conduct your research or any of your existing data workflows.
The data can always be archived afterward by producing relevant Metadata and Technical metadata and then uploading it into the Archive.
The Metadata summarizes essential information about research like: Title, Description, Type, Format, License, Keywords, Contributor, Reference, etc.
Technical metadata describes what the contents of the files represent, how they are used in the research field, how they related to each other and how they can be processed.
All this information is useful to others in order to find and understand the research data.
All the research data of a single publication that is meant to be Archived should be gathered within a single parent directory.
The data within the parent directory can then be sorted into different directories based on some useful logical hierarchy chosen by the authors.
Metadata of the gathered data should then be defined, i.e: Title, Description, Type, Format, License, Keywords, Contributor, Reference, etc.
Technical descriptions of the research data (technical metadata) should also be prepared and included in the parent directory as human readable files like “.txt “.
These technical metadata files should describe what the contents of the files represent, how they are used in the research field, how they related to each other and how they can be processed.
The parent directory should be compressed to an open format if the storage space requirements allow it.

Once your research data has been gathered and prepared with metadata, this steps [link] can be followed to Archive it.

Multiple large files can be archived individually while large amounts of small files that belong to the same data set should be compressed into a single archive (zip, tar, rar) file.
If possible all files of the same dataset should be gathered together in the same archived (compressed) file.
This makes it easier to retrieve complete datasets that are meant to be distributed together.

The data should be available to be copied from either:

  • a Linux computer installation provided by the ZDV [link]
  • the Mogon HPC system of the JGU [link]
  • any computer with mounted group storage network drives of the JGU [link]
  • a personal computer with the Kerberos Client authentication packages installed
  • any computer with SSH access to linux.zdv.uni-mainz.de [link] or the MOGON HPC system [link]
  • a personal Windows computer using the Linux subsystems or Linux virtual machines.

The source code needed to process, generate or understand your research data should be included with the data itself whenever available or possible.
The Archive itself is meant to be used for long term non-changing data, and

The Archive can hold the data for as long as the user requests it.
The Archive is meant to be able to store data for decades in order to help researches comply with funding requirements and to be able to reproduce results in the future.

For archiving datasets over 1TB it is recommended to get in contact with us under zdv-forschungsdaten@uni-mainz.de to assist you, assure smooth performance and stability during the Archival process.
There is currently no storage limit to the amount of datasets that can be Archived, but individual files or compressed datasets should not be over ~8TB.
If the archived datasets are meant to be retrieved through HTTP interfaces, a maximum size of 5GB per file is recommended.
The limitations in capacity are only given by the storage location of the original data to be copied, i.e: 100GB on Home at the MOGON systems.

Currently there is no limitation on the amount of files to be archived.
If possible all files of the same dataset should be gathered together in the same compressed file.
This makes it easier to retrieve a complete datasets that is meant to belong together.

To give other users access to your research data Archives you must follow these steps [link].

Your research data can be Published publicly during the Archival process or afterwards following these steps [link].

Metadata first needs to be gathered i.e: Title, Description, Type, Format, License, Keywords, Contributor, Reference, etc.
After defining this and any other helpful descriptive fields, you can follow this steps [link] to add metadata.
Technical Metadata needs to be generated by the owner of the research data.
Technical descriptions of the research data should then be included as human readable files like “.txt “ alongside normal data.
These technical metadata files should describe what the contents of the files represent, how they are used in the research field, how they related to each other and how they can be processed.
Under normal circumstances the Archive is meant to hold non-changing data for decades.
The Archived research data can nevertheless be updated to add or change existing files under certain special cases.
This can happen when wrong data is accidentally uploaded or when the underlaying published research changes.
Research data that has been Archived long term to comply with government or funding requirements cannot and should not be deleted.
iRODS uses 'Resources' to archive the collections (directories) and data objects (files). The resources are organized hierarchically. The iRODS Archive from the ZDV currently has a compound resource consisting of a cache (unix filesystem) and an archive tape universal mass storage system. The cache has a size of 8TB and once it is fills up, the oldest data objects are deleted on the cache. If required, they are fetched back from the tape archive.

replResc:replication
├── cephfsResc:unixfilesystem
└── compResc:compound
    ├── netappResc:unixfilesystem
    └── tsmResc:univmss