The work undertaken in the REPLIX project provided a solid basis for our participation within the EUDAT project Safe Replication service. Where REPLIX was aimed towards a replication solution for LAT archives, the EUDAT Safe Replication service is aiming to provide a more abstract replication solution, independent from repository technology.
As shown in Figure 1: Safe Replication, the service is set up around community repositories and data center stores. The community repositories are the centers hosting the repository and services for a specific community. A community manager specifies replication policies in the form of statements like:
“replicate my collection X to two data centers and store the collection safely for 10 years”
Based on these statements the data is replicated to different EUDAT data center stores. These data center stores will safely store the replicated data for the agreed period of time.
A very important element in this process is the management of the persistent identifier (PID) records identifying the location of all replicas of a digital object and therefore also providing ways to access each of these replicas.
Registered data is promoted within the EUDAT data domain. Registered data are digital objects with metadata and a PID associated. The metadata will provide additional information about the digital object and the PID will ensure stable identification of the object and it’s replicas over time. The PID also stores checksum information, which can be used to validate the different replicas.
Together, these features provide a service fulfilling the following goals:
- guard against data loss in long-term archiving and preservation,
- optimize access for users from different regions, in particular when replication is combined with the hosting of community-specific data services, and
- bring data closer to powerful computers for compute-intensive analysis.
The Language Archive is currently using this service to replicate a small set of collections to the RZG and SARA data center stores. The number of collections and total volume of data will be continuously increased until the full archive is replicated by using the EUDAT Safe Replication service.