IGSN IDs for material samples are registered with metadata encoded in the DataCite Metadata Schema. The following are recommendations and best practices for IGSN ID metadata.
Please also refer to the IGSN-DataCite Crosswalk Recommendation and the DataCite IGSN support pages.
The archaeology sample metadata profile and the specific crosswalk to DataCite metadata were developed by the IGSN Archaeology Community of Practice.
Edmunds, R., Klump, J., Bartholdy, B. P., Crook, P., Corns, A., Hsu, Y.-K., Keller, C., Novák, D., Plomp, E., Rose, T., Ross, S., & Sessing, J. (2025). Archaeological Sample Metadata Profile. IGSN e.V. https://doi.org/10.5281/zenodo.17254974
Edmunds, R., Klump, J., Bartholdy, B. P., Crook, P., Corns, A., Keller, C., Rose, T., Ross, S., & Sessing, J. (2025). IGSN–DataCite Archaeology CoP Crosswalk Recommendation. IGSN e.V. https://doi.org/10.5281/zenodo.17446557
Citation: Wyborn, L., Powers, L., Edmunds, R., Elger, K., Kohlmann, F., Habermann, T., Richard, S., Ross, C., & Wimalaratne, S. (2026). IGSN–DataCite Crosswalk Recommendation. IGSN e.V. https://doi.org/10.5281/zenodo.20453782
An IGSN–DataCite Partnership Working Group (WG) was formed with the goal of developing a consensus crosswalk to support the translation of current IGSN metadata into the DataCite Schema V4.4. This WG firstly focused on populating the six mandatory DataCite elements (identifier, creator, title, publisher, publicationYear, and resourceType), it then discussed populating the six recommended DataCite elements (subject, contributor, date, relatedIdentifier, description, and geoLocation). The entire WG agreed on the recommendation described here.
An IGSN ID can be applied to an individual sample, an aggregation of samples, or to a feature-of-interest (the real-world feature that the sample is taken from). An IGSN ID cannot be used for an image of a sample or for digital data.
The above usage contrasts with the Global Biodiversity Information Facility (GBIF)[1] occurrenceID[2], where the ID ‘records the evidence of the occurrence[3] of a species (or other taxon) at a particular place on a specified date’. Note: the Distributed System of Scientific Collections (DiSSCo) identifier[4] can be used to identify digital representations of physical specimens in natural science collections (see Hardisty et al. 2021[5]).
How IGSN IDs are used is described in Klump et al. (2021)[6]. It is based on the modelling of a ‘FeatureOfInterest’ and ‘Specimen’, as per ISO 19156:2011 (Observations and Measurements, O&M[7]) and The Sensor, Observation, Sample, and Actuator (SOSA) Ontology (Cox 2020[8], Haller et al. 2019[9]). Both ISO 19156:2011and The SOSA Ontology include a common concept whereby a Specimen (i.e., sample) is a specialization of a larger FeatureOfInterest (e.g., lake, tree, cross-section, transect, borehole, etc.). This Specimen concept also aligns with the definition of ‘MaterialSample’[10] emerging from discussions by a Biodiversity Information Standards (TDWG) task group.
In the DataCite Schema, the most appropriate resourceTypeGeneral for both samples and features-of-interest was agreed to be ‘PhysicalObject’, which is defined as ‘an inanimate, three-dimensional object or substance’ and with the examples given of ‘artifacts, specimens’. See the resourceType recommendation below.
Mappings for the DataCite mandatory and recommended properties are given in Sections 2 and 3, respectively. The WG found that these properties could not be populated using just content from the IGSN Registration Metadata (IGSNRegistration[11]), and so some recommendations also reference IGSN Descriptive Metadata (IGSNDescriptive[12]). The WG used the element names in IGSN Descriptive Metadata Version 1.0 when referring to the descriptive metadata (with a ‘namespace’ of IGSNDescriptive). In practice, these descriptive metadata are less consistent than the registration metadata, varying across IGSN Allocating Agents.
For convenience and consistency, Section 4 gives recommendations for mandatory IGSNRegistration elements that are consistent with the other mappings.
Section 5 outlines several possible approaches to supporting sample searches in DataCite services as they are populated with sample metadata following these recommendations.
In the case of both newly registered and re-registered IGSN IDs, DataCite:Identifier will be automatically populated with a DOI upon the creation of an IGSN ID metadata record within DataCite services. See Section 2.7 for discussion of other identifiers.
The DataCite:Creator property contains a list of ‘the main researcher(s) involved...in priority order’. For IGSN IDs, this could be the sample collector/creator, chief scientist, curator, or even the person who deposited the sample into a repository. In the case of re-registered IGSN IDs, as no equivalent information is available in IGSNRegistration, DataCite:Creator will be filed with appropriate content from IGSNDescriptive at the discretion of each IGSN Allocating Agent. If no appropriate name is available, the property will be populated with the name of the IGSN Allocating Agent or an appropriate standard value for unknown information from Table 11 of the DataCite Schema[13] (page 45).
For re-registered IGSN IDs, IGSNRegistration:sampleNumber and/or IGSNDescriptive:identifier—also referred to as the ‘local sample number’—will be used as the DataCite:Title of a resource. Because DataCite:Title is highly important for discovery of a metadata record, other appropriate elements that would help find and distinguish a sample should also be included from IGSNDescriptive at an IGSN Allocating Agent’s discretion (e.g., name, description, resourceTypes, material). If no title information is available, there is the option to fill this property with an appropriate standard value for unknown information from Table 11 of the DataCite Schema[13] (page 45).
For newly created IGSN IDs in DataCite services, there is no need to add the sample identifier to the title if it is the same as the DOI suffix; however, other local identifier codes should be incorporated.
DataCite:Publisher contains ‘the name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource’. It directly maps to IGSNRegistration:registrant; namely, the organization registering the IGSN ID for the physical sample.
The year when the sample was first made available to the research community. For existing IGSN IDs that are re-registered in DataCite services, the year of the IGSNRegistration:log.logElement.timeStamp value associated with the first ‘re
.gistered’ or ‘submitted’ IGSNRegistration:log.logElement.event (these terms are used interchangeably during registration of IGSN IDs) is mapped to DataCite:PublicationYear.
For new IGSN IDs, it is likely to be the year at the time the physical sample was registered. Unless, the sample was somehow released before registration of its metadata record.
DataCite:resourceTypeGeneral will be ‘PhysicalObject’ for all (re-)registered IGSN IDs. More specific types from IGSNDescriptive or other sources (e.g., ontologies or shared vocabularies) may be used in the optional, free-text resourceType property. This will be negotiated with each IGSN Allocating Agent. However, in the absence of an agreed vocabulary, the WG strongly recommends that the terms ‘material sample’ or ‘feature-of-interest’ be used in DataCite:resourceType to at least distinguish between these sampling concepts.
This free-text property contains the ‘subject, keyword, classification code, or key phrase describing the resource’. For IGSN IDs, this is the materials that compose the sample. In particular, for re-registered IGSN IDs, DataCite:Subject is equivalent to IGSNDescriptive:materials.
Since materials may be categorized under different schemata, the subproperties DataCite:Subject.subjectScheme and DataCite:Subject.schemeURI should also be included whenever possible.
All institutions and people involved in a sample’s workflow—from collection to archival (or discarding/destruction)—are captured in the free-text property DataCite:Contributor. If DataCite:Contributor is used, then the subproperties DataCite:Contributor.contributorName and DataCite:Contributor.contributorType are mandatory. For re-registered IGSN IDs, DataCite:Contributor.contributorName is equivalent to IGSNDescriptive:contributors.contributor.name and DataCite:Contributor.contributorType is equivalent to IGSNDescriptive:contributors.contributor.contributorType.
Note here that to be included in the reference to a resource, a person or organization must be listed in DataCite:Creator (and can then be additionally listed in DataCite:Contributor). People and organizations listed only in DataCite:Contributor are not included in the resource reference.
Other available DataCite.Contributor properties that may be applicable to IGSN IDs are
Contributor.contributorName.nameType – Takes the values ‘Personal’ (default) or ‘Organizational’.
Contributor.nameIdentifier – For example, the contributor’s ORCID ID. The name identifier scheme (in this case, ‘ORCID’) is then expressed in Contributor.nameIdentifier.nameIdentifierScheme.
Contributor.affiliation – Free text. For an organizational contributor, this is the name of the formal institution to which it belongs. Should ideally be used alongside Contributor.affiliation.affiliationIdentifier to uniquely identify the organizational affiliation according to a scheme (e.g., ROR).
All dates relevant to the material sample. If DataCite:Date is used then DataCite:Date.dateType (controlled list) is mandatory. For re-registered IGSN IDs, DataCite:Date = IGSNRegistration:log.timeStamp with the appropriate dateType included (see 4.3 IGSNRegistration:log for more details). Furthermore, if the collection time of the sample has been captured in IGSNDescriptive:collectionTime, then it is mapped to DataCite:Date[dateType=Collected].
IGSN IDs registered in the IGSN central registry are expected to have at least two identifiers: IGSNRegistration:sampleNumber and one or more identifiers assigned by a researcher or project (IGSNDescriptive:identifier). When an already existing IGSN ID is re-registered in DataCite services, IGSNRegistration:sampleNumber should be included in the DataCite Schema as follows to maximize discoverability:
DataCite:alternateIdentifier = IGSNRegistration:sampleNumber (including the Handle prefix 10273), and with DataCite:alternateIdentifierType = ‘IGSN’. (Note: both are free text.)
DataCite:relatedIdentifier = IGSNRegistration:sampleNumber (free text), DataCite:relatedIdentifierType = ‘IGSN’ (controlled list), and DataCite:relationType = ‘IsIdenticalTo’ (controlled list).
Researcher/project identifiers are included using the free-text DataCite:alternateIdentifier property with the (free-text) value of the DataCite:alternateIdentifierType property chosen by the IGSN Allocating Agent. The WG suggests using ‘local’ as the default value for the latter.
For newly created IGSN IDs, there is no requirement to use the DataCite:alternateIdentifier property. But, if any other identifiers exist, then this is where they should be placed with an appropriate DataCite:alternateIdentifierType.
Connecting samples to one another, and to research based on them, is a primary goal of the IGSN ID. Such connections are captured through DataCite:RelatedIdentifier, which lists the globally unique Identifiers assigned to related resources. It is therefore recommended that DataCite:RelatedIdentifier is used to the maximum extent possible and is updated on a regular basis.
For re-registered IGSN IDs, all connection information contained in both IGSNRegistration:relatedResourceIdentifier and IGSNDescriptive:relatedIdentifiers.relatedIdentifier should be mapped to DataCite:RelatedIdentifier.
If DateCite.RelatedIdentifier is used, then its subproperty relatedIdentifierType is mandatory and with values selected from a controlled list that includes ‘IGSN’. Again, for re-registered IGSN IDs, any relatedIdentifierType in IGSNRegistration and IGSNDescriptive should be mapped:
DataCite:RelatedIdentifier.RelatedIdentifierType = IGSNRegistration:relatedResourceIdentifier.relatedIdentifierType
IGSNDescriptive:relatedIdentifiers.relatedIdentifier.identifierType
The RelationType subproperty of DataCite:RelatedIdentifier is also mandatory. Taking values from a controlled list, RelationType describes relationships between the material sample for which the IGSN ID is being registered and related resources (features-of-interest, parent samples, subsamples, datasets, publications,...). DataCite:RelatedIdentifier should be used for any relationType listed in IGSNRegistration Table 3.2.
For re-registered IGSN IDs:
DataCite:RelatedIdentifier.RelationType = IGSNRegistration:relatedResourceIdentifier.relationType
IGSNDescriptive:relatedIdentifiers.relatedIdentifier.relationType
It is important to note here that DataCite:RelatedIdentifier is used for making connections that mirror sample hierarchies. Because the parent IGSN ID is a key element in IGSN ID metadata, for new IGSN IDs created under DataCite services, it is recommended that a child (sub)sample identifies its parent using the relationType ‘IsPartOf’ or ‘IsDerivedFrom’. Vice versa, a parent sample can identify its children using 'HasPart’ or ‘IsSourceOf’.
It is valuable to include additional information about a sample, particularly about its ‘birth’, in DataCite:Description (free text). If DataCite:Description is used, then DataCite:Description.descriptionType is mandatory. Values for the latter are selected from a controlled list, with the most relevant for IGSN IDs being
Abstract – Brief description of the resource and the context in which it was created.
Methods, – The methodology employed for the study or research.
Both of these are important for discovery purposes.
For re-registered IGSN IDs:
DataCite:Description[descriptionType=Abstract] = IGSNDescriptive:description
DataCite:Description[descriptionType=Methods] = IGSNDescriptive:collectionMethods.collectionMethod
DataCite:GeoLocation is used to encode information on the ‘spatial region or named place where the data was gathered or about which the data is focused’. The property can be repeated to indicate a number of different locations, and can express a location as a point, bounding box or polygon, or simply as a free-text description through its (respective) subproperties: geoLocationPoint, geoLocationBox, geoLocationPolygon, and geoLocationPlace.
For IGSN IDs, this property will contain where a sample was acquired relative to the Earth or another astronomical object. Note that it may not be relevant for samples that are ‘non-geographic’ (e.g., a synthetic material).
In general for re-registered IGSN IDs, DataCite:GeoLocation is equivalent to IGSNDescriptive:geoLocations. More specifically, the following relationships hold:
DataCite:GeoLocation.geoLocationPoint = IGSNDescriptive.geoLocations.geoLocation.geometry[geometryType=Point]
DataCite:GeoLocation.geoLocationPoint[1..n] = IGSNDescriptive.geoLocations.geoLocation.geometry[geometryType=MultiPoint]
DataCite:GeoLocation.geoLocationPolygon = IGSNDescriptive.geoLocations.geoLocation.geometry[geometryType=Polygon]
DataCite:GeoLocation.geoLocationPolygon[1..n] = IGSNDescriptive.geoLocations.geoLocation.geometry[geometryType=MultiPolygon]
DataCite:GeoLocation.geoLocationPlace = IGSNDescriptive.geoLocations.geoLocation.toponym