This project provides the JSON schemas for defining metadata fields and their requirements for The National Archives. It aims to standardize the structure of metadata across teams/catalogues, facilitating interoperability and consistency in data representation.
The National Archives (TNA) collects and preserves digital records from various sources, including government departments, courts and public enquiries. To manage these records effectively, TNA requires a consistent and structured approach to metadata creation and management.
While JSON is probably the most popular format for exchanging data, JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability at scale.
This project provides a set of JSON schemas that define the structure and requirements of metadata fields used in TNA Digital Archiving. The schema are designed to conform to the JSON Schema specification, extended to include custom keywords and properties specific to TNA Digital Archiving metadata requirements.
- Flexibility: The schema supports a wide range of metadata fields commonly found in catalogues, including textual descriptions, numerical values, dates, and more.
- Validation: Ensures that metadata entries adhere to a predefined structure and meet specified requirements, reducing errors and inconsistencies.
- Extensibility: Easily extend the schema to accommodate additional metadata fields or custom requirements specific to different teams or use cases.
JSON Schema have defined keywords used to define data.
- The National Archives schema are based on the JSON schema draft-07 specification with extensions for domain specific requirements, including:
daBeforeToday
indicates a supplied date must be before nowalternateKeys
allows for alternate names for metadata fields
Metadata standards are defined against specific field names. Each team may use alternate names for these fields but still require the same validation. The mapping is defined in alternateKeys
.
CSV files are the standard way to upload metadata to TDR and their data value type
needs to be evaluated.
- CSV file headers are defined in
alternateKeys -> tdrFileHeader
. - conversions from the CSV string to the
type
required for validation.
The base schema is used for the configurations
These schemas are used by the JSON schema validator to validate metadata.
- base schema
- definitions schema
- closure schema open
- closure schema closed
- relationship schema
- required schema
The base schema defines the metadata fields that are supported within Digital Archiving.
- Example definition for
end_date
{
"end_date": {
"description": "The date the record ends",
"type": [
"string",
"null"
],
"format": "date",
"propertyType": "Supplied",
"alternateKeys": [
{
"tdrFileHeader": "Date of the record",
"tdrDataLoadHeader": "end_date"
}
],
"daBeforeToday": "Validates that end date is earlier than today's date"
}
}
- end_date field key
- type - value can be a string or null (undefined)
- format - date - the string will be in the format 2018-11-13 (This date format is specific for TDR)
- alternateKeys -> tdrFileHeader - Date of the record (The column header in the TDR metadata file)
- daBeforeToday - the date must be before today
The definitions schema defines allowed values for fields (such as FOI exemption codes). These can then be referenced in other schemas.
The closure schema open defines the schema for Open
records
- if closure_type is Open
- Then
- no closure_period
- no closure_start_date
- no description_closed
- no foi_exemption_asserted
- no foi_exemption_code
- no title_alternate
- no description_alternate
- title_closed is false
- description_closed is false
- Then
The closure schema closed defines the schema for Closed
records
- if closure_type is Closed
- Then the following fields are required
- closure_period
- closure_start_date
- description_closed
- foi_exemption_asserted
- foi_exemption_code
- title_alternate
- description_alternate
- title_closed
- description_closed
- If title_closed is true then title_alternate is required
- If description_closed is true then description_alternate is required
- Then the following fields are required
This schema is used to enforce cross attribute relationships.
{
"if": {
"properties": {
"file_name_translation": {
"type": "string",
"minLength": 1
}
}
},
"then": {
"properties": {
"file_name_translation_language": {
"type": "string",
"minLength": 1
}
},
"required": ["file_name_translation_language"]
}
}
If there is a file_name_translation
then there must be a file_name_translation_language
.
The required schema defines the required fields in a metadata file uploaded to TDR
{
"$id": "/schema/required",
"type": "object",
"required": [
"file_path",
"end_date",
"description",
"closure_type",
"closure_period",
"closure_start_date",
"description_closed",
"foi_exemption_asserted",
"foi_exemption_code",
"title_closed",
"title_alternate",
"description_alternate"
]
}
The data load SharePoint schema defines what properties are permitted when loading metadata directly from SharePoint.
It is a sub-set of the Base Schema properties.
Example data:
{
"date_last_modified": "2001-12-12",
"client_side_checksum": "8b9118183f01b3df0fc5073feb68f0ecd5a7f85a88ed63ac7d0d242dc2aba2ea",
"file_size": 26,
"file_path": "a/filepath/filename1.docx",
"UUID": "b8b624e4-ec68-4e08-b5db-dfdc9ec84fea"
}
The output of a metadata file validation should be a JSON file conforming to the error file schema
An example of an error file conforming to this schema.
{
"consignmentId" : "5049c395-6124-40fd-bffa-3fe44223bbd0",
"date" : "2025-01-22",
"fileError" : "SCHEMA_VALIDATION",
"validationErrors" : [
{
"assetId" : "test/hi.txt",
"errors" : [
{
"validationProcess" : "SCHEMA_CLOSURE_CLOSED",
"property" : "Closure Start Date",
"errorKey" : "type",
"message" : "Must be provided for a closed record"
}
],
"data" : [
{
"name" : "Closure Start Date",
"value" : ""
}
]
}
]
}
consignmentId
is the unique identifier for the consignmentdate
is the date the error file was createdfileError
is the type of error. SCHEMA_VALIDATION indicates a schema validation errorassetId
for TDR is the file path as this should be unique within the consignment and the error can be tracked back to the originalvalidationProcess
indicates the schema used for validation that produced the errorerrorKey
is the keyword returned from the Json Schema validation. 'type' indicated null and not a Stringproperty
is the input key (TDR metadata file column header)message
is the user-friendly message for the error obtained as below, using the property name as used in validation
The tdr-metadata-validation produces errors with the following properties:
validationProcess
property
errorKey
For user-friendly messages see Validation-message.properties file
The format is {validationProcess}.{property}.{errorKey}={User friendly message}
To use the JSON schema in your project, follow these steps:
- Download: Clone this repository or download the
*schema.json
files directly. - Integration: Integrate the
*schema.json
files into your project where metadata validation is required. - Validation: Use JSON schema validation libraries in your preferred programming language to validate metadata objects against the provided schema.
An example using scala and the networknt json-schema-validator library is shown in BaseSpec.scala.
// load schema
val schemaPath = "metadata-schema/relationshipSchema.schema.json"
val schemaInputStream: InputStream = Files.newInputStream(new File(schemaPath).toPath)
val schema = getJsonSchemaFromStreamContentV7(schemaInputStream)
// load data
val dataPath = "/data/relationship.json"
val dataInputStream = getClass.getResourceAsStream(dataPath)
val node = getJsonNodeFromStreamContent(dataInputStream)
// validate data
val errors: util.Set[ValidationMessage] = schema.validate(node.toPrettyString, InputFormat.JSON)
To publish the schemas locally, run the following commands from the repository directory:
$ sbt copySchema copyValidationMessageProperties package publishLocal
Other sbt projects that have this project as a dependency can access the local snapshot version by changing the version number in their build.sbt or dependencies file, for example:
... other dependencies...
"uk.gov.nationalarchives" % "da-metadata-schema_3" % "[version number]-SNAPSHOT"
... other dependences...