1. Abstract

This Model - the IDN Supermodel - is the Indigenous Data Network’s overarching data model that provides modelling patterns and integration logic for data elements in several specialised domains. With this model, you can test whether an area of concern for the IDN is formally modelled using Semantic Web technologies and, if it is, whether or not you can relate model elements to other IDN information.

This Supermodel is formulated similarly to other Supermodels in other, semi-related, scenarios such as Geoscience Australia's Supermodel for the Location Index initiative.

2. Metadata

This metadata is for this document.

IRI

https://linked.data.gov.au/def/idn-supermodel

Note the published IRI for this data model will be subject to approval of IDN-ARDC IIRC Project governance processes and will reflect the long term sustainability model for the specification. Nothing normative should be inferred from the initial IRI being in a gov.au domain.

Editor’s Draft

http://idnau.org/def/sm

The "under development" and latest version of the IDN Supermodel

Title

Indigenous Data Network Australia Supermodel Specifciation

Description

This Model - the IDN Supermodel - is the Indigenous Data Network’s overarching data model that provides modelling patterns and integration logic for data elements in several specialised domains.

Created

2022-03-02

Modified

2022-05-16

Issued

0000-00-00

Creator

Indigenous Data Network

Publisher

Indigenous Data Network

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Machine-readable form

supermodel.ttl

3. Preamble

3.1. Namespaces

This model is built on a "baseline" of Semantic Web models which use a variety of namespaces. Prefixes for thess namespaces, used througout this document, are listed below.

Table 1. Namespaces
Prefix Namespace Description

super

https://linked.data.gov.au/def/supermodel/

the generic Supermodel model

td:

https://linked.data.gov.au/def/supermodel/terms/

the Terms & Definitions vocabulary within the Supermodel model

dcterms:

http://purl.org/dc/terms/

Dublin Core Terms vocabulary namespace

ex:

http://example.com/

Generic examples namespace

owl:

http://www.w3.org/2002/07/owl#

Web Ontology Language ontology namespace

rdfs:

http://www.w3.org/2000/01/rdf-schema#

RDF Schema ontology namespace

sosa:

http://www.w3.org/ns/sosa/

Sensor, Observation, Sample, and Actuator ontology namespace

skos:

http://www.w3.org/2004/02/skos/core#

Simple Knowledge Organization System (SKOS) ontology namespace

time:

http://www.w3.org/2006/time#

Time Ontology in OWL namespace

void:

http://rdfs.org/ns/void#

Vocabulary of Interlinked Data (VoID) ontology namespace

xsd:

http://www.w3.org/2001/XMLSchema#

XML Schema Definitions ontology namespace

3.2. Terms & Definitions

The following terms appear in this document and, when they do, the definitions in this section apply to them. This section’s content is also presented online in a formal vocabulary at:

Term IRI Definition Source

Central Class

td:central-class

Central Classes are the generic data classes at the centre of Data Domains with high-level relationships between them defined in this supermodel.

These classes are taken from general standards - usually well-known international stadnards - and the Indigenous Data Network specialises and extends them to make specific, custom, classes for their needs.

Supermodel model

Component Data Model

td:component-data-model

A data model for a particular component of a Supermodel. The Component Data Model may have been designed for a particular Supermodel that uses t but it may also pre-exist and it just indicated for use within the Supermodel.

A Supermodel will always need to provide mappings from classes within a Component Data Model to other Supermodel elements for interoperability

Supermodel model

Data Domain

td:data-domain

High-level conceptual areas within which Geosicence Australia has data.

These Data Domains are not themed scientificly - 'geology', 'hydrogeology', etc. - but instead based on parts of the Observations & Measurement [ISO19156] standard, realised in Semantic Web form in the SOSA Ontology, part of the Semantic Sensor Network Ontology [SSN].

Current Data Domain are shown in Figure 1.

Supermodel model

Knowledge Graph

td:knowledge-graph

A Knowledge Graph is a dataset that uses a graph data tructure - nodes and edges - with strongly-defined elements.

Common use, e.g. https://en.wikipedia.org/wiki/Knowledge_graph

Linked Data

td:linked-data

A set of technologies and conventions defined by the World Wide Web Consortium that aim to present data in both human- and machine-readable form over the Internet.

Linked Data is strongly-defined with each element having either a local definition or a link to an available definition on the Internet.

Linked Data is graph-based in nature, that is it consistes of nodes and edges that can forever be linked to further conceps with defined relationships.

https://www.w3.org/standards/semanticweb/data

Semantic Web

td:semantic-web

The World Wide Web Consortium's vision of an Internet-based web of Linked Data.

Semantic Web is used to refer to something more than just the technologies and conventions of Linked Data; the term also encompases a specific set of interoperable data models - often called ontologies - published by the W3C, other standards bodies and some well-known companies.

The 'semantic' refers to the strongly-defined nature of the elements in the Semantic Web: the meaning of Semantic Web data is as precicely defined as any data can be.

https://www.w3.org/standards/semanticweb/

3.3. Conventions

All model diagrams use elements introduced in Figure 1. These elements are defined in the RDF, RDFS and OWL ontologies, see [OWL] for mode details.

All code snippets in this document, used to show formal and machine-readable versions of concepts, are expressed using the Turtle RDF syntax [TTL].

4. Introduction

4.1. Supermodel Structure

Supermodels, such as this one, consist of a set of Component Data Models instances within Data Domains needed for various purposes. This Supermodel identifies the Central Classes within each Data Domain and associates them with one another - across Data Domains - and with other classes within Data Domains using Linked Data principles.

Altogether, the various Component Data Models, Central Classes these things form a Knowledge Graph of data for the Indigenous Data Network (IDN) that participates in the wider, international, Semantic Web.

All Component Data Models and this Supermodel itself are modelled using the Web Ontology Language [OWL] and specailisations of it, such as the Simple Knowledge Observation System [SKOS] which is used for modelling taxonomies of concepts. As well as the textual and image descriptions of this model here, machine-readable versions of this model and all Component Data Models are available in the Turtle [TTL] RDF format. See the Metadata section for the Supermodel Turtle file and each Component Data Model’s metadata.

4.2. IDN Data Domains

This IDN Supermodel is predicated on an assumption that the IDN is a data aggregation organisation and therefore data cataloguing is its major concern. At the centre of this model then is a domain of Data Cataloguing, the main elements of which are taken from the Data Catalog Vocabulary [DCAT]. Many Supermodels have this as one of their Data Domains

The IDN is expected to contain datasets with information about Indigenous Australian people sampled, "observed", from census and other population data, so the domain of Observations is provided and it uses the Data Cube Vocabulary [QB] as its core. QB is a statistical/sampling slicing and dimensions model. Other Supermodels use variations on this domain such as "Sampleing" where the sampling/observation is physical, not statistical.

Some of the IDN’s data is spatial - the location of Indigenous Australian peoples, landforms etc. - hence a Data Domain of Spatiality is provided, for which the GeoSPARQL [GEO] is the modelling core. Spatiality is the predominant Data Domain of some other Supermodels, such as the LocI Supermodel.

Like all Supermodels, the IDN’s data is cetegorised in various ways and for this the Data Domain of Theming is provided. Within it, taxonomy representation using [SKOS] is paramount.

Finally, and again like all Supermodels, the IDM must relate organisations and people to its contents (data) and maintain knowledge of organisations and the relationships between them so the Data Domain <Organisations & People Domain, Organisations & People>> is presented which is modelled using a number of well-known Semantic Web models, in particular schema.org [SDO].

4.3. Uses

This Supermodel can be used as a technial reference for all elements of the IDN’s modelling. Using this document, an IDN technical modeller or data user should be able to discover how each compoennt domain of the IDN has been modelled and how they relate to all other component domains.

Data from any IDN component domain, and the model elements of the Supermodel also, can be placed into a single RDF database for querying.

5. Model

This model, the actual IDN Supermodel itself, is a profile of - specialised reusing of - multiple, well-known Semantic Web models. It is organised into a series of Levels which serve different purposes. All elements of the model are only defined once and the various Levels simply present views of the model at differnt levels of abstraction to serve their viarous intended purposes.

5.1. Level 0: Model Background

This view of the model is a backgrounding one which describes the underpinning model mechanics that it uses. The object modelling used is based on the Web Ontology Language [OWL] and its own underlying use of RDF & RDFS [1]. The Provenance Ontology [PROV] is used to model real-world causal dependencies - provenance.

5.1.1. Diagram Key

The figure below is a key for the elements in all of the model diagrams in this document.

key
Figure 1. Diagram elements key

5.1.2. Object Modelling

The elements from the above subsection are shown in relation to one another in the figure below.

level0 owl
Figure 2. OWL objects and their relations

The elements shown above are identified with prefixed IRIs that correspond to entries in the Namespace Table. A short explanation of the diagram key elements is:

  • owl:Class - represents any conceptual class of objects. Classes are expected to contain individuals - instances of the class - and the class, as a whole, may have realtions to other classes

  • owl:NamedIndividual - an individual of an owl:class. For example, for the class ships, an individual might be Titanic

  • rdf:property - a relationship between classes, individuals, or any objects and Literals

  • rdfs:subClassOf - an rdf:property indicating that the domain (from object) is a subclass of the range (to objects). An example is the class student which is a subclass of person: all students are clearly persons but not vice versa

  • rdf:type - the property that related an owl:NamedIndividual to the owl:Class that it’s a member of

  • Literal - a simple literal data property, e.g. the string "Nicholas", or the number 42. Specific literal types are usually indicated when used

The remaining diagrams in this document use extensions to this basic model, for example Figure 3 uses colour-coded specialised forms of owl:Class (subclasses of it) and the relations in Figure 5 are specialised forms of rdf:property.

5.1.3. Provenance

General provenance/lineage information about anything - a rock sample, a dataset, a term in a vocabulary etc. - is described using the Provenance Ontology [PROV] which views everything in the world as being of one or more types in Figure 3.

level0 prov
Figure 3. PROV main classes and main relations

According to PROV, all things are either a:

  • prov:Entity - a physical, digital, conceptual, or other kind of thing with some fixed aspects

  • prov:Agent - something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent’s activity

  • prov:Activity - something that occurs over a period of time and acts upon or with entities

While not often in front of mind for objects in any Data Domain, provenance relations always apply, for example: a sosa:Sample within the Sampling domain is a prov:Entity and will necissarily have been created via a sosa:Sampling which is a prov:Activity. Another example: an sdo:Person related to a dcat:Dataset via the property dcterms:creator in the DataCataloging domain is a specialised form of a prov:Agent related to a prov:Entity via prov:wasAttributedTo.

5.2. Level 1: Data Domains

Data Domains are a Supermodel’s major areas of concern and Level 1 just presents them with no further details. The Data Doamins defined for the IDN so far are:

These are shown in Figure 1 below.

data domains
Figure 4. Top-level view of the IDN Supermodel showing Data Domains

5.3. Level 2: Central Classes

Central Classes are the major model classes within each Data Domain and Level 2 shows the Data Domains with these classes, and the main relationships between them, indicated. These relationships are used to traverse across Data Domains.

These are shown in Figure 2 below.

central classes
Figure 5. Central Classes of the Data Domains and their major relationships. Those as of now still undefined are indicated as ---???---

Not all of the relationships between Central Classes have been formalised yet. As they are formaliused, they will be indicated in Figure 2 above.

Note that the relationship between Dataset in the Data Catalouging Domain and Agent in the Organisations & People Domain is indicated at this level as using a prov:qualifiedAttribution. While other Supermodels indicate simple attribution for this relationship, this IDN modelling uses this particular qualified pattern to allow for role-based data/agent relations.

5.4. Level 3: Domain Main Modelling

Top-level modelling within Data Domains - representations of their major concepts and relationships - are given at Level 3 of the Supermodel. The following Subsections indicate such modelling for each Data Domain.

These Domain Main Models can be used to ascertain whether or not a Supermodel covers particular aspects of a Data Domain. They also indicate which Component Data Models are used per Data Domain via their various classes which might come from several.

5.4.1. Data Catalouging Domain

main classes data cataloguing
Figure 6. Central Classes of the Data Cataloguing Domain

While the agent/role Attribution modelling within this Domain, as shown in Figure 6, is handled in a standard way here which is according to [PROV] and [DCAT], it is brought to the fore in this Supermodel whereas it isn’t in others, such as the GA Supermodel. This is because, in the IDN, attribution of data to groups and persons with defined roles is a very important concern.

5.4.2. Observations Domain

main classes observations
Figure 7. Central Classes of the Observations Domain

5.4.3. Organisations & People

main classes organisations
Figure 8. Central Classes of the Organisations & People Domain

5.4.4. Spatiality

main classes spatiality
Figure 9. Central Classes of the Spatiality Domain

5.4.5. Theming

main classes theming
Figure 10. Central Classes of the Theming Domain

6. Data Domains Details

The Data Domains described above are implemented using multiple models and other resources. The following subsections describe the Domains' details and link to all their resources.

6.1. Data Cataloguing Domain

This model is based on the Data Catalog Vocabulary [DCAT] and The Provenance Ontology [PROV] with extensions to cater for mappings to FAIR [FAIR] and CARE [CARE] models. The essential model is shown in Figure 8.

details datacataloguing
Figure 11. Data Cataloguing Model, based on DCAT & PROV

6.2. Observations Domain

This domain is essentially the Data Cube Vocabulary [DQ]'s core elements with dataset metadata replaced with elements from DCAT [DCAT]. The essential model is shown in Figure 9.

details observations
Figure 12. Observations Model, based on Data Cube Vocabulary overview

6.3. Spatiality Domain

Spatiality is a concern for the IDN but not aprimary concern as it is in other Supermodels such as the Loc-I Supermodel. The IDN is aiming to be fully compatable with the Loc-I Supermodel such that any IDN spatial datasets are able to understood as Loc-I datasets and thus will be interoperable with other Loc-I Datasets. This will enable their use with the wider Loc-I spatial data.

6.4. Theming Domain

The purposes of characterising theming of data as a whole domain, rather than just applying it with Data Cataloguing, is to call it out as a major concern for IDN data beyond just cataloguing and to allow across allow for the full expression of sophisticated classification mecanisms.

The general mecahnisms for theming within this domain are standard, that is they follow patterns indicated in [DCAT] and [SKOS] used in many other models.

The DCAT pattern is, for whole catalogues, to identify Knowledge Organization Systems (KOSes) and then, for idividual datasets, to categorise (theme) them using controlled themese from those KOSes. DCAT recommends the use of [SKOS] as a partisular KOS.

The general pattern for theming (classifying or categorising) datasets or other IDN objects is as per Figure 13 where the [DCAT] theme property is used.

fig-theming-pattern
Figure 13. The pattern of how themese (classifications / categories) are applied to Datasets and other IDN objects

A variation on this pattern is to use the Linked Data qualified relations pattern which

fig-theming-pattern
Figure 14. The Linked Data qualified relations pattern (A.) and an implementation of it for dataset attribution (B.)

6.4.1. Specific Themeing Resources

This Supermodel specifies the use of several specific SKOS vocabularies for data theming that express core classifications of interest to the IDN. Some of these vocabularies are widely used and the IDN has adopted them, others the IDN has created for its specific needs. The vocabularies are listed here but detailed in Annex A: Vocabularies. They are:

  1. [ISO19115-1]'s Role Codes

    • for Agent instances' roles in relation to Dataset instances

  2. Indigenous Data Governance Roles

    • for indigenous Agent roles in relation to Dataset instances

    • inspired by the Role Codes above

Use of th

6.5. Organisations & People Domain

This domain is essentially the organization modelling element of schema.org with a few additional properties to track some IDN-relevant aspects of an organisation. The essential model is shown in Figure 10.

details organisations
Figure 15. IDN Organisation Model, based on schema.org

Annex A: Vocabularies

ISO19115-1 Role Codes

This vocabulary is a well known and widely used codelist of the roles that `Agent`s play in relation to `Dataset`s. The codelist was originally presented in the first version of ISO19115 (2003) as a table in a PDF document and has been delivered online in multiple forms but always unofficially: the ISO has not yet published an authoratitive form.

A current ISO test publication of this codelist as a SKOS vocabulary is online at http://115.146.86.155/vocab/CI_RoleCode, taken from the 2018 form of ISO19115-1.

The roles and their descriptions from this codelist are given in the table below for quick reference.

Table 2. [ISO19115]'s Role Code vocabulary
Concept Definition

author

party who authored the resource

co-author

party who jointly authors the resource

collaborator

party who assists with the generation of the resource other than the principal investigator

contributor

party contributing to the resource

custodian

party that accepts accountability and responsibility for the resource and ensures appropriate care and maintenance of the resource

distributor

party who distributes the resource

editor

party who reviewed or modified the resource to improve the content

funder

party providing monetary support for the resource

mediator

a class of entity that mediates access to the resource and for whom the resource is intended or useful

originator

party who created the resource

owner

party that owns the resource

point of contact

party who can be contacted for acquiring knowledge about or acquisition of the resource

principal investigator

key party responsible for gathering information and conducting research

processor

party who has processed the data in a manner such that the resource has been modified

publisher

party who published the resource

resource provider

party that supplies the resource

rights holder

party owning or managing rights over the resource

sponsor

party who speaks for the resource

stakeholder

party who has an interest in the resource or the use of the resource

user

party who uses the resource

Indigenous Data Governance Roles

TODO: Establish this vocab in RVA

This vocabulary is core to the description of indigenous governance of IDN data. It specifies a series of indigenous roles that are partly derived from the roles in the Role Codes vocabulary above but uses definitions established by the IDN.

Note
This vocabulary is in development within the IDN so, until this message is removed, this is not a complete or final vocabulary.
Table 3. [ISO19115]'s Role Code vocabulary
Concept Definition

indigenous custodian

Indigenous party that accepts accountability and responsibility for resource from an indigenous perspective. A non-indigenous Custodian might act on behalf of an Indigenous Custodian

subject group

Indigenous party that the resource contains information about

indigenous subject group point of contact

Point of Contact for an Indigenous Subject Group

indigenous point of contact

An indigenous party who can be contacted for acquiring knowledge about or acquisition of the resource

indigenous rights holder

Party owning or managing indigenous rights over the resource

indigenous stakeholder

Indigenous party who has an interest in the resource or the use of the resource

References


1. RDF: https://www.w3.org/RDF/, RDFS: https://www.w3.org/TR/rdf-schema/. These references generally need not be followed as descriptions of the use of OWL will cover their relevant concepts.