Description
The term "provenance" here means what is sometimes also referred to as data lineage. Specifically, approaches for keeping track of the origin and changes of data, in a structured, machine-processable form.
Introduction
Many application areas of 3D Tiles involve planning and decision making. Examples are architecture, engineering, construction, and mission support. The decision making processes here crucially depend on knowledge about the reliability of the data. This includes information about the origin of the data (e.g. whether it is a CAD application or a drone scan), possible preprocessing steps (like simplification or optimizations for visualization), or manual modifications (like annotations have have been added and stored as metadata). The concept of data provenance could therefore be applied to 3D Tiles on many different levels.
Scope
For now, this issue focusses on a very narrow subset of the data, namely on metadata. On some level, there even isn't a clear technical distinction between "metadata" and "geometry" - namely, when metadata is represented in binary form with EXT_structural_metadata
. So fow now, the focus is very narrow on the JSON-structured metadata. And in this narrow context, the goal of provenance could be compared to that of a version control system - namely, to know which modifications have been applied to the metadata.
Goals
The goal of preserving provenance information in 3D Tiles in the given scope would be
- knowing the original data (including information about its origin)
- for each modification:
- when was the modification?
- who did the modification?
- maybe 'metadata' (e.g. reasons for the modification - similar to a commit message)
- what was the old state and what is the new state?
Representation and Storage
The provenance information could either be tracked and stored externally, or be part of the tileset itself (as meta-metadata...). In both cases, one could very broadly categorize two representations:
- Storing the new state (and deriving the difference by comparing the new state to the old state)
- Storing the difference (and deriving the new state by applying the 'change' to the old state)
The best choice here will depend on the granularity and modification types (see below). Common 'event databases' store the initial state, and all modifications as a sequence of 'transactions/events' that modify the data. If the goal is to store the provenance information in the tileset itself, then the size of that data may grow considerably (particularly, when certain "bulk operations" are applied to binary metadata). Further options could be considered - like only storing the "initial" and the "latest" state, or only storing the "previous" and "current" state.
Granularity
A first differentiation could be the granularity level on which the modifications take place:
- coarse-grained: on the level of tileset-, tile-, and content metadata
- fine-grained: in tile content - i.e. in
EXT_structural_metadata
in glTF assets
Modification Types
Another differentiation could be whether only values are modified, or whether there can also be structural modifications.
- values: Setting a new value for an existing metadata property (like
tileset.metadata.properties["lastModificationDate"] = "2024-01-23"
) - structure: Adding a new property to an existing metadata class (like
schema.classes["..."].properties.add("lastModificationDate", "STRING")
)
Maybe this issue can be used to
- gather more specific requirements
- think about approaches that could cover "as many of them as reasonably(!) possible"
- identify the limits of these approaches
- draft out possible implementations (for data producers and consumers)