0% found this document useful (0 votes)
6 views43 pages

Knowledge Graphs

This paper provides a comprehensive introduction to knowledge graphs, discussing their data models, query languages, and techniques for knowledge representation and extraction. It contrasts open and enterprise knowledge graphs, highlighting their applications and the importance of schema, identity, and context. The authors aim to serve researchers and practitioners new to the field by summarizing existing literature and outlining future research directions.

Uploaded by

prathammj903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views43 pages

Knowledge Graphs

This paper provides a comprehensive introduction to knowledge graphs, discussing their data models, query languages, and techniques for knowledge representation and extraction. It contrasts open and enterprise knowledge graphs, highlighting their applications and the importance of schema, identity, and context. The authors aim to serve researchers and practitioners new to the field by summarizing existing literature and outlining future research directions.

Uploaded by

prathammj903
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Knowledge Graphs

AIDAN HOGAN, IMFD, DCC, Universidad de Chile, Chile


EVA BLOMQVIST, Linköping University, Sweden
MICHAEL COCHEZ, Vrije Universiteit and Discovery Lab, Elsevier, The Netherlands
arXiv:2003.02320v6 [cs.AI] 11 Sep 2021

CLAUDIA D’AMATO, University of Bari, Italy


GERARD DE MELO, HPI, Germany and Rutgers University, USA
CLAUDIO GUTIERREZ, IMFD, DCC, Universidad de Chile, Chile
JOSÉ EMILIO LABRA GAYO, Universidad de Oviedo, Spain
SABRINA KIRRANE, SEBASTIAN NEUMAIER, and AXEL POLLERES, WU Vienna, Austria
ROBERTO NAVIGLI, Sapienza University of Rome, Italy
AXEL-CYRILLE NGONGA NGOMO, DICE, Universität Paderborn, Germany
SABBIR M. RASHID, Tetherless World Constellation, Rensselaer Polytechnic Institute, USA
ANISA RULA, University of Milano–Bicocca, Italy and University of Bonn, Germany
LUKAS SCHMELZEISEN, Universität Stuttgart, Germany
JUAN SEQUEDA, data.world, USA
STEFFEN STAAB, Universität Stuttgart, Germany and University of Southampton, UK
ANTOINE ZIMMERMANN, École des mines de Saint-Étienne, France
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered
significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic,
large-scale collections of data. After some opening remarks, we motivate and contrast various graph-based data
models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and
context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination
of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment,
refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge
graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques.
We conclude with high-level future research directions for knowledge graphs.
CCS Concepts: • Information systems → Graph-based database models; Information integration;
Additional Key Words and Phrases: knowledge graph

1 INTRODUCTION
Though the phrase “knowledge graph” has been used in the literature since at least 1972 [465], the
modern incarnation of the phrase stems from the 2012 announcement of the Google Knowledge
Graph [484], followed by further announcements of the development of knowledge graphs by
Airbnb [87], Amazon [298], eBay [417], Facebook [387], IBM [128], LinkedIn [224], Microsoft [482],
Uber [214], and more besides. The growing industrial uptake of the concept proved difficult for
academia to ignore: more and more scientific literature is being published on knowledge graphs,
which includes books (e.g. [425]), as well as papers outlining definitions (e.g., [141]), novel techniques
(e.g., [318, 424, 553]), and surveys of specific aspects of knowledge graphs (e.g., [400, 549]).
Underlying all such developments is the core idea of using graphs to represent data, often
enhanced with some way to explicitly represent knowledge [387]. The result is most often used in
application scenarios that involve integrating, managing and extracting value from diverse sources
of data at large scale [387]. Employing a graph-based abstraction of knowledge has numerous
benefits in such settings when compared with, for example, a relational model or NoSQL alternatives.
Graphs provide a concise and intuitive abstraction for a variety of domains, where edges capture the
(potentially cyclical) relations between the entities inherent in social data, biological interactions,
bibliographical citations and co-authorships, transport networks, and so forth [17]. Graphs allow
maintainers to postpone the definition of a schema, allowing the data – and its scope – to evolve in
a more flexible manner than typically possible in a relational setting, particularly for capturing
incomplete knowledge [3]. Unlike (other) NoSQL models, specialised graph query languages support
not only standard relational operators (joins, unions, projections, etc.), but also navigational
operators for recursively finding entities connected through arbitrary-length paths [16]. Standard
knowledge representation formalisms – such as ontologies [70, 239, 366] and rules [254, 288] –
can be employed to define and reason about the semantics of the terms used to label and describe
the nodes and edges in the graph. Scalable frameworks for graph analytics [335, 503, 563] can
be leveraged for computing centrality, clustering, summarisation, etc., in order to gain insights
about the domain being described. Various representations have also been developed that support
applying machine learning techniques directly over graphs [549, 559].
In summary, the decision to build and use a knowledge graph opens up a range of techniques that
can be brought to bear for integrating and extracting value from diverse sources of data. However,
we have yet to see a general unifying summary that describes how knowledge graphs are being
used, what techniques they employ, and how they relate to existing data management topics.
The goal of this tutorial paper is to motivate and give a comprehensive introduction to knowl-
edge graphs: to describe their foundational data models and how they can be queried; to discuss
representations relating to schema, identity, and context; to discuss deductive and inductive ways
to make knowledge explicit; to present a variety of techniques that can be used for the creation
and enrichment of graph-structured data; to describe how the quality of knowledge graphs can be
discerned and how they can be refined; to discuss standards and best practices by which knowledge
graphs can be published; and to provide an overview of existing knowledge graphs found in practice.
Our intended audience includes researchers and practitioners who are new to knowledge graphs.
As such, we do not assume that readers have specific expertise on knowledge graphs.
Knowledge graph. The definition of a “knowledge graph” remains contentious [40, 57, 141], where
a number of (sometimes conflicting) definitions have emerged, varying from specific technical
proposals to more inclusive general proposals; we address these prior definitions in Appendix A.
Herein we adopt an inclusive definition, where we view a knowledge graph as a graph of data
intended to accumulate and convey knowledge of the real world, whose nodes represent entities of
interest and whose edges represent relations between these entities. The graph of data (aka data graph)
conforms to a graph-based data model, which may be a directed edge-labelled graph, a property
graph, etc. (we discuss concrete alternatives in Section 2). By knowledge, we refer to something
that is known1 . Such knowledge may be accumulated from external sources, or extracted from the
knowledge graph itself. Knowledge may be composed of simple statements, such as “Santiago is
the capital of Chile”, or quantified statements, such as “all capitals are cities”. Simple statements
can be accumulated as edges in the data graph. If the knowledge graph intends to accumulate
quantified statements, a more expressive way to represent knowledge – such as ontologies or rules
– is required. Deductive methods can then be used to entail and accumulate further knowledge (e.g.,
“Santiago is a city”). Additional knowledge – based on simple or quantified statements – can also be
extracted from and accumulated by the knowledge graph using inductive methods.
Knowledge graphs are often assembled from numerous sources, and as a result, can be highly
diverse in terms of structure and granularity. To address this diversity, representations of schema,
identity, and context often play a key role, where a schema defines a high-level structure for the
knowledge graph, identity denotes which nodes in the graph (or in external sources) refer to
the same real-world entity, while context may indicate a specific setting in which some unit of
1A number of specific definitions for knowledge have been proposed in the literature on epistemology.

2
knowledge is held true. As aforementioned, effective methods for extraction, enrichment, quality
assessment, and refinement are required for a knowledge graph to grow and improve over time.
In practice. Knowledge graphs aim to serve as an ever-evolving shared substrate of knowledge
within an organisation or community [387]. We distinguish two types of knowledge graphs in
practice: open knowledge graphs and enterprise knowledge graphs. Open knowledge graphs are
published online, making their content accessible for the public good. The most prominent exam-
ples – DBpedia [311], Freebase [55], Wikidata [543], YAGO [243], etc. – cover many domains and
are either extracted from Wikipedia [243, 311], or built by communities of volunteers [55, 543].
Open knowledge graphs have also been published within specific domains, such as media [431],
government [233, 475], geography [497], tourism [13, 279, 328, 577], life sciences [82], and more
besides. Enterprise knowledge graphs are typically internal to a company and applied for com-
mercial use-cases [387]. Prominent industries using enterprise knowledge graphs include Web
search (e.g., Bing [482], Google [484]), commerce (e.g., Airbnb [87], Amazon [132, 298], eBay [417],
Uber [214]), social networks (e.g., Facebook [387], LinkedIn [224]), finance (e.g., Accenture [390],
Banca d’Italia [35], Bloomberg [347], Capital One [69], Wells Fargo [377]), among others. Ap-
plications include search [482, 484], recommendations [87, 214, 224, 387], personal agents [417],
advertising [224], business analytics [224], risk assessment [112, 522], automation [234], and more
besides. We will provide more details on the use of knowledge graphs in practice in Section 10.
Running example. To keep the discussion accessible, throughout the paper, we present concrete
examples in the context of a hypothetical knowledge graph relating to tourism in Chile (loosely
inspired by, e.g., [279, 328]). The knowledge graph is managed by a tourism board that aims to
increase tourism in the country and promote new attractions in strategic areas. The knowledge
graph itself will eventually describe tourist attractions, cultural events, services, and businesses, as
well as cities and inter-city travel routes. Some applications the organisation envisages are to:
• create a tourism portal that allows visitors to search for attractions, upcoming events, and
other related services (in multiple languages);
• gain insights into tourism demographics in terms of season, nationalities, etc.;
• analyse sentiment about various attractions and events, including positive reviews, summaries
of complaints about events and services, reports of crime, etc.;
• understand tourism trajectories: the sequence of attractions, events, etc., that tourists visit;
• cross-reference trajectories with available flights/buses to suggest new strategic routes;
• offer personalised recommendations of places to visit;
• and so forth.

Related Literature. A number of related surveys, books, etc., have been published relating to
knowledge graphs. In Table 1, we provide an overview of the tertiary literature – surveys, books,
tutorials, etc. – relating to knowledge graphs, comparing the topics covered to those covered herein.
We see that the existing literature tends to focus on specific aspects of knowledge graphs. Unlike
the various surveys that have been published, our goal as a tutorial paper is to provide a broad
and accessible introduction to knowledge graphs. Some of the surveys (in particular) provide more
in-depth technical details on their chosen topic than this paper; throughout the discussion, where
appropriate, we will refer to these surveys for further reading.
Structure. The remainder of the paper is structured as follows:
Section 2 outlines graph data models and the languages that can be used to query them.
Section 3 describes representations of schema, identity, and context in knowledge graphs.
Section 4 presents deductive formalisms by which knowledge can be represented and entailed.

3
Table 1. Related tertiary literature on knowledge graphs; ✓ denotes in-depth discussion, denotes brief
discussion, * denotes informal publication (arXiv)

rning

ns
s

efinitio
rise KG
lic Lea
uction

ations
dings

ment
tion
ent
gies

Gs
cs
g
Queryin

Entailm
t
Identity

y
Analyti

Open K
Quality

Publica

Prior D
Contex
Models

Shapes

Ontolo

Embed

Constr

Enterp
Symbo

Applic
Refine

Histor
GNNs
Rules
DLs
Publication Year Type
Pan et al. [392] 2017 Book ✓ ✓ ✓ ✓ ✓ ✓

Paulheim [400] 2017 Survey ✓

Wang et al. [549] 2017 Survey ✓

Yan et al. [567] 2018 Survey ✓ ✓ ✓ ✓ ✓

Gesese et al. [183] 2019 Survey ✓

Kazemi et al. [282] 2019 Survey* ✓ ✓ ✓

Kejriwal [286] 2019 Book ✓

Xiao et al. [562] 2019 Survey ✓

Wang and Yang [552] 2019 Survey ✓ ✓

Al-Moslmi et al. [8] 2020 Survey ✓

Fensel et al. [154] 2020 Book ✓ ✓

Heist et al. [229] 2020 Survey* ✓

Ji et al. [272] 2020 Survey* ✓ ✓ ✓ ✓

Hogan et al. 2020 Tutorial* ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

Section 5 describes inductive techniques by which additional knowledge can be extracted.


Section 6 discusses the creation and enrichment of knowledge graphs from external sources.
Section 7 enumerates quality dimensions by which a knowledge graph can be assessed.
Section 8 discusses various techniques for knowledge graph refinement.
Section 9 discusses principles and protocols for publishing knowledge graphs.
Section 10 surveys some prominent knowledge graphs and their applications.
Section 11 concludes with a summary and future research directions for knowledge graphs.
Appendix A provides historical background and previous definitions for knowledge graphs.
Appendix B enumerates formal definitions that will be referred to from the body of the paper.

2 DATA GRAPHS
At the foundation of any knowledge graph is the principle of first applying a graph abstraction to
data, resulting in an initial data graph. We now discuss a selection of graph-structured data models
that are commonly used in practice to represent data graphs. We then discuss the primitives that
form the basis of graph query languages used to interrogate such data graphs.

2.1 Models
Leaving aside graphs, let us assume that the tourism board from our running example has not
yet decided how to model relevant data about attractions, events, services, etc. The board first
considers using a tabular structure – in particular, relational databases – to represent the required
data, and though they do not know precisely what data they will need to capture, they start to
design an initial relational schema. They begin with an Event table with five columns:
Event(name, venue, type, start, end)
where name and start together form the primary key of the table in order to uniquely identify
recurring events. But as they start to populate the data, they encounter various issues: events may

4
have multiple names (e.g., in different languages), events may have multiple venues, they may
not yet know the start and end date-times for future events, events may have multiple types, and
so forth. Incrementally addressing these modelling issues as the data become more diverse, they
generate internal identifiers for events and adapt their relational schema until they have:

EventName(id, name), EventStart(id, start), EventEnd(id, end), (1)


EventVenue(id, venue), EventType(id, type)
With the above schema, the organisation can now model events with 0–𝑛 names, venues, and types,
and 0–1 start dates and end dates (without needing relational nulls/blank cells in tables).
Along the way, the board has to incrementally change the schema several times in order to support
new sources of data. Each such change requires a costly remodelling, reloading, and reindexing of
data; here we only considered one table. The tourism board struggles with the relational model
because they do not know, a priori, what data will need to be modelled or what sources they will
use. But once they reach the latter relational schema, the board finds that they can integrate further
sources without more changes: with minimal assumptions on multiplicities (1–1, 1–𝑛, etc.) this
schema offers a lot of flexibility for integrating incomplete and diverse data.
In fact, the refined, flexible schema that the board ends up with – shown in (1) – is modelling
a set of binary relations between entities, which indeed can be viewed as modelling a graph. By
instead adopting a graph data model from the outset, the board could forgo the need for an upfront
schema, and could define any (binary) relation between any pair of entities at any time.
We now introduce graph data models commonly used in practice [16].
2.1.1 Directed edge-labelled graphs. A directed edge-labelled graph (also known as a multi-relational
graph [29, 63, 386]) is defined as a set of nodes – like Santiago , Arica , EID16 , 2018-03-22 12:00 – and a
set of directed labelled edges between those nodes, like Santa Lucía city Santiago . In the case of
knowledge graphs, nodes are used to represent entities and edges are used to represent (binary)
relations between those entities. Figure 1 provides an example of how the tourism board could
model some relevant event data as a directed edge-labelled graph (for a formal definition of a
directed edge-labelled graph see Definition B.1 in Appendix B). The graph includes data about
the names, types, start and end date-times, and venues for events.2 Adding information to such
a graph typically involves adding new nodes and edges (with some exceptions discussed later).
Representing incomplete information requires simply omitting a particular edge; for example, the
graph does not yet define a start/end date-time for the Food Truck festival.
Modelling data as a graph in this way offers more flexibility for integrating new sources of data,
compared to the standard relational model, where a schema must be defined upfront and followed
at each step. While other structured data models such as trees (XML, JSON, etc.) would offer similar
flexibility, graphs do not require organising the data hierarchically (should venue be a parent, child,
or sibling of type for example?). They also allow cycles to be represented and queried (e.g., note
the directed cycle in the routes between Santiago, Arica, and Viña del Mar).
A standardised data model based on directed edge-labelled graphs is the Resource Description
Framework (RDF) [111], which has been recommended by the W3C. The RDF model defines
different types of nodes, including Internationalized Resource Identifiers (IRIs) [134] which allow for
global identification of entities on the Web; literals, which allow for representing strings (with or
2 We represent bidirectional edges as Viña del Mar bus Arica , which more concisely depicts two directed edges:
Viña del Mar bus Arica and Viña del Mar bus Arica . Also while some naming conventions recommend more
complete edge labels that include a verb, such as has venue or is valid from, in this paper, for presentation purposes, we will
omit the “has” and “is” verbs from such labels, using simply venue or valid from.

5
2018-03-29 20:00 Ñam Open Market Food Truck

end name type type name

2018-03-22 12:00 start EID15 type Food Festival type EID16

venue type venue venue

Santa Lucía Drinks Festival Sotomayor Piscina Olímpica

city city city


bus
Santiago Viña del Mar bus Arica
flight
flight

Fig. 1. Directed edge-labelled graph describing events and their venues.

City Country

type type type


borders
Santiago capital Chile Perú
borders

(a) Del graph


borders
Santiago : City capital Chile : Country Perú : Country
borders

(b) Heterogeneous graph

Fig. 2. Data about capitals and countries in a directed edge-labelled graph and a heterogeneous graph

without language tags) and other datatype values (integers, dates, etc.); and blank nodes, which
are anonymous nodes that are not assigned an identifier (for example, rather than create internal
identifiers like EID15, EID16, in RDF, we have the option to use blank nodes). We will discuss these
different types of nodes further in Section 3.2 when we speak about issues relating to identity.
2.1.2 Heterogeneous graphs. A heterogeneous graph [257, 551, 570] (or heterogeneous information
network [509, 510]) is a graph where each node and edge is assigned one type. Heterogeneous
graphs are thus akin to del graphs – with edge labels corresponding to edge types – but where the
type of node forms part of the graph model itself, rather than being expressed as a special relation,
as illustrated in Figure 2. An edge is called homogeneous if it is between two nodes of the same type
(e.g., borders); otherwise it is called heterogeneous (e.g., capital). A benefit of heterogeneous graphs
is that they allow for partitioning nodes according to their type, for example, for the purposes
of machine learning tasks [257, 551, 570]. Conversely, they typically only support a one-to-one
relation between nodes and types, which is not the case for del graphs (see, for example, the node
Santiago with zero types and EID15 with multiple types in Figure 1).

2.1.3 Property graphs. Property graphs were introduced to provide additional flexibility when
modelling more complex relations. Consider integrating incoming data that provides information
on which companies offer fares on which flights, allowing the board to better understand available

6
−33.45 −70.66 LA380 −18.48 −70.33

lat long from mode company to lat long

Santiago Flight LATAM Arica

type to mode company from type

Capital City LA381 Port City

Fig. 3. Directed edge-labelled graph with companies offering flights between Santiago and Arica

LA380 : flight
Santiago : Capital City company = LATAM Arica : Port City

lat = −33.45 lat = −18.48


long = −70.66 LA381 : flight long = −70.33
company = LATAM

Fig. 4. Property graph with companies offering flights between Santiago and Arica

routes between cities (for example, on national airlines). In the case of directed-edge labelled graphs,
we cannot directly annotate an edge like Santiago flight Arica with the company (or companies)
offering that route. But we could add a new node denoting a flight, connect it with the source,
destination, companies, and mode, as shown in Figure 3. Applying this modelling to all routes in
Figure 1 would, however, involve a significant change to the graph. Another option might be to put
the flights of different companies in different named graphs, but if named graphs are already being
used to track the source of graphs (for example), this could become cumbersome.
The property graph model was thus proposed to offer additional flexibility when modelling
data as a graph [16, 354]. A property graph allows a set of property–value pairs and a label to be
associated with both nodes and edges. Figure 4 shows a concise example of a property graph with
data analogous to Figure 3 (for a formal definition of a property graph, we refer to Definition B.5 in
Appendix B). This time we use property–value pairs on edges to model the companies3 . The type
of relation is captured by the label flight. We further use node labels to indicate the types of the
two nodes, and use property–value pairs to indicate their latitude and longitude.
Property graphs are most prominently used in popular graph databases, such as Neo4j [16, 354].
In choosing between graph models, it is important to note that property graphs can be translated
to/from directed edge-labelled graphs without loss of information [18, 235] (per, e.g., Figure 4). In
summary, directed-edge labelled graphs offer a more minimal model, while property graphs offer a
more flexible one. Often the choice of model will be secondary to other practical factors, such as
the implementations available for different models, etc.
2.1.4 Graph dataset. Although multiple directed edge-labelled graphs can be merged by taking
their union, it is often desirable to manage several graphs rather than one monolithic graph; for
example, it may be beneficial to manage multiple graphs from different sources, making it possible to
update or refine data from one source, to distinguish untrustworthy sources from more trustworthy
ones, and so forth. A graph dataset then consists of a set of named graphs and a default graph. Each
3 In practical implementations of property graphs, properties with multiple values may be expressed, for example, as a single

array value. Such issues do not, however, affect expressivity, nor our discussion.

7
2018-03-29 20:00 Ñam Open Market Food Truck

end name type type name

2018-03-22 12:00 start EID15 type Food Festival type EID16

venue type venue venue

Santa Lucía Drinks Festival Sotomayor Piscina Olímpica

city city city

Santiago Viña del Mar Arica


Events

bus
Santiago Viña del Mar bus Arica
flight
flight
Routes

Routes updated 2018-04-03 Events updated 2018-06-14


Default

Fig. 5. Graph dataset with two named graphs and a default graph describing events and routes

named graph is a pair of a graph ID and a graph. The default graph is a graph without an ID, and is
referenced “by default” if a graph ID is not specified. Figure 5 provides an example where events
and routes are stored in two named graphs, and the default graph manages meta-data about the
named graphs (for a formal definition of a graph dataset, see Definition B.7 in Appendix B). Graph
names can also be used as nodes in a graph. Furthermore, nodes and edges can be repeated across
graphs, where the same node in different graphs will typically refer to the same entity, allowing
data on that entity to be integrated when merging graphs. Though the example illustrates a dataset
of directed edge-labelled graphs, the concept generalises straightforwardly to other types of graphs.
A prominent use-case for graph datasets is to manage and query Linked Data composed of
interlinked documents of RDF graphs spanning the Web. When dealing with Web data, tracking
the source of data becomes of key importance [58, 130, 583]. We will discuss Linked Data later in
Section 3.2 and further discuss provenance in Section 3.3.

2.1.5 Other graph data models. The previous models are popular examples of graph representations.
Other graph data models exist with complex nodes that may contain individual edges [17, 222] or
nested graphs [17, 42] (sometimes called hypernodes [315]). Likewise the mathematical notion of
a hypergraph defines complex edges that connect sets rather than pairs of nodes. In our view, a
knowledge graph can adopt any such graph data model based on nodes and edges: often data can be
converted from one model to another (see Figure 3 vs. Figure 4). In the rest of the paper, we prefer
discussing directed-edge labelled graphs given their relative succinctness, but most discussion
extends naturally to other models.

8
2.1.6 Graph stores. A variety of techniques have been proposed for storing and indexing graphs,
facilitating the efficient evaluation of queries (as discussed next). Directed-edge labelled graphs can
be stored in relational databases either as a single relation of arity three (triple table), as a binary
relation for each property (vertical partitioning), or as 𝑛-ary relations for entities of a given type
(property tables) [560]. Custom storage techniques have also been developed for a variety of graph
models, providing efficient access for finding nodes, edges and their adjacent elements [17, 354, 560].
A number of systems further allow for distributing graphs over multiple machines based on popular
NoSQL stores or custom partitioning schemes [267, 560]. For further details we refer to the book
chapter by Janke and Staab [267] and the survey by Wylot et al. [560] dedicated to this topic.

2.2 Querying
A number of practical languages have been proposed for querying graphs [16], including the
SPARQL query language for RDF graphs [217]; and Cypher [165], Gremlin [445], and G-CORE [15]
for querying property graphs.4 Underlying these query languages are some common primitives,
including (basic) graph patterns, relational operators, path expressions, and more besides [16]. We
now describe these core features for querying graphs in turn, starting with graph patterns.
2.2.1 Graph patterns. At the core of every structured query language for graphs are (basic) graph
patterns [16, 100], which follow the same model as the data graph being queried (see Section 2.1),
additionally allowing variables as terms.5 Terms in graph patterns are thus divided into constants,
such as Arica or venue, and variables, which we prefix with question marks, such as ?event or ?rel. A
graph pattern is then evaluated against the data graph by generating mappings from the variables
of the graph pattern to constants in the data graph such that the image of the graph pattern under
the mapping (replacing variables with the assigned constants) is contained within the data graph.
In Figure 6, we provide an example of a graph pattern looking for the venues of Food Festivals,
along with the possible mappings generated by the graph pattern against the data graph of Figure 1.
In some of the presented mappings (the last two listed), multiple variables are mapped to the
same term, which may or may not be desirable depending on the application. Hence a number
of semantics have been proposed for evaluating graph patterns [16], amongst which the most
important are: homomorphism-based semantics, which allows multiple variables to be mapped to the
same term such that all mappings shown in Figure 6 would be considered results; and isomorphism-
based semantics, which requires variables on nodes and/or edges to be mapped to unique terms,
thus excluding the latter three mappings of Figure 6 from the results. Different practical languages
adopt different semantics for evaluating graph patterns where, for example, SPARQL adopts a
homomorphism-based semantics, while Cypher adopts an isomorphism-based semantics on edges.
As we will see in later examples (particularly Figure 8), graph patterns may also form cycles
(be they directed or undirected), and may replace edge labels with variables. Graph patterns in
the context of other models – such as property graphs – can be defined analogously by allowing
variables to replace terms in any position of the model. We provide a formalisation of graph patterns
and their evaluation for both directed edge-labelled graphs and property graphs in Appendix B.2.1.
2.2.2 Complex graph patterns. A graph pattern transforms an input graph into a table of results (as
shown in Figure 6). We may then consider using the relational algebra to combine and/or transform
such tables, thus forming more complex queries from one or more graph patterns. Recall that the
relational algebra consists of unary operators that accept one input table, and binary operators

4 Thepopularity of these languages is investigated by Seifer et al. [470].


5 Theterms of a directed edge-labelled graph are its nodes and edge-labels. The terms of a property graph are its ids, labels,
properties, and values (as used on either edges or nodes).

9
?vn1 ?ev ?vn1 ?vn2

venue EID16 Piscina Olímpica Sotomayor


Food Festival type ?ev EID16 Sotomayor Piscina Olímpica
venue EID16 Piscina Olímpica Piscina Olímpica
EID16 Sotomayor Sotomayor
?vn2 EID15 Santa Lucía Santa Lucía

Fig. 6. Graph pattern (left) with mappings generated over the graph of Figure 1 (right)

that accept two input tables. Unary operators include projection (𝜋) to output a subset of columns,
selection (𝜎) to output a subset of rows matching a given condition, and renaming of columns (𝜌).
Binary operators include union (∪) to merge the rows of two tables into one table, difference (−)
to remove the rows from the first table present in the second table, and joins (Z) to extend the
rows of one table with rows from the other table that satisfy a join condition. Selection and join
conditions typically include equalities (=), inequalities (≤), negation (¬), disjunction (∨), etc. From
these operators, we can further define other (syntactic) operators, such as intersection (∩) to output
rows in both tables, anti-join (▷, aka not exists) to output rows from the first table for which there
are no join-compatible rows in the second table, left-join ( Z, aka optional) to perform a join but
keeping rows from the first table without a compatible row in the second table, etc.
Graph patterns can then be expressed in a subset of relational algebra (namely 𝜋, 𝜎, 𝜌, Z).
Assuming, for example, a single ternary relation 𝐺 (𝑠, 𝑝, 𝑜) representing a graph – i.e., a table 𝐺
with three columns 𝑠, 𝑝, 𝑜 – the query of Figure 6 can be expressed in relational algebra as:
𝜋𝑒𝑣,𝑣𝑛1,𝑣𝑛2 (𝜎𝑝=type∧𝑜=Food Festival∧𝑝 1 =𝑝 2 =venue (𝜌𝑠/𝑒𝑣 (𝐺 ⊲⊳ 𝜌 𝑝/𝑝 1,𝑜/𝑣𝑛1 (𝐺) ⊲⊳ 𝜌 𝑝/𝑝 2,𝑜/𝑣𝑛2 (𝐺))))
where Z denotes a natural join, meaning that equality is checked across pairs of columns with the
same name in both tables (here, the join is thus performed on the subject column 𝑠). The result of
this query is a table with a column for each variable: 𝑒𝑣, 𝑣𝑛1, 𝑣𝑛2. However, not all queries using 𝜋,
𝜎, 𝜌 and Z on 𝐺 can be expressed as graph patterns; for example, we cannot choose which variables
to project in a graph pattern, but rather must project all variables not fixed to a constant.
Graph query languages such as SPARQL [217] and Cypher [165] allow the full use of relational
operators over the results of graph patterns, giving rise to complex graph patterns [16]. Figure 7
presents an example of a complex graph pattern with projected variables in bold, choosing particular
variables to appear in the final results. In terms of expressivity, graph patterns with (unrestricted)
projection of this form equate to conjunctive queries on graphs. In Figure 8, we give another example
of a complex graph pattern looking for food festivals or drinks festivals not held in Santiago,
optionally returning their start date and name (where available). Such queries – allowing the full
use of relational operators on top of graph patterns – equate to first-order queries on graphs. In
Appendix B.2.2, we formalise complex graph patterns and their evaluation over data graphs.
Complex graph patterns can give rise to duplicate results; for example, the first result in Figure 7
appears twice since ?city1 matches Arica and ?city2 matches Viña del Mar in one result,
and vice-versa in the other. Query languages then offer two semantics: bag semantics preserves
duplicates according to the multiplicity of the underlying mappings, while set semantics (typically
invoked with a DISTINCT keyword) removes duplicates from the results.
2.2.3 Navigational graph patterns. A key feature that distinguishes graph query languages is the
ability to include path expressions in queries. A path expression 𝑟 is a regular expression that
allows matching arbitrary-length paths between two nodes, which is expressed as a regular path
query (𝑥, 𝑟, 𝑦), where 𝑥 and 𝑦 can be variables or constants (or even the same term). The base path

10
?event1 venue ?ven1 city ?city1 ?name1 ?con ?name2
name Food Truck bus Food Truck
type
Food Truck bus Food Truck
?name1 Food Truck bus Ñam
Food Festival ?con Food Truck flight Ñam
?name2 Food Truck flight Ñam
type Ñam bus Food Truck
name
Ñam flight Food Truck
?event2 venue ?ven2 city ?city2 Ñam flight Food Truck

Fig. 7. Conjunctive query (left) with mappings generated over the graph of Figure 1 (right)

𝑄 1 : ?event type Food Festival 𝑄 2 : ?event type Drinks Festival

𝑄 3 : ?event venue ?ven city Santiago 𝑄 4 : ?event start ?start 𝑄 5 : ?event name ?name

?event ?start ?name


𝑄 := ((((𝑄 1 ∪ 𝑄 2 ) ▷ 𝑄 3 ) Z 𝑄 4 ) Z 𝑄 5 ), 𝑄 (𝐺) =
EID16 Food Truck

Fig. 8. Complex graph pattern (𝑄) with mappings generated (𝑄 (𝐺)) over the graph of Figure 1 (𝐺)

1 Arica 2 Arica bus Viña del Mar 3 Arica bus Viña del Mar bus Arica

4 Arica bus Viña del Mar bus Arica bus Viña del Mar ...

Fig. 9. Some possible paths matching (Arica, bus*, ?city) over the graph of Figure 1

expression is where 𝑟 is a constant (an edge label). Furthermore if 𝑟 is a path expression, then 𝑟 −
(inverse)6 and 𝑟 ∗ (Kleene star: zero-or-more) are also path expressions. Finally, if 𝑟 1 and 𝑟 2 are path
expressions, then 𝑟 1 | 𝑟 2 (disjunction) and 𝑟 1 · 𝑟 2 (concatenation) are also path expressions.
Regular path queries can then be evaluated under a number of different semantics. For example,
(Arica, bus*, ?city) evaluated against the graph of Figure 1 may match the paths in Figure 9. In
fact, since a cycle is present, an infinite number of paths are potentially matched. For this reason,
restricted semantics are often applied, returning only the shortest paths, or paths without repeated
nodes or edges (as in the case of Cypher).7 Rather than returning paths, another option is to instead
return the (finite) set of pairs of nodes connected by a matching path (as in the case of SPARQL 1.1).
Regular path queries can then be used in graph patterns to express navigational graph pat-
terns [16], as shown in Figure 10, which illustrates a query searching for food festivals in cities
reachable (recursively) from Arica by bus or flight. Furthermore, when regular path queries and
graph patterns are combined with operators such as projection, selection, union, difference, and
optional, the result is known as complex navigational graph patterns [16]. Appendix B.2.3 provides
definitions for (complex) navigational graph patterns and their evaluation.

6 Some authors distinguish 2-way regular path queries from regular path queries, where only the former supports inverses.
7 Mapping variables to paths requires special treatment [16]. Cypher [165] returns a string that encodes a path, upon which
certain functions such as length(·) can be applied. G-CORE [15], on the other hand, allows for returning paths, and
supports additional operators on them, including projecting them as graphs, applying cost functions, and more besides.

11
Food Festival type ?event name ?name
?event ?name ?city
(venue · city)− EID15 Ñam Santiago
EID16 Food Truck Arica
Arica (bus | flight)* ?city EID16 Food Truck Viña del Mar

Fig. 10. Navigational graph pattern (left) with mappings generated over the graph of Figure 1 (right)

2.2.4 Other features. Thus far we have discussed features that form the practical and theoretical
foundation of any query language for graphs [16]. However, specific query languages for graphs
may support other practical features, such as aggregation (GROUP BY, COUNT, etc.), more complex
filters and datatype operators (e.g., range queries on years extracted from a date), federation for
querying remotely hosted graphs over the Web, languages for updating graphs, support for semantic
entailment regimes, etc. For more information, we refer to the documentation of the respective
query languages (e.g., [15, 217]) and to the survey by Angles et al. [16].

3 SCHEMA, IDENTITY, CONTEXT


In this section we describe various enhancements and extensions of the data graph – relating to
schema, identity and context – that provide additional structures for accumulating knowledge.
Henceforth, we refer to a data graph as a collection of data represented as nodes and edges using
one of the models discussed in Section 2. We refer to a knowledge graph as a data graph potentially
enhanced with representations of schema, identity, context, ontologies and/or rules. These additional
representations may be embedded in the data graph, or layered above it. Representations for schema,
identity and context are discussed herein, while ontologies and rules will be discussed in Section 4.

3.1 Schema
One of the benefits of modelling data as graphs – versus, for example, the relational model – is the
option to forgo or postpone the definition of a schema. However, when modelling data as graphs,
schemata can be used to prescribe a high-level structure and/or semantics that the graph follows or
should follow. We discuss three types of graph schemata: semantic, validating, and emergent.
3.1.1 Semantic schema. A semantic schema allows for defining the meaning of high-level terms
(aka vocabulary or terminology) used in the graph, which facilitates reasoning over graphs using
those terms. Looking at Figure 1, for example, we may notice some natural groupings of nodes
based on the types of entities to which they refer. We may thus decide to define classes to denote
these groupings, such as Event, City, etc. In fact, Figure 1 already illustrates three low-level classes
– Open Market, Food Market, Drinks Festival – grouping similar entities with an edge labelled
type. We may subsequently observe some natural relations between some of these classes that
we would like to capture. In Figure 11, we present a class hierarchy for events where children are
defined to be subclasses of their parents such that if we find an edge EID15 type Food Festival in
our graph, we may also infer that EID15 type Festival and EID15 type Event .
Aside from classes, we may also wish to define the semantics of edge labels, aka properties.
Returning to Figure 1, we may consider that the properties city and venue are sub-properties of a
more general property location, such that given an edge Santa Lucía city Santiago , for example,
we may also infer that Santa Lucía location Santiago . We may also consider, for example, that bus and
flight are both sub-properties of a more general property connects to. As such, properties may
also form a hierarchy. We may further define the domain of properties, indicating the class(es) of
entities for nodes from which edges with that property extend; for example, we may define that

12
Event

Festival Periodic Market ...

Food Festival Drinks Festival Music Festival ... Open Market Closed Market

Fig. 11. Example class hierarchy for Event

Table 2. Definitions for sub-class, sub-property, domain and range features in semantic schemata
Feature Definition Condition Example
Subclass 𝑐 subc. of 𝑑 𝑥 type 𝑐 implies 𝑥 type 𝑑 City subc. of Place

Subproperty 𝑝 subp. of 𝑞 𝑥 𝑝 𝑦 implies 𝑥 𝑞 𝑦 venue subp. of location

Domain 𝑝 domain 𝑐 𝑥 𝑝 𝑦 implies 𝑥 type 𝑐 venue domain Event

Range 𝑝 range 𝑐 𝑥 𝑝 𝑦 implies 𝑦 type 𝑐 venue range Venue

the domain of connects to is a class Place, such that given the previous sub-property relations,
we could conclude that Arica type Place . Conversely, we may define the range of properties,
indicating the class(es) of entities for nodes to which edges with that property extend; for example,
we may define that the range of city is a class City, inferring that Arica type City .
A prominent standard for defining a semantic schema for (RDF) graphs is the RDF Schema (RDFS)
standard [70], which allows for defining subclasses, subproperties, domains, and ranges amongst
the classes and properties used in an RDF graph, where such definitions can be serialised as a
graph. We illustrate the semantics of these features in Table 2 and provide a concrete example of
definitions in Figure 12 for a sample of terms used in the running example. These definitions can
then be embedded into a data graph. More generally, the semantics of terms used in a graph can be
defined in much more depth than seen here, as is supported by the Web Ontology Language (OWL)
standard [239] for RDF graphs. We will return to such semantics later in Section 4.
Semantic schema are typically defined for incomplete graph data, where the absence of an edge
between two nodes, such as Viña del Mar flight Arica , does not mean that the relation does not
hold in the real world. Therefore, from the graph of Figure 1, we cannot assume that there is no
flight between Viña del Mar and Arica. In contrast, if the Closed World Assumption (CWA) were
adopted – as is the case in many classical database systems – it would be assumed that the data
graph is a complete description of the world, thus allowing to assert with certainty that no flight
exists between the two cities. Systems that do not adopt the CWA are said to adopt the Open World
Assumption (OWA). A consequence of CWA is that the addition of an edge to the data graph may
contradict what was previously assumed to be false (due to missing information), whereas with
OWA, a statement that is proven false continues to be false with the addition of more edges.
Considering our running example, it would be unreasonable to assume that the tourism organi-
sation has complete knowledge of everything describable in its knowledge graph. However, it is
inconvenient if a system is unable to definitely answer “yes” or “no” to questions such as “is there
a flight between Arica and Viña del Mar?”, especially when the organisation is certain that it has
complete knowledge of the flights. A compromise between OWA and CWA is the Local Closed
World Assumption (LCWA), where portions of the data graph are assumed to be complete.

13
Event location range Place
domain domain
subc. of subc. of subp. of subp. of subc. of subc. of

Festival Periodic Market venue city range City Venue


range

Fig. 12. Example schema graph describing sub-classes, sub-properties, domains, and ranges

3.1.2 Validating schema. When graphs are used to represent diverse, incomplete data at large-scale,
the OWA is the most appropriate choice for a default semantics. But in some scenarios, we may
wish to guarantee that our data graph – or specific parts thereof – are in some sense “complete”.
Returning to Figure 1, for example, we may wish to ensure that all events have at least a name, a
venue, a start date, and an end date, such that applications using the data – e.g., one that sends event
notifications to users – can ensure that they have the minimal information required. Furthermore,
we may wish to ensure that the city of an event is stated to be a city (rather than inferring that it is
a city). We can define such constraints in a validating schema and validate the data graph with
respect to the resulting schema, listing constraint violations (if any). Thus while semantic schemata
allow for inferring new graph data, validating schemata allow for validating existing graph data.
A standard way to define a validating schema for graphs is using shapes [296, 306, 423]. A shape
targets a set of nodes in a data graph and specifies constraints on those nodes. The shape’s target
can be defined in many ways, such as targetting all instances of a class, the domain or range of a
property, the result of a query, nodes connected to the target of another shape by a given property,
etc. Constraints can then be defined on the targetted nodes, such as to restrict the number or types
of values taken on a given property. A shapes graph is formed from a set of interrelated shapes.
Shapes graphs can be depicted as UML-like class diagrams, where Figure 13 illustrates an example
of a shapes graph based on Figure 1, defining constraints on four interrelated shapes. Each shape –
denoted with a box like Place , Event , etc. – is associated with a set of constraints. Nodes conform
to a shape if and only if they satisfy all constraints defined on the shape. Inside each shape box
are placed constraints on the number (e.g., [1..*] denotes one-to-many, [1..1] denotes precisely
one, etc.) and types (e.g., string, dateTime, etc.) of nodes that conforming nodes can relate to with
a property (e.g., name, start, etc.). Another option is to place constraints on the number of nodes
conforming to a particular shape that the conforming node can relate to with a property (thus
venue
generating edges between shapes); for example, Event 1..* Venue denotes that conforming
nodes for Event must relate to at least one node with the property venue that conforms to the
Venue shape. Shapes can inherit the constraints of parent shapes – denoted with an △ connector –
as in the case of City and Venue , whose conforming nodes must also conform to the Place shape.
Given a shape and a targetted node, it is possible to check if the node conforms to that shape
or not, which may require checking conformance of other nodes; for example, the node EID15
conforms to the Event shape not only based on its local properties, but also based on conformance
of Santa Lucía to Venue and Santiago to City . Conformance dependencies may also be recursive,
where the conformance of Santiago to City requires that it conforms to Place , which requires that
Viña del Mar and Arica conform to Place , and so on. Conversely, EID16 does not conform to Event , as
it does not have the start and end properties required by the example shapes graph.
When declaring shapes, the data modeller may not know in advance the entire set of prop-
erties that some nodes can have. An open shape allows the node to have additional proper-
ties not specified by the shape, while a closed shape does not. For example, if we add the edge

14
Place
flight 0..* lat: float [0..1] 0..* bus
long: float [0..1]

Event

name: string [1..∗] venue Venue city City


start: dateTime [1..1] 1..* 0..1
indoor: boolean [0..1] population: int ∧ >5000 [0..1]
end: dateTime [1..1]
type: any [1..∗]

Fig. 13. Example shapes graph depicted as a UML-like diagram

Santiago founder Pedro de Valdivia to the graph represented in Figure 1, then Santiago only conforms to
the City shape if that shape is defined as open (since the shape does not mention founder).
Practical languages for shapes often support additional boolean features, such as conjunction
(and), disjunction (or), and negation (not) of shapes; for example, we may say that all the values
of venue should conform to the shape Venue and (not City) , making explicit that venues in the data
graph should not be directly given as cities. However, shapes languages that freely combine recur-
sion and negation may lead to semantic problems, depending on how their semantics are defined.
To illustrate, consider the following case inspired by the barber paradox [306], involving a shape
Barber whose conforming nodes shave at least one node conforming to Person and (not Barber) .
Now, given (only) Bob shave Bob with Bob conforming to Person , does Bob conform to Barber ? If
yes – if Bob conforms to Barber – then Bob violates the constraint by not shaving at least one node
conforming to Person and (not Barber) . If no – if Bob does not conform to Barber – then Bob satisfies
the Barber constraint by shaving such a node. Semantics to avoid such paradoxical situations have
been proposed based on stratification [61], partial assignments [104], and stable models [177].
Although validating schemata and semantic schemata serve different purposes, they can comple-
ment each other. In particular, a validating schema can take into consideration a semantic schema,
such that, for example, validation is applied on the data graph including inferences. Taking the class
hierarchy of Figure 11 and the shapes graph of Figure 13, for example, we may define the target of
the Event shape as the nodes that are of type Event (the class). If we first apply inferencing with
respect to the class hierarchy of the semantic schema, the Event shape would now target EID15 and
EID16 . The presence of a semantic schema may, however, require adapting the validating schema.
Taking into account, for example, the aforementioned class hierarchy would require defining a
relaxed cardinality on the type property. Open shapes may also be preferred in such cases rather
than enumerating constraints on all possible properties that may be inferred on a node.
We provide high-level definitions for shapes and related concepts in Appendix B.3.2. Two shapes
languages have recently emerged for RDF graphs: Shape Expressions (ShEx), published as a W3C
Community Group Report [423]; and SHACL (Shapes Constraint Language), published as a W3C
Recommendation [296]. These languages support the discussed features (and more) and have been
adopted for validating graphs in a number of domains relating to health-care [521], scientific
literature [216], spatial data [83], amongst others. More details about ShEx and SHACL can be
found in the book by Labra Gayo et al. [306]. A recently proposed language that can be used as a
common basis for both ShEx and SHACL reveals their similarities and differences [305]. A similar
notion of schema has been proposed by Angles [14] for property graphs.

15
Food Festival Santa Lucía Santiago
Ñam
Open Market Sotomayor city Viña de Mar
Food Truck
Drinks Festival Piscina Olímpica Arica

name bus flight


type venue

2020-03-22 12:00 start EID15


2020-03-29 20:00 end EID16

Fig. 14. Example quotient graph simulating the data graph in Figure 1

EID16

type name venue

Food Festival Santa Lucía Santiago


Ñam
Open Market Sotomayor city Viña de Mar
Food Truck
Drinks Festival Piscina Olímpica Arica

type name venue bus flight


2020-03-22 12:00 start
EID15
2020-03-29 20:00 end

Fig. 15. Example quotient graph bisimilar with the data graph in Figure 1

3.1.3 Emergent schema. Both semantic and validating schemata require a domain expert to explic-
itly specify definitions and constraints. However, a data graph will often exhibit latent structures
that can be automatically extracted as an emergent schema [413] (aka graph summary [84, 322, 492]).
A framework often used for defining emergent schema is that of quotient graphs, which partition
groups of nodes in the data graph according to some equivalence relation while preserving some
structural properties of the graph. Taking Figure 1, we can intuitively distinguish different types
of nodes based on their context, such as event nodes, which link to venue nodes, which in turn
link to city nodes, and so forth. In order to describe the structure of the graph, we could consider
six partitions of nodes: event, name, venue, class, date-time, city. In practice, these partitions may
be computed based on the class or shape of the node. Merging the nodes of each partition into
one node while preserving edges leads to the quotient graph shown in Figure 14: the nodes of this
quotient graph are the partitions of nodes from the data graph and the edge 𝑋 𝑦 𝑍 is in the
quotient graph if and only if there exists 𝑥 ∈ 𝑋 and 𝑧 ∈ 𝑍 such that 𝑥 𝑦 𝑧 is in the data graph.
There are many ways in which quotient graphs may be defined, depending not only on how nodes
are partitioned, but also how the edges are defined. Different quotient graphs may provide different
guarantees with respect to the structure they preserve. Formally, we can say that every quotient
graph simulates its input graph (based on the simulation relation of set membership between data
nodes and quotient nodes), meaning that for all 𝑥 ∈ 𝑋 with 𝑥 an input node and 𝑋 a quotient node,
if 𝑥 𝑦 𝑧 is an edge in the data graph, then there must exist an edge 𝑋 𝑦 𝑍 in the quotient
graph such that 𝑧 ∈ 𝑍 ; for example, the quotient graph of Figure 14 simulates the data graph of
Figure 1. However, this quotient graph seems to suggest (for instance) that EID16 would have a
start and end date in the data graph when this is not the case. A stronger notion of structural
preservation is given by bisimilarity, which in this case would further require that if 𝑋 𝑦 𝑍 is
an edge in the quotient graph, then for all 𝑥 ∈ 𝑋 , there must exist a 𝑧 ∈ 𝑍 such that 𝑥 𝑦 𝑧 is in
the data graph; this is not satisfied by EID16 in the quotient graph of Figure 14, which does not have

16
an outgoing edge labelled start or end in the original data graph. Figure 15 illustrates a bisimilar
version of the quotient graph, splitting the event partition into two nodes reflecting their different
outgoing edges. An interesting property of bisimilarity is that it preserves forward-directed paths:
given a path expression 𝑟 without inverses and two bisimilar graphs, 𝑟 will match a path in one
graph if and only if it matches a corresponding path in the other bisimilar graph. One can verify,
for example, that a path matches 𝑥 city·(flight|bus)* 𝑧 in Figure 1 if and only if there is a path
matching 𝑋 city·(flight|bus)* 𝑍 in Figure 15 such that 𝑥 ∈ 𝑋 and 𝑧 ∈ 𝑍 .
There are many ways in which quotient graphs may be defined, depending on the equivalence
relation that partitions nodes. Furthermore, there are many ways in which other similar or bisimilar
graphs can be defined, depending on the (bi)simulation relation that preserves the data graph’s
structure [84]. We provide formal definitions for the notions of quotient graphs, simulation and
bisimulation in Appendix B.3.3. Such techniques aim to summarise the data graph into a higher-level
topology. In order to reduce the memory overhead of the quotient graph, in practice, nodes may
rather be labelled with the cardinality of the partition and/or a high-level label (e.g., event, city) for
the partition rather than storing the labels of all nodes in the partition.
Various other forms of emergent schema not based on a quotient graph framework have also been
proposed; examples include emergent schemata based on relational tables [413], formal concept
analysis [196], and so forth. Emergent schemata may be used to provide a human-understandable
overview of the data graph, to aid with the definition of a semantic or validating schema, to optimise
the indexing and querying of the graph, to guide the integration of data graphs, and so forth. We
refer to the survey by Čebirić et al. [84] for further details.

3.2 Identity
In Figure 1, we use nodes like Santiago , but to which Santiago does this node refer? Do we refer to
Santiago de Chile, Santiago de Cuba, Santiago de Compostela, or do we perhaps refer to the indie
rock band Santiago? Based on edges such as Santa Lucía city Santiago , we may deduce that it is
one of the three cities mentioned (not the rock band), and based on the fact that the graph describes
tourist attractions in Chile, we may further deduce that it refers to Santiago de Chile. Without
further details, however, disambiguating nodes of this form may rely on heuristics prone to error
in more difficult cases. To help avoid such ambiguity, first we may use globally-unique identifiers
to avoid naming clashes when the knowledge graph is extended with external data, and second we
may add external identity links to disambiguate a node with respect to an external source.

3.2.1 Persistent identifiers. Assume we wished to compare tourism in Chile and Cuba, and we have
acquired an appropriate knowledge graph for Cuba. Part of the benefit of using graphs to model
data is that we can merge two graphs by taking their union. However, as shown in Figure 16, using
an ambiguous node like Santiago may result in a naming clash: the node is referring to two different
real-world cities in both graphs, where the merged graph indicates that Santiago is a city in both
Chile and Cuba (rather than two different cities).8 To avoid such clashes, long-lasting persistent
identifiers (PIDs) [213] can be created in order to uniquely identify an entity. Prominent examples of
PID schemes include Digital Object Identifiers (DOIs) for papers, ORCID iDs for authors, International
Standard Book Numbers (ISBNs) for books, Alpha-2 codes for counties, and more besides.
In the context of the Semantic Web, the RDF data model goes one step further and recommends
that global Web identifiers be used for nodes and edge labels. However, rather than adopt the
Uniform Resource Locators (URLs) used to identify the location of information resources such as

8 Such a naming clash is not unique to graphs, but could also occur if merging tables, trees, etc.

17
Santa Lucía Santa Ifigenia Santa Lucía Santa Ifigenia

city city city city

Santiago Santiago Santiago

country ... country ... country ... country

Chile ... Cuba ... Chile ... Cuba

Chile Cuba Chile ∪ Cuba

Fig. 16. Result of merging two graphs with ambiguous local identifiers

webpages, RDF 1.1 proposes to use Internationalised Resource Identifiers (IRIs) to identify non-
information resources such as cities or events.9 Hence, for example, in the RDF representation of the
Wikidata [543] – a knowledge graph proposed to complement Wikipedia, discussed in more detail in
Section 10 – while the URL https://www.wikidata.org/wiki/Q2887 refers to a webpage that can be loaded in a
browser providing human-readable meta-data about Santiago, the IRI http://www.wikidata.org/entity/Q2887
refers to the city itself. Distinguishing the identifiers for both resources (the webpage and the city
itself) avoids naming clashes; for example, if we use the URL to identify both the webpage and the
city, we may end up with an edge in our graph, such as (with readable labels below the edge):
http://www.wikidata.org/wiki/Q2887 http://www.wikidata.org/wiki/Property:P112 https://www.wikidata.org/wiki/Q203534
[Santiago (URL)] [founded by (URL)] [Pedro de Valdivia (URL)]

Such an edge leaves ambiguity: was Pedro de Valdivia the founder of the webpage, or the city?
Using IRIs for entities distinct from the URLs for the webpages that describe them avoids such
ambiguous cases, where Wikidata thus rather defines the previous edge as follows:
http://www.wikidata.org/entity/Q2887 http://www.wikidata.org/prop/direct/P112 http://www.wikidata.org/entity/Q203534
[Santiago (IRI)] [founded by (IRI)] [Pedro de Valdivia (IRI)]

using IRIs for the city, person, and founder of, distinct from the webpages describing them. These
Wikidata identifiers use the prefix http://www.wikidata.org/entity/ for entities and the prefix
http://www.wikidata.org/prop/direct/ for relations. Such prefixes are known as namespaces, and
are often abbreviated with prefix strings, such as wd: or wdt:, where the latter triple can then be
written more concisely using such abbreviations as wd:Q2887 wdt:P112 wd:Q203534 .
If HTTP IRIs are used to identify the graph’s entities, when the IRI is looked up (via HTTP),
the web-server can return (or redirect to) a description of that entity in formats such as RDF. This
further enables RDF graphs to link to related entities described in external RDF graphs over the Web,
giving rise to Linked Data [41, 226] (discussed in Section 9). Though HTTP IRIs offer a flexible and
powerful mechanism for issuing global identifiers on the Web, they are not necessarily persistent:
websites may go offline, the resources described at a given location may change, etc. In order to
enhance the persistence of such identifiers, Persistent URL (PURL) services offer redirects from
a central server to a particular location, where the PURL can be redirected to a new location if
necessary, changing the address of a document without changing its identifier. The persistence of
HTTP IRIs can then be improved by using namespaces defined through PURL services.
9 Uniform Resource Identifiers (URIs) can be Uniform Resource Locators (URLs), used to locate information resources, and
Uniform Resource Names (URNs), used to name non-information resources. Internationalised Resource Identifiers (IRIs) are
URIs that allow Unicode. For example, http://example.com/Ñam is an IRI, but not a URI, due to the use of “Ñ”. Percentage
encoding – http://example.com/%C3%91am – can encode an IRI as a URI (but reduces readability).

18
3.2.2 External identity links. Assume that the tourist board opts to define the chile: namespace
with an IRI such as http://turismo.cl/entity/ on a web-server that they control, allowing
nodes such as chile:Santiago – a shortcut for the IRI http://turismo.cl/entity/Santiago – to be looked up over
the Web. While using such a naming scheme helps to avoid naming clashes, the use of IRIs does not
necessarily help ground the identity of a resource. For example, an external geographic knowledge
graph may assign the same city the IRI geo:SantiagoDeChile in their own namespace, where we have no
direct way of knowing that the two identifiers refer to the same city. If we merge the two knowledge
graphs, we will end up with two distinct nodes for the same city.
There are a number of ways to ground the identity of an entity. The first is to associate the entity
with uniquely-identifying information in the graph, such as its geo-coordinates, its postal code, the
year it was founded, etc. Each additional piece of information removes ambiguity as to which city
is being referred, providing (for example) more options for matching the city with its analogue in
external sources. A second option is to use identity links to state that a local entity has the same
identity as another coreferent entity found in an external source; an instantiation of this concept
can be found in the OWL standard, which defines the owl:sameAs property relating coreferent
entities. Using this property, we could state the edge chile:Santiago owl:sameAs geo:SantiagoDeChile in
our RDF graph, thus establishing an identity link between the corresponding nodes in both graphs.
The semantics of owl:sameAs defined by the OWL standard then allow us to combine the data for
both nodes. Such semantics will be discussed later in Section 4. Ways in which identity links can
be computed will also be discussed later in Section 8.
3.2.3 Datatypes. Consider the two date-times on the left of Figure 1: how should we assign these
nodes persistent/global identifiers? Intuitively it would not make sense, for example, to assign IRIs
to these nodes since their syntactic form tells us what they refer to: specific dates and times in
March 2020. This syntactic form is further recognisable by machine, meaning that with appropriate
software, we could order such values in ascending or descending order, extract the year, etc.
Most practical data models for graphs allow for defining nodes that are datatype values. RDF
utilises XML Schema Datatypes (XSD) [411], amongst others, where a datatype node is given as a
pair (𝑙, 𝑑) where 𝑙 is a lexical string, such as "2020-03-29T20:00:00", and 𝑑 is an IRI denoting the
datatype, such as xsd:dateTime. The node is then denoted "2020-03-29T20:00:00"^^xsd:dateTime . Datatype
nodes in RDF are called literals and are not allowed to have outgoing edges. Other datatypes
commonly used in RDF data include xsd:string, xsd:integer, xsd:decimal, xsd:boolean, etc.
In case that the datatype is omitted, the value is assumed to be of type xsd:string. Applications
built on top of RDF can then recognise these datatypes, parse them into datatype objects, and apply
equality checks, normalisation, ordering, transformations, casting, according to their standard
definition. In the context of property graphs, Neo4j [354] also defines a set of internal datatypes on
property values that includes numbers, strings, booleans, spatial points, and temporal values.
3.2.4 Lexicalisation. Global identifiers for entities will sometimes have a human-interpretable
form, such as chile:Santiago , but the identifier strings themselves do not carry any formal semantic
significance. In other cases, the identifiers used may not be human-interpretable by design. In
Wikidata, for instance, Santiago de Chile is identified as wd:Q2887 , where such a scheme has the
advantage of providing better persistence and of not being biased to a particular human language.
For example, the Wikidata identifier for Eswatini ( wd:Q1050 ) was not affected when the country
changed its name from Swaziland, and does not necessitate choosing between languages for creating
(more readable) IRIs such as wd:Eswatini (English), wd:eSwatini (Swazi), wd:Esuatini (Spanish), etc.
Since identifiers can be arbitrary, it is common to add edges that provide a human-interpretable
label for nodes, such as wd:Q2887 rdfs:label “Santiago” , indicating how people may refer to the subject
node linguistically. Linguistic information of this form plays an important role in grounding

19
chile:Chile chile:peaks rdf:rest rdf:rest rdf:rest rdf:nil

rdf:first rdf:first rdf:first

chile:OjosDelSalado chile:NevadoTresCruces chile:Llullaillaco

Fig. 17. RDF list representing the three largest peaks of Chile, in order

knowledge such that users can more clearly identify which real-world entity a particular node in a
knowledge graph actually references [120]; it further permits cross-referencing entity labels with
text corpora to find, for example, documents that potentially speak of a given entity [338]. Labels
can be complemented with aliases (e.g., wd:Q2887 skos:altLabel “Santiago de Chile” ) or comments (e.g.
wd:Q2887 rdfs:comment “Santiago is the capital of Chile” ) to further help ground the node’s identity.
Nodes such as “Santiago” denote string literals, rather than an identifier. Depending on the
specific graph model, such literal nodes may also be defined as a pair (𝑠, 𝑙), where 𝑠 denotes
the string and 𝑙 a language code; in RDF, for example we may state chile:City rdfs:label "City"@en ,
chile:City rdfs:label "Ciudad"@es , etc., indicating labels for the node in different languages. In other
models, the pertinent language can rather be specified, e.g., via metadata on the edge. Knowl-
edge graphs with human-interpretable labels, aliases, comments, etc., (in various languages) are
sometimes called (multilingual) lexicalised knowledge graphs [57].
3.2.5 Existential nodes. When modelling incomplete information, we may in some cases know
that there must exist a particular node in the graph with particular relationships to other nodes,
but without being able to identify the node in question. For example, we may have two co-located
events chile:EID42 and chile:EID43 whose venue has yet to be announced. One option is to simply omit
the venue edges, in which case we lose the information that these events have a venue and that
both events have the same venue. Another option might be to create a fresh IRI representing the
venue, but semantically this becomes indistinguishable from there being a known venue. Hence
some graph models permit the use of existential nodes, represented here as a blank circle:
chile:EID42 chile:venue chile:venue chile:EID43

These edges denote that there exists a common venue for chile:EID42 and chile:EID42 without identifying
it. Existential nodes are supported in RDF as blank nodes [111], which are also commonly used to
support modelling complex elements in graphs, such as RDF lists [111, 247]. Figure 17 exemplifies
an RDF list, which uses blank nodes in a linked-list structure to encode order. Though existential
nodes can be convenient, their presence can complicate operations on graphs, such as deciding
if two data graphs have the same structure modulo existential nodes [111, 245]. Hence methods
for skolemising existential nodes in graphs – replacing them with canonical labels – have been
proposed [245, 325]. Other authors rather call to minimise the use of such nodes in graph data [226].

3.3 Context
Many (arguably all) facts presented in the data graph of Figure 1 can be considered true with
respect to a certain context. With respect to temporal context, Santiago has only existed as a city
since 1541, flights from Arica to Santiago began in 1956, etc. With respect to geographic context, the
graph describes events in Chile. With respect to provenance, data relating to EID15 were taken from
– and are thus said to be true with respect to – the Ñam webpage on January 4th , 2020. Other forms
of context may also be used. We may further combine contexts, such as to indicate that Arica is a
Chilean city (geographic) since 1883 (temporal) according to the Treaty of Ancón (provenance).

20
By context we herein refer to the scope of truth, and thus talk about the context in which some
data are held to be true [205, 345]. The graph of Figure 1 leaves much of its context implicit. However,
making context explicit can allow for interpreting the data from different perspectives, such as
to understand what held true in 2016, what holds true excluding webpages later found to have
spurious data, etc. As seen in the previous examples, context for graph data may be considered
at different levels: on individual nodes, individual edges, or sets of edges (sub-graphs). We now
discuss various representations by which context can be made explicit at different levels.
3.3.1 Direct representation. The first way to represent context is to consider it as data no dif-
ferent from other data. For example, the dates for the event EID15 in Figure 1 can be seen as
representing a form of temporal context, indicating the temporal scope within which edges such as
EID15 venue Santa Lucía are held true. Another option is to change a relation represented as an edge,
such as Santiago flight Arica , into a node, such as seen in Figure 3, allowing to assign additional
context to the relation. While in these examples context is represented in an ad hoc manner, a
number of specifications have been proposed to represent context as data in a more standard way.
One example is the Time Ontology [107], which specifies how temporal entities, intervals, time
instants, etc. – and relations between them such as before, overlaps, etc. – can be described in RDF
graphs in an interoperable manner. Another example is the PROV Data Model [188], which specifies
how provenance can be described in RDF graphs, where entities (e.g., graphs, nodes, physical
document) are derived from other entities, are generated and/or used by activities (e.g., extraction,
authorship), and are attributed to agents (e.g., people, software, organisations).
3.3.2 Reification. Often we may wish to directly define the context of edges themselves; for
example, we may wish to state that the edge Santiago flight Arica is valid from 1956. While we
could use the pattern of turning the edge into a node – as illustrated in Figure 3 – to directly
represent such context, another option is to use reification, which allows for making statements
about statements in a generic manner (or in the case of a graph, for defining edges about edges). In
Figure 18 we present three forms of reification that can be used for modelling temporal context on
the aforementioned edge within a directed edge-labelled graph [235]. We use 𝑒 to denote an arbitrary
identifier representing the edge itself to which the contextual information can be associated. Unlike
in a direct representation, 𝑒 represents an edge, not a flight. RDF reification [111] (Figure 18a) defines
a new node 𝑒 to represent the edge and connects it to the source node (via subject), target node
(via object), and edge label (via predicate) of the edge. In contrast, 𝑛-ary relations [111] (Figure 18b)
connect the source node of the edge directly to the edge node 𝑒 with the label of the edge; the
target node of the edge is then connected to 𝑒 (via value). Finally, singleton properties [383]
(Figure 18c) rather use 𝑒 as an edge label, connecting it to a node indicating the original edge
label (via singleton). Other forms of reification have been proposed in the literature, including, for
example, NdFluents [190]. In general, a reified edge does not assert the edge it reifies; for example,
we may reify an edge to state that it is no longer valid. We refer to the work of Hernández et al. [235]
for further comparison of reification alternatives and their relative strengths and weaknesses.
3.3.3 Higher-arity representation. As an alternative to reification, we can rather use higher-arity
representations for modelling context. Taking again the edge Santiago flight Arica , Figure 19
illustrates three higher-arity representations of temporal context. First, we can use a named graph
(Figure 19a) to contain the edge and then define the temporal context on the graph name. Second, we
can use a property graph (Figure 19b) where the temporal context is defined as an attribute on the
edge. Third, we can use RDF* [218] (Figure 19c): an extension of RDF that allows edges to be defined
as nodes. Amongst these options, the most flexible is the named graph representation, where we
can assign context to multiple edges at once by placing them in one named graph; for example, we

21
Santiago flight Arica Santiago flight 𝑒 Santiago 𝑒 Arica

subject valid from value valid from singleton


predicate
object
𝑒 valid from 1956 1956 Arica 1956 flight

(a) RDF Reification (b) 𝑛-ary Relations (c) Singleton properties

Fig. 18. Three representations of temporal context on an edge in a directed-edge labelled graph

Santiago Arica Santiago flight Arica


flight
𝑒
Santiago Arica valid from
valid from
𝑒 : flight
1956 valid from = 1956 1956

(a) Named graph (b) Property graph (c) RDF*

Fig. 19. Three higher-arity representations of temporal context on an edge

can add more edges to the named graph of Figure 19a that are also valid from 1956. The least flexible
option is RDF*, which, in the absence of an edge id, does not permit different groups of contextual
values to be assigned to an edge; for example, considering the edge Chile president M. Bachelet , if
we add four contextual values to this edge to state that it was valid from 2006 until 2010 and valid
from 2014 until 2018, we cannot pair the values, but may rather have to create a node to represent
different presidencies (in the other models, we could have used two named graphs or edge ids).
3.3.4 Annotations. Thus far we have discussed representing context in a graph, but we have not
spoken about automated mechanisms for reasoning about context; for example, if there are only
seasonal summer flights from Santiago to Arica , we may wish to find other routes from Santiago
for winter events taking place in Arica . While the dates for buses, flights, etc., can be represented
directly in the graph, or using reification, writing a query to manually intersect the corresponding
temporal contexts will be tedious – or may not even be possible at all. Another alternative is
to consider annotations that provide mathematical definitions of a contextual domain and key
operations possible within that domain that can then be applied automatically.
Some annotations model a particular contextual domain; for example, Temporal RDF [210]
Chile
president M. Bachelet
allows for annotating edges with time intervals, such as [2006, 2010] , while Fuzzy
climate
RDF [502] allows for annotating edges with a degree of truth such as Santiago 0.8
Semi-Arid ,

indicating that it is more-or-less true – with a degree of 0.8 – that Santiago has a semi-arid climate.
Other forms of annotation are domain-independent; for example, Annotated RDF [130, 530, 583]
allows for representing various forms of context modelled as semi-rings: algebraic structures
consisting of domain values (e.g., temporal intervals, fuzzy values, etc.) and two main operators to
combine domain values: meet and join.10 We provide an example in Figure 20, where 𝐺 is annotated
with values from a simplified temporal domain consisting of sets of integers (1–365) representing
days of the year. For brevity we use an interval notation, where, for example, {[150, 152]} indicates
the set {150, 151, 152}. Query 𝑄 then asks for flights from Santiago to cities with events; this query
10 The join operator for annotations is different from the join operator for relational algebra.

22
𝐺: EID16 EID17 EID18 𝑄: ?event
city
city city city
{ [123, 130] } { [276, 279] } { [150, 152] }
Santiago flight ?city

Santiago flight Arica Punta Arenas


{ [1, 125], [200, 365] }
?city context
flight 𝑄 (𝐺) :
{ [1, 120], [220, 365] }
Arica { [123, 125], [276, 279] }

Fig. 20. Example query on a temporally annotated graph

will check and return an annotation reflecting the temporal validity of each answer. To derive
these answers, we first require applying a conjunction of annotations on compatible flight and
city edges, applying the meet operator to compute the annotation for which both edges hold. The
natural way to define meet in our scenario is as the intersection of sets of days, where, for example,
applying meet on the event annotation {[150, 152]} and the flight annotation {[1, 120], [220, 365]}
for Punta Arenas leads to the empty time interval {}, which may thus lead to the city being filtered
from the results (depending on the query evaluation semantics). However, for Arica , we find two
different non-empty intersections: {[123, 125]} for EID16 and {[276, 279]} for EID17 . Given that we
are interested in the city (a projected variable), rather than the event, we can thus combine these
two annotations for Arica using the join operator, returning the annotation in which either result
holds true. In our scenario, the natural way to define join is as the union of the sets of days, giving
{[123, 125], [276, 279]}. We provide formal definitions in Appendix B.4.1 based on the general
framework proposed by Zimmermann et al. [583] for annotations on graphs.

3.3.5 Other contextual frameworks. Other frameworks have been proposed for modelling and rea-
soning about context in graphs. A notable example is that of contextual knowledge repositories [252],
which allow for assigning individual (sub-)graphs to their own context. Unlike in the case of named
graphs, context is explicitly modelled along one or more dimensions, where each (sub-)graph must
take a value for each dimension. Each dimension is further associated with a partial order over its
values – e.g., 2020-03-22 ⪯ 2020-03 ⪯ 2020 – allowing to select and combine sub-graphs that are valid
within contexts at different levels of granularity. Schuetz et al. [467] similarly propose a form of
contextual OnLine Analytic Processing (OLAP), based on a data cube formed by dimensions where
individual cells contain knowledge graphs. Operations such as “slice-and-dice” (selecting knowledge
according to given dimensions), as well as “roll-up” (aggregating knowledge at a higher level) can
then be supported. We refer the reader to the respective papers for more details [252, 467].

4 DEDUCTIVE KNOWLEDGE
As humans, we can deduce more from the data graph of Figure 1 than what the edges explicitly
indicate. We may deduce, for example, that the Ñam festival ( EID15 ) will be located in Santiago,
even though the graph does not contain an edge EID15 location Santiago . We may further deduce
that the cities connected by flights must have some airport nearby, even though the graph does
not contain nodes referring to these airports. In these cases, given the data as premises, and some
general rules about the world that we may know a priori, we can use a deductive process to derive
new data, allowing us to know more than what is explicitly given by the data. These types of general
premises and rules, when shared by many people, form part of “commonsense knowledge” [344];
conversely, when rather shared by a few experts in an area, they form part of “domain knowledge”,

23
Festival
type
𝑄: ?name name ?festival
location
Santiago

Fig. 21. Graph pattern querying for names of festivals in Santiago

where, for example, an expert in biology may know that hemocyanin is a protein containing copper
that carries oxygen in the blood of some species of Mollusca and Arthropoda.
Machines, in contrast, do not have a priori access to such deductive faculties; rather they need to
be given formal instructions, in terms of premises and entailment regimes, in order to make similar
deductions to what a human can make. These entailment regimes formalise the conclusions that
logically follow as a consequence of a given set of premises. Once instructed in this manner, machines
can (often) apply deductions with a precision, efficiency, and scale beyond human performance.
These deductions may serve a range of applications, such as improving query answering, (deductive)
classification, finding inconsistencies, etc. As a concrete example involving query answering, assume
we are interested in knowing the festivals located in Santiago; we may straightforwardly express
such a query as per the graph pattern shown in Figure 21. This query returns no results for the
graph in Figure 1: there is no node named Festival , and nothing has (directly) the location Santiago .
However, an answer ( Ñam ) could be automatically entailed were we to state that 𝑥 being a Food
Festival entails that 𝑥 is a Festival, or that 𝑥 having venue 𝑦 in city 𝑧 entails that 𝑥 has location
𝑧. How, then, should such entailments be captured? In Section 3.1.1 we already discussed how
the former entailment can be captured with sub-class relations in a semantic schema; the second
entailment, however, requires a more expressive entailment regime than seen thus far.
In this section, we discuss ways in which more complex entailments can be expressed and
automated. Though we could leverage a number of logical frameworks for these purposes – such as
First-Order Logic, Datalog, Prolog, Answer Set Programming, etc. – we focus on ontologies, which
constitute a formal representation of knowledge that, importantly for us, can be represented as a
graph. We then discuss how these ontologies can be formally defined, how they relate to existing
logical frameworks, and how reasoning can be conducted with respect to such ontologies.

4.1 Ontologies
To enable entailment, we must be precise about what the terms we use mean. Returning to Figure 1,
for example, and examining the node EID16 more closely, we may begin to question how it is
modelled, particularly in comparison with EID15 . Both nodes – according to the class hierarchy
of Figure 11 – are considered to be events. But what if, for example, we wish to define two pairs
of start and end dates for EID16 corresponding to the different venues? Should we rather consider
what takes place in each venue as a different event? What then if an event has various start and end
dates in a single venue: would these also be considered as one (recurring) event, or many events?
These questions are facets of a more general question: what precisely do we mean by an “event”?
Does it happen in one contiguous time interval or can it happen many times? Does it happen in
one place or can it happen in multiple? There are no “correct” answers to such questions – we may
understand the term “event” in a variety of ways, and thus the answers are a matter of convention.
In the context of computing, an ontology 11 is then a concrete, formal representation of what
terms mean within the scope in which they are used (e.g., a given domain). For example, one event
ontology may formally define that if an entity is an “event”, then it has precisely one venue and
11 Theterm stems from the philosophical study of ontology, concerned with the different kinds of entities that exist, the
nature of their existence, what kinds of properties they have, and how they may be identified and categorised.

24
precisely one time instant in which it begins. Conversely, a different event ontology may define
that an “event” can have multiple venues and multiple start times, etc. Each such ontology formally
captures a particular perspective – a particular convention. Under the first ontology, for example,
we could not call the Olympics an “event”, while under the second ontology we could. Likewise
ontologies can guide how graph data are modelled. Under the first ontology we may split EID16 into
two events. Under the second, we may elect to keep EID16 as one event with two venues. Ultimately,
given that ontologies are formal representations, they can be used to automate entailment.
Like all conventions, the usefulness of an ontology depends on the level of agreement on what
that ontology defines, how detailed it is, and how broadly and consistently it is adopted. Adoption
of an ontology by the parties involved in one knowledge graph may lead to a consistent use of
terms and consistent modelling in that knowledge graph. Agreement over multiple knowledge
graphs will, in turn, enhance the interoperability of those knowledge graphs.
Amongst the most popular ontology languages used in practice are the Web Ontology Language
(OWL) [239]12 , recommended by the W3C and compatible with RDF graphs; and the Open Biomedical
Ontologies Format (OBOF) [366], used mostly in the biomedical domain. Since OWL is the more
widely adopted, we focus on its features, though many similar features are found in both [366].
Before introducing such features, however, we must discuss how graphs are to be interpreted.
4.1.1 Interpretations. We as humans may interpret the node Santiago in the data graph of Figure 1
as referring to the real-world city that is the capital of Chile. We may further interpret an edge
Arica flight Santiago as stating that there are flights from the city of Arica to this city. We thus
interpret the data graph as another graph – what we here call the domain graph – composed of
real-world entities connected by real-world relations. The process of interpretation, here, involves
mapping the nodes and edges in the data graph to nodes and edges of the domain graph.
Along these lines, we can abstractly define an interpretation of a data graph as being composed
of two elements: a domain graph, and a mapping from the terms (nodes and edge-labels) of the data
graph to those of the domain graph. The domain graph follows the same model as the data graph;
for example, if the data graph is a directed edge-labelled graph, then so too will be the domain
graph. For simplicity, we will speak of directed edge-labelled graphs and refer to the nodes of the
domain graph as entities, and the edges of the domain graph as relations. Given a data graph and an
interpretation, while we denote nodes in the data graph by Santiago , we will denote the entity it refers
to in the domain graph by Santiago (per the mapping of the given interpretation). Likewise, while
we denote an edge by Arica flight Santiago , we will denote the relation by Arica flight Santiago
(again, per the mapping of the given interpretation). In this abstract notion of an interpretation, we
do not require that Santiago nor Arica be the real-world cities, nor even that the domain graph contain
real-world entities and relations: an interpretation can have any domain graph and mapping.
Why is such an abstract notion of interpretation useful? The distinction between nodes/edges
and entities/relations becomes important when we define the meaning of ontology features and
entailment. To illustrate this distinction, if we ask whether there is an edge labelled flight between
Arica and Viña del Mar for the data graph in Figure 1, the answer is no. However, if we ask if the
entities Arica and Viña del Mar are connected by the relation flight, then the answer depends on what
assumptions we make when interpreting the graph. Under the Closed World Assumption (CWA), if
we do not have additional knowledge, then the answer is a definite no – since what is not known is
assumed to be false. Conversely, under the Open World Assumption (OWA), we cannot be certain
that this relation does not exist as this could be part of some knowledge not (yet) described by
the graph. Likewise under the Unique Name Assumption (UNA), the data graph describes at least

12 We could include RDF Schema (RDFS) in this list, but it is largely subsumed by OWL, which builds upon its core.

25
two flights to Santiago
(since Viña del Mar and Arica are assumed to be different entities and therefore,
Arica flight and Viña del Mar flight Santiago must be different edges). Conversely, under
Santiago
No Unique Name Assumption (NUNA), we can only say that there is at least one such flight since
Viña del Mar and Arica may be the same entity with two “names”.
These assumptions (or lack thereof) define which interpretations are valid, and which interpreta-
tions satisfy which data graphs. The UNA forbids interpretations that map two data terms to the
same domain term. The NUNA allows such interpretations. Under CWA, an interpretation that
contains an edge x p y in its domain graph can only satisfy a data graph from which we can
entail x p y . Under OWA, an interpretation containing the edge x p y can satisfy a data graph
not entailing x p y so long it does not contradict that edge.13 In the case of OWL, the NUNA and
OWA are adopted, thus representing the most general case, whereby multiple nodes/edge-labels in
the graph may refer to the same entity/relation-type (NUNA), and where anything not entailed by
the data graph is not assumed to be false as a consequence (OWA).
Beyond our base assumptions, we can associate certain patterns in the data graph with semantic
conditions that define which interpretations satisfy it; for example, we can add a semantic condition
to enforce that if our data graph contains the edge p subp. of q , then any edge x p y in the
domain graph of the interpretation must also have a corresponding edge x q y to satisfy the data
graph. These semantic conditions then form the features of an ontology language. In what follows,
to aid readability, we will introduce the features of OWL using an abstract graphical notation
with abbreviated terms. For details of concrete syntaxes, we rather refer to the OWL and OBOF
standards [239, 366]. Likewise we present semantic conditions for interpretations associated with
each feature in the same graphical format;14 further details of these conditions will be described
later in Section 4.2, with formal definitions rather provided in Appendix B.5.
4.1.2 Individuals. In Table 3, we list the main features supported by OWL for describing indi-
viduals (e.g., Santiago, EID16), sometimes distinguished from classes and properties. First, we
can assert (binary) relations between individuals using edges such as Santa Lucía city Santiago . In
the condition column, when we write 𝑥 𝑦 𝑧 , for example, we refer to the condition that the
given relation holds in the interpretation; if so, the interpretation satisfies the axiom. OWL further
allows for defining relations to explicitly state that two terms refer to the same entity, where, e.g.,
Región V same as Región de Valparaíso states that both refer to the same region (per Section 3.2); or that
two terms refer to different entities, where, e.g., Valparaíso diff. from Región de Valparaíso distinguishes
the city from the region of the same name. We may also state that a relation does not hold using
negation, which can be serialised as a graph using a form of reification (see Figure 18a).
4.1.3 Properties. In Section 3.1.1, we already discussed how subproperties, domains and ranges
may be defined for properties. OWL allows such definitions, and further includes other features,
as listed in Table 4. We may define a pair of properties to be equivalent, inverses, or disjoint. We
can further define a particular property to denote a transitive, symmetric, asymmetric, reflexive, or
irreflexive relation. We can also define the multiplicity of the relation denoted by properties, based
on being functional (many-to-one) or inverse-functional (one-to-many). We may further define a
key for a class, denoting the set of properties whose values uniquely identify the entities of that
class. Without adopting a Unique Name Assumption (UNA), from these latter three features we
may conclude that two or more terms refer to the same entity. Finally, we can relate a property to
a chain (a path expression only allowing concatenation of properties) such that pairs of entities
13 Variations of the CWA can provide a middle ground between a completely open world that makes no assumption about
completeness, falsehood of unknown statements, or unicity of names. One example of such variation is Local Closed World
Assumption, already mentioned in Section 3.1.1.
14 We use “iff” as an abbreviation for “if and only if” whereby “𝜙 iff 𝜓 ” can be read as “if 𝜙 then 𝜓 ” and “if 𝜓 then 𝜙”.

26
Table 3. Ontology features for individuals
Feature Axiom Condition Example
Assertion 𝑥 𝑦 𝑧 𝑥 𝑦 𝑧 Chile capital Santiago

Neg Neg

type 𝑥 type Chile


𝑛 sub 𝑦 sub
Negation pre not 𝑥 𝑧 𝑛 pre
obj 𝑦 obj capital

𝑧 Arica

Same As 𝑥1 same as 𝑥2 𝑥1 = 𝑥2 Región V same as Región de Valparaíso

Different From 𝑥 1 diff. from 𝑥2 𝑥1 ≠ 𝑥2 Valparaíso diff. from Región de Valparaíso

Table 4. Ontology features for property axioms


Feature Axiom Condition (for all 𝑥 ∗ , 𝑦∗ , 𝑧 ∗ ) Example
Subproperty 𝑝 subp. of 𝑞 𝑥 𝑝 𝑦 implies 𝑥 𝑞 𝑦 venue subp. of location

Domain 𝑝 domain 𝑐 𝑥 𝑝 𝑦 implies 𝑥 type 𝑐 venue domain Event

Range 𝑝 range 𝑐 𝑥 𝑝 𝑦 implies 𝑦 type 𝑐 venue range Venue

Eqivalence 𝑝 equiv. p. 𝑞 𝑥 𝑝 𝑦 iff 𝑥 𝑞 𝑦 start equiv. p. begins

Inverse 𝑝 inv. of 𝑞 𝑥 𝑝 𝑦 iff 𝑦 𝑞 𝑥 venue inv. of hosts

𝑞
Disjoint 𝑝 disj. p. 𝑞 not 𝑥 𝑦 venue disj. p. hosts
𝑝
Transitive 𝑝 type Transitive 𝑥 𝑝 𝑦 𝑝 𝑧 implies 𝑥 𝑝 𝑧 part of type Transitive

Symmetric 𝑝 type Symmetric 𝑥 𝑝 𝑦 iff 𝑦 𝑝 𝑥 nearby type Symmetric

𝑝
Asymmetric 𝑝 type Asymmetric not 𝑥 𝑦 capital type Asymmetric
𝑝

Reflexive 𝑝 type Reflexive 𝑥 𝑝 part of type Reflexive

Irreflexive 𝑝 type Irreflexive not 𝑥 𝑝 flight type Irreflexive

Functional 𝑝 type Functional 𝑦1 𝑝 𝑥 𝑝 𝑦2 implies 𝑦1 = 𝑦2 population type Functional

Inv. Functional 𝑝 type Inv. Functional 𝑥1 𝑝 𝑦 𝑝 𝑥 2 implies 𝑥 1 = 𝑥 2 capital type Inv. Functional

𝑝1 type 𝑦 type
1
. 𝑝1 𝑝1 lat
Key 𝑐 key .
. 𝑥1
... ...
𝑥 2 implies 𝑥 1 = 𝑥 2 City key long
𝑝𝑛 ...
𝑝𝑛 𝑝𝑛
𝑦𝑛

𝑞1
. 𝑥 𝑞1 𝑦1 ... 𝑦𝑛−1 𝑞𝑛 𝑧 location
Chain 𝑝 chain .
. location chain part of
. implies 𝑥 𝑝 𝑧
𝑞𝑛

related by the chain are also related by the given property. Note that for the latter two features in
.
Table 4 we require representing a list, denoted with a vertical notation .. ; while such a list may be
serialised as a graph in a number of concrete ways, OWL uses RDF lists (see Figure 17).

27
4.1.4 Classes. In Section 3.1.1, we discussed how class hierarchies can be modelled using a sub-class
relation. OWL supports sub-classes, and many additional features, for defining and making claims
about classes; these additional features are summarised in Table 5. Given a pair of classes, OWL
allows for defining that they are equivalent, or disjoint. Thereafter, OWL provides a variety of
features for defining novel classes by applying set operators on other classes, or based on conditions
that the properties of its instances satisfy. First, using set operators, one can define a novel class
as the complement of another class, the union or intersection of a list (of arbitrary length) of other
classes, or as an enumeration of all of its instances. Second, by placing restrictions on a particular
property 𝑝, one can define classes whose instances are all of the entities that have: some value from
a given class on 𝑝; all values from a given class on 𝑝;15 have a specific individual as a value on 𝑝 (has
value); have themselves as a reflexive value on 𝑝 (has self ); have at least, at most or exactly some
number of values on 𝑝 (cardinality); and have at least, at most or exactly some number of values on
𝑝 from a given class (qualified cardinality). For the latter two cases, in Table 5, we use the notation
“#{ a | 𝜙 }” to count distinct entities satisfying 𝜙 in the interpretation. These features can then be
combined to create more complex classes, where combining the examples for Intersection and
Has Self in Table 5 gives the definition: self-driving taxis are taxis having themselves as a driver.
4.1.5 Other features. OWL supports other language features not previously discussed, including:
annotation properties, which provide metadata about ontologies, such as versioning info; datatype
vs. object properties, which distinguish properties that take datatype values from those that do not;
and datatype facets, which allow for defining new datatypes by applying restrictions to existing
datatypes, such as to define that places in Chile must have a float between -66.0 and -110.0 as their
value for the (datatype) property latitude. For more details we refer to the OWL 2 standard [239].
We will further discuss methodologies for the creation of ontologies in Section 6.5.

4.2 Semantics and Entailment


The conditions listed in the previous tables indicate how each feature should be interpreted. These
conditions give rise to entailments, where, for example, in reference to the Symmetric feature
of Table 4, the definition nearby type Symmetric and edge Santiago nearby Santiago Airport entail
the edge Santiago Airport nearby Santiago according to the condition given for that feature. We now
describe how these conditions lead to entailments.
4.2.1 Model-theoretic semantics. Each axiom described by the previous tables, when added to a
graph, enforces some condition(s) on the interpretations that satisfy the graph. The interpretations
that satisfy a graph are called models of the graph. Were we to consider only the base condition of the
Assertion feature in Table 3, for example, then the models of a graph would be any interpretation
such that for every edge x y z in the graph, there exists a relation x y z in the model. Given
that there may be other relations in the model (under the OWA), the number of models of any such
graph is infinite. Furthermore, given that we can map multiple nodes in the graph to one entity
in the model (under the NUNA), any interpretation with (for example) the relation a a a is
a model of any graph so long as for every edge x y z in the graph, it holds that x = y = z
= a in the interpretation (in other words, the interpretation maps everything to a ). As we add
axioms with their associated conditions to the graph, we restrict models for the graph; for example,
considering a graph with two edges – x y z and y type Irreflexive – the interpretation with
a a a , x = y = ... = a is no longer a model as it breaks the condition for the irreflexive axiom.

15 While something like flight prop DomesticAirport all NationalFlight might appear to be a more natural example for
All Values, this would be a modelling mistake, as the corresponding for all condition is satisfied when no such node exists.
In other words, with this example definition, we could infer anything known not to have any flights to be a domestic airport.
(We could, however, define the intersection of this class and airport as being a domestic airport.)

28
Table 5. Ontology features for class axioms and definitions
Feature Axiom Condition (for all 𝑥 ∗ , 𝑦∗ , 𝑧 ∗ ) Example
Subclass 𝑐 subc. of 𝑑 𝑥 type 𝑐 implies 𝑥 type 𝑑 City subc. of Place

Eqivalence 𝑐 equiv. c. 𝑑 𝑥 type 𝑐 iff 𝑥 type 𝑑 Human equiv. c. Person

Disjoint 𝑐 disj. c. 𝑑 not 𝑐 type 𝑥 type 𝑑 City disj. c. Region

Complement 𝑐 comp. 𝑑 𝑥 type 𝑐 iff not 𝑥 type 𝑑 Dead comp. Alive

𝑑1 𝑥 type 𝑑 1 or
. DomesticFlight
Union 𝑐 union .
. 𝑥 type 𝑐 iff 𝑥 type ... or Flight union
InternationalFlight
𝑑𝑛 𝑥 type 𝑑𝑛

𝑑1
𝑑1 type
. Taxi
Intersection 𝑐 inter. .
. 𝑥 type 𝑐 iff 𝑥 type ... SelfDrivingTaxi inter.
SelfDriving
𝑑𝑛 type 𝑑𝑛

𝑥1 Austria
. .
Enumeration 𝑐 one of .
. 𝑥 type 𝑐 iff 𝑥 ∈ { 𝑥 1 , . . . , 𝑥𝑛 } EUState one of .
.
𝑥𝑛 Sweden

𝑝 nationality
𝑐
prop 𝑥 type 𝑐 iff there exists
𝑎 such that
EUCitizen
prop
Some Values some some
𝑑 𝑥 𝑝 𝑎 type 𝑑 EUState

𝑝 has part
prop 𝑎 with 𝑥 𝑝 𝑎 prop
All Values 𝑐 𝑥 type 𝑐 iff for all Weightless
all it holds that 𝑎 type 𝑑 all
𝑑 Weightless

𝑝 nationality
prop prop
Has Value 𝑐 𝑥 type 𝑐 iff 𝑥 𝑝 𝑦 ChileanCitizen
value value
𝑦 Chile

𝑝 driver
prop prop
Has Self 𝑐 𝑥 type 𝑐 iff 𝑥 𝑝 𝑥 SelfDriving
self self
true true

𝑝 𝑥 type fluent
Cardinality prop 𝑐 iff prop
𝑐 Polyglot
★ ∈ {=, ≤, ≥ } ★
𝑛 . #{ 𝑎 | 𝑥 𝑝 𝑎 } ★𝑛 ≥ 2

𝑝 body
Qualified prop prop
𝑥 type 𝑐 iff
Cardinality 𝑐 class 𝑑 BinaryStarSystem class Star
★ . #{ 𝑎 | 𝑥 𝑝 𝑎 type 𝑑 } ★𝑛
★ ∈ {=, ≤, ≥ } =
𝑛 2

4.2.2 Entailment. We say that one graph entails another if and only if any model of the former
graph is also a model of the latter graph. Intuitively this means that the latter graph says nothing
new over the former graph and thus holds as a logical consequence of the former graph. For example,
consider the graph Santiago type City subc. of Place and the graph Santiago type Place . All models
of the latter must have that Santiago type Place , but so must all models of the former, which must
have Santiago type City subc. of Place and further must satisfy the condition for Subclass, which
requires that Santiago type Place also hold. Hence we conclude that any model of the former graph
must be a model of the latter graph, or, in other words, the former graph entails the latter graph.
4.2.3 If–then vs. if-and-only-if semantics. Consider the graph nearby type Symmetric and the graph
nearby inv. of nearby . They result in the same semantic conditions being applied in the domain
graph, but does one entail the other? The answer depends on the semantics applied. Considering

29
the axioms and conditions of Tables 3, we can consider two semantics. Under if–then semantics –
if Axiom matches data graph then Condition holds in domain graph – the graphs do not entail
each other: though both graphs give rise to the same condition, this condition is not translated back
into the axioms that describe it.16 Conversely, under if-and-only-if semantics – Axiom matches
data graph if-and-only-if Condition holds in domain graph – the graphs entail each other: both
graphs give rise to the same condition, which is translated back into all possible axioms that describe
it. Hence if-and-only-if semantics allows for entailing more axioms in the ontology language than
if–then semantics. OWL generally applies an if-and-only-if semantics [239].

4.3 Reasoning
Unfortunately, given two graphs, deciding if the first entails the second – per the notion of entailment
we have defined and for all of the ontological features listed in Tables 3–5 – is undecidable: no
(finite) algorithm for such entailment can exist that halts on all inputs with the correct true/false
answer [240]. However, we can provide practical reasoning algorithms for ontologies that (1) halt
on any input ontology but may miss entailments, returning false instead of true, (2) always halt
with the correct answer but only accept input ontologies with restricted features, or (3) only return
correct answers for any input ontology but may never halt on certain inputs. Though option (3) has
been explored using, e.g., theorem provers for First Order Logic [466], options (1) and (2) are more
commonly pursued using rules and/or Description Logics. Option (1) generally allows for more
efficient and scalable reasoning algorithms and is useful where data are incomplete and having
some entailments is valuable. Option (2) may be a better choice in domains – such as medical
ontologies – where missing entailments may have undesirable outcomes.
4.3.1 Rules. One of the most straightforward ways to provide automated access to deductive
knowledge is through inference rules (or simply rules) encoding if–then-style consequences. A rule
is composed of a body (if) and a head (then). Both the body and head are given as graph patterns.
A rule indicates that if we can replace the variables of the body with terms from the data graph and
form a subgraph of a given data graph, then using the same replacement of variables in the head
will yield a valid entailment. The head must typically use a subset of the variables appearing in the
body to ensure that the conclusion leaves no variables unreplaced. Rules of this form correspond to
(positive) Datalog [85] in databases, Horn clauses [323] in logic programming, etc.
Rules can be used to capture entailments under ontological conditions. In Table 6, we list
some example rules for sub-class, sub-property, domain and range features [368]; these rules
may be considered incomplete, not capturing, for example, that every class is a sub-class of itself,
that every property is a sub-property of itself, etc. A more comprehensive set of rules for the
OWL features of Tables 3–5 have been defined as OWL 2 RL/RDF [363]; these rules are likewise
incomplete as such rules cannot fully capture negation (e.g., Complement), existentials (e.g., Some
Values), universals (e.g., All Values), or counting (e.g., Cardinality and Qualified Cardinality).
Other rule languages have, however, been proposed to support additional such features, including
existentials (see, e.g., Datalog± [36]), disjunction (see, e.g., Disjunctive Datalog [449]), etc.
Rules can be leveraged for reasoning in a number of ways. Materialisation refers to the idea of
applying rules recursively to a graph, adding the conclusions generated back to the graph until a
fixpoint is reached and nothing more can be added. The materialised graph can then be treated as
any other graph. Although the efficiency and scalability of materialisation can be enhanced through
optimisations like Rete networks [164], or using distributed frameworks like MapReduce [531],
depending on the rules and the data, the materialised graph may become unfeasibly large to manage.
16 Observe that nearby type Symmetric is a model of the first graph but not the second, while nearby inv. of nearby
is a model of the second graph but not the first. Hence neither graph entails the other.

30
venue
𝑂: location chain
city
Food Festival subc. of Festival subc. of Drinks Festival

𝑂 (𝑄) : ( ?festival type Festival ∪ ?festival type Food Festival ∪ ?festival type Drinks Festival )
Z ( ?festival location Santiago ∪ ?festival venue ?x city Santiago )
Z ?festival name ?name

Fig. 22. Query rewriting example for the query 𝑄 of Figure 21

Table 6. Example rules for sub-class, sub-property, domain, and range features
Feature Body ⇒ Head
Subclass (I) ?x type ?c subc. of ?d ⇒ ?x type ?d

Subclass (II) ?c subc. of ?d subc. of ?e ⇒ ?c subc. of ?e

subp. of
Subproperty (I) ⇒ ?x ?q ?y
?x ?p ?y ?q

Subproperty (II) ?p subp. of ?q subp. of ?r ⇒ ?p subp. of ?r

domain
Domain ⇒ ?x type ?c
?x ?p ?y ?c

range
Range ⇒ ?y type ?c
?x ?p ?y ?c

Another strategy is to use rules for query rewriting, which given a query, will automatically extend
the query in order to find solutions entailed by a set of rules; for example, taking the schema graph
in Figure 12 and the rules in Table 6, the (sub-)pattern ?x type Event in a given input query would
be rewritten to the following disjunctive pattern evaluated on the original graph:
?x type Event ∪ ?x type Festival ∪ ?x type Periodic Market ∪ ?x venue ?y

Figure 22 provides a more complete example of an ontology that is used to rewrite the query of
Figure 21; if evaluated over the graph of Figure 1, Ñam will be returned as a solution. However,
not all of the aforementioned features of OWL can be supported in this manner. The OWL 2 QL
profile [363] is a subset of OWL designed specifically for query rewriting of this form [21].
While rules can be used to (partially) capture ontological entailments, they can also be defined
independently of an ontology language, capturing entailments for a given domain. In fact, some
rules – such as the following – cannot be captured by the ontology features previously seen, as they
do not support ways to infer relations from cyclical graph patterns (for computability reasons):
?x flight ?y country ?z ⇒ ?x domestic flight ?y
country
Various languages allow for expressing rules over graphs – independently or alongside of an
ontology language – including: Notation3 (N3) [42], Rule Interchange Format (RIF) [288], Semantic
Web Rule Language (SWRL) [254], and SPARQL Inferencing Notation (SPIN) [295].
4.3.2 Description Logics. Description Logics (DLs) were initially introduced as a way to formalise
the meaning of frames [355] and semantic networks [426]. Considering that semantic networks
are an early version of knowledge graphs, and the fact that DLs have heavily influenced the Web
Ontology Language, DLs thus hold an important place in the logical formalisation of knowledge

31
graphs. DLs form a family of logics rather than a particular logic. Initially, DLs were restricted
fragments of First Order Logic (FOL) that permit decidable reasoning tasks, such as entailment
checking [23]. Different DLs strike different balances between expressive power and computational
complexity of reasoning. DLs would later be extended with features that go beyond FOL but are
useful in the context of modelling graph data, such as transitive closure, datatypes, etc.
Description Logics are based on three types of elements: individuals, such as Santiago; classes
(aka concepts) such as City; and properties (aka roles) such as flight. DLs then allow for making
claims, known as axioms, about these elements. Assertional axioms can be either unary class
relations on individuals, such as City(Santiago), or binary property relations on individuals,
such as flight(Santiago,Arica). Such axioms form the Assertional Box (A-Box). DLs further
introduce logical symbols to allow for defining class axioms (forming the Terminology Box, or
T-Box for short), and property axioms (forming the Role Box, R-Box); for example, the class axiom
City ⊑ Place states that the former class is a subclass of the latter one, while the property axiom
flight ⊑ connectsTo states that the former property is a subproperty of the latter one. DLs may
then introduce a rich set of logical symbols, not only for defining class and property axioms, but
also defining new classes based on existing terms; as an example of the latter, we can define a
class ∃nearby.Airport as the class of individuals that have some airport nearby. Noting that the
symbol ⊤ is used in DLs to denote the class of all individuals, we can then add a class axiom
∃flight.⊤ ⊑ ∃nearby.Airport to state that individuals with an outgoing flight must have some
airport nearby. Noting that the symbol ⊔ can be used in DL to define that a class is the union of
other classes, we can further define that Airport ⊑ DomesticAirport ⊔ InternationalAirport,
i.e., that an airport is either a domestic airport or an international airport (or both).
The similarities between these DL features and the OWL features previously outlined in Tables 3–
5 are not coincidental: the OWL standard was heavily influenced by DLs, where, for example, the
OWL 2 DL language is a fragment of OWL restricted so that entailment becomes decidable. As an
example of a restriction, with DomesticAirport ⊑ = 1 destination ◦ country.⊤, we can define
in DL syntax that domestic airports have flights destined to precisely one country (where p ◦ q
denotes a chain of properties). However, counting chains is often disallowed in DLs to ensure
decidability. In Appendix B.5.3, we present formal definitions for DL syntax and semantics, as well
as notions of entailment. For further reading, we also refer to the textbook by Baader et al. [23].
Expressive DLs support complex entailments involving existentials, universals, counting, etc.
A common strategy for deciding such entailments is to reduce entailment to satisfiability, which
decides if an ontology is consistent or not [253].17 Thereafter methods such as tableau can be used
to check satisfiability, cautiously constructing models by completing them along similar lines to
the materialisation strategy previously described, but additionally branching models in the case of
disjunction, introducing new elements to represent existentials, etc. If any model is successfully
“completed”, the process concludes that the original definitions are satisfiable (see, e.g., [364]). Due
to their prohibitive computational complexity [363] – where for example, disjunction may lead to an
exponential number of branching possibilities – such reasoning strategies are not typically applied
in the case of large-scale data, though they may be useful when modelling complex domains.

5 INDUCTIVE KNOWLEDGE
While deductive knowledge is characterised by precise logical consequences, inductively acquiring
knowledge involves generalising patterns from a given set of input observations, which can then be
used to generate novel but potentially imprecise predictions. For example, from a large data graph
with geographical and flight information, we may observe the pattern that almost all capital cities
17𝐺 entails 𝐺 ′ if and only if 𝐺 ∪ not(𝐺 ′ ) is not satisfiable.

32
Inductive Knowledge

Numeric Symbolic

Unsupervised Self-supervised Supervised Self-supervised

Graph Analytics Embeddings GNNs Rule Mining Axiom Mining

Fig. 23. Conceptual overview of popular inductive techniques for knowledge graphs in terms of
type of representation generated (Numeric/Symbolic) and type of paradigm used (Unsupervised/Self-
supervised/Supervised).

Moon Valley Calama flight Iquique Punta Arenas bus Torres del Paine

bus bus bus flight flight flight flight bus bus

Piedras Rojas San Pedro Santiago flight Puerto Montt Grey Glacier

bus bus bus flight flight bus

Los Flamencos Arica Easter Island Puerto Varas bus Osorno Volcano

Fig. 24. Data graph representing transport routes in Chile

of countries have international airports serving them, and hence predict that if Santiago is a capital
city, it likely has an international airport serving it; however, the predictions drawn from this pattern
do not hold for certain, where (e.g.) Vaduz, the capital city of Liechtenstein, has no (international)
airport serving it. Hence predictions will often be associated with a level of confidence; for example,
we may say that a capital has an international airport in 187 195 of cases, offering a confidence of
0.959 for predictions made with that pattern. We then refer to knowledge acquired inductively
as inductive knowledge, which includes both the models used to encode patterns, as well as the
predictions made by those models. Though fallible, inductive knowledge can be highly valuable.
In Figure 23 we provide an overview of the inductive techniques typically applied to knowledge
graphs. In the case of unsupervised methods, there is a rich body of work on graph analytics, which
uses well-known functions/algorithms to detect communities or clusters, find central nodes and
edges, etc., in a graph. Alternatively, knowledge graph embeddings can use self-supervision to learn
a low-dimensional numeric model of a knowledge graph that (typically) maps input edges to an
output plausibility score indicating the likelihood of the edge being true. The structure of graphs
can also be directly leveraged for supervised learning, as explored in the context of graph neural
networks. Finally, while the aforementioned techniques learn numerical models, symbolic learning
can learn symbolic models – i.e., logical formulae in the form of rules or axioms – from a graph in
a self-supervised manner. We now discuss each of the aforementioned techniques in turn.

5.1 Graph Analytics


Analytics is the process of discovering, interpreting, and communicating meaningful patterns
inherent to (typically large) data collections. Graph analytics is then the application of analytical
processes to (typically large) graph data. The nature of graphs naturally lends itself to certain
types of analytics that derive conclusions about nodes and edges based on the topology of the
graph, i.e., how the nodes of the graph are connected. Graph analytics hence draws many of its
techniques from related areas such as graph theory and network analysis, which have been used to

33
study graphs that represent social networks, the Web, internet routing, transportation networks,
ecosystems, protein–protein interactions, linguistic cooccurrences, and more besides [147].
Returning to the domain of our running example, the tourism board could use graph analytics
to extract knowledge about, for instance: key transport hubs that serve many tourist attractions
(centrality); groupings of attractions visited by the same tourists (community detection); attractions
that may become unreachable in the event of strikes or other route failures (connectivity), or
pairs of attractions that are similar to each other (node similarity). Given that such analytics will
require a complex, large-scale graph, for the purposes of illustration, in Figure 24 we present a more
concise example of some transportation connections in Chile directed towards popular touristic
destinations. We first introduce a selection of key techniques that can be applied for graph analytics.
We then discuss frameworks and languages that can be used to compute such analytics in practice.
Given that many traditional graph algorithms are defined for unlabelled graphs, we then describe
ways in which analytics can be applied over directed edge-labelled graphs. Finally we discuss the
potential connections between graph analytics and querying and reasoning.

5.1.1 Techniques. A wide variety of techniques can be applied for graph analytics. In the following
we will enumerate some of the main techniques – as recognised, for example, by the survey of Iosup
et al. [264] – that can be invoked in this setting.
• Centrality: aims to identify the most important (aka central) nodes or edges of a graph. Specific
node centrality measures include degree, betweenness, closeness, Eigenvector, PageRank, HITS,
Katz, among others. Betweenness centrality can also be applied to edges. A node centrality
measure would allow, e.g., to predict the transport hubs in Figure 24, while edge centrality
would allow us to find the edges on which many shortest routes depend for predicting traffic.
• Community detection: aims to identify communities in a graph, i.e., sub-graphs that are
more densely connected internally than to the rest of the graph. Community detection
algorithms, such as minimum-cut algorithms, label propagation, Louvain modularity, etc.
enable discovering such communities. Community detection applied to Figure 24 may, for
example, detect a community to the left (referring to the north of Chile), to the right (referring
to the south of Chile), and perhaps also the centre (referring to cities with airports).
• Connectivity: aims to estimate how well-connected the graph is, revealing, for instance, the
resilience and (un)reachability of elements of the graph. Specific techniques include measuring
graph density or 𝑘-connectivity, detecting strongly connected components and weakly connected
components, computing spanning trees or minimum cuts, etc. In the context of Figure 24,
such analysis may tell us that routes to Grey Glacier , Osorno Volcano and Piedras Rojas are the most
“brittle”, becoming disconnected if one of two bus routes fail.
• Node similarity: aims to find nodes that are similar to other nodes by virtue of how they
are connected within their neighbourhood. Node similarity metrics may be computed us-
ing structural equivalence, random walks, diffusion kernels, etc. These methods provide an
understanding of what connects nodes, and, thereafter, in what ways they are similar. In the
context of Figure 24, such analysis may tell us that Calama and Arica are similar nodes based
on both having return flights to Santiago and return buses to San Pedro .
While the previous techniques accept a graph alone as input,18 other forms of graph analytics may
further accept a node, a pair of nodes, etc., along with the graph.

18 Node similarity can be run over an entire graph to find the 𝑘 most similar nodes for each node, or can also be run
for a specific node to find its most similar nodes. There are also measures for graph similarity (based on, e.g., frequent
itemsets [334]) that accept multiple graphs as input.

34
• Path finding: aims to find paths in a graph, typically between pairs of nodes given as input.
Various technical definitions exist that restrict the set of valid paths between such nodes,
including simple paths that do not visit the same node twice, shortest paths that visit the
fewest number of edges, or – as previously discussed in Section 2.2 – regular path queries that
restrict the labels of edges that can be traversed by the path [16]. We could use such algorithms
to find, for example, the shortest path(s) in Figure 24 from Torres del Paine to Moon Valley .
Most such techniques have been proposed and studied for simple graphs or directed graphs without
edge labels. We will discuss their application to more complex graph models – and how they can
be combined with other techniques such as reasoning and querying – later in Section 5.1.3.

5.1.2 Frameworks. Various frameworks have been proposed for large-scale graph analytics, often
in a distributed (cluster) setting. Amongst these we can mention Apache Spark (GraphX) [119, 563],
GraphLab [326], Pregel [335], Signal–Collect [503], Shark [564], etc. These graph parallel frameworks
apply a systolic abstraction [304] based on a directed graph, where nodes are processors that can
send messages to other nodes along edges. Computation is then iterative, where in each iteration,
each node reads messages received through inward edges (and possibly its own previous state),
performs a computation, and then sends messages through outward edges based on the result.
These frameworks then define the systolic computational abstraction on top of the data graph
being processed: nodes and edges in the data graph become nodes and edges in the systolic graph.
We refer to Appendix B.6.1 for more formal details on graph parallel frameworks.
To take an example, assume we wish to compute the places that are most (or least) easily reached
by the routes shown in the graph of Figure 24. A good way to measure this is using centrality,
where we choose PageRank [391], which computes the probability of a tourist randomly following
the routes shown in the graph being at a particular place after a given number of “hops”. We can
implement PageRank on large graphs using a graph parallel framework. In Figure 25, we provide
an example of an iteration of PageRank for an illustrative sub-graph of Figure 24. The nodes are
initialised with a score of |𝑉1 | = 16 , where we assume the tourist to have an equal chance of starting
𝑖 (𝑣)
at any point. In the message phase (Msg), each node 𝑣 passes a score of 𝑑R |𝐸 (𝑣) | on each of its outgoing
edges, where we denote by 𝑑 a constant damping factor used to ensure convergence (typically
𝑑 = 0.85, indicating the probability that a tourist randomly “jumps” to any place), by R𝑖 (𝑣) the score
of node 𝑣 in iteration 𝑖 (the probability of the tourist being at node 𝑣 after 𝑖 hops), and by |𝐸 (𝑣)|
the number of outgoing edges of 𝑣. The aggregation phase (Agg) for 𝑣 then sums all incoming
messages received along with its constant share of the damping factor ( 1−𝑑 |𝑉 | ) to compute R𝑖+1 (𝑣).
We then proceed to the message phase of the next iteration, continuing until some termination
criterion is reached (e.g., iteration count or residual threshold, etc.) and final scores are output.
While the given example is for PageRank, the systolic abstraction is general enough to support
a wide variety of graph analytics, including those previously mentioned. An algorithm in this
framework consists of the functions to compute message values in the message phase (Msg), and
to accumulate the messages in the aggregation phase (Agg). The framework will take care of
distribution, message passing, fault tolerance, etc. However, such frameworks – based on message
passing between neighbours – have limitations: not all types of analytics can be expressed in such
frameworks [565].19 Hence frameworks may allow additional features, such as a global step that
performs a global computation on all nodes, making the result available to each node [335]; or a
mutation step that allows for adding or removing nodes and edges during processing [335].

19 Formally
Xu et al. [565] have shown that such frameworks are as powerful as the (incomplete) Weisfeiler–Lehman (WL)
graph isomorphism test – based on recursively hashing neighbouring hashes – for distinguishing graph structures.

35
1
6 Moon Valley Calama 1
6
𝑑
6·4 + 1−𝑑
6
Moon Valley Calama 𝑑
6·4 + 1−𝑑
6

𝑑 𝑑 𝑑 𝑑
6·1 𝑑 6·1 𝑑 6·1 𝑑 6·1 𝑑
6·4 6·4 6·4 6·4

1
6 Piedras Rojas San Pedro 1
6
𝑑
6·1 + 1−𝑑
6
Piedras Rojas San Pedro 𝑑
6·1 + 𝑑
6·1 + 𝑑
6·1 + 1−𝑑
6

𝑑 𝑑 𝑑 𝑑 𝑑 𝑑
6·1 6·4 𝑑 6·4 𝑑 6·1 6·4 𝑑 6·4 𝑑
6·1 6·1 6·1 6·1

1
6 Los Flamencos Arica 1
6
𝑑
6·4 + 𝑑
6·1 + 1−𝑑
6
Los Flamencos Arica 𝑑
6·4 + 1−𝑑
6

Msg (iter = 1) Agg (iter = 1)

Fig. 25. Example of a systolic iteration of PageRank for a sample sub-graph of Figure 24

5.1.3 Analytics on data graphs. As aforementioned, most analytics presented thus far are, in their
“native” form, applicable for undirected or directed graphs without the edge meta-data – i.e., edge
labels or property–value pairs – typical of graph data models.20 A number of strategies can be
applied to make data graphs subject to analytics of this form:
• Projection involves simply “projecting” an undirected or directed graph by optionally selecting
a sub-graph from the data graph from which all edge meta-data are dropped; for example,
Figure 25 may be the result of extracting the sub-graph induced by the edge labels bus and
flight from a larger data graph, where the labels are then dropped to create a directed graph.
• Weighting involves converting edge meta-data into numerical values according to some
function. Many of the aforementioned techniques are easily adapted to the case of weighted
(directed) graphs; for example, we could consider weights on the graph of Figure 25 denoting
trip duration (or price, traffic, etc.), and then compute the shortest paths adding the duration
of each leg.21 In the absence of external weights, we may rather map edge labels to weights,
assigning the same weight to all flight edges, to all bus edges, etc., based on some criteria.
• Transformation involves transforming the graph to a lower arity model. A transformation may
be lossy, meaning that the original graph cannot be recovered; or lossless, meaning that the
original graph can be recovered. Figure 26 provides an example of a lossy and lossless trans-
formation from a directed edge-labelled graph to directed graphs. In the lossy transformation,
we cannot tell, for example, if the original graph contained the edge Iquique flight Santiago or
Iquique flight Arica , etc. The lossless transformation must introduce new nodes (similar to
reification) to maintain information about directed labelled edges. Both transformed graphs
further attempt to preserve the directionality of the original graph.
• Customisation involves changing the analytical procedure to incorporate edge meta-data,
such as was the case for path finding based on path expressions. Other examples might
include structural measures for node similarity that not only consider common neighbours,
but also common neighbours connected by edges with the same label, or aggregate centrality
measures that capture the importance of edges grouped by label, etc.
The results of an analytical process may change drastically depending on which of the previous
strategies are chosen to prepare the data for analysis. This choice may be a non-trivial one to make
20 We remark that in the case of property graphs, property–value pairs on nodes can be converted by mapping values to
nodes and properties to edges with the corresponding label.
21 Other forms of analytics are possible if we assume the graph is weighted; for example, if we annotated the graph of

Figure 25 with probabilities of tourists moving from one place to the next, we could leverage Markov processes to understand
features such as reducibility, periodicity, transience, recurrence, ergodicity, steady-states, etc., of the routes [138].

36
Iquique Iquique Iquique

flight bus bus bus


flight flight

Santiago flight Arica Santiago Arica Santiago Arica

(a) Original graph (b) Lossy transformation (c) Lossless transformation

Fig. 26. Transformations from a directed edge-labelled graph to a directed graph

a priori and may require empirical validation. More study is required to more generally understand
the effects of such strategies on the results of different analytical techniques.
5.1.4 Analytics with queries. As discussed in Section 2.2, various languages for querying graphs
have been proposed [16]. One may consider a variety of ways in which query languages and
analytics can complement each other. First, we may consider using query languages to project or
transform a graph suitable for a particular analytical task, such as to extract the graph of Figure 24
from a larger data graph. Query languages such as SPARQL [217], Cypher [165], and G-CORE [15]
allow for outputting graphs, where such queries can be used to select sub-graphs for analysis. These
languages can also express some limited (non-recursive) analytics, where aggregations can be used
to compute degree centrality, for example; they may also have some built-in analytical support,
where, for example, Cypher [165] allows for finding shortest paths. In the other direction, analytics
can contribute to the querying process in terms of optimisations, where, for example, analysis of
connectivity may suggest how to better distribute a large data graph over multiple machines for
querying using, e.g., minimum cuts [7, 268]. Analytics have also been used to rank query results
over large graphs [151, 544], selecting the most important results for presentation to the user.
In some use-cases we may further wish to interleave querying and analytical processes. For
example, from the full data graph collected by the tourist board, consider an upcoming airline strike
where the board wishes to find the events during the strike with venues in cities unreachable from
Santiago by public transport due to the strike. Hypothetically, we could use a query to extract the
transport network excluding the airline’s routes (assuming, per Figure 3 that the airline information
is available), use analytics to extract the strongly connected component containing Santiago, and
finally use a query to find events in cities not in the Santiago component on the given dates.22 While
one could solve this task using an imperative language such as Gremlin [445], GraphX [563], or
R [518], more declarative languages are also being explored to more easily express such tasks, with
proposals including the extension of graph query languages with recursive capabilities [47, 439],23
combining linear algebra with relational (query) algebra [258], and so forth.
5.1.5 Analytics with entailment. Knowledge graphs are often associated with a semantic schema
or ontology that defines the semantics of domain terms, giving rise to entailments (per Section 4).
Applying analytics with or without such entailments – e.g., before or after materialisation – may
yield radically different results. For example, observe that an edge Santa Lucía hosts EID15 is seman-
tically equivalent to an edge EID15 venue Santa Lucía once the inverse axiom hosts inv. of venue is
22 Such a task could not be solved in a single query using regular path queries as such expressions would not be capable of
filtering edges representing flights of a particular airline.
23 Recursive query languages become Turing complete assuming one can also express operations on binary arrays.

37
invoked; however, these edges are far from equivalent from the perspective of analytical techniques
that consider edge direction, for which including one type of edge, or the other, or both, may have
a major bearing on the final results. To the best of our knowledge, the combination of analytics
and entailment has not been well-explored, leaving open interesting research questions. Along
these lines, it may be of interest to explore semantically-invariant analytics that yield the same
results over semantically-equivalent graphs (i.e., graphs that entail one another), thus analysing
the semantic content of the knowledge graph rather than simply the topological features of the
data graph; for example, semantically-invariant analytics would yield the same results over a graph
containing the inverse axiom hosts inv. of venue and a number of hosts edges, the same graph but
where every hosts edge is replaced by an inverse venue edge, and the union of both graphs.

5.2 Knowledge Graph Embeddings


Methods for machine learning have gained significant attention in recent years. In the context
of knowledge graphs, machine learning can either be used for directly refining a knowledge
graph [400] (discussed further in Section 8); or for downstream tasks using the knowledge graph,
such as recommendation [575], information extraction [533], question answering [255], query
relaxation [548], query approximation [215], etc. (discussed further in Section 10). However, many
traditional machine learning techniques assume dense numeric input representations in the form
of vectors, which is quite distinct from how graphs are usually expressed. So how can graphs – or
nodes, edges, etc., thereof – be encoded as numeric vectors?
A first attempt to represent a graph using vectors would be to use a one-hot encoding, generating
a vector for each node of length |𝐿| · |𝑉 | – with |𝑉 | the number of nodes in the input graph and |𝐿|
the number of edge labels – placing a one at the corresponding index to indicate the existence of
the respective edge in the graph, or zero otherwise. Such a representation will, however, typically
result in large and sparse vectors, which will be detrimental for most machine learning models.
The main goal of knowledge graph embedding techniques is to create a dense representation of
the graph (i.e., embed the graph) in a continuous, low-dimensional vector space that can then be
used for machine learning tasks. The dimensionality 𝑑 of the embedding is fixed and typically low
(often, e.g., 50 ≥ 𝑑 ≥ 1000). Typically the graph embedding is composed of an entity embedding
for each node: a vector with 𝑑 dimensions that we denote by e; and a relation embedding for each
edge label: (typically) a vector with 𝑑 dimensions that we denote by r. The overall goal of these
vectors is to abstract and preserve latent structures in the graph. There are many ways in which
this notion of an embedding can be instantiated. Most commonly, given an edge s p o , a specific
embedding approach defines a scoring function that accepts es (the entity embedding of node s ), rp
(the entity embedding of edge label p) and eo (the entity embedding of node o ) and computes the
plausibility of the edge: how likely it is to be true. Given a data graph, the goal is then to compute
the embeddings of dimension 𝑑 that maximise the plausibility of positive edges (typically edges in
the graph) and minimise the plausibility of negative examples (typically edges in the graph with a
node or edge label changed such that they are no longer in the graph) according to the given scoring
function. The resulting embeddings can then be seen as models learnt through self-supervision
that encode (latent) features of the graph, mapping input edges to output plausibility scores.
Embeddings can then be used for a number of low-level tasks involving the nodes and edge-labels
of the graph from which they were computed. First, we can use the plausibility scoring function to
assign a confidence to edges that may, for example, have been extracted from an external source
(discussed later in Section 6). Second, the plausibility scoring function can be used to complete edges
with missing nodes/edge labels for the purposes of link prediction (discussed later in Section 8);
for example, in Figure 24, we might ask which nodes in the graph are likely to complete the edge

38
Antofagasta Toconao rwo.

north of north of eA. eT.

Valparaíso west of Santiago rwo. eV. eS.

north of north of rno. eL. eC.

Licantén west of Curico

(a) Original graph (b) Relation embeddings (c) Entity embeddings

Fig. 27. Toy example of two-dimensional relation and entity embeddings learnt by TransE; the entity embed-
dings use abbreviations and include an example of vector addition to predict what is west of Antofagasta

Grey Glacier bus ?


, where – aside from Punta Arenas , which is already given – we might intuitively
expect Torres del Paine
to be a plausible candidate. Third, embedding models will typically assign
similar vectors to similar nodes and similar edge-labels, and thus they can be used as the basis of
similarity measures, which may be useful for finding duplicate nodes that refer to the same entity,
or for the purposes of providing recommendations (discussed later in Section 10).
A wide range of knowledge graph embedding techniques have been proposed [549]. Our goal here
is to provide a high-level introduction to some of the most popular techniques proposed thus far. First
we discuss translational models that adopt a geometric perspective whereby relation embeddings
translate subject entities to object entities in the low-dimensional space. We then describe tensor
decomposition models that extract latent factors approximating the graph’s structure. Thereafter
we discuss neural models that use neural networks to train embeddings that provide accurate
plausibility scores. Finally, we discuss language models that leverage existing word embedding
techniques, proposing ways of generating graph-like analogues for their expected (textual) inputs.
A more formal treatment of these models is provided in Appendix B.6.2.
5.2.1 Translational models. Translational models interpret edge labels as transformations from
subject nodes (aka the source or head) to object nodes (aka the target or tail); for example, in the
edge San Pedro bus Moon Valley , the edge label bus is seen as transforming San Pedro to Moon Valley ,
and likewise for other bus edges. The most elementary approach in this family is TransE [63].
Over all positive edges s p o , TransE learns vectors es , rp , and eo aiming to make es + rp as
close as possible to eo . Conversely, if the edge is a negative example, TransE attempts to learn a
representation that keeps es + rp away from eo . To illustrate, Figure 27 provides a toy example
of two-dimensional (𝑑 = 2) entity and relation embeddings computed by TransE. We keep the
orientation of the vectors similar to the original graph for clarity. For any edge s p o in the
original graph, adding the vectors es + rp should approximate eo . In this toy example, the vectors
correspond precisely where, for instance, adding the vectors for Licantén (eL. ) and west of (rwo. )
gives a vector corresponding to Curico (eC. ). We can use these embeddings to predict edges (among
other tasks); for example, in order to predict which node in the graph is most likely to be west of
Antofagasta (A.), by computing eA. + rwo. we find that the resulting vector (dotted in Figure 27c) is
closest to eT. , thus predicting Toconao (T.) to be the most plausible such node.
Aside from this toy example, TransE can be too simplistic; for example, in Figure 24, bus not
only transforms San Pedro to Moon Valley , but also to Arica , Calama , and so forth. TransE will, in this
case, aim to give similar vectors to all such target locations, which may not be feasible given other

39
edges. TransE will also tend to assign cyclical relations a zero vector, as the directional components
will tend to cancel each other out. To resolve such issues, many variants of TransE have been
investigated. Amongst these, for example, TransH [553] represents different relations using distinct
hyperplanes, where for the edge s p o , s is first projected onto the hyperplane of p before the
translation to o is learnt (uninfluenced by edges with other labels for s and for o ). TransR [318]
generalises this approach by projecting s and o into a vector space specific to p, which involves
multiplying the entity embeddings for s and o by a projection matrix specific to p. TransD [271]
simplifies TransR by associating entities and relations with a second vector, where these secondary
vectors are used to project the entity into a relation-specific vector space. Recently, RotatE [511]
proposes translational embeddings in complex space, which allows to capture more characteristics
of relations, such as direction, symmetry, inversion, antisymmetry, and composition. Embeddings
have also been proposed in non-Euclidean space, e.g., MuRP [29] uses relation embeddings that
transform entity embeddings in the hyperbolic space of the Poincaré ball mode, whose curvature
provides more “space” to separate entities with respect to the dimensionality. For discussion of
other translational models, we refer to the survey by Wang et al. [549].
5.2.2 Tensor decomposition models. A second approach to derive graph embeddings is to apply
methods based on tensor decomposition. A tensor is a multidimensional numeric field that generalises
scalars (0-order tensors), vectors (1-order tensors) and matrices (2-order tensors) towards arbitrary
dimension/order. Tensors have become a widely used abstraction for machine learning [427].
Tensor decomposition involves decomposing a tensor into more “elemental” tensors (e.g., of lower
order) from which the original tensor can be recomposed (or approximated) by a fixed sequence of
basic operations. These elemental tensors can be viewed as capturing latent factors underlying the
information contained in the original tensor. There are many approaches to tensor decomposition,
where we will now briefly introduce the main ideas behind rank decompositions [427].
Leaving aside graphs momentarily, consider an (𝑎, 𝑏)-matrix (i.e., a 2-order tensor) C, where 𝑎 is
the number of cities in Chile, 𝑏 is the number of months in a year, and each element (C)𝑖 𝑗 denotes
the average temperature of the 𝑖 th city in the 𝑗 th month. Noting that Chile is a long, thin country
– ranging from subpolar climates in the south, to a desert climate in the north – we may find a
decomposition of C into two vectors representing latent factors – specifically x (with 𝑎 elements)
giving lower values for cities with lower latitude, and y (with 𝑏 elements), giving lower values for
months with lower temperatures – such that computing the outer product24 of the two vectors
approximates C reasonably well: x ⊗ y ≈ C. In the (unlikely) case that there exist vectors x and y
such that C is precisely the outer product of two vectors (x ⊗ y = C) we call C a rank-1 matrix;
we can then precisely encode C using 𝑎 + 𝑏 values rather than 𝑎 × 𝑏 values. Most times, however,
to get precisely C, we will need to sum multiple rank-1 matrices, where the rank 𝑟 of C is the
minimum number of rank-1 matrices that need to be summed to derive precisely C, such that
x1 ⊗ y1 + . . . x𝑟 ⊗ y𝑟 = C. In the temperature example, x2 ⊗ y2 might correspond to a correction for
altitude, x3 ⊗ y3 for higher temperature variance further south, etc. A (low) rank decomposition
of a matrix then sets a limit 𝑑 on the rank and computes the vectors (x1, y1, . . . , x𝑑 , y𝑑 ) such that
x1 ⊗ y1 + . . . + x𝑑 ⊗ y𝑑 gives the best 𝑑-rank approximation of C. Noting that to generate 𝑛-order
tensors we need to compute the outer product of 𝑛 vectors, we can generalise this idea towards low
rank decomposition of tensors; this method is called Canonical Polyadic (CP) decomposition [236].
For example, we might have a 3-order tensor C containing monthly temperatures for Chilean cities
at four different times of day, which could be approximated with x1 ⊗ y1 ⊗ z1 + . . . x𝑑 ⊗ y𝑑 ⊗ z𝑑
(e.g., x1 might be a latitude factor, y1 a monthly variation factor, and z1 a daily variation factor,
24 The outer product of two (column) vectors x of length 𝑎 and y of length 𝑏, denoted x ⊗ y, is defined as xyT , yielding an
(𝑎, 𝑏)-matrix M such that (M)𝑖 𝑗 = (x)𝑖 · (y) 𝑗 . Analogously, the outer product of 𝑘 vectors is a 𝑘-order tensor.

40
y1 z1 y𝑑 z𝑑
A. C. L. S. T. V.
north of
west of
A. 0 0 0 0 0 0
C. 0 0 0 0 0 0
L.
S.
0
0
1
0
0
0
0
0
0
0
0
0 ≈ x1
⊗ +...+ x𝑑

T. 0 0 0 0 0 0
V. 0 0 0 1 0 0

G ≈ x1 ⊕ y1 ⊕ z1 +...+ x𝑑 ⊕ y𝑑 ⊕ z𝑑

Fig. 28. Abstract illustration of a CP 𝑑-rank decomposition of a tensor representing the graph of Figure 27a

and so on). Various algorithms then exist to compute (approximate) CP decompositions, including
Alternating Least Squares, Jennrich’s Algorithm, and the Tensor Power method [427].
Returning to graphs, similar principles can be used to decompose a graph into vectors, thus
yielding embeddings. In particular, a graph can be encoded as a one-hot 3-order tensor G with
|𝑉 | × |𝐿| × |𝑉 | elements, where the element (G)𝑖 𝑗𝑘 is set to one if the 𝑖 th node links to the 𝑘 th
node with an edge having the 𝑗 th label, or zero otherwise. As previously mentioned, such a tensor
will typically be very large and sparse, where rank decompositions are thus applicable. A CP
decomposition [236] would compute a sequence of vectors (x1, y1, z1, . . . , x𝑑 , y𝑑 , z𝑑 ) such that
x1 ⊗ y1 ⊗ z1 + . . . + x𝑑 ⊗ y𝑑 ⊗ z𝑑 ≈ G.
We illustrate
this scheme in Figure 28. Letting X, Y, Z denote
the matrices formed by x1 · · · x𝑑 , y1 · · · y𝑑 , z1 · · · z𝑑 , respectively, with each vector forming
a column of the corresponding matrix, we could then extract the 𝑖 th row of Y as an embedding
for the 𝑖 th relation, and the 𝑗 th rows of X and Z as two embeddings for the 𝑗 th entity. However,
knowledge graph embeddings typically aim to assign one vector to each entity.
DistMult [568] is a seminal method for computing knowledge graph embeddings based on rank
decompositions, where each entity and relation is associated with a vector of dimension 𝑑, such
Í
that for an edge s p o , a plausibility scoring function 𝑑𝑖=1 (es )𝑖 (rp )𝑖 (eo )𝑖 is defined, where (es )𝑖 ,
(rp )𝑖 and (eo )𝑖 denote the 𝑖 th elements of vectors es , rp , eo , respectively. The goal, then, is to learn
vectors for each node and edge label that maximise the plausibility of positive edges and minimise
the plausibility of negative edges. This approach equates to a CP decomposition of the graph tensor
G, but where entities have one vector that is used twice: x1 ⊗ y1 ⊗ x1 + . . . + x𝑑 ⊗ y𝑑 ⊗ x𝑑 ≈ G. A
weakness of this approach is that per the scoring function, the plausibility of s p o will always
be equal to that of o p s ; in other words, DistMult does not consider edge direction.
Rather than use a vector as a relation embedding, RESCAL [386] uses a matrix, which allows for
combining values from es and eo across all dimensions, and thus can capture (e.g.) edge direction.
However, RESCAL incurs a higher cost in terms of space and time than DistMult. HolE [385] uses
vectors for relation and entity embeddings, but proposes to use the circular correlation operator –
which takes sums along the diagonals of the outer product of two vectors – to combine them. This
operator is not commutative, and can thus consider edge direction. ComplEx [526], on the other
hand, uses a complex vector (i.e., a vector containing complex numbers) as a relational embedding,
which similarly allows for breaking the aforementioned symmetry of DistMult’s scoring function
while keeping the number of parameters low. SimplE [283] rather proposes to compute a standard
CP decomposition computing two initial vectors for entities from X and Z and then averaging
terms across X, Y, Z to compute the final plausibility scores. TuckER [30] employs a different
type of decomposition – called a Tucker Decomposition [528], which computes a smaller “core”
tensor T and a sequence of three matrices A, B and C, such that G ≈ T ⊗ A ⊗ B ⊗ C – where

41
entity embeddings are taken from A and C, while relation embeddings are taken from B. Of these
approaches, TuckER [30] currently provides state-of-the-art results on standard benchmarks.
5.2.3 Neural models. A limitation of the previously discussed approaches is that they assume
either linear (preserving addition and scalar multiplication) or bilinear (e.g., matrix multiplication)
operations over embeddings to compute plausibility scores. A number of approaches rather use
neural networks to learn embeddings with non-linear scoring functions for plausibility.
One of the earliest proposals of a neural model was Semantic Matching Energy (SME) [192],
which learns parameters (aka weights: w, w′) for two functions – 𝑓w (es, rp ) and 𝑔w′ (eo, rp ) – such
that the dot product of the result of both functions – 𝑓w (es, rp ) · 𝑔w′ (eo, rp ) – gives the plausibility
score. Both linear and bilinear variants of 𝑓w and 𝑔w′ are proposed. Another early proposal was
Neural Tensor Networks (NTN) [488], which rather proposes to maintain a tensor W of internal
weights, such that the plausibility score is computed by a complex function that combines the outer
product es ⊗ W ⊗ eo with a standard neural layer over es and eo , which in turn is combined with
rp , to produce a plausibility score. The use of the tensor W results in a high number of parameters,
which limits scalability [549]. Multi Layer Perceptron (MLP) [131] is a simpler model, where es , rp
and eo are concatenated and fed into a hidden layer to compute the plausibility score.
A number of more recent approaches have proposed using convolutional kernels in their models.
ConvE [127] proposes to generate a matrix from es and rp by “wrapping” each vector over several
rows and concatenating both matrices. The concatenated matrix serves as the input for a set of
(2D) convolutional layers, which returns a feature map tensor. The feature map tensor is vectorised
and projected into 𝑑 dimensions using a parametrised linear transformation. The plausibility score
is then computed based on the dot product of this vector and eo . A disadvantage of ConvE is
that by wrapping vectors into matrices, it imposes an artificial two-dimensional structure on the
embeddings. HypER [28] is a similar model using convolutions, but avoids the need to wrap vectors
into matrices. Instead, a fully connected layer (called the “hypernetwork”) is applied to rp and used
to generate a matrix of relation-specific convolutional filters. These filters are applied directly to
es to give a feature map, which is vectorised. The same process is then applied as in ConvE: the
resulting vector is projected into 𝑑 dimensions, and a dot product applied with eo to produce the
plausibility score. The resulting model is shown to outperform ConvE on standard benchmarks [28].
The presented approaches strike different balances in terms of expressivity and the number of
parameters that need to be trained. While more expressive models, such as NTN, may better fit
more complex plausibility functions over lower dimensional embeddings by using more hidden
parameters, simpler models, such as that proposed by Dong et al. [131], and convolutional net-
works [28, 127] that enable parameter sharing by applying the same (typically small) kernels over
different regions of a matrix, require handling fewer parameters overall and are more scalable.
5.2.4 Language models. Embedding techniques were first explored as a way to represent natural
language within machine learning frameworks, with word2vec [352] and GloVe [408] being two
seminal approaches. Both approaches compute embeddings for words based on large corpora of text
such that words used in similar contexts (e.g., “frog”, “toad”) have similar vectors. Word2vec uses
neural networks trained either to predict the current word from surrounding words (continuous
bag of words), or to predict the surrounding words given the current word (continuous skip-gram).
GloVe rather applies a regression model over a matrix of co-occurrence probabilities of word pairs.
Embeddings generated by both approaches are widely used in natural language processing tasks.
Another approach for graph embeddings is thus to leverage proven approaches for language
embeddings. However, while a graph consists of an unordered set of sequences of three terms
(i.e., a set of edges), text in natural language consists of arbitrary-length sequences of terms (i.e.,
sentences of words). Along these lines, RDF2Vec [441] performs (biased [95]) random walks on

42
the graph and records the paths (the sequence of nodes and edge labels traversed) as “sentences”,
which are then fed as input into the word2vec [352] model. An example of such a path extracted
from Figure 24 might be, for example, San Pedro bus Calama flight Iquique flight Santiago , where
the paper experiments with 500 paths of length 8 per entity. RDF2Vec also proposes a second mode
where sequences are generated for nodes from canonically-labelled sub-trees of which they are a
root node, where the paper experiments with sub-trees of depth 1 and 2. Conversely, KGloVe [96]
is based on the GloVe model. Much like how the original GloVe model [408] considers words that
co-occur frequently in windows of text to be more related, KGloVe uses personalised PageRank25 to
determine the most related nodes to a given node, whose results are then fed into the GloVe model.
5.2.5 Entailment-aware models. The embeddings thus far consider the data graph alone. But what
if an ontology or set of rules is provided? Such deductive knowledge could be used to improve the
embeddings. One approach is to use constraint rules to refine the predictions made by embeddings;
for example, Wang et al. [550] use functional and inverse-functional definitions as constraints
(under UNA) such that, for example, if we define that an event can have at most one value for
venue, this is used to lower the plausibility of edges that would assign multiple venues to an event.
More recent approaches rather propose joint embeddings that consider both the data graph and
rules when computing embeddings. KALE [207] computes entity and relation embeddings using a
translational model (specifically TransE) that is adapted to further consider rules using t-norm fuzzy
logics. With reference to Figure 24, consider a simple rule ?x bus ?y ⇒ ?x connects to ?y . We can
use embeddings to assign plausibility scores to new edges, such as 𝑒 1 : Piedras Rojas bus Moon Valley .
We can further apply the previous rule to generate a new edge 𝑒 2 : Piedras Rojas connects to Moon Valley
from the predicted edge 𝑒 1 . But what plausibility should we assign to this second edge? Letting 𝑝 1
and 𝑝 2 be the current plausibility scores of 𝑒 1 and 𝑒 2 (initialised using the standard embedding), then
t-norm fuzzy logics suggests that the plausibility be updated as 𝑝 1𝑝 2 − 𝑝 1 + 1. Embeddings are then
trained to jointly assign larger plausibility scores to positive examples versus negative examples
of both edges and ground rules. An example of a positive ground rule based on Figure 24 would
be Arica bus San Pedro ⇒ Arica connects to San Pedro . Negative ground rules randomly replace the
relation in the head of the rule; for example, Arica bus San Pedro ⇏ Arica flight San Pedro . Guo
et al. [208] later propose RUGE, which uses a joint model over ground rules (possibly soft rules
with confidence scores) and plausibility scores to align both forms of scoring for unseen edges.
Generating ground rules can be costly. An alternative approach, called FSL [125], observes that
in the case of a simple rule, such as ?x bus ?y ⇒ ?x connects to ?y , the relation embedding
bus should always return a lower plausibility than connects to. Thus, for all such rules, FSL
proposes to train relation embeddings while avoiding violations of such inequalities. While relatively
straightforward, FSL only supports simple rules, while KALE also supports more complex rules.
These works are interesting examples of how deductive and inductive forms of knowledge – in
this case rules and embeddings – can interplay and complement each other.

5.3 Graph Neural Networks


While embeddings aim to provide a dense numerical representation of graphs suitable for use within
existing machine learning models, another approach is to build custom machine learning models
adapted for graph-structured data. Most custom learning models for graphs are based on (artificial)
neural networks [559], exploiting a natural correspondence between both: a neural network already
corresponds to a weighted, directed graph, where nodes serve as artificial neurons, and edges serve
25 Intuitively
speaking, personalised PageRank starts at a given node and then determines the probability of a random walk
being at a particular node after a given number of steps. A higher number of steps converges towards standard PageRank
emphasising global node centrality, while a lower number emphasises proximity/relatedness to the starting node.

43

You might also like