Long-form triples are tidy datasets with an explicit row (observation) identifier among the columns.
JSON | object | property | value |
spreadsheet | row id | column name | cell |
data.frame | key | variable | measurement |
data.frame | key | attribute | value |
RDF | subject | predicate | object |
Table source: rdflib
Let’s take a small subset of the iris_dataset()
, which
is the semantically enriched version of the base R iris
dataset. Limiting the the dataset to the top 3 rows, we have exactly 2 x
5 = 10 data cells.
head(iris_dataset, 2)
#> Anderson E (1935). "Iris Dataset [subset]."
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> iris:o1 5.1 3.5 1.4 0.2 setosa
#> iris:o2 4.9 3.0 1.4 0.2 setosa
#> Further metadata: describe(x)
xsd_convert(head(iris_dataset, 2))
#> Anderson E (1935). "Iris Dataset [subset]."
#> Sepal.Length Sepal.Width Petal.Length
#> iris:o1 "5.1"^^<xs:decimal> "3.5"^^<xs:decimal> "1.4"^^<xs:decimal>
#> iris:o2 "4.9"^^<xs:decimal> "3"^^<xs:decimal> "1.4"^^<xs:decimal>
#> Petal.Width Species
#> iris:o1 "0.2"^^<xs:decimal> "setosa"^^<xs:string>
#> iris:o2 "0.2"^^<xs:decimal> "setosa"^^<xs:string>
#> Further metadata: describe(x)
Let us arrange this to subject-predicate-object triples.
iris_triples <- dataset_to_triples(xsd_convert(head(iris_dataset,2)))
iris_triples
#> s p o
#> 1 iris:o1 Sepal.Length "5.1"^^<xs:decimal>
#> 2 iris:o2 Sepal.Length "4.9"^^<xs:decimal>
#> 3 iris:o1 Sepal.Width "3.5"^^<xs:decimal>
#> 4 iris:o2 Sepal.Width "3"^^<xs:decimal>
#> 5 iris:o1 Petal.Length "1.4"^^<xs:decimal>
#> 6 iris:o2 Petal.Length "1.4"^^<xs:decimal>
#> 7 iris:o1 Petal.Width "0.2"^^<xs:decimal>
#> 8 iris:o2 Petal.Width "0.2"^^<xs:decimal>
#> 9 iris:o1 Species "setosa"^^<xs:string>
#> 10 iris:o2 Species "setosa"^^<xs:string>
We receive 2x5 = 10 rows; each with an identifier. The identifiers
are made from row.names()
, and we have exactly 5 statements
about the first observation (iris:o1
), and 5 statements
about the second (iris:o2
). Each statement simply states
the observed value.
iris_triples$p <- paste0("iris:", iris_triples$p)
iris_triples
#> s p o
#> 1 iris:o1 iris:Sepal.Length "5.1"^^<xs:decimal>
#> 2 iris:o2 iris:Sepal.Length "4.9"^^<xs:decimal>
#> 3 iris:o1 iris:Sepal.Width "3.5"^^<xs:decimal>
#> 4 iris:o2 iris:Sepal.Width "3"^^<xs:decimal>
#> 5 iris:o1 iris:Petal.Length "1.4"^^<xs:decimal>
#> 6 iris:o2 iris:Petal.Length "1.4"^^<xs:decimal>
#> 7 iris:o1 iris:Petal.Width "0.2"^^<xs:decimal>
#> 8 iris:o2 iris:Petal.Width "0.2"^^<xs:decimal>
#> 9 iris:o1 iris:Species "setosa"^^<xs:string>
#> 10 iris:o2 iris:Species "setosa"^^<xs:string>
vignette_temp_file <- file.path(tempdir(), "example_ttl.ttl")
dataset_ttl_write(dataset_to_triples(iris_triples),
file_path = vignette_temp_file)
We see a standard metadata file expressed in the Turtle language. The
definitions are separated with a # -- Observations ------
comment from the actual statements about the dataset.
# Only first 23 lines are read and printed:
readLines(vignette_temp_file, n = 23)
#> [1] "@prefix owl: <http://www.w3.org/2002/07/owl#> ."
#> [2] "@prefix qb: <http://purl.org/linked-data/cube#> ."
#> [3] "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
#> [4] "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> ."
#> [5] "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> ."
#> [6] ""
#> [7] "# -- Observations -----------------------------------------"
#> [8] ""
#> [9] "1 a qb:Observation ;"
#> [10] " s iris:o1 ;"
#> [11] " p iris:Sepal.Length ;"
#> [12] " o \"5.1\"^^<xs:decimal> ;"
#> [13] " ."
#> [14] "2 a qb:Observation ;"
#> [15] " s iris:o2 ;"
#> [16] " p iris:Sepal.Length ;"
#> [17] " o \"4.9\"^^<xs:decimal> ;"
#> [18] " ."
#> [19] "3 a qb:Observation ;"
#> [20] " s iris:o1 ;"
#> [21] " p iris:Sepal.Width ;"
#> [22] " o \"3.5\"^^<xs:decimal> ;"
#> [23] " ."
If we would try to parse this file with a ttl-reader, we would get an error message, because not all statements are well-defined.
The Turtle prefix statements define the abbreviations of the following namespaces:
readLines(vignette_temp_file, n = 5)
#> [1] "@prefix owl: <http://www.w3.org/2002/07/owl#> ."
#> [2] "@prefix qb: <http://purl.org/linked-data/cube#> ."
#> [3] "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
#> [4] "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> ."
#> [5] "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> ."
owl
, rdf
and rdfs
.character
, integer
, or
Date
are correctly represented in web documents.The prefix makes the ttl
Turle-file future-proof: before
explaining the semantics of the data, it contains all the definitions
that are needed to understand the explanation. It is a dictionary; every
elements of the vocabulary that are needed to explain the iris dataset
should be here. This means that we must define the iris
prefix, too.
These definitions can be found in the
data("dataset_namespace")
dataset. we only need to add the
definitions ourselves that is unique about our own dataset, in this
case, the definitions of the variables of the iris dataset, i.e., the
iris
namespace:
The dataset_namespace
data file contains some often used
vocabularies and their prefixes. Let us select owl:
,
rdf:
, rdfs:
, qb:
and add
iris:
as <<www.example.com/iris#>>
(the example.com domain is reserved by the World Wide Web
consortium for documentation and tutorial examples.)
used_prefixes <- which(dataset_namespace$prefix %in% c(
"owl:", "rdf:", "rdfs:", "qb:", "xsd:")
)
vignette_namespace <- rbind(
dataset_namespace[used_prefixes, ],
data.frame (prefix = "iris:",
uri = '<www.example.com/iris#>')
)
vignette_namespace
#> prefix uri
#> 6 owl: <http://www.w3.org/2002/07/owl#>
#> 7 qb: <http://purl.org/linked-data/cube#>
#> 8 rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#> 9 rdfs: <http://www.w3.org/2000/01/rdf-schema#>
#> 20 xsd: <http://www.w3.org/2001/XMLSchema#>
#> 1 iris: <www.example.com/iris#>
Let us overwrite the earlier ttl file, but this time defining the
variables and observations with the iris:
prefix:
dataset_ttl_write(
iris_triples,
ttl_namespace = vignette_namespace,
file_path = vignette_temp_file,
overwrite = TRUE)
readLines(vignette_temp_file, n = 23)
#> [1] "@prefix owl: <http://www.w3.org/2002/07/owl#> ."
#> [2] "@prefix qb: <http://purl.org/linked-data/cube#> ."
#> [3] "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
#> [4] "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> ."
#> [5] "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> ."
#> [6] "@prefix iris: <www.example.com/iris#> ."
#> [7] ""
#> [8] "# -- Observations -----------------------------------------"
#> [9] ""
#> [10] "iris:o1 a qb:Observation ;"
#> [11] " iris:Sepal.Length \"5.1\"^^<xs:decimal> ;"
#> [12] " iris:Sepal.Width \"3.5\"^^<xs:decimal> ;"
#> [13] " iris:Petal.Length \"1.4\"^^<xs:decimal> ;"
#> [14] " iris:Petal.Width \"0.2\"^^<xs:decimal> ;"
#> [15] " iris:Species \"setosa\"^^<xs:string> ;"
#> [16] " ."
#> [17] "iris:o2 a qb:Observation ;"
#> [18] " iris:Sepal.Length \"4.9\"^^<xs:decimal> ;"
#> [19] " iris:Sepal.Width \"3\"^^<xs:decimal> ;"
#> [20] " iris:Petal.Length \"1.4\"^^<xs:decimal> ;"
#> [21] " iris:Petal.Width \"0.2\"^^<xs:decimal> ;"
#> [22] " iris:Species \"setosa\"^^<xs:string> ;"
#> [23] " ."
RDFLib is a pure Python package for working with RDF with RDF serialisation parsers, store implementations, graph interface and a SPARQL query and update implementation. It has an excellent R binding, the rdflib package1.
In this section we show how to work further with our future-proof
datasets. We parse the ttl
file created with the dataset
package into a triplestore:
require(rdflib)
example_rdf <- rdf_parse(vignette_temp_file, format = "turtle")
example_rdf
#> Total of 12 triples, stored in hashes
#> -------------------------------
#> <file:///www.example.com/iris#o2> <file:///www.example.com/iris#Species> "setosa"^^<xs:string> .
#> <file:///www.example.com/iris#o2> <file:///www.example.com/iris#Petal.Length> "1.4"^^<xs:decimal> .
#> <file:///www.example.com/iris#o1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#Observation> .
#> <file:///www.example.com/iris#o2> <file:///www.example.com/iris#Petal.Width> "0.2"^^<xs:decimal> .
#> <file:///www.example.com/iris#o2> <file:///www.example.com/iris#Sepal.Length> "4.9"^^<xs:decimal> .
#> <file:///www.example.com/iris#o1> <file:///www.example.com/iris#Sepal.Length> "5.1"^^<xs:decimal> .
#> <file:///www.example.com/iris#o1> <file:///www.example.com/iris#Petal.Width> "0.2"^^<xs:decimal> .
#> <file:///www.example.com/iris#o1> <file:///www.example.com/iris#Petal.Length> "1.4"^^<xs:decimal> .
#> <file:///www.example.com/iris#o2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/linked-data/cube#Observation> .
#> <file:///www.example.com/iris#o1> <file:///www.example.com/iris#Sepal.Width> "3.5"^^<xs:decimal> .
#>
#> ... with 2 more triples
And define a simple SPARQL query on the data:
sparql <-
'PREFIX iris: <www.example.com/iris#>
SELECT ?observation ?value
WHERE { ?observation iris:Sepal.Length ?value . }'
rdf_query(example_rdf, sparql)
#> # A tibble: 2 × 2
#> observation value
#> <chr> <dbl>
#> 1 file:///www.example.com/iris#o2 4.9
#> 2 file:///www.example.com/iris#o1 5.1
Convert, for example, to JSON-LD
format…:
temp_jsonld_file <- file.path(tempdir(), "example_jsonld.json")
rdf_serialize(rdf=example_rdf, doc = temp_jsonld_file, format = "jsonld")
… and read in the first 12 lines:
readLines(temp_jsonld_file, 12)
#> [1] "{"
#> [2] " \"@graph\": ["
#> [3] " {"
#> [4] " \"@id\": \"file:///www.example.com/iris#o1\","
#> [5] " \"@type\": \"http://purl.org/linked-data/cube#Observation\","
#> [6] " \"file:///www.example.com/iris#Petal.Length\": {"
#> [7] " \"@type\": \"xs:decimal\","
#> [8] " \"@value\": \"1.4\""
#> [9] " },"
#> [10] " \"file:///www.example.com/iris#Petal.Width\": {"
#> [11] " \"@type\": \"xs:decimal\","
#> [12] " \"@value\": \"0.2\""
Carl Boettiger: A tidyverse lover’s intro to RDF↩︎