Interacting with RDF data – Linked Data for Professional Education

0 to 60 on SPARQL Queries in 50 Minutes

Sun, 06 May 2018 02:22:48 +0000

This webinar provides an introduction to SPARQL, a query language for RDF. Users will gain hands on experience crafting queries, starting simply, but evolving in complexity. These queries will focus on coinage data in the SPARQL endpoint hosted by http://nomisma.org with numismatic concepts defined in a SKOS-based thesaurus and physical specimens from three major museum collections (American Numismatic Society, British Museum, and Münzkabinett of the Staatliche Museen zu Berlin) linked to these concepts. Results generated from these queries in the form of CSV may be imported directly into Google Fusion Tables for immediate visualization in the form of charts and maps. Links to all resources necessary to practice the queries covered are available online and referenced from the webinar presentation.

The webinar uses the following SPARQL constructs: SELECT, WHERE, LIMIT, ORDER BY (ASC), FILTER, DISTINCT, regex(), count(), lang(), langMatches().

Access to all SPARQL queries used in the webinar are available at http://numishare.blogspot.com/2015_05_01_archive.

URL: https://www.asist.org/events/webinars/dcmi-webinar-from-0-to-60-on-sparql-queries-in-50-minutes/
Keywords: visualization (map geolocations), Google Fusion Tables, visualization (statistics), geographic queries, UNION, OPTIONAL, sorting, filtering, SPARQL syntax
Author: Ethan Gruber
Publisher: DCMI
Date created: 2014-01-01 07:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P1H00M
Educational use: instruction
Educational audience: student
Interactivity type: mixed

Eurostat Linked Data

Fri, 18 Aug 2017 08:22:56 +0000

his is a Linked Data version of the Eurostat data with the goal to provide 5 star Linked Open Data on the European level, in a contextually rich and up-to-date manner, useful for ETL-style business analysis or data warehousing purposes with benefits including but not limited to: It allows for a straight-forward comparison of statistical indicators across EU countries; Through providing context for statistics it facilitates the interpretation process; Enables one to re-use observations in a fine-grained way. A SPARQL endpoint allows the user to query the entire metadata, including DSDs and dictionaries. Contains SPARQL queries kindly provided by Søren Roug from the European Environment Agency (EEA).

URL: http://eurostat.linked-statistics.org/
Keywords: Extract, Transform, Load (ETL), 5-Star Linked Open Data, SPARQL endpoint, Dataset, RDF
Author: Roug, Søren
Publisher: DERI
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P1H
Educational use: professionalDevelopment
Educational audience: professional
Interactivity type: mixed

URI design for RDF conversion of CSV-based data

Fri, 18 Aug 2017 08:22:56 +0000

The sharing of comma-separated files is very common, and a common set of problems arise upon the receipt and processing of data within this format. The RDF vocabulary described in this document provides interpretation instructions to a tool that converts tabular based data into RDF appropriate for linked data publishing. Links to a page which describes a Java-based tool that follows the interpretation vocabulary described here and provides a set of shell script tools to facilitate automatic conversion.

URL: https://data-gov.tw.rpi.edu/wiki/URI_design_for_RDF_conversion_of_CSV-based_data
Keywords: CSV (Comma Separated Values), HTTP URIs, VoID (Vocabulary of Interlinked Datasets)
Author: Williams, Gregory Todd
Publisher: Tetherless World Constellation
Date created: 2011-01-21 05:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P45M

Tarql: SPARQL for Tables

Fri, 18 Aug 2017 08:22:56 +0000

Tarql is a command-line tool for converting CSV files to RDF using SPARQL 1.1 syntax. In short, a CSV file’s contents are input into a SPARQL query as a table of bindings. This allows manipulation of CSV data using the full power of SPARQL 1.1 syntax, and in particular the generation of RDF using CONSTRUCT queries. Includes design patterns and examples. Discusses how to deal with variations in header rows, delimiters, quotes and character encoding encountered in CSV/TSV files.

URL: http://tarql.github.io/
Keywords: CSV (Comma Separated Values), SPARQL CONSTRUCT, SPARQL OFFSET, Apache Jena, Java (programming language)
Author: Cyganiak, Richard
Date created: 2017-06-22 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P45M

Using the Convert CSV to RDF ingest tool

Fri, 18 Aug 2017 08:22:56 +0000

This guide will walk through the use of the Convert CSV to RDF tool, a semi-automated method of converting comma separated or tab separated text files into RDF that can be displayed in VIVO. These files should include one row of data per record (e.g., a person or publication) and represent the fields or properties associated with each record in separate columns within the row, much as the values appear in a spreadsheet. The most common pattern of loading CSV files involves one CSV file per type of data to be loaded. Note, the current ingest tools involve working through a number of steps from original source data files to the appearance of new data in VIVO. The process requires some understanding of semantic web data modeling and some training.

URL: https://wiki.duraspace.org/display/VIVODOC19x/Using+the+Convert+CSV+to+RDF+ingest+tool
Keywords: CSV (Comma Separated Values), VIVO Ontology for Researcher Discovery, SPARQL
Author: Gross, Benjamin
Publisher: Duraspace
Date created: 2017-03-16 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P30M
Educational use: instruction
Educational audience: teacher-educationSpecialist
Interactivity type: mixed

Linked Statistical Data Analysis

Sun, 13 Aug 2017 08:18:11 +0000

Linked Data design principles are increasingly employed to publish and consume high-fidelity, heterogeneous statistical datasets in a distributed fashion. While vast amounts of linked statistics are available, access and reuse of the data is subject to expertise in corresponding technologies. There exists no user-centered interfaces for researchers, journalists and interested people to compare statistical data retrieved from different sources on the Web. Given that the RDF Data Cube vocabulary is used to describe statistical data, its use makes it possible to discover and identify statistical data artifacts in a uniform way. In this article, the design and implementation of a user-centric application and service is presented. Behind the scene, the platform utilizes federated SPARQL queries to gather statistical data from distributed data stores. The R language for statistical computing is employed to perform statistical analyses and visualizations. The Shiny application and server bridges the front-end Web user interface with R on the server-side in order to compare statistical macrodata, and stores analyses results in RDF for future research. As a result, distributed linked statistics with accompanying provenance data can be more easily explored and analysed by interested parties.

URL: http://csarven.ca/linked-statistical-data-analysis
Keywords: Data analysis, R (programming language), Shiny server, Apache Jena, Federated queries
Author: Riedl, Reinhard
Publisher: CEUR (Central Europe Workshop Proceedings)
Date created: 2013-07-07 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P1H
Educational use: professionalDevelopment
Educational audience: professional
Interactivity type: expositive

Jena Full Text Search

Sun, 13 Aug 2017 08:18:11 +0000

This documentation explains how to configure and use the Full Text extension to Apache Jena's ARQ (the module is included in Fuseki). The extension combines SPARQL and full-text search via Lucene or ElasticSearch (built on Lucene). It gives applications the ability to perform indexed full-text searches within SPARQL queries. Although SPARQL allows the use of regular expressions in FILTER, this is a test on a value retrieved earlier in the query and its use is not indexed. In other words, if you're searching for occurrences of a specific term in the rdfs:label of a bunch of products, then the search will need to examine all selected rdfs:label statements and apply the regular expression to each label in turn. If there are many such statements and many such uses of regex, then it may be appropriate to consider using this extension to take advantage of the performance potential of full text indexing.

URL: http://jena.apache.org/documentation/query/text-query.html
Keywords: SPARQL, REGEX, FILTER, Apache Jena
Publisher: Apache Jena
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P2H
Educational use: professionalDevelopment

Don’t Use a Hammer to Screw in a Nail: Alternatives to REGEX in SPARQL

Sun, 13 Aug 2017 08:18:11 +0000

This brief blog post explains that regular expressions are expensive to evaluate regardless of what language you are using them in. The author suggests that if you can avoid using a regular expression in favor of a simpler string computation, then you can likely get much better performance out of your SPARQL engine. Alternative strategies include using CONTAINS, LCASE, UCASE, STRSTARTS, and STRENDS. If more complex string operations are required, full-text extensions to the SPARQL engine may be an option.

URL: http://www.cray.com/blog/dont-use-hammer-screw-nail-alternatives-regex-sparql/
Keywords: SPARQL, CONTAINS, String operations, REGEX, Jena Text
Author: Vesse, Rob
Publisher: Cray
Date created: 2014-06-03 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P10M
Educational use: instruction
Educational audience: student
Interactivity type: expositive

Equality and Inequality in SPARQL

Sun, 13 Aug 2017 08:18:11 +0000

This brief blog post discusses issues surrounding expression semantics in SPARQL. Practices that people are used to from other languages (like the use of "=" and "!="), can often be confusing to new – and even not so new- SPARQL developers. The author suggests that an awareness of type errors will allow the user to understand what is occurring. Example queries show how type errors are treated when using FILTER. An alternative method involving project expressions or BIND is proposed.

URL: http://www.cray.com/blog/equality-inequality-sparql/
Keywords: SPARQL, BIND, FILTER, Type errors, Project expressions
Author: Vesse, Rob
Publisher: Cray
Date created: 2013-04-23 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P10M
Educational use: instruction
Educational audience: student
Interactivity type: expositive

Desiderata for an authoritative Representation of MeSH in RDF

Tue, 08 Aug 2017 08:07:57 +0000

Although the Semantic Web provides a framework for the integration of resources on the web, datasets are not always made available in RDF by their producers and the Semantic Web community has had to convert some of these datasets to RDF in order for these datasets to participate in the LOD cloud. As a result, the LOD cloud sometimes contains outdated, partial and even inaccurate RDF datasets. The authors of this article review the LOD landscape for one of these resources, MeSH, and analyze the characteristics of six existing representations in order to identify desirable features for an authoritative version, for which they have created a prototype. They then illustrate the suitability of this prototype on three common use cases. NLM released an authoritative representation of MeSH in RDF (beta version) in the Fall of 2014.

URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419968/
Keywords: Medical Subject Headings (MeSH), Linked Open Data (LOD) Cloud, Linked Open Data (LOD), Semantic Web, eXtensible Markup Language (XML)
Author: Bodenreider, Olivier
Publisher: American Medical Informatics Association (AMIA)
Date created: 2014-11-14 05:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P20M