The final version of the Inventory of Learning Topics, with modifications in light of comments received, is posted here. This document presented a draft inventory of “learning topics” to be covered by learners who want to understand, process, and create Linked Data. Read more about the project history. This inventory became the basis for the Competency Index for Linked Data that underlies functionality on this site.
Understanding Linked DataLinked Data is data that fits into a “cloud” of diverse data sources — whether those sources are published world-readably on the Web (Linked Open Data) or behind a corporate or institutional firewall (Linked Enterprise Data). For the purposes of the Learning Linked Data Project, Linked Data is data published in a form compatible with the Resource Description Framework (RDF) — a Semantic Web standard of the World Wide Web Consortium (W3C). Inasmuch as RDF is a language designed for processing by machines, learners must acquire competence in the use of software tools for ingesting, visualizing, transforming, and interpreting its URI-based statements. Prerequisite to using any sort of tool, however, a learner must grasp underlying concepts such as:
- The rationale for, and application of, Linked Data
- The elements of RDF Data: properties, classes, instances
- The node-arc model of RDF Graphs, and Named Graphs
- Reading and understanding RDF Triples
- The use of Uniform Resource Identifiers (URIs) as globally unique identifiers
- How data is linked using URIs
- The Open World Assumption versus Closed World Assumption
- How triples are merged to create new RDF Graphs
- Principles of inferencing (reasoning) – see Ontologies
- Looking up RDF Vocabularies (“following one’s nose”)
- Principles of publishing linked data as Linked Data (e.g., content negotiation)
Just as a language learner must learn to ask questions and converse with native speakers, learners of Linked Data must learn how to query datasets and explore their characteristics.
- Formulating structured queries, e.g., using the SPARQL query language
- Structured Query Tool
- Search engines (e.g., Sindice, Semantic Web Search Engine)
- Assessing data and checking consistency
- Reasoners
- RDF validators (e.g., W3C RDF Validation Service)
- Discovering vocabularies
- Vocabulary discovery tools
In the Resource Description Framework (RDF), “everything is data,” from descriptions of things — both of things in the world or, more specifically, of information resources, to descriptions of the Languages of Description used to describe those things (see RDF Data). Instances, attribute spaces, and value spaces are all expressed in the same RDF language. The category “creating and manipulating data,” therefore, encompasses a broad range of topics which, in other fields, might be considered quite separate from each other:
- Creating RDF Vocabularies and minting URIs for their properties and classes
- RDF vocabulary editors
- Creating property-to-property and class-to-class links across RDF Vocabularies
- Mapping tools
- Creating SKOS Concept Schemes
- SKOS editors
- Creating a Domain Model enumerating the things to be described in a dataset
- Diagramming tools (e.g., UML and mind maps)
- Creating other types of datasets
- Data editors
- Converting triples among alternative RDF syntaxes (e.g., RDF/XML, Turtle, N-Triples, RDFa)
- Triple converters (e.g., Rapper)
- Generating RDF Triples from the content analysis of unstructured text data
- Triplifiers for full text (e.g., Calais)
- Extracting RDF Triples embedded in Web pages
- Linters
- Distillers (e.g., ,
Microdata to RDFa)
- Deriving RDF triples from non-RDF data
- Triplifiers for XML (e.g., GRDDL)
- Data cleaners (e.g., Google Refine)
Visualization plays a unique role in understanding RDF Data because RDF
Graphs are conceptually diagrammatic in nature. Because in RDF, “everything is data,” some of the tools usable for visualizing instance data may be used to visualize ontologies, while other tools may be used to explore the statistical, spatial, or temporal characteristics of datasets:
- Generating node-and-arc diagrams
- Generating a Linked Data cloud diagram
- LOD cloud generators (e.g., CKAN)
- Visually exploring statistical characteristics of large data sets
- Statistical visualization tools (e.g., Spotfire)
- Generating different visual views of data (e.g., on timelines or maps)
- Visualization tools (e.g., Simile)
Simply learning how to interpret and manipulate Linked Data could stop with the topics outlined above. The extent to which a language-lab-like platform for learning Linked Data should encompass tools for implementing Linked Data applications is an open question. Whether as part of a tool platform or merely as topics of study, however, the learner should acquire knowledge of the following:
- Publishing RDF-compatible data on the Web
- Web Frameworks (e.g., Ruby on Rails)
- Content Management Systems (e.g., Drupal)
- Storing RDF Data
- RDF Triple stores (e.g., Virtuoso)
- Relational databases and other RDF-compatible backend storage options
- Integrated tool platforms
In addition to comments on the Learning Inventory, the project invites contributions of use cases outlining possible instructional scenarios related to learning Linked Data concepts, technologies, and tools. Such practical applications can help to discover and prioritize tools to be highlighted in implementing a coherent package of instructional resources.
Stuart Sutton, CEO of the Dublin Core Metadata Initiative, has provided a use case for an introductory university course. Others may be posted as comments on this page.
SCENARIO: Introductory university course in semantic metadata
–Prerequiste knowledge and skills:
Learners should have a basic understanding of knowledge organization systems, XML, and database management. No prior knowledge of RDF or Semantic Web is assumed.
–Education or training context:
Learners should be motivated to learn the technology. Appropriate for advanced undergraduate informatics majors. Instruction — lectures and discussion — are totally online and asynchronous. A ten-week course, with roughly 20 hours of lectures and presentation and 70 hours of reading, assignments, and other activities.
–Student deliverables:
By the end of the course, students will produce serializations of RDF graphs in several syntaxes; design a domain model (class diagram); create RDF vocabularies and SKOS concept schemes; and produce RDF instance data for a student project.
–Expected learning outcomes:
Students should demonstrate a grasp of basic Linked Data and Semantic Web tools and concepts, including the principles and mechanisms for merging graphs. They should understand how to use RDF serialization syntaxes; manually draw RDF graphs; serialize frequently used N-ary triple patterns; “webify” existing controlled vocabularies. They should be able to explain the difference between the XML information set and the RDF abstract data model and demonstrate modeling skills in mapping between the two.
–Required use of tools:
Students should be able to use tools for graphically depicting domain models (class diagrams); for editing and validating RDF data; for transforming data among different RDF syntaxes; and for generating visual depictions of RDF graphs
The inventory of topics developed during the planning grant timeframe is available on the project legacy site.