Sunday, June 23, 2013

RDF Investigations

First in a series of articles exploring the True Nature(s) of RDF.

"Why, oh why?" you cry.  Mainly because it's fun; it inevitably leads to investigation of lots of very interesting topics not only in logic but in philosophy, computer science, and linguistics.

But also because there is at least a slim chance that it may prove useful.  Actually I think there's a very good chance.  It may not prove useful for implementers who have already mastered the official definition(s); but for the newcomer, the official docs are dauntingly complex, obscure, quasi-incoherent, and generally hard to read.  On top of that, the reigning idiom in which the so-called "semantic web" is discussed is largely, by turns, vague, counter-intuitive, sloppy, propagandistic, ... [your adjective here].  A careful investigation of RDF might well turn up language that eliminates such problems for the general reader.

The goal here is not to provide a yet another Guide to RDF.  On the contrary: the first question to ask is whether the official definition of RDF, not to mention prevalent informal ways of talking about it, are adequate to the task. At the center of RDF is a collection of concepts - triples, graphs, statements, inference, etc., which the official definition regiments into a quasi-formal definition.  One question is whether this official definition adequately captures the intuitive understandings that lead to RDF in the first place.  It would be a mistake to take the official definition of this collection of ideas as the only or even the best way of thinking about them, just as it would be a mistake to think that one particular logical formalism (e.g. classical, intuitionistic, game-theoretical, quantum, etc.) is the only or best way of capturing the essential properties of logical consequence or logical truth.  There are always multiple ways to think about things; and sometimes, adoption of one way over another has real consequences.

In fact, careful investigation of RDF will show that the official definition is inadequate in two fundamental ways.  First, it restricts interpretation to a single semantic domain.  This is unnecessary; there are many different ways of thinking about RDF - concrete "theories" of RDF - that can serve as the basis of models of RDF, just as there are many different concrete groups that can serve as models of formalized group theory.  Furthermore, restricting interpretation to a single specific theory of RDF obscures the point of model theory, which is fundamentally concerned with how a fixed language can be used to generalize about aspects of structure common to a variety of concrete mathematical domains.

The second point of inadequacy is the lack of a complete RDF calculus.  For model theory to work, you need three separate calculi, which together comprise the calculus of the language: rules for constructing formal proofs (deductions).

  • term calculus specifies how terms are to be constructed from elementary symbols.  For RDF, this is provided by the rules of IRI syntax plus RDF-specific rules for the syntax of literals.
  • formula calculus specifies how formulae are to be constructed from terms.  In RDF, a formula is called a statement, and a set of formulae is called a graph.  The so-called abstract syntax described  in RDF Concepts and Abstract Syntax serves as the formula calculus, but it is incomplete.  It specifies that a triple (statement) "contains" three terms (nodes), and that an RDF graph is "a set of triples".  But these are not rules of a calculus; they do not tell us how to construct statements in a formal language.
  • An inferential calculus specifies how proofs or deductions are to be constructed from formulae and symbols.  The official definition of RDF does not specify such a calculus.
Lack of a complete calculus means that the model-theoretic interpretation of RDF in RDF Semantics is essentially incomplete.  It does provide an account of semantic entailment (not to be confused with logical entailment), but in the absence a calculus we can use to construct formal statements and inferences (deductions), we have no means of making use of such entailments.  The business of model theory is to build a bridge between formal calculi and (informal) semantic domains.  You don't need a formal representation of the semantic domain, but you do need a formal calculus.  Viewed from the computational rather than mathematical perspective, this is the point of model theory, since it makes automated proof a legitimate idea.  No model without a calculus.

This analysis suggests a radically different strategy for defining RDF: start with a syntactic calculus, and demonstrate several different models for the sentences generated by the calculus.  The result is a reconceptualization of RDF as a structure common to several concrete mathematical theories.  The practical consequence is that implementers are freed from the need to select a particular model; they can concentrate solely on the syntax.  Users too can dispense with the need to understand a theory of RDF, and can focus on understanding the deductions licensed by the calculus.  In particular, it is not necessary to conceptualize RDF as a graph, nor is it necessary to think of the middle term of a triple as denoting a relation between the first and third terms.

That's just the formal side of RDF; a distinct but related investigation will address the notion that RDF "statements" can be construed as assertions, and that IRIs denote real-world entities.

Program

  1. Formalize the official definition of RDF.  This means both defining a calculus (syntax), and describing the semantic aspect of RDF as a mathematical domain.
  2. Investigate alternative concrete RDF theories.  Some possibilities:
    1. Set theoretic with binary relations.  This is the domain used in the official RDF semantics, which treats the first and third elements of an RDF statement as members of a set of Resources, the second term as a member of a set of Properties, together with a mapping from the Property to a relation containing the pair of Resources.  Entailment comes automatically, from the axioms of set theory.  No 3-tuples are defined as part of the semantic domain.
    2. Set theoretic with cartesian 3-tuples and entailment axioms.  This would dispense with binary relations and account for entailments axiomatically.
    3. Category theoretic, with triples represented as product constructions in the category RDF whose objects are Nodes (or Resources, or whatever).  Here a triple is a triple of arrows rather than objects.  Entailments are expressed as axiomatic commutative diagrams.
  3. Investigate alternative calculi.  Investigate ways to formalize the rdf: and rdfs: constants defined in RDF Schema; what axioms are needed, and what form should they take?
  4. Other?
  5. Investigate what it means to say that IRIs and RDF statements have any sort of meaning beyond the mathematical meanings investigated above.  What is it for an IRI to denote anything at all, let alone a real-world entity?  What is the relation between the idea that an RDF triple amounts to a statement, that it is asserted?  How are such meanings related to the pragmatics of how the web is actually used?  Etc.  This is a very big can of worms.




No comments:

Post a Comment