Tuesday, June 25, 2013

A Simplest Triad Calculus - DRAFT

Series:  RDF Investigations

Overview

For the moment, please treat this as a kind of proof-of-concept.  It's intended to have enough detail to make the design concept clear, without bogging down too much in details at this point.  For example, I'm disregarding literals for now, because they are inessential to the basic deductive system.  I guess the basic idea is that we ought to be able to use some variation of Natural Deduction to express just about everything needed to define RDF and RDFS clearly and concisely.  Among other things this should allow us to eliminate stuff like "instance of a graph" and "lean graph" (described in RDF Semantics); the former is just not needed (being handled by derivation rules involving variables), while the latter seems to correspond to the concept of normalization, in the sense of a set (sequence) of independent triads, no one of which can be derived from the others.  To get started I'm just sketching out a basic calculus to suggest what shape the final thing would have.

Also, the notes amount to a sort of tutorial/cry-for-help in that I try to explain what I'm doing and why, rather than just stating definitions as one might do in a formal definition.

Meta-language

• Constants:
• All constants in the object language are also constants in the meta-language.
• $$\exists$$:  existential quantifier
• $$\land$$: logical and
• $$\phi$$: a formula
• $$\Phi$$: set of formulae
• $$\vdash$$:  derivation relation
• $$\models$$:  entailment (logical consequence) relation
• $$\leadsto$$;  informal justification
• $$\Rightarrow$$:  rewrite relation; like derivation, only from object language to meta-language.
• Formulae
• Quantification
• $$\exists x.\!x\phi$$ has its usual FOL meaning.
• Inference.  We treat inference as a genus with three species:
• $$\Phi \vdash \phi$$:  the formulae $$\Phi$$ derives the formula $$\phi$$; $$\phi$$ is derivable or deducible from $$\Phi$$
• $$\Phi \models \phi$$: the formulae $$\Phi$$ entail the formula $$\phi$$; $$\phi$$ is a (logical) consequence of $$\Phi$$.
• $$\Phi \leadsto \phi$$: the meanings of $$\Phi$$ justify or lead to the meaning $$\phi$$; $$\phi$$ follows from of $$\Phi$$ (semantically).  (NB: this is an informal notion we will use to explicate the consequence relation.)
Note that the meta-language contains a few constants from First-Order Logic (FOL): $$\exists, \land, \vdash, \models$$.  They allow us to say useful things about object languages and how they work.  As it stands it is not complete; this is just enough to get the ideas across.

$$\displaystyle\mathbb{T}_\varnothing$$  (T-null)

• Symbols
• Constants:   $$\{A\ldots Z\} \cup \{a,\ldots z\}\cup \{., ,"\}$$
• Term calculus:  every symbol is a term.
• Formula calculus:
• If $$A$$, $$B$$, and $$C$$ are terms, then $$ABC$$ is a formula.
• If $$A$$, $$B$$, and $$C$$ are terms, then each of the following is a formula:
• $$xBC$$, $$AxC$$, $$ABx$$
• $$xyC$$, $$xBy$$, $$Axy$$
• $$xyz$$
• If $$ABC$$ and $$DEF$$ are formulae, then $$ABC, DEF.$$ is a formula.
So far, $$\mathbb{T}_\varnothing$$ - call it T-null - is a minimal language to say the least.  It has a very limited number of forms, and it has no semantics whatsoever.  In other words, it's not really a language at all; we could call it an uninterpreted calculus, except that you can't do any calculating in it.  However we care to classify it, we can attach any semantics we care to it by assigning meanings to its symbols, but since no such assignments are fixed the result has no stability - you can always change any assignment, so there isn't much point in making one in the first place.

We can express this in terms of expressive strength in a way that will turn out to be useful as we proceed.  $$\mathbb{T}_0$$ is the weakest possible language, because it has no expressive stability.  You can't say anything in it that is guaranteed to mean the same thing under different interpretations.

$$\mathbb{T}_\varnothing$$ corresponds to RDF without blank nodes and literals.
Formulae like $$ABC$$ obviously correspond structurally to RDF triples; to maintain a clean separation between this language and RDF languages, we will call such forms triads.   We might occasionally be tempted to call the constants trions, but we hope not.

$$\displaystyle\mathbb{T}_0$$  (T-basic)

What can we do to remedy this sorry state of affairs?  Let's start by giving the speakers of this little language the ability to express things like "somebody killed Caesar" and "Cicero said something".  While we're at it, let's allow them to say "Somebody killed Caesar and Cicero said something".

As to "somebody" ("something", etc.): the usual way this is done is by extending the language.  One adds some symbols and the axioms that fix their meanings.  In this case, we could add the existential quantifier $$\exists$$ to the object language and, if we wanted to be fastidious, define rules in the meta-language to determine its meaning ("there exists").  This would allow users of the language to "say" things of the form $$\exists x.\!xBC$$; if you set B = killed and C = Caesar, and this expresses "somebody killed Caesar".

But since we want to track RDF, whose symbol set does not include anything corresponding to $$\exists$$, we will take a different tack, one that corresponds to the way things are done in the RDF ecology: instead of extending the language, we strengthen it.

Note: there's more to say about extending/strengthening; in particular such changes must be conservative, that is, they cannot break what is already there.  More generally: we have a language, a world, and a stable but partial interpretation.  Fix any two and vary the third.

There are two ways to do this.  One is to pick some constants that are already in the language and give them a fixed meaning.  This is the technique we will use later on when we accomodate the various "entailment regimes" of RDF.  This kind of strengthening is equivalent to extending the set of constants in the language with fixed meanings (call them the fixed constants for short).  Having more fixed constants means more expressive power; it allows the speaker to say more (kinds of) things.

The other way to strengthen a language, which we will now use, is to reclassify some subset of its constants.  Notice that $$\mathbb{T}_\varnothing$$ contains only one kind of symbol.  We will change the language not by adding symbols, but by reclassifying the symbols $$x, y,$$ and $$z$$ into a new class of  Variables.  Unfortunately, that alone would have no effect; something additional needs to be introduced in order to make such reclassification meaningful.  In this case, that means adding rules to the meta-language that fix the meanings of the formulae in which they appear.  That last clause - "meanings of the formulae" - is critical, since with this technique we do not fix the meanings of the variable symbols themselves.  On the contrary, quantified variables of this sort will never be assigned a determinate meaning.  Their meaning is essentially tied up with the meaning of quantification, which in turn is essentially connected with the composite forms in which the variables appear.

The new meta-language rules fixing the meanings of formulae containing variables will be expressed as rewrite rules.  A rewrite rule is essentially the same as a syntactic derivation or transformation rule; it says how one form can be mapped to another.  The difference is that derivation rules work within the object language (i.e. they're closed), whereas rewrite rules take forms from the object language to forms in the metalanguage.

• Symbols
• Constants:   $$\{A\ldots Z\} \cup \{a,\ldots z\}\!\setminus\!\{x, y, z\} \cup \{., ,"\}$$
• Variables: $$\{x, y, z\}$$
• Meta-language Rewrite Sublimation Rules
1. $$ABC\Rightarrow ABC$$
2. $$, \Rightarrow\land$$
3. $$xBC \Rightarrow \exists x.\!xBC$$ and similarly for AxC and ABx.  We treat any formula in the object language that contains variables as an existentially quantified formula.
4. $$xyC \Rightarrow \exists x\exists y.\!xyC$$.
5. $$xyz \Rightarrow \exists x\exists y\exists z.\!xyz$$.
6. $$xBC, xEF \Rightarrow \exists x.\!(xBC\land xEF)$$ and similar for two and three variables.
7. $$xBC, yEF \Rightarrow \exists x\exists y.\!(xBC\land yEF)$$ and similar for two and three variables.
8. etc. (existential rewrites shown by example rather than general rule)
CAVEAT: it may be tempting to view the rewrite rules as a kind of semantic interpretation.  That would be a tragic mistake.  They are purely syntactic rules.
For example, if we have $$ABC, xEF$$, rewrite rules 1-3 sublimate'' this to $$ABC\land\exists x.\!xEF$$.
Note: as it stands this calculus is not complete, because it does not handle the scope of quantification for all cases.  For example it does not tell us how to get from $$xBC, DEF, GHy$$ to $$\exists x\exists y.\!(xBC\land~DEF\land~GHy)$$.  But I hope what's there is sufficient to get the idea across.

This is a stronger language, but it still lacks a highly desirable bit of logic: it does not yet support any form of inference.  We cannot go from a premise to a conclusion.  In particular, even though we know how to construct (legal) formulae containing ',' and $$\land$$, nothing tells us when it is appropriate to do so.  For example, if you know that grass is green, and you also know (separately, as it were) that snow is white, then you ought to be able to package those two bits of knowledge into a single whole and say "grass is green and snow is white".  Intuitively that's obviously what we do; but for a calculus, we need an explicit rule.  Similarly for $$\exists$$: if we know the specific person (thing) that exists, then we can say that some person or thing exists.  But we need a formal rule for this.

We could strengthen $$\mathbb{T}_0$$ and call it a new language, but in the interest of conciseness let's just define $$\mathbb{T}_0$$ as including support for a basic form of inference.  It is intuitively obvious that if "Brutus killed Caesar" is true, then so is "somebody killed Caesar".  Going from the former to the latter is a basic species of inference.  Similarly, if you know "Somebody killed Caesar", and you also know (perhaps you were told) that Brutus killed Caesar, then you no longer need "Somebody killed Caesar".  Since Brutus is somebody, you lose nothing if you discard "Somebody killed Caesar" and retain "Brutus killed Caesar".  This too is a form of inference.

To support this in $$\mathbb{T}_0$$ we add inference rules governing the introduction and elimination of terms like "somebody".  To start, we have only two such terms: $$\exists$$ ("there exists") corresponding to "somebody" (or "something", etc.) and $$\land$$ ("and").  [In the case of $$\exists$$ we're abusing the terminology a little bit, since $$\exists$$ is not part of the object language.  Our introduction rule will actually license introduction of variables in triad constructions, which in turn are mapped to $$\exists$$ expressions in the meta-language.
• $$\land$$-Introduction: $$\displaystyle{\Phi\quad \phi\over \Phi,\ \phi}$$
• $$\land$$-Elimination:
• $$\displaystyle{\Phi,\ \phi\over\Phi}$$
• $$\displaystyle{\Phi,\ \phi\over\phi}$$
• $$\exists$$-Introduction
• $$ABC\vdash xBC$$ ...similar for $$AxB, ABx$$ etc.
• $$ABC\vdash xyC$$ ...
• $$ABC\vdash xyz$$
• $$\exists$$-Elimination
• $${ABC\quad xBC\over ABC}\qquad or\qquad {\Phi\quad \Phi x\over\Phi}$$

RDF:  $$\exists$$-Elimination is basically the counterpart to "instancing" in RDF-Semantics.

• NOTE: I'm not sure how to write $$\exists$$-Elimination yet.  Maybe something like one of these?   TODO: explicate significance of elimination rules for readers unfamiliar with this style of logic.

[NB: if we were being really fastidious we would give each of  these rules a distinct name, but that seems like too much bother for now.  And why not just call them $$\exists$$\!I rules, since that's what they mean?]
CAVEAT: this works fine for one triad, but does not work for a sequent of triads.  The single triad rule would allow e.g. $$ABC,ADE\vdash xBC,yDE$$.  We need a rule to express the requirement of uniform substitution, e.g. $$ABC,ADE\vdash xBC,xDE$$.
Alternative format:

 $$ABC\over xBC$$ $$ABC\over AxC$$ $$ABC\over ABx$$ $$ABC\over xyC$$ $$ABC\over Axy$$ $$ABC\over xBy$$ $$ABC\over xyz$$
[NB: can we generalize? E.g.  $$\Phi a\over\Phi x$$ , reading $$\Phi a$$ to mean a collection of triads containing one or more constants a so the rule says replace 'em all with a variable.  For a single triad, $$\phi a\over\phi x$$.]

The intuitive way to interpret such rules is simple:  if you have $$ABC$$, then you are licensed to introduce a var in place of any of the constants.  So "introduce a var" is the formal counterpart of "say 'somebody' (instead of 'Brutus')".  [NB:  "Var-introduction" actually doesn't quite work, since it involves substitution, but it's close enough.]
[TODO:  explicate: are the inference rules part of the language definition, or are they in the meta-language?  Intuitively they seem to be like the rules of the formula calculus; but they use the symbol $$\vdash$$, which we do not want to admit to the language.  So they must be part of the meta-language.  Remember, the meta-language "includes" the object language.]

Historical note:  this rule is cribbed directly from the great Gerhard Gentzen, who first came up with the idea of introduction and elimination rules as determining the meanings of the logical constants.  But his $$\exists$$-Introduction rule is different than our Var-Introduction rule, since his object language included they symbol $$\exists$$ and ours does not.  His looked like something this: ${Pa\over\exists x.Px}\ {\scriptstyle\!\exists\!-\!I}$

where the $$\exists\!-\!I$$ on the right names the rule.

Normal Form
Informally, a collection of triads is in normal form if the triads are independent; that is, if none of them can be derived from the others.
RDF:  as far as I can tell, the concept of "leanness" introduced in RDF-Semantics boils down to normal form.  It is expressed in terms of graphs, instances, and proper subgraphs, but what a "lean" graph amounts to is one whose triples are independent - the semantic domain counterpart of normal form.

Literals

If we want our little Triad calculus to reflect RDF, we obviously need to support literals.  I'll get to it eventually.

...todo

Commentary:

So far, this has nothing to do with RDF per se, except insofar as it reflects its structure.  But the symbol set has been chosen with an eye toward the RDF family of languages we will define using this calculus.  Informally:
• Our set of constant symbols $$\{A, B,\ldots\}\cup \{a, b,\ldots\}$$  corresponds to the set of IRIs
• To define a logical language corresponding to RDFS, OWL, etc. we will fix the meanings of subsets of the constants, e.g. $$t$$ corresponds to $$\mathrm{rdf:type}$$
• Variable symbols will be used to capture the RDF concept of blank node''.
We intentionally limit the symbol set so that we can write triads concisely.  If we our term calculus allow e.g. "Foo" as a term, we would need to be able to write "Foo Bar Baz".  Which isn't all that bad, but ABC is a whole lot more concise, and since we're only concerned with inference we don't need to be able to construct human-readable terms like "Brutus".  This is supposed to be a meta-language, after all.

For example, to accomodate RDFS, we can define inference schemata such as:
• $$AtB, BkC\vdash AtC$$.
This is a deduction rule, not an entailment rule, although it corresponds to RDFS entailment rule rdfs9 in RDF Semantics.  The critical point here is thatthe complete set of such inference schemata will serve to fix the meanings of the constants involved in them - in this case, rdf:type (t) and rdfs:subClassOf (k).  For such constants, at least, no model theoretic semantics is needed.