Saturday, March 19, 2016

Coinduction Resources

Here are some materials I've found useful in trying to understand coinduction and related notions. Most of them involve some pretty heavy mathematics, but their introductions usually provide an overview and some examples that will help you develop your intuitions even if the math is over your head.

Some major figures, all of whom have made many publications available on their homepages:

A good place to start is An introduction to (co)algebra and (co)induction (PDF) by Rutten and Jacobs. (in Advanced Topics in Bisimulation and Coinduction).  The first part gives several concrete examples that will help you understand coinduction.

Coinductive Models of Finite Computing Agents by Wegner and Goldin.  Free version available on their homepages.

Communicating and Mobile Systems: the Pi Calculus, Milner.  Not easy to read, but well worth the effort.

An interesting paper showing how the concepts of bisimulation and bisimilarity emerged independently in computer science, philosophical logic, and set theory:  On the Origins of Bisimulation and Coinduction, Sangiorgi.  Free version available on author's homepage.

Universal coalgebra: a theory of systems, Rutten.  A landmark paper, I gather.  You can find a free version on Rutten's homepage.

Coalgebraically Thinking, David Corfield (blog post, n-Category Cafe, 2008)

This seems to be a growth industry; a number of introductory texts have been published in the past few years:

Friday, March 11, 2016

The Problem with Inductive Definition - and why it needs coinductive codefinition

There's a subtle problem with inductive definitions.  Consider the cannonical example of Nat, usually defined using something like the following pseudo-code:

Nat: Type
Z : Nat
S : Nat -> Nat

The problem is that given only this definition we cannot answer a simple question:  is every n:Nat of the form S...Z?  In other words, do the cannonical constructors of an inductively defined type exhaust the membership of that type?

The mathematician Andrej Bauer wrote an interesting blog post on this topic (The elements of an inductive type), but he comes at it from a very different perspective than I do.  His focus is on models of Nat, and he shows that there are various "exotic" models, which he describes thusly:  "Let us call a model exotic if not every element of (the interpretation of) 𝚗𝚊𝚝 is obtained by finitely many uses of the successor constructor, starting from zero."  It's an interesting post (as are the comments), but I think it misses the key point, which is that the problem is inherent in the structure of inductive definition; models are beside the point.  In this article I take a stab at an alternative characterization of the problem and a possible solution.

The reason we cannot answer the question of whether an inductively defined type contains only those elements built up using the constructors, given only the inductive definition, is that the inductive part of the definition (here, S) makes the assumption that its argument is of the appropriate type, but that's the only assumption it makes.  In particular it makes no assumption as to the structure of that argument. We can make this a bit more explicit using informal English:

Z is a natural number
Assuming that n is a natural number, infer that Sn is a natural number.

So we've assumed that n:Nat, but we have not assumed that it has cannonical form.  Maybe it was constructed solely using Z and S, maybe it wasn't - all we know (or rather assume) is that it is a Nat. For the purposes of inductive definition, it does not matter: S is a rule. If n is in fact of type Nat (no assumption here), then, no matter what its form or how we got it, Sn is of type Nat, end of story.  So for all we know, Nat could be chock-full of non-canonical members.

That's not very satisfactory, since intuition tells us that of course there is nothing in Nat that is not constructed from Z and S.  But our inductive definition does not say this, and there is no way to prove it from the definition alone.

Then how can we vindicate our intuitions, and show that every Nat is built up using nothing but S, starting from Z?  We want something like the following proposition:

For all n:Nat, either n=Z or n=Sm for some m:Nat   ;; where n=Sm means Sm is the canonical form of n)

Note that the inductive definition of Nat does not justify the second part of this proposition.  It tells us that every n:Nat has a successor Sn:Nat, but it does not tell us that every n:Nat is the successor to some other Nat (that is, has a predecessor "some m" such that n=Sm).  Even worse, it leaves open the possibility that every n:Nat could have any number of predecessors.  It may seem intuitively obvious that every Nat has a unique successor and a unique predecessor, but again, the definition does not say this; in fact it does not even imply it if your intuition hasn't already been trained to expect it.  Nor can it be used to prove it.

We cannot justify this inductively, since induction will always involve assumptions as above, and if we try to build canonicity into our assumptions we just move the problem.  For example it won't work to say "Assuming that n is either Z or a Nat of form Sm, then Sn is a Nat", since the same problem arises with respect to m.  And we cannot go further and say "Assuming that n is a cannonical Nat..." because that is obviously circular - you only get canonical form after you've defined S.

The only way I can see to do it is to use the dual notion of coinduction.  The idea is that we can introduce another operator P (Predecessor) and use it to show that every n:Nat can be expressed as (or reduced to, or equivalent to,  choose your terminology) a canonical form.  But not: defined as a canonical form!  Coinduction does not define; it does not say what things are (no metaphysics here); rather, it characterizes how they behave, which is to say it codefines them.  (See my previous post, Definition and Co-definition, for more on the distinction.)  The fact that every n:Nat has a predecessor (with PZ = Z) does not define Nat, it just says what you can do with a Nat once you have one.  By the same token it is not quite true to say that Z and S define Nat; they define those elements of Nat that have canonical form, but they do not "close" Nat, in that they do not and cannot exclude the possibility of (undefined) non-canonical Nat forms.

So to get a complete and intuitively satisfying "definition" of Nat we need both an inductive definition of canonical forms and a coinductive codefinition of non-canonical forms.  Here's one way to get started that I don't think works:

P :  Nat -> Nat

Leaving aside the fact that this just looks like an alias for S (I'll address that problem in a minute), the form of this expression obscures the difference between inductive constructor and coinductive coconstructor.  Informally, an inductive constructor of a type X takes things "into" X; a coinductive coconstructore takes things "from" X.  We can therefore make this structure more conspicuous as follows:

S : Nat -> X
P : Nat <- X

and setting X == Nat.  We could also have written P : X -> Nat, but that would make P look like a constructor for Nat rather than a coconstructor for X.  It's really the reversal of the arrow that exposes the duality, rather than the swapping of the type variables.

So now we have P: Nat <- X, but that just looks like a funny way to write S.  All it says is that if you have n:Nat, then you have Pn:Nat.  Now the question is how to express the idea that P functions as a predecessor operation.  The only way to do that is in terms of S.

PZ = Z
SPn = Sn ;; n = Z
SPn = n   ;; n != Z

Note that we are not using S to define P.  That is, we do not want to define P by saying P(Sn) = n; we want instead to start with the same minimal assumption we used to specify S, namely, "Assuming n:Nat, infer Pn...".  In fact we're not really defining P at all!  Instead we're specifying the relation between S and the result of using P. There's really nothing in this "definition" that characterizes P as a predecessor operation, except for the fact that the successor operation S cancels it out.  In a sense the operation P performs is completely hidden, and could be anything; all that matters is that once you use P, you can use S to cancel it out.

(Note that S : Nat -> X is also not a definition of S.  It does not define any mechanism, operation, algorithm, etc. that S performs.  Constructors like S, and coconstructors like P, are not functions.)

This definition/codefinition answers our original question directly.  Every n:Nat is equal to S(Pn), which is in canonical form.

I haven't said much about what coinduction really is, or why we should think of P as a coconstructor, etc.  I'll address that in another post.

(Aside:  some writers define "canonical form" as a form containing only canonical constructors, e.g. SSSZ; others defined it as one whose outermost form is a canonical constructor, e.g. Sn is in canonical form.)

Monday, March 7, 2016

Workspaces: Boot's Missing Abstraction

Boot is a thing of beauty, but it does have a minor wart or two.  One of them is the fact that implementation details are exposed by the API.  For example, to add files to a Fileset you need to obtain a temporary directory using boot.core/tmp-dir!, add your file to that directory using standard Java filesystem APIs, and then add the temp dir contents to the fileset using something like boot.core/add-resource.  The latter does not really add anything to the Fileset; Filesets are immutable, so add-resource really constructs a new Fileset consisting of the old Fileset plus the contents of the temp dir. (NB: add-resource and other add-* functions can work with any dir, but the Boot Way is to use tmp-dir!)  The concept of a Fileset is an abstraction over storage mechanisms; the fact that Boot maps Filesets to the filesystem is an implementation detail.  A filesystem is really just a kind of database, so an alternative implementation could map Filesets to some other data storage mechanism.

The problem is that API functions like tmp-dir! inevitably suggest that we're working with the filesystem, but this too is an implementation detail.  Boot's current implementation of tmp-dir! happens to create a hidden temporary directory, but an alternative could use some other mechanism, such as an in-memory database like Datascript.  Furthermore, the fact that one uses Java filesystem APIs to work with the temp dir is also an implementation detail.  An alternative could offer its own more abstract API, with functions like add-item or similar.  What's important is that in order to alter the Fileset the developer must obtain a workspace, then add, change, and delete items in the workspace, and finally merge the workspace with the incoming Fileset to create a new Fileset to be passed on to subsequent tasks in the pipeline.  The implementation mechanism is not relevant to the semantics, so ideally, the API would be expressed solely in terms of implementation-independent abstractions.

When I say that the concept of a workspace is Boot's missing abstraction I do not mean that Boot lacks some critical bit of functionality that workspaces would provide.  I mean just that promoting the concept of a workspace to first-class status would make it easier for developers and users to understand and use Boot.

For example, my recommendation is that people new to Boot think of boot.core/tmp-dir! as boot.core/workspace, and should avoid thinking of workspaces as filesystem directories and their contents as files.  A workspace is just a space, never mind how the implementation handles the mapping of that space to a storage mechanism.  The concept is very similar to the notion of a namespace in Clojure.  The Java implementation of Clojure maps namespace symbols to file names, but that too is an implementation detail; when you work with namespaces and their contents in Clojure you never think about filesystem locations, you just think about names in namespaces. Similarly for Boot workspaces; when you add something to a workspace and then merge it with a Fileset you don't care how Boot goes about making this happen behind the scenes, you just want the stuff you put in the workspace to end up in the Fileset.

Unfortunately it's a little harder to ignore the filesystem-based implementation when you go to add something to a workspace.  A common pattern looks something like this:

(let [tmp-dir (boot/tmp-dir!)
      out-file (io/file tmp-dir "hello.txt")]
  (spit out-file "hello world")
  (-> fileset (boot/add-asset tmp-dir) boot/commit!))

Obviously it's hard to think of io/file as anything other than a filesystem operation.  But it's easy to imagine a few macros that would provide a more abstract interface.  So the above might look something like:

(let [ws (-> (boot/workspace) (boot/add-item "hello.txt" "hello world"))]
  (-> fileset (boot/add-asset ws) boot/commit!))

To really make the concept of a workspace first-class would require a lot more analysis; for example, you'd have to really think through the semantics and syntax of something like add-item here.  You'd have to decide what to do if the item already existed and so forth. But it also opens up some interesting possibilities.  Workspaces could be immutable, for example. The concept of a workspace could also be construed as an extension of Clojure's concept of a namespace. You could allow the user to specify a workspace name symbol, e.g. (boot/workspace, which would allow you to treat items in the workspace as analogous to Clojure vars.  You would then have to decide on a mapping from vars to location names for the backing storage mechanism, but that doesn't seem an insurmountable design problem.  The example above might become something like:

(let [ws (boot/workspace foo)
      wss (add-item ws hello.txt "hello world")]
  (-> fileset (boot/add-asset wss) boot/commit!))

Or you might be even more Clojure-like, and support workspace operations like find-ws, so instead of the preceding you could write something like

(boot/workspace foo)
(boot/workspace bar)]
(let [ws (boot/find-ws foo)]
  (add-item ws hello.txt "hello world")
  (-> fileset ws boot/commit!))

You could even think of Fileset as a species of Workspace, a kind of lambda workspace.