Monday, March 7, 2016

Workspaces: Boot's Missing Abstraction

Boot is a thing of beauty, but it does have a minor wart or two.  One of them is the fact that implementation details are exposed by the API.  For example, to add files to a Fileset you need to obtain a temporary directory using boot.core/tmp-dir!, add your file to that directory using standard Java filesystem APIs, and then add the temp dir contents to the fileset using something like boot.core/add-resource.  The latter does not really add anything to the Fileset; Filesets are immutable, so add-resource really constructs a new Fileset consisting of the old Fileset plus the contents of the temp dir. (NB: add-resource and other add-* functions can work with any Java.io.File dir, but the Boot Way is to use tmp-dir!)  The concept of a Fileset is an abstraction over storage mechanisms; the fact that Boot maps Filesets to the filesystem is an implementation detail.  A filesystem is really just a kind of database, so an alternative implementation could map Filesets to some other data storage mechanism.

The problem is that API functions like tmp-dir! inevitably suggest that we're working with the filesystem, but this too is an implementation detail.  Boot's current implementation of tmp-dir! happens to create a hidden temporary directory, but an alternative could use some other mechanism, such as an in-memory database like Datascript.  Furthermore, the fact that one uses Java filesystem APIs to work with the temp dir is also an implementation detail.  An alternative could offer its own more abstract API, with functions like add-item or similar.  What's important is that in order to alter the Fileset the developer must obtain a workspace, then add, change, and delete items in the workspace, and finally merge the workspace with the incoming Fileset to create a new Fileset to be passed on to subsequent tasks in the pipeline.  The implementation mechanism is not relevant to the semantics, so ideally, the API would be expressed solely in terms of implementation-independent abstractions.

When I say that the concept of a workspace is Boot's missing abstraction I do not mean that Boot lacks some critical bit of functionality that workspaces would provide.  I mean just that promoting the concept of a workspace to first-class status would make it easier for developers and users to understand and use Boot.

For example, my recommendation is that people new to Boot think of boot.core/tmp-dir! as boot.core/workspace, and should avoid thinking of workspaces as filesystem directories and their contents as files.  A workspace is just a space, never mind how the implementation handles the mapping of that space to a storage mechanism.  The concept is very similar to the notion of a namespace in Clojure.  The Java implementation of Clojure maps namespace symbols to file names, but that too is an implementation detail; when you work with namespaces and their contents in Clojure you never think about filesystem locations, you just think about names in namespaces. Similarly for Boot workspaces; when you add something to a workspace and then merge it with a Fileset you don't care how Boot goes about making this happen behind the scenes, you just want the stuff you put in the workspace to end up in the Fileset.

Unfortunately it's a little harder to ignore the filesystem-based implementation when you go to add something to a workspace.  A common pattern looks something like this:

(let [tmp-dir (boot/tmp-dir!)
      out-file (io/file tmp-dir "hello.txt")]
  (spit out-file "hello world")
  (-> fileset (boot/add-asset tmp-dir) boot/commit!))

Obviously it's hard to think of io/file as anything other than a filesystem operation.  But it's easy to imagine a few macros that would provide a more abstract interface.  So the above might look something like:

(let [ws (-> (boot/workspace) (boot/add-item "hello.txt" "hello world"))]
  (-> fileset (boot/add-asset ws) boot/commit!))


To really make the concept of a workspace first-class would require a lot more analysis; for example, you'd have to really think through the semantics and syntax of something like add-item here.  You'd have to decide what to do if the item already existed and so forth. But it also opens up some interesting possibilities.  Workspaces could be immutable, for example. The concept of a workspace could also be construed as an extension of Clojure's concept of a namespace. You could allow the user to specify a workspace name symbol, e.g. (boot/workspace foo.bar), which would allow you to treat items in the workspace as analogous to Clojure vars.  You would then have to decide on a mapping from vars to location names for the backing storage mechanism, but that doesn't seem an insurmountable design problem.  The example above might become something like:

(let [ws (boot/workspace foo)
      wss (add-item ws hello.txt "hello world")]
  (-> fileset (boot/add-asset wss) boot/commit!))

Or you might be even more Clojure-like, and support workspace operations like find-ws, so instead of the preceding you could write something like

(boot/workspace foo)
(boot/workspace bar)]
...
(let [ws (boot/find-ws foo)]
  (add-item ws hello.txt "hello world")
  (-> fileset ws boot/commit!))

You could even think of Fileset as a species of Workspace, a kind of lambda workspace.



No comments:

Post a Comment