Tuesday, September 12, 2017

Bazel: Building a JNI Lib

Here's how I managed to use Bazel to compile a JNI wrapper for a C library.  To understand it you'll need to understand the basics of Bazel; you should go through Java and C++ tutorials on the Bazel site, and understand workspaces, packages, targets, and labels. You will also need to understand Working with external dependencies.  That documentation is a little thin, but in this article I'll try to explain how it works, at least as I understand it.

To start, a JNI wrapper for a C/C++ library - let's call it libfoo - will involve three things (native file extensions omitted since they are platform-dependent):
  1. The Java jar that exposes the API - we'll call it libfooapi.jar in this case
  2. The JNI layer (written in C in this case, C++ also works) that wraps your C library (translates between it and the API jar) - we'll call this libjnifoo
  3. Your original C library - libfoo
So the task of your build is to build the first two.

Code Organization

My C library (libfoo) is built (using Bazel) as a separate project on the local filesystem. This makes it an "external Bazel dependency"; we'll see below how to express this in our JNI workspace/build files.

The source for the JNI lib project looks something like this (on OS X):
$ tree foo
foo
├── BUILD
├── README.adoc
├── WORKSPACE
├── src
│   ├── c
...
│   │   ├── jni_init.c
│   │   ├── jni_init.h
... other jni layer sources ...
│   │   └── z.h
│   └── main
│       ├── java
│       │   └── org
│       │       └── foo
│       │           ├── A.java
...
│       │           ├── FOO.java
...
Note that WORKSPACE establishes the project root; you will execute Bazel commands within the foo directory. Note also that we only have one "package" (i.e. directory containing a BUILD file). We're going to build the jar and the jni lib as targets in the same package.

The third thing we need is the JDK, since our JNI code has a compile-time dependency on "jni.h"; this is a little bit tricky, since this too is an external dependency, and you cannot brute-force it by giving an absolute path the the JDK include directories - Bazel rejects such paths. We'll see how to deal with such "external non-Bazel dependencies" below.

WORKSPACE

My WORKSPACE file defines my libfoo library as a local, external, Bazel project:

local_repository(
    name = "libfoo",
    path = "/path/to/libfoo/bazel/project",
)

In this example, /path/to/libfoo/bazel/project will be a Bazel project - it will contain a WORKSPACE file and one or more BUILD files.  Defining a local "repository" like this in the workspace puts a (local) name on the "project" referenced by the path.  Note that projects do not have proper names; a "project" is really just a repository of workspaces, packages, and targets defined by WORKSPACE and BUILD files - hence the name "local_repository".

The "external" output dirs

When you define an external repo like this, Bazel will create (or link) the necessary resources in subdirectories of the output directories, named appropriately; in this case, "external/libfoo".

In other words, if you define @foo_bar, you will get "external/foo_bar" in the output dirs.

TODO: explain - does Bazel just create soft links to the referenced repo?

JDK Dependency

Our JNI library will have a compile-time dependency on the header files of the local JDK. Many more traditional build systems would allow you to express this by add the absolute file path to those include directories; Bazel disallows this.  Instead, we need to define such resources as an external repository; in this case, a non-Bazel external repository.

You could do this yourself for JDK resources, at least in principle (I tried and failed).  Fortunately Bazel predefines the repository, packages and targets you need.  The external repo you need is named "@local_jdk" (note: underscore, not dash); the targets are defined in https://github.com/bazelbuild/bazel/blob/117da7a947b4f497dffd6859b9769d7c8765443d/src/main/java/com/google/devtools/build/lib/bazel/rules/java/jdk.WORKSPACE.  Frankly I do not completely understand how local_jdk and etc. definitions work, but they do. In this example, we will use:
  • "@local_jdk//:jni_header"
  • "@local_jdk//:jni_md_header-darwin"
(If you look at the source, you will see that these are labels for "filegroup" targets - a convenient way to reference a group of files.)

(See also https://github.com/bazelbuild/bazel/blob/master/src/main/tools/jdk.BUILD - I don't know how this is related to the other file.)

BUILD

The local name for the external repository (project) specified in your WORKSPACE makes it possible to refer to it using Bazel labels.  In your BUILD files you use the @ prefix to refer to such external repositories; in this case, "@libfoo//src:bar" would refer to the bar target of the src package of the libfoo repository, as defined in the WORKSPACE above. Note that because of the way Bazel labels are defined, we can expect to find target "bar" defined in file "src/BUILD" in the libfoo repo.

My buildfile has two targets, one for the Java, one for the C.  The Java target is easy:

java_library(
    name = "fooapi",  # will produce libfooapi.jar
    srcs = glob(["src/main/**/*.java"]))

The C target is a little more complicated:

cc_library(
    name = "jni",
    srcs = glob(["src/c/*.c"]) + glob(["src/c/*.h"])
    + ["@local_jdk//:jni_header",
       "@local_jdk//:jni_md_header-darwin"],
    deps = ["@libfoo//src/ocf"],
    includes = ["external/libfoo/src",
                "external/libfoo/src/bar",
                "external/local_jdk/include",
                "external/local_jdk/include/darwin"],
)

Important: note that we put the @local_jdk labels in srcs, not deps.  That's because (I think) they are filegroups, and the labels you put in deps should be "rule" targets rather than file targes.

Exposing the headers

Note that it is not sufficient to express the dependencies on local_jdk; you must also specify the include directories with that repo in order to expose jni.h etc.  That's what the "includes" attribute is for.  You must list all the (external) directories needed by your sources.

Saturday, January 21, 2017

boot-gae: Interactive Clojure Development on Google App Engine

[Third in a series of articles on boot-gae]

boot-gae supports interactive development of Clojure applications on GAE. It does not have a true REPL, but it's pretty close: edit the source, save your edits, refresh the page. Your changes will be loaded by the Clojure runtime on page refresh.

This is mildly tricky on GAE. GAE security constraints prevent the Clojure runtime from accessing the source tree, since it is not on the classpath. Nothing outside of the webapp's root directory tree can be on the classpath.

Part of the solution is obvious: monitor the source tree, and whenever it changes, copy the changed files to the output directory. The built-in watch task makes this easy; gae/monitor composes that task with some other logic to make it work.

The tricky bit here is to make sure the changed files get copied to the appropriate place in the output directory; for Clojure source files, that means

target/WEB-INF/classes    ;; for servlet apps

target/<servicename>/WEB-INF/classes    ;; for service apps

boot-gae tasks use configuration parameters to construct the path. The gae/monitor (and the gae/build) task uses the built-in sift task to move input from the source tree to the right place.

That's half a solution; we still need to get Clojure to reload the changed files. The trick here is to use a  Java filter to monitor the files in the webapp and reload them on change, just as the gae/monitor does with source files. A filter in a Java Servlet app dynamically intercepts requests and responses; by installing a filter, we can ensure that changed Clojure code can be reloaded whenever any page is loaded.  See The Essentials of Filters for more information.

The gae/reloader task generates and installs the appropriate filter. No configuration is necessary; the whole process is hidden, so all the programmer need do is run the gae/reloader task.

The reloader task generates a reloader "generator" file (named using gensym) whose contents look like this:

;; TRANSIENT FILTER GENERATOR
;; DO NOT EDIT - GENERATED BY reloader TASK
(ns reloadergen2244)

(gen-class :name reloader
           :implements [javax.servlet.Filter]
           :impl-ns reloader)

It saves this to a hidden location (this is easily done, since doing so one of the core features of boot) and then AOT-compiles it to produce the reloader.class file.

It also generates the Clojure file that implements the filter's doFilter method. Here's the content of that file:

;; RELOADER IMPLEMENTATION NS
;; DO NOT EDIT - GENERATED BY reloader TASK
(ns reloader
  (:import (javax.servlet Filter FilterChain FilterConfig
                          ServletRequest ServletResponse))
  (:require [ns-tracker.core :refer :all]))
(defn -init [^Filter this ^FilterConfig cfg])
(defn -destroy [^Filter this])
(def modified-namespaces (ns-tracker ["./"]))
(defn -doFilter
  [^Filter this
   ^ServletRequest rqst
   ^ServletResponse resp
   ^FilterChain chain]
  (doseq [ns-sym (modified-namespaces)]
    (println (str "reloading " ns-sym))
    (require ns-sym :reload))
  (.doFilter chain rqst resp))

This process results in some transient files, which are filtered out of the final result. The only files we need are reloader.class (which the servlet container needs) and reloader.clj (to which reloader.class will delegate calls to the filter methods, like doFilter).

If you want to inspect the transient files, you can retain them by passing the --keep (short: -k) flag to the gae/build task. Here's an example of what you will find in WEB-INF/classes in that case (other files omitted):

reloader.class
reloader.clj
reloadergen2244$fn__35.class
reloadergen2244$loading__5569__auto____33.class
reloadergen2244.clj
reloadergen2244__init.class

Since the reloadergen* files are not needed by the app, then are removed by default.

Deployment


This works find for local development; however, it's just a waste in an application deployed to the cloud. Before deploying (gae/deploy), be sure to omit the gae/reloader task from your build pipeline; is you're using gae/build, use the --prod (short: -p) flag.





boot-gae: building and assembling service-based apps

[Second in a series of articles on using boot-gae to develop Clojure apps on GAE]

Google App Engine supports two kinds of application. The traditional kind is what I'll call a servlet app - a standard, Java Servlet application. It may contain multiple servlets and filters, but everything is in one WAR directory. Servlets can communicate with each other using several techniques, including direct method invocation, or using System properties to pass information, etc. The key point is that they need not send each other HTTP messages in order to cooperate.

The other kind of application, which I will call a service-based, or just services app, assembles one or more servlet apps into an application. Each servlet app is called a service (formerly: module), and functions as a micro-service in the assembled application. Such microservices collaborate via HTTP.

See Microservices Architecture on Google App EngineService: The building blocks of App Engine, and Configuration Files for more information.

boot-gae makes it easy to develop service-based applications, using the same code as for servlet applications. To build a service, do this (from the root of the service project):

$ boot gae/build -s

The -s (--service) switch tells boot-gae to build a service; the result will be placed in target/<servicename>. Building a service, unlike building a servlet app, will generate a jar file for the service. Install this:

$ boot install -f target/<servicename>/<service-jar-file-name>.jar

Do this for each service. Then, from the root directory of the service-based app, run the assemble task:

$ boot gae/assemble

To run the assembled app, use gae/run. The two commands can be combined:

$ boot gae/assemble gae/run

To interactively develop a service running in a services app, change to the service's root directory and run

$ boot gae/monitor -s

Now when you edit your service's code, the changes will be propagated to the assembled service-based app, where they will be loaded on page refresh.

How It Works

The service components and the services app must be correctly configured for this to work, of course. Each service component must include a :gae map in its build.boot file; it looks like this:


(set-env!
 :gae {:app-id "microservices-app"
       :version "v1"
       :module {:name "greeter"
                :app-dir (str (System/getProperty "user.home")
                              "/boot/boot-gae-examples/standard-env/microservices-app")}}
...)

The :version string must conform to the GAE rules: The version identifier can contain lowercase letters, digits, and hyphens. It cannot begin with the prefix "ah-" and the names "default" and "latest" are reserved and cannot be used...Version names should begin with a letter, to distinguish them from numeric instances which are always specified by a number (see appengine-web.xml Reference).

The :app-dir string must be the path of the service-based app's root directory.

The :name string will be used (by gae/monitor -s) to construct the path of the service in its WAR directory in the services app; in this case, the result will will be

$HOME/boot/boot-gae-examples/standard-env/microservices-app/target/greeter

The gae/monitor -s task will copy source changes to this directory.

The services app must also include the :gae map in its build.boot file, but without the :module entry. In addition, the component services must be included in the :checkouts vector; for example:

:checkouts '[[tmp.services/main "0.2.0-SNAPSHOT" :module "default" :port 8083]
            [tmp/greeter "0.1.0-SNAPSHOT" :module "greeter" :port 8088]
            [tmp/uploader "0.1.0-SNAPSHOT" :module "uploader" :port 8089]]

The first service listed will be the default service; it must be named "default".  The :module string here must match the :module :name string of the service's build.boot.

WARNING: this will change, so that service components will be listed in :dependencies.

Finally, the services app must contain a services.edn file, which looks like this:

{:app-id "boot-gae-greetings"
 ;; first service listed is default service
 :services [{:service "default"}
            {:service "greeter"}
            {:service "uploader"}]}

WARNING: this will change. We have all the information needed to assemble the app in build.boot, so this edn file is not needed.

See standard environment examples for working demos.

Previous article: Building Clojure Apps on Google App Engine with boot-gae
Next article: boot-gae: Interactive Clojure Development on Google App Engine


Friday, January 20, 2017

Building Clojure Apps on Google App Engine with boot-gae

It's relatively easy to get a Clojure application running on GAE's devserver; you just need to use gen-class to AOT compile a servlet. See for example Clojure in the cloud. Part 1: Google App Engine. The problem is that you then need to restart the devserver whenever you want to exercise code changes, which is way too slow.

One way around this limitation is to run Jetty or some other Java servlet container rather that the devserver.  See for example:
The problem with this strategy is exactly that it does not use the official development server from Google. That server is a modified version of Jetty, with strict security constraints, providing a near-exact emulation of the production environment (which also runs a version of Jetty). If you develop with some other servlet container, you won't know if your code is going to run in production until you actually deploy to the cloud.

So there are two problems to be addressed if we want to use the official devserver.  One is that Java servlets must be compiled, since the servlet container will search for byte-code on disk when it comes time to load a servlet; most solutions I've seen end up AOT-compiling the entire app. The other problem is that GAE's security constraints will prevent your app from accessing anything outside of the webapp's directories. That means, for example, that any jar dependencies should be installed in WEB-INF/lib. If you want to load Clojure source files at runtime, they must be on the classpath, e.g. in WEB-INF/classes.

boot-gae is a new set of tools that solves these problems. Using it, you can easily develop Clojure apps with REPL-like interactivity in the devserver environment. It automates just about everything, so building and running an application is as simple as:

$ boot gae/build gae/run

To develop interactively, switch to another terminal session and run

$ boot gae/monitor

It's that simple. Now changes in your source tree will be propagated to the output tree, where they will be reloaded on page refresh.

The gae/build task is a convenience task that composes a number of core tasks that take care of everything:

  • installing jar dependencies in WEB-INF/lib
  • generating the config files WEB-INF/appengine-web.xml and WEB-INF/web.xml
  • generating one stub .class file for each servlet and filter
  • copying Clojure source files from the source tree to WEB-INF/classes for runtime reloading
  • copying static web assets (html, js, css, jpeg, etc.) from the source tree to the appropriate output directory
  • generating and installing a reloader filter, which will be used to detect and reload changed namespaces at runtime

The process is controlled via simple *.edn files. For example, servlets are specified in servlets.edn, which looks like this:

{:servlets [{:ns greetings.hello
             :name "hello-servlet"
             :display {:name "Awesome Hello Servlet"}
             :desc {:text "blah blah"}
             :urls ["/hello/*" "/foo/*"]
             :params [{:name "greeting" :val "Hello"}]
             :load-on-startup {:order 3}}
            {:ns greetings.goodbye
             :name "goodbye-servlet"
             :urls ["/goodbye/*" "/bar/*"]
             :params [{:name "op" :val "+"}
                      {:name "arg1" :val 3}

                      {:name "arg2" :val 2}]}]}

Here two servlets are specified. One task - gae/servlets - will use this data to generate a "servlets generator" source file that looks like this:

;; TRANSIENT SERVLET GENERATOR
;; DO NOT EDIT - GENERATED BY servlets TASK
(ns servletsgen2258)

(gen-class :name greetings.hello
           :extends javax.servlet.http.HttpServlet
           :impl-ns greetings.hello)

(gen-class :name greetings.goodbye
           :extends javax.servlet.http.HttpServlet
           :impl-ns greetings.goodbye)


This file is then AOT-compiled to produce the two class files, WEB-INF/classes/greetings/hello.class and WEB-INF/classes/greetings/goodbye.class. The programmer then need only supply an implementation for the service method of HttpServlet, in an appropriately named Clojure file - in this case, in the source tree, greetings/hello.clj and greetings/goodbye.clj will both contain something like (defn -service ...) or (ring/defservice ...).

Another task, gae/webxml, will use the same information to generate WEB-INF/web.xml.

Thus with boot-gae, only minimal servlet and filter stubs are AOT-compiled. The gen-class source code is itself automatically generated, then AOT-compiled to produce the corresponding class files, and discarded. The programmer never even sees this code (but can keep it for inspection by passing a -k parameter).

boot-gae is available at https://github.com/migae/boot-gae.  A companion repository,  https://github.com/migae/boot-gae-examples contains sample code with commentary.

Thursday, July 7, 2016

Java cacerts on Intel Edisons and Gateways

The Intel Edison ships with loads of ca certificates in /etc/ssl/certs, and it also ships with Java 8 (OpenJDK), but it does not ship with a pre-configured cacerts file for Java. Ditto for the Wind River Linux installation that is preinstalled on the Dell 3290 gateway device.

So if you try to use Java to access something over HTTPS you'll get a security exception. For example, if you want to run Clojure using the excellent boot build too, the first time you try "$ boot repl" on either WRLinux/Gateway or Yocto/Edison, you’re going to get an exception along the lines of:


Exception in thread "main" javax.net.ssl.SSLException:
java.lang.RuntimeException: Unexpected error:
java.security.InvalidAlgorithmParameterException:
the trustAnchors parameter must be non-empty

The problem is that although both systems ship with a large set of ca certificates, as well as preinstalled OpenJDK 1.8, the latter is not configured to use the former out of the box.  Since I virtually never have to deal with this sort of thing it took me a couple of hours to figure out what the problem was and how to fix it.  All you have to do is create a Java “trust store” by running a Java utility calledkeytoolonce for each certificate you want to add to the trust store. (A trust store is analogous to a key store; the latter is where you keep your keys, the former is where you keep public certificates of trust; if you’re a glutton for punishment take a look at Configuring Java CAPS for SSL Support.)
Here’s what you need to know to make sense of that:
  • Java is installed at /usr/lib64/jvm (WRLinux on the Gateway), or /usr/lib/jvm (Yocto on Edison)
  • The standard place for the Java truststore is $JAVA_HOME/jre/lib/security/cacerts. Note that cacerts is a file, not a directory. You create it with
  • keytool
  • keytool is located in $JAVA_HOME/jre/bin
  • The certificates are in /etc/ssl/certs, which is a directory containing soft links to /usr/share/ca-certificates

Now you just need something to save you the drudgery on installing everything in /etc/ssl/certs into the Java trust store, since keytool must be given an individual file rather than a directory of files as input. Fortunately, somebody already wrote that script for us. You can find it at Introduction to OpenJDK in the subsection Install or update the JRE Certificate Authority Certificates (cacerts) file. I just copied that to ~/bin/mkcacerts, chmodded it to make it executable, and then created a one-time helper:

#!/bin/sh
# certs.sh

# use this for Yocto/Edison:
# LIB=lib
# use this for WRLinux/Gateway
LIB=lib64
if [ -f /usr/$LIB/jvm/java-8-openjdk/jre/lib/security/cacerts ]; then
  mv /usr/$LIB/jvm/java-8-openjdk/jre/lib/security/cacerts \
     /usr/$LIB/jvm/java-8-openjdk/jre/lib/security/cacerts.bak
fi
# if you have a ca-certificates.crt file, use this:
# -f "/etc/ssl/certs/ca-certificates.crt"
# otherwise use
# -d "/etc/ssl/certs/"
./mkcacerts                 \
        -d "/etc/ssl/certs/"           \
        -k "/usr/$LIB/jvm/java-8-openjdk/bin/keytool"      \
        -s "/usr/bin/openssl"          \
        -o "/usr/$LIB/jvm/java-8-openjdk/jre/lib/security/cacerts"

There is one tricky bit: notice the "-d" parameter in the script above. Use that if /etc/ssl/certs contains certificates but no "ca-certificates.crt" - this was the case in earlier versions of the Edison. But if you find /etc/ssl/certs/ca-certificates.crt, then replace that line as noted in the comment.

Now cd to ~/bin and run $ ./certs.sh.  You'll see a bunch of alarming-looking output as it churns through the certificates, adding them to the Java truststore (cacerts file). It takes a while, but once it's done you should be good to go with Java and HTTPS.

Tuesday, July 5, 2016

Why I Still Hate Hate Hate Build Systems

You might think that by 2016 software and systems developers would have figured out build systems. But no.  Build systems, along with package management and configuration, remain the bane of the developer's, not to mention the user's, existence.

I've worked with many such systems, from the venerable make/autotools system, to Cmake, to Scons, to Ant, Maven, Gradle (Boo!) and who knows what else. I hate them all.  They are all misbegotten monstrosities that only their mothers could love.

Case in point: I want to use the tinyb Bluetooth LE library on the Intel Edison and on a Dell 3290 gateway device.  You'd think that would be easy enough - download, configure, make, install, use. The project uses CMake, which ought to be even better than Make.

But like all build systems, Cmake works just great - until it doesn't.  In this case:


  • It started out by offering an inscrutable error message:  "A required package was not found". I'm not making this up, that is the message.  Which package?  Why so coy, CMake?  Cantcha just say???
  • So you look in the error log, and you see that a check for Pthreads failed. But you know your system has pthreads, so you flail around for a while, then notice it failed because the test command specified not "-lpthread", but "-lpthreads".  Whaa? A bug in CMake? An hour and a half later you finally realize that was a spurious report - the error message was not caused by Pthreads testing but by the check that occurred just before the error message. You proceed to feel both very stupid, because you should have thought of that immediately, and very, very pissed that the morons in charge of Cmake could have saved you the trouble by simply saying what the problem is.  I dunno, maybe something like "A required package was not found: gio-unix-2.0>=2.40"?  Izzat so hard?
  • And the way you actually figure out that gio-unix-2.0 was the problem is by creating a test project and painstakingly going through the CMakeLists.txt file line by line, trial-and-error. You begin to fear you might have an aneurism.  You are glad the Cmake devs are not within throttling distance, lest you squeeze their throats very hard.
  • But, but, bu bu - my system does so have gio-unix-2.0!  It's right there! (Of course, you waste a good 20-30 minutes googling around trying to figure out what exactly gio-unix is).
  • More trial and error.  Maybe its the version number.  You try a bunch, nothing makes a difference, the test always fails.
  • Finally, you inspect the pkg-config file for gio-unix-2.0. Everything looks hunky dory. Except for that @[email protected] thingie there, it looks a little fishy, but you are not a pkg-config expert, you assume that it must be there for a reason.  But then you look at the related gio-2.0 file, and there in the version field is an actual number!  2.48.1, to be exact!
  • So finally it dawns on you: the person whose neck you need to squeeze, hard, mercilessly, is the person who install the broken pkg-config file for gio-unix-2.0, the one that didn't get the version number so it fails all checks.
  • But the CMake devs do not get away so easily!  It's still a flaw in their product - when they test for a library with a version >=2.40, and the version they get is obviously broken - not a version number at all - they should throw a "broken config file" exception, not a "package not found" error.  It's just false to say the package was not found! And it would also be false to say that the package found had too low a version number! It did not fail the version test, it was incapable of even taking the version test, and that is the exception.

The real problem with build systems is that they do not treat the build engineering problem as a software development problem. To manage builds you need a real programming language, and you need to write your build scripts with all the care you would use in writing critical C code.  There is actually one project out there that takes this approach, the only one I know of: http://boot-clj.com/. Whereas previous build systems are of the "build a better mousetrap" variety, so they're all pretty much the same thing, boot represents a fairly radical re-conceptualization of what building software is all about.

opkg

Both the Intel Edison and the Dell 3290 (Intel IoT Gateway) with Wind River Linux, being embedded systems, run a Yocto-based OS.  The package manager for Yocto is opkg, not apt nor rpm nor any of the other PMs one typically encounters in full-featured Linux environments.  Unfortunately, good information on how to correctly use opkg is not easy to find.  Here's what worked for me:

Here "Edison" means an Intel Edison module updated to the latest image as of June 2016, and "Gateway" means a Dell/Wyse 3290 running Wind River Linux version 6.  Note that the Intel IoT Gateway platform also supports Ubuntu and Windows OSs.

opkg homepage:  https://wiki.openwrt.org/doc/techref/opkg.

Repositories:

https://downloads.openwrt.org/snapshots/trunk/

Config file:

  • Edison:
  • Gateway:  /etc/opkg/opgk.conf
Here's what I used on the Gateway:

# opkg.conf

dest root /
dest ram /tmp
lists_dir ext /var/opkg-lists
option overlay_root /overlay
src/gz x86_base http://downloads.openwrt.org/snapshots/trunk/x86/64/packages/base/
src/gz x86_packages http://downloads.openwrt.org/snapshots/trunk/x86/64/packages/packages/