## Monday, May 14, 2018

### Bazel: genrule patching an external repo

Just for fun I decided to try a quick-and-dirty Bazel configuration for Iotivity (github mirror).  It turned out to be much easier than I had expected. Over the space of a weekend I was able to enable Bazel builds for the core C/C++ API and also the Java and Android APIs. These should be considered Proof of Concept for the moment, since they need to be refined a bit (compiler options, platform-specific configuration, etc.)  Only tested on Mac OS X, but should be easily adapted to Linux and Windows.

I did come across one hairball that took the better part of a day to figure out.  It involves patching an external package. Since fixing it involved various troubleshooting techniques that are not documented, this article will describe the problem, the solution, and some of the ways that I figured out what was going on. I'll also try to explain how external repos and genrules work.

Iotivity uses several external packages (libcoap, mbedtls, tinycbor).  The Scons-based build system looks for them, and if it does not find them, displays a message instructing the user to download the package and exits. When Scons is rerun, it builds the packages along with Iotivity.

Bazel offers a much better solution. You just set such packages up as external repositories and Bazel will download and build them as needed, without user intervention. It's embarrassingly simple. Here's how tinycbor is handled:

In file WORKSPACE:

new_http_archive(
name = "tinycbor",
urls = ["https://github.com/intel/tinycbor/archive/v0.5.1.zip"],
sha256 = "48e664e10acec590795614ecec1a71be7263a04053acb9ee81f7085fb9116369",
strip_prefix = "tinycbor-0.5.1",
build_file = "config/tinycbor.BUILD",
)

This defines an external repository, whose label is @tinycbor.  In file config/tinycbor.BUILD you specify the build the same way you would if it were a local repo:

cc_library(
name = "tinycbor-lib",
copts = ["-Iexternal/tinycbor/src"],
srcs = ["src/cborparser.c",
"src/cborparser_dup_string.c",
"src/cborencoder.c",
"src/cborerrorstrings.c"],
hdrs = glob(["src/*.h"]),
visibility = ["//visibility:public"]
)

That's it! Test it by running this from the command line:  $bazel build @tinycbor//:tinycbor-lib. Bazel will download the library to a hidden directory, unzip it, compile it, and make it available to other tasks under the @<repo>//<target> label, ie. @tinycbor//:tinycbor-lib. Add it as a dependency for a cc_* build like so: deps = ["@tinycbor//:tinycbor-lib"] The problem is that Iotivity patches the mbedtls library. Furthermore, it provides a customized config.h intended to replace the one that comes with the library (using copy rather than patch). It took a considerable amount of trial and error to figure out how to do this with Bazel. So we have three tasks: 1. Configure the external repo as a new_http_archive rule in WORKSPACE 2. Define a genrule to patch the library 3. Arrange for the custom config.h to replace the default version 4. Define a cc_library rule to compile the patched code Here's how I set things up: WORKSPACE: new_http_archive( name = "mbedtls", urls = ["https://github.com/ARMmbed/mbedtls/archive/mbedtls-2.4.2.zip"], sha256 = "dacb9f5dd438c456b9ef6627637f46e16fd41e86d828217ec9f8547d3d22a338", strip_prefix = "mbedtls-mbedtls-2.4.2", build_file = "config/mbedtls/BUILD", ) In config/mbedtls I have the following files: BUILD, ocf.patch, and config.h. A Bazel genrule allows you to run Bash commands from Bazel. It must lists all inputs and all outputs, so that Bazel can guarantee that you indeed output exactly what you promised, no more and no less. It writes output to ./bazel-genfiles/ which it then makes available to other tasks. Unfortunately the documentation is a little weak, so I had to discover the hard way just what Bazel considers an output. #### Exploring external repos using genrule Firsts let's take a look at what happens when Bazel downloads and unzips an external repo. We can do this using a simple genrule in config/mbedtls/BUILD: genrule( name = "gentest", srcs = glob(["**/*"]), outs = ["genrule.log"], cmd = "pwd > [email protected]" ) Run "$ bazel build @mbedtls". You should get a message like the following:

Target @mbedtls//:gentest up-to-date:
bazel-genfiles/external/mbedtls/genrule.log

Browse the genrule.log file and you'll see it contains the working directory of the genrule cmd, something like:

/private/var/tmp/_bazel_gar/a2778d8bc5379ccd6c684731e73b4da6/sandbox/4850556017797389628/execroot/__main__

The first lesson here is that Bazel sandboxes execution for this external repo.

The second lesson is that you must write outputs to the appropiate Bazel-defined directory. That's what the [email protected] is for: it's the name of the real the output file. If you use "pwd > genrule.log", you'll get an error: "declared output 'external/mbedtls/genrule.log' was not created...".  That does not mean that genrule.log was not written, it means rather that it was written in the wrong place.

You can see what [email protected] is by using "echo [email protected] > [email protected]"; the log will then contain:

bazel-out/darwin-fastbuild/genfiles/external/mbedtls/genrule.log

Now try changing the cmd to "ls > [email protected]".  Then genrule.log should contain:

bazel-out
external

Now try "ls -R > [email protected]" to get a recursive listing of the tree; examine it and you will see that Bazel has unzipped the mbedtls library in ./external/mbedtls.

Finally, try this cmd: "\n".join(["ls > genrule.log", "ls > [email protected]"])

This will show you that ls > genrule.log gets written to the execroot, whereas ls > [email protected] gets written to the right place.

#### Applying a patch

Now let's write a genrule to apply the patch. This is a little tricker, since it has multiple outputs. If you try to use [email protected] Bazel will complain. Furthermore, since patch updates files in place, we need to copy the library to a new directory and apply the patch there. Finally, we need to make the patch file available - since our genrule will execute in the sandboxed execroot, we do not automatically have access to config/mbedtls/ocf.patch.

First let's expose ocf.patch. This is simple but involves an obscure function. Put the following at the top of config/mbedtls/BUILD:  exports_files(["config.h", "ocf.patch"])  This will make config/mbedtls/ocf.patch available under a Bazel label: "@//config/mbedtls:ocf.patch"

Our genrule starts out like this:

genrule(
name = "patch",
srcs = glob(["**/*.c"])
+ glob(["**/*.h"])
+ glob(["**/*.data"])
+ glob(["**/*.function"])
+ glob(["**/*.sh"])
+ ["@//config/mbedtls:ocf.patch"],
...)

The globs pick up all the files that are listed in the patch file and thus required as input (plus others, but that's ok). It also must list the patch file, since that is an input. All inputs must be explicitly listed.

Our command will look like this:

cmd  = "\n".join([
"cp -R external/mbedtls/ patched",
"patch -dpatched -p1 -l -f < $(location @//config/mbedtls:ocf.patch)" ....]) We first copy the entire tree to a new directory "patched" (e.g. external/mbedtls/include -> patched/include, etc). We then need to add -dpatched to the patch command, so it runs from the correct subdir. To access the patch file we use$(location @//config/mbedtls:ocf.patch); this is a Bazel feature that retuns the correct (Bazel-controlled) path for ocf.patch.

This will apply the patch, but it will not produce the output required by genrule. It's just like "ls > gentest.log" above: the output gets written but not in the write place. Where is the right place? That's what $(@D) is for. It's a so-called "Makefile variable"; see Other Variables available to the cmd attribute of a genrule. It resolves to the Bazel-defined output directory when you have multiple outputs. In this case: bazel-out/darwin-fastbuild/genfiles/external/mbedtls. (Compare this to the value of [email protected]). So now we need to copy the files we care about to$(@D). Fortunately this is easy; everything we need is already under patched/ so we just add "cp -R patched $(@D)" to our cmd. Finally we need to specify the outputs. Note that we only need source files for the library, even though the patchfile applies to additional files (e.g. some programs and test files). So we can limit our output to those files: outs = ["patched/" + x for x in glob(["**/library/*.c"])] + ["patched/" + x for x in glob(["**/*.h"], exclude=["**/config.h"])], Here we use a Python facility (the language of Bazel is a Python variant). We are only interested in the library files so we do not output any of the other stuff. We also exclude config.h since we are supplying a custom version. NOTE: through trial and error, I have discovered that genrule will allow you to output files that are not listed in the outs array, but it will not emit them. In this case, our command copies the entire source tree to$(@D), but our outs array only contains c files and h files.  The resulting genfiles tree contains only those files, to the exclusion of various other files in the source (e.g. *.data). So evidently Bazel is smart enough to eliminate files not listed in outs from $(@D). Here's the final genrule: genrule( name = "patch", srcs = glob(["**/*.c"]) + glob(["**/*.h"]) + glob(["**/*.data"]) + glob(["**/*.function"]) + glob(["**/*.sh"]) + ["@//config/mbedtls:ocf.patch"], outs = ["patched/" + x for x in glob(["**/library/*.c"])] + ["patched/" + x for x in glob(["**/include/**/*.h"], exclude=["**/config.h"])], cmd = "\n".join([ "cp -R external/mbedtls/ patched", "patch -dpatched -p1 -l -f <$(location @//config/mbedtls:ocf.patch)",
"cp -R patched \$(@D)",
])
)

#### Build the library from the patched sources

First off, get the vanilla build working. This is pretty easy, it looks similar to the tinycbor example above.

Unfortunately, getting the lib to build using the patches turned out to be quite difficult. What I came up with is the following (which I do not entirely understand).

First, I had a devil of a time getting the header paths right. In the end the only thing I found that works is to list them all explicitly; globbing does not work.  So I have:

mbedtls_hdrs = ["patched/include/mbedtls/aes.h",
"patched/include/mbedtls/aesni.h",
"patched/include/mbedtls/arc4.h",
...
]

Then I have:

cc_library(
name = "mbedtls-lib",
copts = ["-Ipatched/include",
"-Ipatched/include/mbedtls",
"-Iconfig",
"-Iconfig/mbedtls"],
data = [":patch"],
srcs = [":patch"],
hdrs = mbedtls_hdrs + ["@//config/mbedtls:config.h"],
includes = ["patched/include", "patched/include/mbedtls", "x"],
visibility = ["//visibility:public"]
)

Omitting either hdrs or includes causes breakage, dunno why.

For that matter, to be honest, I don't yet know if the build is good, because I have not used the lib with a running app yet.  But it builds!