Bazel! Correct, reproducible, fast builds

Bazel is a build tool from Google that I used a lot during my time there. Happily, it’s been open sourced, and so I can look into using it for my own Python projects.

Let me tell you why I’m excited about it, and then follow up with some of the details of my initial exploration.

Tests are great #

I love tests. I think that for software projects in general, and Python projects in particular, it’s very important to run the full test suite before every change. It helps make sure your code is always releasable, it prevents regressions, and it means that anyone can start working on your project with confidence.

However, running a full test suite can sometimes take a very long time. One project I worked on had a test suite that took four hours, making it nearly impossible to get anything done.

Worse, the slow test suite actively discouraged small changes. Who is going to fix a typo when it might take more than two days to get merged?

What you want in that case is to only run the tests for the changed code. That way, the cost of a change becomes proportional to the change.

The problem is: how do you do that? The answer: Bazel.

Correct, reproducible, fast tests #

Here’s what happens when you run Bazel on the experimental project I set up:

$ bazel test --test_output=errors spike:test_spike
INFO: Found 1 test target...
Target //spike:test_spike up-to-date:
  bazel-bin/spike/test_spike
INFO: Elapsed time: 0.273s, Critical Path: 0.05s
//spike:test_spike                                                       PASSED

Executed 1 out of 1 tests: 1 test passes.

It ran 1 test, and that test passed. Here’s what happens when you run it again:

$ bazel test --test_output=errors spike:test_spike
INFO: Found 1 test target...
Target //spike:test_spike up-to-date:
  bazel-bin/spike/test_spike
INFO: Elapsed time: 0.119s, Critical Path: 0.00s
//spike:test_spike                                          (1/0 cached) PASSED

Executed 0 out of 1 tests: 1 test passes.

It didn’t run the test, because it knew it was going to pass.

Here’s what happens when you change the underlying code and then run the tests:

$ bazel test --test_output=errors spike:all
INFO: Found 1 target and 1 test target...
INFO: Elapsed time: 0.145s, Critical Path: 0.03s
//spike:test_spike                                                       PASSED

Executed 1 out of 1 tests: 1 test passes.

It ran the test, because it saw that the code that the test depended on has changed.

Not much of a speed-up on a small project like this, but imagine how it could be on a much bigger project.

How does it work?

BUILD files #

The secret sauce is in the BUILD files that tell Bazel how to build a project. Here’s the BUILD file from the spike:

py_library(
    name='spike',
    srcs=[
        '_spike.py',
    ],
)


py_test(
    name='test_spike',
    srcs=[
        'tests/test_spike.py',
    ],
    deps=[
        ':spike',
    ],
    size = 'small',
)

This defines two targets: :spike and :test_spike. In Bazel, each target must specify all of its dependencies.

Here, spike depends on nothing, and has a single source code file. :test_spike depends on :spike. If we remove that dependency, then the tests fail:

Traceback (most recent call last):
  File "/home/jml/.cache/bazel/_bazel_jml/5349438f91eafd39f2b56a30e3eeae42/bazel-python-spike/bazel-out/local_linux-fastbuild/bin/spike/test_spike.runfiles/spike/tests/test_spike.py", line 4, in <module>
    from spike._spike import square
ImportError: No module named _spike

How it achieves this isolation is beyond the scope of this post. You can see some hints in my spike’s README or in the Bazel documentation.

This means that our BUILD file is guaranteed to have the full list of inputs for any build target, which means that Bazel can tell whether its possible for a change to break our tests.

Functional programming wonks will recognize this: making the test a pure function of its inputs means the test run becomes an idempotent operation.

Of course, once you put it like that, it becomes clear that a test might not actually be a pure function. It might rely on the network, or on the filesystem, or an environment variable or something like that. The test encyclopedia outlines the conditions that are necessary for Bazel to run tests with proper isolation.

If a test fails, then running bazel test will actually run the test again. This is rather sensible, since flaky tests are an actual thing.

Actually using Bazel for serious work #

That’s all well and good, but can we use Bazel for our day jobs?

Well, maybe. I haven’t done a full experiment on a serious code base, but there are some open questions and issues:

First, all of the dependencies for your project must themselves have a BUILD file. Bazel doesn’t know about pip or requirements.txt.

Second, some projects need to run tests on multiple platforms (e.g. Twisted has to pass tests on Windows). I’ve got no idea how to set this up with Bazel, or if it’s even possible.

Together, these mean that I might not (yet!) try to use Bazel for released software like Twisted, but would seriously consider using it for deployed software like Launchpad.

For better or worse, I’m not actually working on any deployed software in Python at the moment, so that means my exploration stops here.

However, I think this is one of the most exciting recent software releases, so I encourage you to pick up where I’ve left off. Try it out, and let me know how it goes.