To start understanding what Bazel is and does, we need to know what build systems are. This, and a broad comparison of characteristics of build systems, will be explained in the first chapter. Next, the most used build tool at this time, namely Make, is discussed in more detail, together with CMake. Finally, Bazel is discussed in detail and compared to other build tools.
What is a build system
A build system is basically a program that knows how to build other programs, or in other words a tool that provides automated building. Building an application includes compiling, linking and packaging the code into a usable or executable form.
The tasks that are often included in an automated build are:
- Downloading dependencies
- Compiling source code into binary code
- Packaging that binary code
- Running tests
- Deployment to production systems
Global comparison of multiple build systems
Currently there are a lot of build systems in use, too many to cover all. The list includes: Make, Ninja, Gradle, Maven, SCons, Bazel, Shake, …
Build systems have a number of features on which they can be compared:
- Static and/or dynamic dependencies
- Minimal builds
- Cloud builds
- Early cutoff
Static and/or dynamic dependencies
Build systems define a set of targets that needs to be created. These targets can have some dependencies that need to be available before the end target can be created.
These dependencies can be static or dynamic. Static dependencies need to be known upfront, before the build is started. Dynamic dependencies may not be known upfront and are discovered after the build has started.
A minimal build system is a system that executes tasks at most once per build, and only if they transitively depend on inputs that changed the previous build.
When build systems are used in large teams you often end up executing the same tasks on different machines. This can be optimised by storing the intermediates in the cloud and re-using these in stead of re-executing.
Build systems that have an early cutoff feature can stop the build early when nothing changes.
Most build systems only track changes of inputs and intermediate results. Self tracking build systems also detect changes to the build tasks, for example because compile flags were changed.
How it works
Make is a tool which was created in 1976 and one of the most appearing build tools in the industry. Make is a cross-platform tool which will look for Makefiles that contain a set of rules. Such a rule or task describes how to make a target and consists of 3 parts:
target: prerequisites recipe
If a prerequisite doesn’t exists yet, the make tool executes the recipe to generate that prerequisite. This is repeated until all prerequisites have been built.
hello_world: util.o main.o gcc util.o main.o -o hello_world util.o: util.h util.c gcc -c util.c main.o: util.h main.c gcc -c main.c
The above Makefile lists three tasks:
Compile a utility library comprising files
util.oby executing the command
gcc -c util.c
Compile the main source file
Link object files
main.ointo the executable
If the Makefiles are configured to keep the build artefacts, such as the object files, the make tool is intelligent enough to rebuild only those parts of a program that have been changed since the previous build. Whether or not a target needs to be rebuilt depends on the timestamps of the target’s dependencies. This avoids rebuilding the entire program for every small change.
One main disadvantage of Make is that the targets in the Makefile don’t depend on the Makefile itself. This can be overcome by adding the Makefile to the prerequisites list of a target.
target: Makefile prerequisites recipe
By doing this the target depends also on the Makefile. Thus if the Makefile was changed, all targets that depend on the Makefile are going to be rebuilt. Even if the Makefile changes don’t have an impact on the target, the target is going to be rebuilt.
If Make is used on its own, the developer needs to write the Makefiles themselves. For big projects, it can be a very complex task to define all the targets and dependencies. The Makefiles are source code in their own right and need to be maintained. A very common tool used to solve this issue is CMake, which is discussed in next section.
- Uses Makefiles
- Makefiles are source code
- For big projects Makefiles, can get very complex
- The timestamp of dependencies is used to check if a part needs to be rebuilt
- Only static dependencies
CMake was released in 2000 and is a cross-platform tool which is used to generate build files for different kinds of build tools, such as Make and Ninja.
A CMake-based build system is organised as a set of high-level logical targets: executable, library or custom target. When using plain Make, the developer just needs to create his Makefiles and enter make. With CMake, this process takes 2 steps. The first step is setting up the build environment through a CMakeLists.txt file, from which the build files, such as Makefiles, are generated, depending on the set environment. After this first step the build files are available and the actual build can be performed using the correct build tool like Make.
The CMakeLists.txt file is the main entry point for CMake and is written in the CMake language, which is very declarative. This file is positioned in the top-level source directory. In it, you can define your complete build specification. Alternatively, the add_subdirectory() command can be used to add subdirectories, with more build specifications, to the build. If subdirectories are used then each subdirectory needs to contain a CMakeLists.txt file as well.
Some key points from CMake:
- Very declarative
- Generates build files, CMake doesn’t build on its own
- All usual compiler and linker flags, libraries commands, … are replaced with platform and build system independent commands
- Easier than Make to compile your files into a shared library
- Automatic discovery and configuration of the toolchain (such as the compiler)
To learn more about using CMake you can click on the link in the references.
Bazel is an open source build tool that grew out of Google’s build system Blaze. Bazel was open-sourced in 2015 and has a large community that works on its further development. The 1.0 version of Bazel has been released at the end of 2019.
The syntax of Bazel is very similar to CMake: it’s very declarative and readable. Like CMake, it has built-in rules to build e.g. C, C++ or Java libraries and/or executables. As with CMake, you have to define your target name and declare your sources and dependencies. The biggest difference is that Bazel is way more explicit. You have to define every dependency, including the header files. Bazel throws an error if it detects an included header file that’s not defined in the dependencies list. That’s due the fact that Bazel creates a sandboxed environment for each binary or library it needs to build. Because of this, Bazel can guarantee the correctness and reproducibility of a build.
To achieve this sandboxed environment, Bazel will run processes in a working directory which only contains known inputs. Because of this the compiler, and other tools, can only see source files they should be able to access. If a source file isn’t mentioned in Bazel’s BUILD file, this file isn’t going to be present in the sandboxed environment, and the compiler isn’t going to find the given source file.
An example Bazel BUILD file looks like this:
cc_library( name = "hello-greet", srcs = ["hello-greet.cc"], hdrs = ["hello-greet.h"], ) cc_binary( name = "hello-world", srcs = ["hello-world.cc"], deps = [ ":hello-greet", ], )
Bazel can use this BUILD file to create a dependency graph. When a file is changed, this graph is used to determine which dependencies need to be rebuilt. The main focus of Bazel is correctness of a build, paired with a very efficient build performance.
Some characteristics of Bazel are:
- Self-tracking: The Bazel BUILD files are implicit dependencies of themselves.
Cloud building: To support cloud builds, Bazel maintains:
a content addressable remote cache that can be used to download a previously built file given the hash of its content.
a history of all executed build commands, annotated with observed file hashes.
Sandboxed builds: To guarantee the correctness and reproducibility of a build, Bazel uses sandboxed builds.
The BUILD file is enough for Bazel to know how to build the target. This file is enough to create the complete dependency graph and sandboxed environment. It doesn’t need to walk the complete code base to know what it needs to build and what not. With Makefiles, it can take minutes to even read a complete Makefile, definitely when you have a very large code base. This is because Make needs to re-evaluate every rule, and create a proper rule tree, thereafter every dependency needs to be checked.
With Make, if the timestamp of a dependency changed this means the target needs to be rebuilt. Bazel uses a hash of a dependency instead. The advantages of this are that it can determine more precisely if a dependency really changed. It can happen that the timestamp of a file has changed, but the content of the file is not. This is used to determine if a target needs to be rebuild or not.
Bazel can use a remote cache to store intermediate build results. This can be useful when many developers work on the same code base. When a new Bazel build is performed, Bazel can check with its remote cache if a hash is already available for each dependency. If that’s the case, the dependency doesn’t need to be rebuilt and can simply be pulled from the remote cache.
Sandboxed environment. This is very important to guarantee the correctness of a build, and that every build should result in the same output.
Bazel can be used for almost all programming languages. Not all are supported by default, but Bazel has an extension language that makes it possible to use Bazel for almost every programming language and for a combination of languages.
=> Main advantage of Bazel is the dependency graph and the efficiency that comes with it. Together with a remote cache, sandboxed actions, early cutoff and hashes of intermediate results this generally results in a much faster build than with Make.
Currently, CMake is the industry standard build tool, in combination with Make. Switching to a new build system takes some effort.
The documentation of Bazel is not as extensive as with Make.
Bazel is not a minimal build system in the sense that it may restart a task multiple times as new dependencies are discovered and rebuilt. But it supports early cutoff, ensuring that a restart of an unnecessary task aborts quickly.