From Create-React-App to Bazel: how to build things

Prologue

MetaNotes originally started as a create-react-app template. CRA is the simplest way to get your React application up and running, after all. The original prototype also included a ReactNative version, and to maintain both in the same repo I opted in for a monorepo approach with yarn & lerna, having the shared code in their own subpackages.

Boy, was I in for a world of pain.

Working with several local packages was never easy with node and yarn workspaces only partially simplified the issues. One problem I faced what my api—built on grpc & protobufs—depending on @grpc/grpc-js for node and on grpc-web for the frontend. Just having @grpc/grpc-js in dependencies of a package caused webpack to barf as it tried to pack the frontend. In the end my solution was to make two packages with the same proto, one would build the grpc version, the other one would only have the grpc-web one. This worked well, apart from the fact that I now had two api.protos that I had to keep in sync.

Moving on I learned about the suffering that is ReactNative and expo and their handing of the package root. It’s insane we have build tools that can only work from the root of the package, but here we are, welcome to the immovable node_modules!

Much hoisting after, I dropped the RN version for good, deciding it’s too much pain to prototype both, focusing solely on the CRA web frontend. The development picked up the pace and I got to implement the “scribbles”—the metanotes’ version of dynamic notes. In a nutshell scribbles are plain jsx files, but they aren’t compiled in by the packer, instead their source is inlined into the frontend, and they are evaluated at runtime, allowing the users to override them and thus modify the UI and the core metanotes behaviour.

To process scribbles, I had to write a webpack loader. I didn’t want to eject, so I opted in for react-app-rewired instead. My loader was trivial—it’d parse a magical metadata header from the source js and emit a JSON with the header and the body as a string.

It worked until I needed to evaluate some scribbles as part of my tests. Of course Jest had no idea how to load them, and I couldn’t use the webpack loader. Some research after I came with a babel loader instead. It was pretty ugly, looking up the require()s for any .metanotes.js and inlining the processed JSONs (same ones as the webpack loader would generate). Now I could access my scribbles in the tests, but the devserver couldn’t watch for changes in them, and caching wouldn’t allow me to see any updates to the scribbles until I restarted the devserver. Eventually, I just turned the caching off.

The pile of complexity kept rolling and at some point I realised I have too many sticks supporting my CRA build system while I was working on a markdown parser—a component tangental to CRA web app itself but one that was too painful to extract into a separate package.

Then I remembered about Bazel

I worked with the google’s version of Bazel for many years, I was pretty good at understanding the BUILD files, and I thought that Bazel could be a good choice to alleviate the complexities of managing a nodejs monorepo. Bazel was surprisingly easy to install, yarn add @bazel/bazilisk is Bazel’s mechanism for version manager for node. Bazilisk downloads the required Bazel version, and you don’t need to fiddle with Bazel releases and java runtimes.

Hello world

Building TypeScript with Bazel isn’t as straightforward as the tutorials say, but it’s manageable. Here’s how my backend build rule could look:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


load("@npm//@bazel/typescript:index.bzl", "ts_project")

package(default_visibility = ["//visibility:public"])

ts_project(
    name = "backend_lib",
    srcs = glob(["*.ts"]),
    deps = [
        "@npm//:node_modules",
    ],
)

ts_project rule tells Bazel to run the typescript compiler tsc in the project mode, it’s the closest Bazel gets to the common nodejs opensource tooling. There’s also a “legacy” rule called ts_library. It seems more flexible at the fist glance, but it’s very non-trivial to integrate into a wider ecosystem. For one, it generates weird amd/umd bundles for running under the Bazel’s concat_devserver—the local version of an interactive webserver with hot reload.

What Bazel gets right

ts_project is extremely flexible. You can define your sources with a glob so that you don’t need to list them one by one. At the same time you don’t need to spell out your node_modules, having a single dependency on all the modules is enough. Bazel offloads the version tracking of your modules to npm or yarn, but uses the package.json and the lockfile as the source of truth and then manages the actual modules source. This allows for some amazing flexibility for monorepos. I have 3 node_modules in my monorepo: npm is the root one for the build tools, @npm_backend manages the backend dependencies, @npm_frontend manages the frontend. Further on, I depend on the individual packages in my rules, so the example ts_project above looks like this in reality:

1
2
3
4
5
6
7
8


deps = [
    "//src/common/api:api_grpc_lib",
    "@npm_backend//@grpc/grpc-js",
    "@npm_backend//@types",
    "@npm_backend//google-protobuf",
    "@npm_backend//sqlite",
    "@npm_backend//sqlite3",
],

api_grpc_lib is an internal dependency on another package in my monorepo (the pb & grpc api), all the node packages apart from everything in @ypes are spelled out individually. This allows isolating the dependencies for parts of your project.

Why would you want to complicate your dependencies?

Bazel is great at dependency solving and visibility. Imagine you have an npm package in your package.json: emphasis that depends on strong and italic. Generally, there’s nothing stopping you from doing something like this in your JS code:

1

import Italic from 'italic';

because italic is present in your node_modules as a transitive dependency. It’s not a direct dependency, but you get it through emphasis so it will be always present when you build your app. This surely causes many issues, such as inability to properly track the dependency versioning. Bazel prevents this. Unless you have italic spelled out in your package.json it won’t be “visible” to your ts_project rule, and what’s not visible you can’t depend on! Furthermore, if you spell out every dependency per parts of your app you can prevent the misuse of legitimate dependencies from pacakge.json, e.g. only the storage part of your backend can depend on sqlite. This allows amazing granularity and extreme visibility into how dependencies propagate through your app. Bazel has a special “query” interface to get details on the dependencies graph and it’s so much more powerful than yarn why.

Back to the protobufs

Bazel has several rules to build protobufs, but none seemed to fit my use case exactly, so I came up with a solution of my own. Because Bazel can effortlessly run tools from your node packages, I opted in for the same toolchain I used with CRA:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57


npm_package_bin(
    name = "api_pb",
    outs = [
        "api_pb.d.ts",
        "api_pb.js",
    ],
    args = [
        "-I./" + package_name(),
        "$(execpath api.proto)",
        "--plugin=protoc-gen-ts=$(execpath @npm//:node_modules/grpc_tools_node_protoc_ts/bin/protoc-gen-ts)",
        "--plugin=protoc-gen-grpc-web=$(execpath @com_github_grpc_grpc_web//javascript/net/grpc/web:protoc-gen-grpc-web)",
        "--ts_out=$(@D)",
        "--js_out=import_style=commonjs:$(@D)",
    ],
    data = [
        "api.proto",
        "@com_github_grpc_grpc_web//javascript/net/grpc/web:protoc-gen-grpc-web",
        "@npm//:node_modules/grpc_tools_node_protoc_ts/bin/protoc-gen-ts",
        "@npm//grpc_tools_node_protoc_ts",
    ],
    package = "grpc-tools",
    package_bin = "grpc_tools_node_protoc",
)

npm_package_bin(
    name = "api_grpc",
    outs = [
        "api_grpc_pb.d.ts",
        "api_grpc_pb.js",
    ],
    ...
)

npm_package_bin(
    name = "api_grpc_web",
    outs = [
        "api_grpc_web_pb.d.ts",
        "api_grpc_web_pb.js",
    ],
    ...
)

js_library(
    name = "api_grpc_lib",
    srcs = [
        ":api_grpc",
        ":api_pb",
    ],
)

js_library(
    name = "api_grpc_web_lib",
    srcs = [
        ":api_grpc_web",
        ":api_pb",
    ],
)

I call into grpc-tools 3 times, each time requesting different inputs: proto, grpc and grpc-web bindings, then combine them into the frontend and backend parts.

Replacing the scribbles parser

Now that I didn’t have webpack nor babel I opted in for a straightforward approach. I rewrote my generator plugin into a simple cli tool that takes a source scribble and the output file name and processes it. You can easily extend Bazel with your build rules, if you know basic python (starlark—the Bazel’s scripting language is a subset of python). My original attempt at building scribbles looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


scribble_gen_ts = rule(
	# the rule definition is tells Bazel about its inputs and outputs
    implementation = _scribble_gen_ts,
    attrs = {
		# in this case we take several "labels" which point at
		# files or otherwise generated data
        "srcs": attr.label_list(mandatory = True, allow_files = True),
		# and we use a convertor tool, for which there's a default
		# value so you don't need to spell it out in every rule call
        "_scribblegen": attr.label(
            default = Label("//tools:scribblegen"),
            executable = True,
            cfg = "exec",
        ),
    },
)

def _scribble_gen_ts(ctx):
    outputs = []
	# ctx.files.$ATTR maps to a list of all the files that were passed
	# through the relevant attribute.
    for f in ctx.files.srcs:
        if not (f.basename.endswith(".metanotes.js") or f.basename.endswith(".metanotes.jsx")):
            fail("'%s' is not a scribble" % f)
		# we replace the .metanotes.js[x] with .generated.ts and
		# declare the new file the output of the rule
        sfile = f.basename.rsplit(".", 2)[0]
        dfile = sfile + ".generated.ts"
        out_file = ctx.actions.declare_file(dfile, sibling = f)
        outputs.append(out_file)

    for i, o in zip(ctx.files.srcs, outputs):
		# and then we declare the action that takes a source file
		# and runs the tool to generate the output
        ctx.actions.run(
            inputs = [i],
            outputs = [o],
            progress_message = "Generating scribble %s" % i.short_path,
            executable = ctx.executable._scribblegen,
            arguments = [i.path, o.path],
        )
    return [DefaultInfo(files = depset(outputs))]

It might seem slow to call the same tool over dozens of input files, but because Bazel runs everything it can in parallel, the scribble generator runs faster than the webpack loader loads the same! And in the end you get the output that’s a magnitude more introspectable, as you can open any generated file and check its contents.

By converting scribbles into .ts files I can then run them through the tsc, and get the actual typings, it’s amazing and something that I couldn’t even imagine achievable with webpack loader.

Splitting the app into packages

The rest of the application was a mixture of .ts and .tsx code, the actual React part of it. ts_project coped with building it really well, revealing some misused dependencies in the process. I could split the frontend into several such projects, for the parsers, the redux store, the helper code and the juicy business logic. All such components can now be built isolated, making tests faster and easier. No longer you need to wait on rebuilding your CRA world to get only the parser part. No longer you need to split the code along the node package boundaries—your source code can exist in a logical hierarchy intermixed with BUILD files.

One issue I faced was importing files through those boundaries. You are suggested to use the traditional ../../../../let/me/find/that/file syntax, but you can also use the WORKSPACENAME/src/i/am/here for the “absolute imports” (make sure you specify link_workspace_root for any rule depending on the latter). Alternatively, your JS rules can have an attribute module_name that is roughly equal to the npm package name for that rule’s generated code. It allows you to import the code via short names, e.g. @metanotes/filter but I never managed to make VSCode work with those reliably. Also they don’t work with ts_project rules anyway (although you can wrap your ts_project into a js_library and make the module name work that way).

I never even wanted CRA!

Bazel comes with rollup by default, and rollup did a decent job of packing my app. Remember that at this stage Bazel offers you several dozen of individual .js files, and it’s up to you to make those into a coherent application. Unlike the webpack loaders it’s up to you to do all the conversion of your images, css, et al. into something your JS bundler can handle, but the result is significantly more obvious and easier to debug. Ever wished to see what’s the output of that specific webpack rule ran on one specific file? With Bazel you can do that!

Rollup was easy to understand once I realised I need plugins for everything. After that, it crunched through my sources and spat out 45Mb of unuglified development bundle (yeah, metanotes’ runtime is huge). Terser got that down to 7Mb—comparable to my CRA webpack runs.

Now, the bad stuff

The development story isn’t exactly clear. Bazel’s concat devserver doesn’t work easily with ts_project and the efforts to run the webpack in devmode are still underway. For me, it’s not much of a bother because I need full page reloads anyway, so I can opt in for a simpler http server for my code. The compilation time is blazing fast (and tsc can be set up as a daemon, making it even faster, but the bundle generation is pretty slow). I couldn’t figure how to propagate the .ts sourcemaps neither to rollup nor to webpack.

Conclusion

Still, I’m staying with Bazel now because for all the downsides my builds became significantly less fragile, I don’t get obscure errors because of an oddly cached file and my CI always builds exactly what I expect. I am in awe from how well-organised the source code is now, and I no longer have to hunt for the right package.json file. I understand my dependencies better, and I can finally work on reducing that 45Mb to something smaller. And I see a reasonable way forward for reintroducing ReactNative.

I’d like to thank everyone at bazel slack for helping me out and especially Alex Eagle for bearing with my questions and going through my issue reports on rules_nodejs.

Prologue#

Then I remembered about Bazel#

Hello world#

What Bazel gets right#

Why would you want to complicate your dependencies?#

Back to the protobufs#

Replacing the scribbles parser#

Splitting the app into packages#

I never even wanted CRA!#

Now, the bad stuff#

Conclusion#