Is NixOS truly reproducible?

Build reproducibility is often considered as a de facto feature provided by functional package managers like Nix. Although the functional package manager model has important assets in the quest for build reproducibility (like reproducibility of build environments for example¹), it is clear among practitioners that Nix does not guarantee that all its builds achieve bitwise reproducibility. In fact, it is not complicated to write a Nix package that builds an artifact non-deterministically:

let
  pkgs = import <nixpkgs> { };
in
pkgs.runCommand "random" { } ''
  echo $RANDOM > $out
''

Despite this, build reproducibility has historically been used as a marketing argument by the NixOS community, with the catchphrase “Reproducible builds and deployments” appearing as a headline of the nixos.org page until 2023². This situation has even occasionally created tensions with members of the reproducible-builds group who dedicate a lot of time contributing patches in compilers and downstream projects to make them bitwise reproducible for everyone, and prompted blog posts such as “NixOS is not reproducible” by Foxboron.

Furthermore, an objective answer to the question “How good is NixOS for bitwise reproducibility” is difficult to give, as there exist no reproducibility monitoring at the scale of the Nix package set (nixpkgs), contrary to other Linux distributions like Debian. One of the reasons for that is that nixpkgs is such a big package set (about 100k packages at the time of writing), that systematically testing for bitwise reproducibility demands huge resources³.

Why is build reproducibility important?

One direct application of reproducible-builds is increasing trust in the software supply chain by allowing users to independently verify the trustworthiness of binaries they download. Indeed, in most typical scenarios, users of Linux distributions will not compile their software directly on their machine but rather download a pre-compiled version supplied by their distribution. The problem here is that the user have to trust that the artifacts that they acquire have not been tampered with (for example, if the compilation server is compromised).

When a software is reproducible, it makes it possible on the other hand to locally compile it and verify that the exact same artifacts are obtained, hence allowing to build trust in the artifact distributed by the Linux distribution. It is also possible to delegate this verification to one or several third-parties, hence “distributing” the trust one have in a given artifact.

Caption: *Leveraging reproducible-builds to increase trust in distributed artifacts.*

Science to the rescue!

As part of my PhD, and under the supervision of Théo Zimmermann and Stefano Zacchiroli, I have empirically studied bitwise build reproducibility in nixpkgs over a time period of 6 years. I warmly advise the reader to go have a look at “Does Functional Package Management Enable Reproducible Builds at Scale? Yes.”, our research article – to be published at MSR’25 – reporting all our findings on the matter, but I’ll try to synthesize our takeaways in this blog post.

Research methodology

We selected 17 nixpkgs revisions, regularly spaced between 2017 and 2023, and locally recompiled the integrality of the packages from these revisions. We then compared the output with the ground truth (historical builds from Hydra, the nixpkgs continuous integration) to determine if the package is bitwise reproducible or not.

Note that even projects that do monitor build reproducibility like Debian or Arch Linux only do it at a given point in time (when the package gets built) and not backward in time like we did in our experiment!

For every non-reproducible build, we then went further to try to infer reasons for non reproducibility:

We generated the diffoscope between both builds in order to understand how the artifacts differ;
In case the package becomes reproducible later at a later point in time in our dataset, we identified the commit that fixed the reproducibility issues using a bisection of the nixpkgs repository, to try to understand if the reproducibility fixes give use insights on the reproducibility issues.

Caption: *Description of our build and analysis pipeline.*

Key findings

Reproducibility rate in nixpkgs

Our most important finding is that the reproducibility rate in nixpkgs has increased steadily from 69% in 2017 to about 91% in April 2023. The high reproducibility rate in our most recent revision is quite impressive, given both the size of the package set and the absence of systematic monitoring in nixpkgs. We knew that it was possible to achieve very good reproducibility rate in smaller package sets like Debian, but this shows that achieving very high bitwise reproducibility is possible at scale, something that was believed impossible by practitioners⁴.

Caption: Absolute numbers of reproducible, rebuildable (but unreproducible) and non-rebuildable packages over time.

As shown by the figure, we can also spot a reproducibility regression around June 2020 that we traced back to a pip regression that happened at the time and that was tracked by the NixOS community.
Reasons for unreproducibilities

By manually analyzing the diffoscopes from our unreproducible packages, we were also able to identify the most prevalent causes for non reproducibility and devise heuristics to identify them. With these heuristics, we are able to identify at least one non-reproducibility reason for about 20% of the diffoscopes.

Our derived heuristics are:
- embedded dates (the date of the build appearing in the artifacts), accounting for 14.8% of the unreproducible packages (despite being one of the main recommendations from the reproducible-builds group to improve reproducibility!);
- embedded uname outputs (containing impure information about the host running the build), accounting for 1.3% of the unreproducible packages;
- embedded environment variables, accounting for 2.2% of the unreproducible packages;
- embedded build ids (some ecosystems embed a unique – but not deterministic – build ID into the artifacts), also accounting for 2.2% of the unreproducible packages.
Caption: Evolution of the number of packages for which we generated diffoscopes that are matched by each of our heuristics, over time.

The interesting aspect of these causes is that they show that even if nixpkgs already achieves great reproducibility rates, there still exists some low hanging fruits towards improving reproducibility that could be tackled by the Nix community and the whole FOSS ecosystem.

Additional research questions are covered in the paper and more insights on the difference of reproducibility in different nixpkgs’s ecosystems or our study of the reproducibility fixes await the curious reader there!

What does this mean for the NixOS community?

This research work shows that there is a very large proportion of packages that are already bitwise reproducible in nixpkgs. One direct consequence of this, is that for most packages, build reproducibility could be leveraged in order to increase trust in the Nix substitution protocol (downloading pre-built packages from caches). This justifies investing resources to build solutions towards distributed cache solutions relying on build reproducibility, like the one envisioned by the Trustix project for example.

See https://hal.science/hal-04430009 ↩︎
See https://github.com/NixOS/nixos-homepage/pull/1077 ↩︎
There exists however limited monitoring of the reproducibility of the ISO images, see reproducible.nixos.org.↩︎
See https://ieeexplore.ieee.org/document/10179320/↩︎

Why is build reproducibility important?

Science to the rescue!

Research methodology

Key findings

What does this mean for the NixOS community?

My socials