Binary Asset Compiler Benchmarks

Tue Feb 19, 2019
~1300 Words
Tags: programming

To learn parts of LLVM related to machine code generation, I decided to follow through on a comment that a binary asset compiler could be written in a few hundred lines of C++. The goal of the program would be to provide fast and portable compilation of binary assets. The term compilation is used loosely here, it is more to wrap the asset into an object file, which can then be linked into a library or program. This project turned into a bit of code golf, and I implemented a second separate binary asset compiler. What follows is a comparison of the performance of these tools with other options.

All of the benchmarking was performed on a somewhat underpowered computer. It has 8GB of memory, and a 32GB eMMC hard drive. Admittedly, this is because that was what was available with Linux, but it also means that users with more powerful computers should be able to outperform the numbers below. The binary assets were files varying from 16B to 2GB, and contained random data.

In total, six programs were benchmarked. The first program is cp, which only copies the asset file. This program was included as a reference, as I doubt any of the compilers could be faster. The second program is xxd, which appears to be the standard approach when transpiling assets to C. Third is file2c, which is also a transpiler to C. However, file2c takes a different approach than xxd, and so has somewhat better performance. Compilation of the intermediate source files was gcc version 7.3.0. Fourth is incbin, which uses inline assembler provided by some compilers to include binary files. Fifth and sixth are the variants of the dedicated binary asset compilers.

To begin, look at the plots for the elapsed (wall clock) time used by the processes, and the maximum resident set size of the processes during their lifetimes. For both time and memory, the programs all exhibit similar curves, where there is an initial flat portion, likely linked to loading and code size, which then transitions into a rising portion when processing the assets dominates. The six program neatly split into three categories.

Asset Size (log2(#))binc-x86_64-linuxbinccpfile2cincbinxxd
4-1.30-0.57-2.30-1.32-1.09-1.35
6-1.63-1.40-2.78-1.32-1.26-1.31
8-1.44-1.33-2.78-1.35-1.23-1.35
10-1.44-1.52-2.78-1.35-1.26-1.29
12-1.63-1.44-2.48-1.35-1.23-1.12
14-1.70-1.40-2.78-1.32-1.12-0.83
16-1.57-1.25-2.78-1.24-1.13-0.33
18-1.52-1.30-2.78-0.83-1.290.28
20-1.52-1.33-2.30-0.38-1.290.88
22-1.33-1.05-1.660.20-1.181.51
24-0.99-0.73-1.260.73-0.952.22
26-0.41-0.16-0.711.53-0.562.82
280.130.39-0.172.170.20
301.161.441.411.43
312.212.472.272.37
Asset Size (log2(#))binc-x86_64-linuxbinccpfile2cincbinxxd
414.5014.9911.3113.9513.9913.95
614.5015.0011.3113.9513.9913.95
814.5115.0011.3013.9513.9913.95
1014.5015.0011.3113.9513.9913.95
1214.5015.0011.3313.9613.9913.99
1414.5014.9811.3313.9613.9914.10
1614.5015.0111.3014.0013.9914.48
1814.5015.0011.3114.1313.9915.44
2014.5015.0811.3314.9113.9917.03
2214.5115.4211.3016.3013.9918.91
2414.4916.3011.3218.0614.6521.30
2614.5017.8011.3120.0016.1922.88
2814.5019.6411.3321.7218.05
3014.4921.6011.3220.01
3114.5022.5911.3521.01

Transpiling

The first category of programs is xxd and file2c. Unlike the other approaches, which appear to be limited by disk IO, these tools are quite a bit slower then a simple file copy. They show higher CPU usage then the other approaches, independant of the asset size. I had not expected compilation speed to be an issue for these tools, but clearly it has an impact. It is also interesting to look at why file2c outperformas xxd. The former uses string literals1, as opposed to character arrays, which reduces the number of tokens in the source code. For xxd, the upwards slope on resident set size is approximately2 120 (i.e. for every additional byte in the asset, an additional 120 bytes of RAM are required). Conversely, the memory multiplier for file2c is approximately 15. This memory multiplier clearly impacts the resident set size of the program, and impacts compilation speed.

Both compilation speed and memory exhaustion limit the use of this technique for large assets. Both of these tools could not be used to compile the largest assets, with failure occuring as the resident set size reached the physical RAM of the machine. Given the linear relationship betweeen resident set size and asset size, it would take a lot of additional RAM to extend these approaches further. Even if you could find enough RAM, compilation speed becomes increasingly painful.

Compilers

The second category of programs is incbinc and binc. These tools load the asset into memory as part of building the object file, but bypass any tokenization or compilation of that data. This is visible in the resident set size measurements, with memory multipliers of 1 and 3, respectively. On this parameter, incbin is ideal, in loading a single instance of the data into memory. On the other hand, binc is making 3 copies. Reviewing the source code, one of these copies is the memory buffer used to load the data, one is the constant data array, and I’m presuming that code generation creates a third copy. If the benchmarks had been extended to even larger assets, both of these tools would have soon failed due to memory exhaustion.

With respect to speed, neither of these tools really “compiles” the assets, so compile speeds are very good. For small assets, both incbin and binc are slower then a simple copy, it take more time to load a larger application, but performance is virtually the same for larger assets.

Avoid loading

The third category of program is cp and binc-x86_64-linux3. These tools differ from previous tools in that they never load the entire asset into memory. Copies are done using a fixed-sized buffer. So, cp is the venerable file copy utility, but binc-x86_64-linux is still a binary asset compiler. Unlike binc, it is specialized to create ELF files. This utility needs to know the size of the asset, but otherwise does not need to load the asset to compute the layout of the ELF file. This allows it to write the header, copy the asset using a fixed size buffer, and then write the trailer. The only reason the resident set size is so large is because it still links to LLVM, just for access to command-line parsing4.

Again looking at speed, binc-x86_64-linux is not faster than binc. It also gives up a lot of flexibility, such as including debug information or cross-compiling. However, for very large assets, there is no worry of memory exhaustion.

The relationship between asset size and elapsed time is consistent amongst the last four tools. The relationship is not a simple linear relationship, as would be expected if the time was set purely by the hard-drive write speed, but superlinear. Also interesting, the CPU usage for cp is essentially zero until the asset size reaches 4MB, rises to nearly 100% at 256MB, and then drops back to zero at 2GB. One might suspect that there is an interaction with the cache.

Summary

Tools like xxd or file2c do not have any dependencies, are cross-platform, and can be used for relatively large binary assets. For those who have run into the limits of xxd, file2c has an open, portable implementation, and can be used successfully with assets up to 256MB, even on a relatively underpowered machine.

For larger assets, or to improve compile times, incbin or binc can be used. Unfortunately, incbin is not available on windows. Otherwise, binc is available (at least in principle) on any platform where LLVM is supported.

So far, I have not heard of anyone linking assets with sizes >10GB. In this case, more RAM will extend the usefulness of incbinc or binc.


  1. Benchmarking also found a bug in file2c. About 1 in 2^21 sequences of 3 bytes represents a trigraph, so if you are compiling 10MB or more of binary data (assuming random distribution of bytes), you will start seeing warnings or errors about trigraphs. [return]
  2. The memory overhead probably varies significantly between compilers, and it would be interesting to evaluate clang. [return]
  3. Although it is named as if it is a compiler for x86_64 and linux, the object file does not contain any code. Untested, but the output should be useable on any system that uses ELF. [return]
  4. Replacing LLVM with something smaller, such as getopt, for command-line parsing, binc-x86_64-linux may be able to match the performance of cp, although cp contains a lot of optimization. [return]

Places to join the discussion