MOLD(1) | General Commands Manual | MOLD(1) |
mold - a modern linker
mold [option...] file...
mold is a faster drop-in replacement for the default GNU ld(1).
See https://github.com/rui314/mold#how-to-use.
Mold is designed to be a drop-in replacement for the GNU linkers for linking user-land programs. If your user-land program cannot be built due to missing command-line options, please file a bug at https://github.com/rui314/mold/issues.
Mold supports a very limited set of linker script features, which is just sufficient to read /usr/lib/x86_64-linux-gnu/libc.so on Linux systems (on Linux, that file is contrary to its name not a shared library but an ASCII linker script that loads a real libc.so file.)
Beyond that, we have no plan to support any additional linker script features. The linker script is an ad-hoc, over-designed, complex language which we believe needs to be replaced by a simpler mechanism. We have a plan to add a replacement for the linker script to mold instead.
Traditionally, Unix linkers are sensitive to the order in which input files appear on the command line. They process input files from the first (leftmost) file to the last (rightmost) file one-by-one. While reading input files, they maintain sets of defined and undefined symbols. When visiting an archive file (.a files), they pull out object files to resolve as many undefined symbols as possible and move on to the next input file. Object files that weren't pulled out will never have a chance for a second look.
Due to this behavior, you usually have to add archive files at the end of a command line, so that when a linker reaches archive files, it knows what symbols remain as undefined.
If you put archive files at the beginning of a command line, a linker doesn't have any undefined symbols, and thus no object files will be pulled out from archives. You can change the processing order by using the --start-group and --end-group options, though they make a linker slower.
mold, as well as the LLVM lld(1) linker, takes a different approach. They remember which symbols can be resolved from archive files instead of forgetting them after processing each archive. Therefore, mold and lld(1) can "go back" in a command line to pull out object files from archives if they are needed to resolve remaining undefined symbols. They are not sensitive to the input file order.
--start-group and --end-group are still accepted by mold and lld(1) for compatibility with traditional linkers, but they are silently ignored.
Some Unix linker features are difficult to understand without comprehending the semantics of dynamic symbol resolution. Therefore, even though it's not specific to mold, we'll explain it here.
We use "ELF module" or just "module" as a collective term to refer to an executable or a shared library file in the ELF format.
An ELF module may have lists of imported symbols and exported symbols, as well as a list of shared library names from which imported symbols should be imported. The point is that imported symbols are not bound to any specific shared library until runtime.
Here is how the Unix dynamic linker resolves dynamic symbols. Upon the start of an ELF program, the dynamic linker constructs a list of ELF modules which, as a whole, consist of a complete program. The executable file is always at the beginning of the list followed by its dependent shared libraries. An imported symbol is searched from the beginning of the list to the end. If two or more modules define the same symbol, the one that appears first in the list takes precedence over the others.
This Unix semantics are contrary to systems such as Windows that have a two-level namespace for dynamic symbols. On Windows, for example, dynamic symbols are represented as a tuple of (symbol-name, shared-library-name), so that each dynamic symbol is guaranteed to be resolved from some specific library.
Typically, an ELF module that exports a symbol also imports the same symbol. Such a symbol is usually resolved to itself, but that's not the case if a module that appears before it in the symbol search list provides another definition of the same symbol.
Let's take malloc as an example. Assume that you define your version of malloc in your main executable file. Then, all malloc calls from any module are resolved to your function instead of the one in libc, because the executable is always at the beginning of the dynamic symbol search list. Note that even malloc calls within libc are resolved to your definition since libc exports and imports malloc. Therefore, by defining malloc yourself, you can overwrite a library function, and the malloc in libc becomes dead code.
These Unix semantics are tricky and sometimes considered harmful. For example, assume that you accidentally define atoi as a global function in your executable that behaves completely differently from the one in the C standard. Then, all atoi function calls from any modules (even function calls within libc) are redirected to your function instead of the one in libc, which will very likely cause a problem.
That is a somewhat surprising consequence for an accidental name conflict. On the other hand, this semantic is sometimes useful because it allows users to override library functions without rebuilding modules containing them.
Whether good or bad, you should keep these semantics in mind to understand Unix linkers' behaviors.
mold's output is deterministic. That is, if you pass the same object files and the same command-line options to the same version of mold, it is guaranteed that mold produces the bit-by-bit identical output. The linker's internal randomness, such as the timing of thread scheduling or iteration orders of hash tables, doesn't affect the output.
mold does not have any host-specific default settings. This is contrary to the GNU linkers, for which some configurable values, such as system-dependent library search paths, are hard-coded. mold depends only on its command-line arguments.
gold(1), ld(1), elf(5), ld.so(8)
Rui Ueyama ruiu@cs.stanford.edu
Report bugs to https://github.com/rui314/mold/issues.
November 2023 |