notes

Log | Files | Refs | README

dynamic_libraries.md (10953B)


      1 # Dynamic Libraries
      2 
      3 A **dynamic library** is a compiled binary that is loaded into a program's
      4 address space at either load time or runtime, rather than being copied into the
      5 final executable by the linker. On Linux these files typically have a `.so`
      6 extension (shared object), on Windows `.dll` (dynamic-link library), and on
      7 macOS `.dylib` (dynamic library).
      8 
      9 Dynamic libraries enable **code sharing** between multiple processes: the
     10 operating system maps a single physical copy of the library into the virtual
     11 address space of every process that needs it. This reduces total memory
     12 footprint and allows a library to be updated independently of the executables
     13 that depend on it.
     14 
     15 ## Dynamic vs. Static Libraries
     16 
     17 | Feature | Static Library | Dynamic Library |
     18 | :------ | :------------- | :---------------- |
     19 | Linking phase | Compile / link time | Load time or runtime |
     20 | Binary size | Larger executable (library copied in) | Smaller executable (reference only) |
     21 | Memory sharing | Each process gets its own copy | OS shares one physical copy |
     22 | Updates | Recompile executable to update library | Replace library file, restart process |
     23 | Portability | Self-contained executable | Requires compatible library present at runtime |
     24 
     25 A static library (`.a` on Unix, `.lib` on Windows) is essentially an archive of
     26 object files. The linker extracts the needed object files and copies them into
     27 the final executable. Once linked, the static library is no longer needed to run
     28 the program.
     29 
     30 A dynamic library remains a separate file on disk. The executable contains a
     31 **reference** to the library — typically a recorded name and a symbol table of
     32 needed functions — and the operating system's dynamic loader resolves that
     33 reference when the process starts (load-time linking) or when the program
     34 explicitly requests it (runtime linking).
     35 
     36 ## How the OS Loads Dynamic Libraries
     37 
     38 When a program starts, the operating system's dynamic loader (e.g., `ld.so` on
     39 Linux, `dyld` on macOS, or the Windows loader) inspects the executable's
     40 **interpreter** and **dynamic section** to determine which shared libraries are
     41 required. It performs the following steps:
     42 
     43 1. **Dependency resolution** — read the list of needed libraries from the
     44    executable headers.
     45 2. **Library search** — locate each library on the search path (`LD_LIBRARY_PATH`,
     46    system cache `/etc/ld.so.cache`, `rpath`, `runpath`, or default system
     47    directories).
     48 3. **Loading and mapping** — `mmap` the library into the process's address space.
     49 4. **Symbol resolution** — walk the relocation tables and patch addresses so
     50    that function calls in the executable point to the correct offsets in the
     51    loaded library.
     52 5. **Initialization** — run constructor functions (e.g., `__attribute__((constructor))`
     53    in C) registered in the library.
     54 
     55 This process is known as **dynamic linking**. If a required library cannot be
     56 found, the loader aborts and the program fails to start.
     57 
     58 ## Runtime Loading with `dlopen` and Friends
     59 
     60 Programs can also load libraries explicitly after they have already started.
     61 This is **runtime dynamic linking** and is the mechanism behind plugin systems,
     62 extensible applications, and language interpreters that load native extensions.
     63 
     64 On POSIX systems the C standard library provides four key functions:
     65 
     66 - `dlopen(path, flags)` — load a shared object into the current address space
     67 - `dlsym(handle, symbol)` — retrieve the address of a named symbol (function or
     68   variable)
     69 - `dlclose(handle)` — decrement the reference count and possibly unload the library
     70 - `dlerror()` — return a human-readable string describing the last error
     71 
     72 On Windows the analogous APIs are `LoadLibraryA`, `GetProcAddress`, and
     73 `FreeLibrary`.
     74 
     75 ### Example: A Minimal Dynamic C Library
     76 
     77 Imagine a small C library that computes a checksum. Save this as `checksum.c`:
     78 
     79 ```c
     80 #include <stdint.h>
     81 
     82 uint32_t checksum(const uint8_t *data, size_t len) {
     83     uint32_t sum = 0;
     84     for (size_t i = 0; i < len; i++) {
     85         sum = (sum << 1) | (sum >> 31); // rotate left
     86         sum += data[i];
     87     }
     88     return sum;
     89 }
     90 ```
     91 
     92 Compile it into a shared object on Linux:
     93 
     94 ```bash
     95 gcc -shared -fPIC -o libchecksum.so checksum.c
     96 ```
     97 
     98 The `-fPIC` flag tells the compiler to emit **position-independent code** —
     99 machine code that can execute correctly regardless of where in memory it is
    100 mapped. This is mandatory for shared libraries because the OS may load them at
    101 different base addresses in different processes (or in the same process across
    102 restarts) for security reasons such as ASLR (Address Space Layout Randomization).
    103 
    104 ## Loading a Dynamic C Library from Rust
    105 
    106 Rust can interact with dynamic C libraries through two mechanisms:
    107 
    108 1. **Compile-time dynamic linking** — declare `#[link(name = "checksum")]` and
    109    let the Rust linker record a dependency on `libchecksum.so`. The OS loader
    110    resolves it automatically when the program starts.
    111 2. **Runtime dynamic loading** — use a crate such as `libloading` to `dlopen`
    112    the library manually and look up symbols on demand.
    113 
    114 Runtime loading is more flexible because the program can decide at execution time
    115 whether to load a library, handle failures gracefully, and even swap
    116 implementations without restarting.
    117 
    118 ### Using `libloading`
    119 
    120 Add `libloading` to your `Cargo.toml`:
    121 
    122 ```toml
    123 [dependencies]
    124 libloading = "0.8"
    125 ```
    126 
    127 Then load the library and call its function:
    128 
    129 ```rust
    130 use libloading::{Library, Symbol};
    131 use std::ffi::c_void;
    132 
    133 type ChecksumFn = unsafe extern "C" fn(*const u8, usize) -> u32;
    134 
    135 fn main() -> Result<(), Box<dyn std::error::Error>> {
    136     unsafe {
    137         let lib = Library::new("./libchecksum.so")?;
    138 
    139         let checksum: Symbol<ChecksumFn> = lib.get(b"checksum\0")?;
    140 
    141         let data = b"hello dynamic world";
    142         let result = checksum(data.as_ptr(), data.len());
    143 
    144         println!("Checksum: {}", result);
    145     }
    146 
    147     Ok(())
    148 }
    149 ```
    150 
    151 Key observations about this code:
    152 
    153 - The `unsafe` block is required because the compiler cannot verify the
    154   correctness of a C library's ABI, pointer usage, or thread safety.
    155 - The symbol name `b"checksum\0"` is a null-terminated byte string, matching the
    156   C ABI expectation.
    157 - `Library::new` calls `dlopen` (or `LoadLibrary` on Windows) under the hood.
    158 - `Symbol` is essentially a smart pointer to a function loaded from a
    159   [**vtable**](/compiler/vtable.md)-like structure inside the dynamic loader.
    160 
    161 ### Safety and ABI Mismatches
    162 
    163 When calling into a dynamic C library, Rust's usual memory-safety guarantees do
    164 not apply across the FFI boundary. The C library operates on raw pointers and
    165 expects a specific **application binary interface (ABI)**. If the Rust side
    166 misdeclares a function signature — for example, using `u64` instead of `usize`,
    167 or omitting `extern "C"` — the behavior is undefined and may result in a
    168 [segfault](/memory_safety/segfault.md) or silent data corruption.
    169 
    170 Best practices for dynamic C library interop:
    171 
    172 - Define a thin, audited FFI module that mirrors the C headers exactly.
    173 - Use `std::os::raw` or the `libc` crate for C types (`c_int`, `c_char`, etc.).
    174 - Keep `unsafe` blocks as small as possible; validate inputs before crossing the
    175   boundary.
    176 - Never pass Rust references (`&T`) directly to C code expecting mutable access
    177   unless you have proven alias safety manually.
    178 
    179 ## Relocation and the Global Offset Table
    180 
    181 Because a shared library can be loaded at any address, it cannot contain
    182 absolute addresses for its own functions or global data. Instead, the compiler
    183 emits **position-independent code** that refers to a **Global Offset Table
    184 (GOT)** and a **Procedure Linkage Table (PLT)**.
    185 
    186 - **GOT** — an array of pointers to global data. The code reads data indirectly
    187   through the GOT so that only the table entries need to be patched at load time,
    188   not every instruction that references the data.
    189 - **PLT** — a trampoline for external function calls. The first time a function
    190   is called through the PLT, the dynamic linker resolves the real address and
    191   patches the GOT entry; subsequent calls jump directly to the resolved target.
    192 
    193 This lazy resolution (often called **lazy binding**) improves startup time for
    194 large programs with many shared libraries because not every symbol needs to be
    195 resolved immediately.
    196 
    197 ## Rust as a Dynamic Library Producer
    198 
    199 Rust can also produce dynamic libraries for consumption by other languages. Two
    200 crate types are relevant:
    201 
    202 - **`cdylib`** — produces a C-compatible dynamic library (`.so`, `.dll`, `.dylib`).
    203   Use this when a C program (or Python via ctypes, or another language) needs to
    204   load your Rust code dynamically. The Rust compiler strips Rust-specific metadata
    205   and exports only functions marked `#[no_mangle]` and `pub extern "C"`.
    206 - **`dylib`** — produces a Rust-native dynamic library. This is primarily used
    207   for rustc plugins and is rarely appropriate for general FFI.
    208 
    209 Example `Cargo.toml` for a C-callable Rust library:
    210 
    211 ```toml
    212 [package]
    213 name = "rust_checksum"
    214 version = "0.1.0"
    215 edition = "2021"
    216 
    217 [lib]
    218 crate-type = ["cdylib"]
    219 ```
    220 
    221 And the corresponding Rust source:
    222 
    223 ```rust
    224 #[no_mangle]
    225 pub extern "C" fn rust_checksum(data: *const u8, len: usize) -> u32 {
    226     if data.is_null() {
    227         return 0;
    228     }
    229     let slice = unsafe { std::slice::from_raw_parts(data, len) };
    230     slice.iter().fold(0u32, |acc, &b| acc.wrapping_mul(31).wrapping_add(b as u32))
    231 }
    232 ```
    233 
    234 Other languages can now `dlopen` the resulting `.so` and call `rust_checksum`
    235 just as if it were a C function.
    236 
    237 ## When to Use Dynamic Libraries
    238 
    239 Prefer dynamic libraries when:
    240 
    241 - Multiple executables share the same code and you want to reduce disk and
    242   memory usage.
    243 - You need a plugin architecture where third-party code is loaded at runtime.
    244 - You want to update a library (e.g., a security patch) without rebuilding every
    245   dependent executable.
    246 - You are shipping a large framework and want consumers to link against a stable
    247   ABI without caring about your internal implementation details.
    248 
    249 Prefer static linking when:
    250 
    251 - You need a self-contained binary that runs without external dependencies
    252   (e.g., containers, embedded systems, or CLI tools distributed to unknown
    253   environments).
    254 - Maximum startup performance is critical and you want to avoid symbol
    255   resolution overhead.
    256 - You want whole-program optimization (LTO) to inline across library boundaries.
    257 
    258 ## Summary
    259 
    260 Dynamic libraries are separate binaries loaded by the OS loader either at
    261 process startup or on demand via `dlopen` / `LoadLibrary`. They reduce memory
    262 usage through sharing and enable runtime extensibility, but they introduce
    263 complexities around symbol resolution, ABI stability, and distribution.
    264 
    265 When loading a dynamic C library from Rust, you cross an **FFI boundary** where
    266 the compiler's usual safety checks no longer apply. Tools like `libloading`
    267 make runtime loading ergonomic, but correctness depends on matching the C ABI
    268 exactly and carefully auditing every `unsafe` call site.