dynamic_libraries.md (10953B)
1 # Dynamic Libraries 2 3 A **dynamic library** is a compiled binary that is loaded into a program's 4 address space at either load time or runtime, rather than being copied into the 5 final executable by the linker. On Linux these files typically have a `.so` 6 extension (shared object), on Windows `.dll` (dynamic-link library), and on 7 macOS `.dylib` (dynamic library). 8 9 Dynamic libraries enable **code sharing** between multiple processes: the 10 operating system maps a single physical copy of the library into the virtual 11 address space of every process that needs it. This reduces total memory 12 footprint and allows a library to be updated independently of the executables 13 that depend on it. 14 15 ## Dynamic vs. Static Libraries 16 17 | Feature | Static Library | Dynamic Library | 18 | :------ | :------------- | :---------------- | 19 | Linking phase | Compile / link time | Load time or runtime | 20 | Binary size | Larger executable (library copied in) | Smaller executable (reference only) | 21 | Memory sharing | Each process gets its own copy | OS shares one physical copy | 22 | Updates | Recompile executable to update library | Replace library file, restart process | 23 | Portability | Self-contained executable | Requires compatible library present at runtime | 24 25 A static library (`.a` on Unix, `.lib` on Windows) is essentially an archive of 26 object files. The linker extracts the needed object files and copies them into 27 the final executable. Once linked, the static library is no longer needed to run 28 the program. 29 30 A dynamic library remains a separate file on disk. The executable contains a 31 **reference** to the library — typically a recorded name and a symbol table of 32 needed functions — and the operating system's dynamic loader resolves that 33 reference when the process starts (load-time linking) or when the program 34 explicitly requests it (runtime linking). 35 36 ## How the OS Loads Dynamic Libraries 37 38 When a program starts, the operating system's dynamic loader (e.g., `ld.so` on 39 Linux, `dyld` on macOS, or the Windows loader) inspects the executable's 40 **interpreter** and **dynamic section** to determine which shared libraries are 41 required. It performs the following steps: 42 43 1. **Dependency resolution** — read the list of needed libraries from the 44 executable headers. 45 2. **Library search** — locate each library on the search path (`LD_LIBRARY_PATH`, 46 system cache `/etc/ld.so.cache`, `rpath`, `runpath`, or default system 47 directories). 48 3. **Loading and mapping** — `mmap` the library into the process's address space. 49 4. **Symbol resolution** — walk the relocation tables and patch addresses so 50 that function calls in the executable point to the correct offsets in the 51 loaded library. 52 5. **Initialization** — run constructor functions (e.g., `__attribute__((constructor))` 53 in C) registered in the library. 54 55 This process is known as **dynamic linking**. If a required library cannot be 56 found, the loader aborts and the program fails to start. 57 58 ## Runtime Loading with `dlopen` and Friends 59 60 Programs can also load libraries explicitly after they have already started. 61 This is **runtime dynamic linking** and is the mechanism behind plugin systems, 62 extensible applications, and language interpreters that load native extensions. 63 64 On POSIX systems the C standard library provides four key functions: 65 66 - `dlopen(path, flags)` — load a shared object into the current address space 67 - `dlsym(handle, symbol)` — retrieve the address of a named symbol (function or 68 variable) 69 - `dlclose(handle)` — decrement the reference count and possibly unload the library 70 - `dlerror()` — return a human-readable string describing the last error 71 72 On Windows the analogous APIs are `LoadLibraryA`, `GetProcAddress`, and 73 `FreeLibrary`. 74 75 ### Example: A Minimal Dynamic C Library 76 77 Imagine a small C library that computes a checksum. Save this as `checksum.c`: 78 79 ```c 80 #include <stdint.h> 81 82 uint32_t checksum(const uint8_t *data, size_t len) { 83 uint32_t sum = 0; 84 for (size_t i = 0; i < len; i++) { 85 sum = (sum << 1) | (sum >> 31); // rotate left 86 sum += data[i]; 87 } 88 return sum; 89 } 90 ``` 91 92 Compile it into a shared object on Linux: 93 94 ```bash 95 gcc -shared -fPIC -o libchecksum.so checksum.c 96 ``` 97 98 The `-fPIC` flag tells the compiler to emit **position-independent code** — 99 machine code that can execute correctly regardless of where in memory it is 100 mapped. This is mandatory for shared libraries because the OS may load them at 101 different base addresses in different processes (or in the same process across 102 restarts) for security reasons such as ASLR (Address Space Layout Randomization). 103 104 ## Loading a Dynamic C Library from Rust 105 106 Rust can interact with dynamic C libraries through two mechanisms: 107 108 1. **Compile-time dynamic linking** — declare `#[link(name = "checksum")]` and 109 let the Rust linker record a dependency on `libchecksum.so`. The OS loader 110 resolves it automatically when the program starts. 111 2. **Runtime dynamic loading** — use a crate such as `libloading` to `dlopen` 112 the library manually and look up symbols on demand. 113 114 Runtime loading is more flexible because the program can decide at execution time 115 whether to load a library, handle failures gracefully, and even swap 116 implementations without restarting. 117 118 ### Using `libloading` 119 120 Add `libloading` to your `Cargo.toml`: 121 122 ```toml 123 [dependencies] 124 libloading = "0.8" 125 ``` 126 127 Then load the library and call its function: 128 129 ```rust 130 use libloading::{Library, Symbol}; 131 use std::ffi::c_void; 132 133 type ChecksumFn = unsafe extern "C" fn(*const u8, usize) -> u32; 134 135 fn main() -> Result<(), Box<dyn std::error::Error>> { 136 unsafe { 137 let lib = Library::new("./libchecksum.so")?; 138 139 let checksum: Symbol<ChecksumFn> = lib.get(b"checksum\0")?; 140 141 let data = b"hello dynamic world"; 142 let result = checksum(data.as_ptr(), data.len()); 143 144 println!("Checksum: {}", result); 145 } 146 147 Ok(()) 148 } 149 ``` 150 151 Key observations about this code: 152 153 - The `unsafe` block is required because the compiler cannot verify the 154 correctness of a C library's ABI, pointer usage, or thread safety. 155 - The symbol name `b"checksum\0"` is a null-terminated byte string, matching the 156 C ABI expectation. 157 - `Library::new` calls `dlopen` (or `LoadLibrary` on Windows) under the hood. 158 - `Symbol` is essentially a smart pointer to a function loaded from a 159 [**vtable**](/compiler/vtable.md)-like structure inside the dynamic loader. 160 161 ### Safety and ABI Mismatches 162 163 When calling into a dynamic C library, Rust's usual memory-safety guarantees do 164 not apply across the FFI boundary. The C library operates on raw pointers and 165 expects a specific **application binary interface (ABI)**. If the Rust side 166 misdeclares a function signature — for example, using `u64` instead of `usize`, 167 or omitting `extern "C"` — the behavior is undefined and may result in a 168 [segfault](/memory_safety/segfault.md) or silent data corruption. 169 170 Best practices for dynamic C library interop: 171 172 - Define a thin, audited FFI module that mirrors the C headers exactly. 173 - Use `std::os::raw` or the `libc` crate for C types (`c_int`, `c_char`, etc.). 174 - Keep `unsafe` blocks as small as possible; validate inputs before crossing the 175 boundary. 176 - Never pass Rust references (`&T`) directly to C code expecting mutable access 177 unless you have proven alias safety manually. 178 179 ## Relocation and the Global Offset Table 180 181 Because a shared library can be loaded at any address, it cannot contain 182 absolute addresses for its own functions or global data. Instead, the compiler 183 emits **position-independent code** that refers to a **Global Offset Table 184 (GOT)** and a **Procedure Linkage Table (PLT)**. 185 186 - **GOT** — an array of pointers to global data. The code reads data indirectly 187 through the GOT so that only the table entries need to be patched at load time, 188 not every instruction that references the data. 189 - **PLT** — a trampoline for external function calls. The first time a function 190 is called through the PLT, the dynamic linker resolves the real address and 191 patches the GOT entry; subsequent calls jump directly to the resolved target. 192 193 This lazy resolution (often called **lazy binding**) improves startup time for 194 large programs with many shared libraries because not every symbol needs to be 195 resolved immediately. 196 197 ## Rust as a Dynamic Library Producer 198 199 Rust can also produce dynamic libraries for consumption by other languages. Two 200 crate types are relevant: 201 202 - **`cdylib`** — produces a C-compatible dynamic library (`.so`, `.dll`, `.dylib`). 203 Use this when a C program (or Python via ctypes, or another language) needs to 204 load your Rust code dynamically. The Rust compiler strips Rust-specific metadata 205 and exports only functions marked `#[no_mangle]` and `pub extern "C"`. 206 - **`dylib`** — produces a Rust-native dynamic library. This is primarily used 207 for rustc plugins and is rarely appropriate for general FFI. 208 209 Example `Cargo.toml` for a C-callable Rust library: 210 211 ```toml 212 [package] 213 name = "rust_checksum" 214 version = "0.1.0" 215 edition = "2021" 216 217 [lib] 218 crate-type = ["cdylib"] 219 ``` 220 221 And the corresponding Rust source: 222 223 ```rust 224 #[no_mangle] 225 pub extern "C" fn rust_checksum(data: *const u8, len: usize) -> u32 { 226 if data.is_null() { 227 return 0; 228 } 229 let slice = unsafe { std::slice::from_raw_parts(data, len) }; 230 slice.iter().fold(0u32, |acc, &b| acc.wrapping_mul(31).wrapping_add(b as u32)) 231 } 232 ``` 233 234 Other languages can now `dlopen` the resulting `.so` and call `rust_checksum` 235 just as if it were a C function. 236 237 ## When to Use Dynamic Libraries 238 239 Prefer dynamic libraries when: 240 241 - Multiple executables share the same code and you want to reduce disk and 242 memory usage. 243 - You need a plugin architecture where third-party code is loaded at runtime. 244 - You want to update a library (e.g., a security patch) without rebuilding every 245 dependent executable. 246 - You are shipping a large framework and want consumers to link against a stable 247 ABI without caring about your internal implementation details. 248 249 Prefer static linking when: 250 251 - You need a self-contained binary that runs without external dependencies 252 (e.g., containers, embedded systems, or CLI tools distributed to unknown 253 environments). 254 - Maximum startup performance is critical and you want to avoid symbol 255 resolution overhead. 256 - You want whole-program optimization (LTO) to inline across library boundaries. 257 258 ## Summary 259 260 Dynamic libraries are separate binaries loaded by the OS loader either at 261 process startup or on demand via `dlopen` / `LoadLibrary`. They reduce memory 262 usage through sharing and enable runtime extensibility, but they introduce 263 complexities around symbol resolution, ABI stability, and distribution. 264 265 When loading a dynamic C library from Rust, you cross an **FFI boundary** where 266 the compiler's usual safety checks no longer apply. Tools like `libloading` 267 make runtime loading ergonomic, but correctness depends on matching the C ABI 268 exactly and carefully auditing every `unsafe` call site.