span.md (7167B)
1 # Span 2 3 ## Question: What is a span in relation to traces, logs and metrics? 4 5 A **span** is the foundational unit of **distributed tracing**, representing a 6 single logical operation (e.g., an HTTP request, a database call, or a function 7 invocation) with timing and contextual metadata [7]. To understand spans 8 deeply—and how they relate to **traces**, **logs**, and **metrics**—we must 9 examine their structural, semantic, and operational relationships. 10 11 --- 12 13 ### **1. Span as the Atomic Unit of a Trace** 14 15 A **trace** is a directed acyclic graph (DAG) of spans that captures the 16 end-to-end journey of a request across services [1]. Each span: 17 18 - Has a **start/end timestamp** and **duration**, 19 - Contains **attributes** (key-value metadata, e.g., HTTP status, user ID), 20 - May include **events** (timestamped annotations like “query started”), 21 - Has a **parent-child relationship** with other spans (e.g., a gateway span may 22 have child spans for auth and DB calls) [[3], [8]]. 23 24 Example trace structure[8]: 25 26 ``` 27 Trace 28 ├── Span (API Gateway) 29 │ ├── Span (Auth Service) 30 │ └── Span (User Service) 31 │ └── Span (Database Query) 32 └── Span (Response Formatting) 33 ``` 34 35 --- 36 37 ### **2. Relationship to Logs** 38 39 - **Logs** are discrete, timestamped records of events (e.g., “error: connection 40 timeout”), often unstructured or semi-structured. 41 - **Spans can embed logs**: When instrumentation libraries (e.g., OpenTelemetry) 42 integrate with logging frameworks, log statements can be attached to spans as 43 **structured events** or **log records**, enriching them with trace context 44 (trace ID, span ID) [10]. 45 - This enables **correlation**: You can view logs _within the context_ of a 46 specific span—e.g., see all logs from a database query span during a failed 47 request [10]. 48 49 > “When adding OpenTelemetry instrumentation on top of your existing log 50 > libraries, the log becomes a dot on a trace span” [10]. 51 52 --- 53 54 ### **3. Relationship to Metrics** 55 56 - **Metrics** are aggregated numerical measurements over time (e.g., request 57 rate, latency percentiles, error counts). 58 - **Spans feed into metrics indirectly**: 59 - Span durations can be used to compute **latency histograms** (e.g., 60 `http.server.request.duration`). 61 - Span attributes (e.g., `http.status_code`) can be aggregated into 62 **counters** (e.g., `http_requests_total{status="500"}`). 63 - While spans are _individual_, _context-rich_ units, metrics are _aggregated_, 64 _summarized_—but both are essential for the **RED method** (Rate, Errors, 65 Duration) [7]. 66 67 > “Developers can acquire a comprehensive perspective of their software 68 > environment by combining distributed traces, metrics, events, and logs” [7]. 69 70 --- 71 72 ### **4. Relationship to Traces (Recap & Nuance)** 73 74 - A **trace** is a _collection of spans_ that together represent a single 75 request’s path through a distributed system [3]. 76 - Spans in a trace are linked via: 77 - **Trace ID** (identifies the full trace), 78 - **Span ID** (identifies the span), 79 - **Parent Span ID** (enables tree-like nesting) [[1], [8]]. 80 - Spans may also have **links** to spans in _other traces_ (e.g., for batch 81 processing or async workflows) [1]. 82 83 --- 84 85 ### **5. Practical Implications** 86 87 - **Troubleshooting**: A trace gives you a _map_; logs give you _narrative 88 detail_; metrics give you _signal-level trends_. For example: 89 - A metric alert (e.g., high error rate) → drill into traces to find failing 90 spans → inspect embedded logs for root cause [14]. 91 - **Context propagation**: Spans carry trace context (trace ID, span ID, 92 sampling flags) across service boundaries, enabling distributed correlation 93 [9]. 94 95 --- 96 97 ### **Summary** 98 99 | Concept | Role | Relationship to Span | 100 | ----------- | ------------------------------------------ | ------------------------------------------------------------------- | 101 | **Span** | Smallest unit of work in a trace | — | 102 | **Trace** | Collection of spans forming a request path | Spans are its building blocks [3] | 103 | **Logs** | Event records with timestamps | Logs can be attached to spans as events or structured metadata [10] | 104 | **Metrics** | Aggregated numerical signals | Span data (duration, status) is used to derive metrics [7] | 105 106 In essence, **spans unify the three pillars of observability**—they are the 107 _contextual glue_ that lets you correlate logs (what happened), metrics (how 108 often/long), and traces (how it flows) into actionable insights [[4], [14]]. 109 110 ## References 111 112 1. [Traces | OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/) 113 _(brave)_ 114 2. [OpenTelemetry - Understanding Traces vs. Spans | SigNoz](https://signoz.io/comparisons/opentelemetry-trace-vs-span/) 115 _(brave)_ 116 3. [Logs vs Metrics vs Traces - Engineering Fundamentals Playbook](https://microsoft.github.io/code-with-engineering-playbook/observability/log-vs-metric-vs-trace/) 117 _(google)_ 118 4. [Observability primer | OpenTelemetry](https://opentelemetry.io/docs/concepts/observability-primer/) 119 _(brave)_ 120 5. [Unpacking Observability: Understanding Logs, Events, Spans, and Traces | Dzero Labs](https://medium.com/dzerolabs/observability-journey-understanding-logs-events-traces-and-spans-836524d63172) 121 _(google)_ 122 6. [OpenTelemetry demystified: a deep dive into distributed tracing | CNCF](https://www.cncf.io/blog/2023/05/03/opentelemetry-demystified-a-deep-dive-into-distributed-tracing/) 123 _(google)_ 124 7. [What Are Spans in Distributed Tracing? - LogicMonitor](https://www.logicmonitor.com/blog/what-are-spans-in-distributed-tracing) 125 _(startpage)_ 126 8. [Traces & Spans: Observability Basics You Should Know - Last9](https://last9.io/blog/traces-spans-observability-basics/) 127 _(startpage)_ 128 9. [software-skills/skills/system-design/references/key-concepts ...](https://github.com/itzcull/software-skills/blob/master/skills/system-design/references/key-concepts/distributed-tracing.md) 129 _(aol)_ 130 10. [Tracing the Line: Understanding Logs vs. Traces - Honeycomb](https://www.honeycomb.io/blog/understanding-logs-vs-traces) 131 _(google)_ 132 11. [A Deep Dive into OpenTelemetry. Part 1 - AWS in Plain English](https://aws.plainenglish.io/opentelemetry-deep-dive-part-1-6ebbd2362bd3) 133 _(google)_ 134 12. [Deep Dive into OpenTelemetry in Saleor](https://saleor.io/blog/otel-deep-dive) 135 _(google)_ 136 13. [Logging Observability - OpenClaw AI Agent Skill | LLMBase](https://llmbase.ai/openclaw/logging-observability/) 137 _(aol)_ 138 14. [Learning Observability from Scratch: Logs, Metrics, and Traces | by Milind Nair | Mar, 2026 | Medium](https://medium.com/@nairmilind3/learning-observability-from-scratch-c36d9771003b) 139 _(brave)_ 140 15. [A Deep Dive Into OpenTelemetry Metrics | Tiger Data](https://www.tigerdata.com/blog/a-deep-dive-into-open-telemetry-metrics) 141 _(aol)_ 142 16. [GitHub - tokio-rs/tracing: Application level tracing for Rust.](https://github.com/tokio-rs/tracing) 143 _(aol)_