notes

Log | Files | Refs | README

edge_infra.md (7647B)


      1 # Edge Infra
      2 
      3 Building a globally distributed edge infrastructure for software package
      4 delivery requires a synergy of multi-tiered caching, intelligent edge
      5 computation, and robust networking. By combining these technologies, we can
      6 ensure that developers worldwide can pull packages, binaries, and containers
      7 with minimal latency and high availability.[^1][^2]
      8 
      9 Here is the design and implementation strategy for a global edge network
     10 tailored for package delivery.
     11 
     12 ### Edge Architecture Overview
     13 
     14 Our system utilizes a tiered architecture to move data as close to the end user
     15 as possible while protecting the origin from traffic spikes.[^3][^1]
     16 
     17 ```text
     18                     +-------------------+
     19                     |    Developer      |
     20                     | (npm/docker pull) |
     21                     +---------+---------+
     22                               |
     23                               v
     24                     +-------------------+
     25                     | Anycast / Geo-DNS |
     26                     +---------+---------+
     27                               | (Routes to nearest PoP)
     28                               v
     29 +-----------------------------------------------------------+
     30 |                       EDGE PoP                            |
     31 |                                                           |
     32 |  +-------------+    +---------------+   +--------------+  |
     33 |  | Edge Compute|--->| L1 Cache      |   | WAF / DDoS   |  |
     34 |  | (Auth/Route)|    | (Memory/NVMe) |   | Protection   |  |
     35 |  +-------------+    +-------+-------+   +--------------+  |
     36 +-----------------------------|-----------------------------+
     37                               | (Cache Miss)
     38                               v
     39 +-----------------------------------------------------------+
     40 |                  REGIONAL SHIELD CACHE                    |
     41 |  +-----------------------------------------------------+  |
     42 |  |   L2 Cache (High Capacity SSD, Request Collapsing)  |  |
     43 |  +--------------------------+--------------------------+  |
     44 +-----------------------------|-----------------------------+
     45                               | (Cache Miss)
     46                               v
     47 +-----------------------------------------------------------+
     48 |                   ORIGIN INFRASTRUCTURE                   |
     49 |  +--------------------+       +------------------------+  |
     50 |  | Blob Storage       |       | Global Metadata DB     |  |
     51 |  | (S3 / GCS)         |       | (Spanner / DynamoDB)   |  |
     52 |  +--------------------+       +------------------------+  |
     53 +-----------------------------------------------------------+
     54 ```
     55 
     56 ### The Full Stack
     57 
     58 To achieve this scale, the technology stack must be highly concurrent and
     59 lightweight:
     60 
     61 - **Edge Routing \& Proxy:** NGINX, Envoy, or Rust-based proxies to handle
     62   millions of concurrent TCP connections and perform TLS termination.[^4]
     63 - **Edge Compute:** WebAssembly (Wasm) or V8 Isolates running directly on the
     64   CDN edge to execute custom logic like authentication, A/B testing, and request
     65   filtering without routing back to the origin.[^3][^1]
     66 - **Caching Layer:** Varnish or custom memory-mapped file systems for L1 edge
     67   caching, backed by high-capacity NVMe drives for L2 regional shields.
     68 - **Data \& Origin:** Geographically replicated object storage (like AWS S3) for
     69   immutable package blobs, and a globally distributed database (like Google
     70   Cloud Spanner) for mutable package metadata and user entitlements.
     71 
     72 ### Edge Caching and CDN Optimizations
     73 
     74 Software packages often experience "thundering herd" traffic patterns, such as
     75 when a popular CI/CD pipeline kicks off thousands of identical container pulls
     76 simultaneously.
     77 
     78 - **Tiered Cache Hierarchy:** Implementing an L1 edge cache and an L2 regional
     79   shield reduces origin calls and optimizes global latency.[^3]
     80 - **Request Collapsing:** If 10,000 clients request the same uncached package
     81   simultaneously, the edge node collapses these into a single origin request,
     82   preventing origin overload.
     83 - **Predictive Caching:** By analyzing package dependency trees (e.g.,
     84   `package.json`), edge servers can pre-cache required dependencies before the
     85   client explicitly requests them.[^1]
     86 - **Cache Invalidation:** Using stale-while-revalidate headers allows the CDN to
     87   serve a slightly outdated metadata file (like a Docker `latest` tag) while
     88   asynchronously fetching the updated version in the background.
     89 
     90 ### Distributed Networking Solutions
     91 
     92 Routing users efficiently is critical for minimizing latency and ensuring high
     93 availability during regional outages.
     94 
     95 - **Anycast IP Routing:** Advertising the same IP address from multiple global
     96   locations allows the Border Gateway Protocol (BGP) to naturally route the
     97   user's TCP connection to the topologically closest datacenter.
     98 - **Dynamic Server Selection:** The system actively monitors Round Trip Time
     99   (RTT) and CPU usage to dynamically route traffic away from congested or
    100   degraded Points of Presence (PoPs).[^5][^1]
    101 - **Protocol Optimizations:** Utilizing TCP BBR congestion control and HTTP/3
    102   (QUIC) reduces connection setup time and mitigates the impact of packet loss
    103   on unstable mobile networks.
    104 
    105 ### System Data Flows
    106 
    107 When a user pulls a package, the request follows a strict path to ensure
    108 authorization and speed:
    109 
    110 1. **Resolution:** The client's DNS query hits a Geo-DNS provider, returning the
    111    Anycast IP of the nearest Edge PoP.
    112 2. **Edge Auth:** The request reaches the Edge Proxy. An Edge Function executes
    113    immediately, verifying the user's API token against a highly cached subset of
    114    the metadata database.[^1]
    115 3. **Cache Lookup:** The proxy checks the L1 Cache. If the package is found, it
    116    is returned instantly.
    117 4. **Shield Fallback:** On an L1 miss, the request goes to the Regional Shield.
    118    If the package is present in the L2 cache, it is returned and populated in
    119    L1.
    120 5. **Origin Fetch:** On an L2 miss, the shield fetches the blob from Origin
    121    Storage, caches it, and streams it back down the chain to the client.
    122 
    123 ### Performance Impact Chart
    124 
    125 This tiered networking approach dramatically reduces latency across the
    126 distribution lifecycle.
    127 
    128 ```text
    129 Average Response Latency (ms) by Retrieval Tier
    130 ------------------------------------------------------------
    131 Origin Fetch       |################################ (250ms)
    132 Regional Shield L2 |########### (85ms)
    133 Edge PoP L1        |### (20ms)
    134 Predictive Cache   |# (5ms)
    135 ------------------------------------------------------------
    136 ```
    137 
    138 [^1]: https://notionhive.com/blog/edge-computing-cdn-strategies
    139 
    140 [^2]: https://talents.studysmarter.co.uk/companies/cloudsmith-ltd/belfast/senior-software-engineer-edge-29145650/
    141 
    142 [^3]: https://www.daydreamsoft.com/blog/edge-caching-and-cdn-optimization-delivering-lightning-fast-web-experiences
    143 
    144 [^4]: https://builtin.com/job/senior-software-engineer-tech-platform/6578449
    145 
    146 [^5]: https://arxiv.org/html/2412.09474v1
    147 
    148 [^6]: https://www.geeksforgeeks.org/system-design/edge-caching-system-design/
    149 
    150 [^7]: https://ijrai.org/index.php/ijrai/article/view/180
    151 
    152 [^8]: https://networks.imdea.org/trade-offs-in-optimizing-the-cache-deployments-of-cdns/
    153 
    154 [^9]: https://careers.deliveroo.co.uk/role/senior-platform-engineer-edge-15f9904608d9/
    155 
    156 [^10]: https://www.sciencedirect.com/science/article/abs/pii/S0140366404002889
    157 
    158 [^11]: https://www.meegle.com/en_us/topics/content-delivery-network/cdn-caching-mechanisms
    159 
    160 [^12]: https://builtin.com/job/senior-software-engineer-edge-infrastructure/7113935
    161 
    162 [^13]: https://www.youtube.com/watch?v=zLblLu3rUC4
    163 
    164 [^14]: https://www.builtinla.com/job/sr-staff-software-engineer-edge-cdn-platform/7455324
    165 
    166 [^15]: https://www.dynadot.com/blog/global-cdn-strategies