Cilium:
eBPF 重塑云原生基础设施
Cilium:
eBPF Reshaping Cloud-Native Infrastructure

连接、安全与可观测性的终极数据面。本白皮书融合了叙事洞察、底层架构解构及全球巨头的大规模落地经验。 The ultimate data plane for connectivity, security, and observability. This whitepaper integrates narrative insights, architectural deconstruction, and large-scale deployment experiences.

今日承诺: Today's Promise:

阅读本篇,你将掌握 ClusterMesh 跨云架构,获得量化的 ROI 模型以节省 30% 以上的云账单,并理解支撑 OpenAI 万级 GPU 算力的网络底座。

By reading this, you will master ClusterMesh cross-cloud architecture, gain a quantified ROI model to save 30%+ on cloud bills, and understand the foundation supporting OpenAI's massive GPU clusters.

01. 快递员、纸条与一万个住址

01. Couriers, Slips, and 10,000 Addresses

想象你在经营一家巨大的快递公司(Kubernetes 集群)。最初,你只有 10 个快递员(Pod),你给每个快递员一张纸条,上面写着所有人的家庭住址(IP 路由表)。每增加一个新员工,你就给所有人发一张新纸条。

Imagine running a massive delivery company (K8s Cluster). Initially, you have 10 couriers (Pods). You give each courier a slip of paper with everyone's home address (IP Routing Table). Every time a new employee joins, you issue a new slip to everyone.

当快递员增加到 10,000 名时...

When the Couriers grow to 10,000...

每次新入职一个人,你要做的分发工作量会发生爆炸:

Every time a new hire starts, the workload for distribution explodes:

  • 分发风暴:每次变动都要为 10,000 人重发纸条。在大规模集群中,这会导致系统同步延迟数分钟甚至数小时。 Distribution Storm:Updating 10,000 slips for every change. In large clusters, this stalls synchronization for minutes or even hours.
  • 信息差:新员工已上岗,但其他人拿的还是旧地址簿。这种“信息不同步”是导致网络连接超时的头号元凶。 Information Gap:New couriers start working while others hold outdated books, causing "Information Lag"—the #1 cause of connection timeouts.

在 Linux 内核中,传统的 iptables 就像这种效率低下的“查纸条”行为。

In the Linux kernel, traditional iptables acts exactly like this inefficient "paper-checking" behavior.

2015 年,Thomas Graf 意识到我们需要一种不再依赖“纸条”的技术。他利用 eBPF —— 这种被称为“内核 JavaScript”的黑科技,为快递公司安装了“刷脸识别”系统:数据包不再查看地址簿,而是通过身份(Identity)直接通行。

In 2015, Thomas Graf realized we needed a technology that moved beyond "slips." He leveraged eBPF — the "JavaScript for the Kernel" — to install a "Face ID" system: packets no longer check address books; they pass based on Identity.

💡 历史瞬间:Cilium 团队在瑞士山间的浓雾中意识到,如果看不见前方的路(可观测性),再快的速度也毫无意义。这便有了后来的 Hubble。

💡 Historical Moment: In the thick fog of the Swiss mountains, the Cilium team realized that without visibility (observability), speed is meaningless. Thus, Hubble was born.

技术进化:从地址到身份
Evolution: Address to Identity
📄
传统:基于 IP 查表 Legacy: IP Lookup O(N)
🆔
Cilium:基于身份识别 Modern: Identity Based O(1)
网络编程延迟 (秒) Network Programming Latency (s)
iptables (O(N)) Cilium eBPF (O(1)) Service/Pod 数量 (Scale) Number of Services/Pods
图 1.1: 传统方案在规则膨胀时的性能断崖 Fig 1.1: Performance cliff of legacy solutions

📜 Cilium 进化史:从内核插件到云网络基石 (2015-2025)

📜 Evolution: From Kernel Plugin to Infrastructure Bedrock

2015 - Origin

Thomas Graf 在 Linux 内核峰会上展示了 eBPF 替代 iptables 的构想,Cilium 项目正式立项。

Thomas Graf introduced the vision of replacing iptables with eBPF; Cilium project was born.

2017 - Cilium 1.0

第一个稳定版发布,引入了身份感知(Identity-aware)安全策略,震惊 CNI 社区。

1.0 released, introducing Identity-aware security policies that redefined CNI standards.

2021 - The Cloud Standard

Google Cloud 与 Azure 相继宣布采用 Cilium 作为托管 K8s (GKE/AKS) 的默认数据面。

Google Cloud and Azure adopted Cilium as the default dataplane for GKE and AKS.

2023 - CNCF Graduation

Cilium 正式从 CNCF 毕业,标志着其在大规模生产环境中的成熟度达到最高等级。

Cilium graduated from CNCF, signaling peak maturity for massive production environments.

2025 - The AI Era

发布十周年。Cilium 成为 OpenAI 等 AI 巨头支撑万级 GPU 集群的唯一网络选型。

10th Anniversary. Cilium becomes the backbone for 10k+ GPU clusters at AI giants like OpenAI.

02. 共同认知:云原生哲学与物流底座

02. Common Ground: Cloud Native Philosophy & Foundations

云原生(Cloud Native)本质上是一种利用云计算优势来设计、构建和运行应用程序的现代方法论。它不是简单地把传统应用搬到云上,而是从根本上重新思考软件如何在云环境中高效、可靠地工作。核心哲学可以总结为“弹性、自动化和可移植性”:应用程序应该像云一样,能自动扩展、快速恢复、自愈,并且不受底层硬件或环境的限制。

Cloud Native is a modern methodology designed to leverage cloud advantages. It's a fundamental rethinking of software efficiency and reliability. The core philosophy: Elasticity, Automation, and Portability—allowing apps to scale, recover, and self-heal independently of underlying hardware.

弹性 (Elasticity)

Elasticity

随包裹量(负载)自动增减快递员(Pod)数量,确保资源利用率最优化。

Auto-scale courier (Pod) counts based on package volume (load) for resource optimization.

自动化 (Automation)

Automation

全流程机器决策与生命周期管理,彻底消除人工误操作引入的系统性风险。

Machine-driven lifecycle management, eliminating systemic risks from manual errors.

可移植性 (Portability)

Portability

解耦底层硬件,一套标准架构无论在私有云还是公有云,表现始终如一。

Decoupled from hardware; consistent performance across private and public clouds.

标准化流水线 (The Pipeline)

Standardized Pipeline

云原生应用通过精确、结构化的流程构建,就像搭乐高积木:每个部分模块化,确保了可重复性与可审计性。实际中,借助 Helm 或 Operator,整个过程从代码提交到上线只需几分钟。

Cloud Native apps are built via structured processes, much like Lego: every part is modular, ensuring repeatability and auditability. With Helm or Operators, the entire path from commit to live takes only minutes.

🏗️

构建 (Build)

Build

用 Docker 编写 Dockerfile
构建镜像并推送到仓库

Write Dockerfile
Build & Push Images

📜

配置 (Config)

Config

编写声明式 YAML 文件
定义网络与安全策略

Declarative YAML
Define Network & Security

🚀

部署 (Deploy)

Deploy

CI/CD 自动化管道
测试并部署到环境

CI/CD Pipelines
Test & Deploy to Runtime

⚙️

运行 (Run)

Run

监控系统状态
实现自动扩缩与自愈

Monitor Status
Auto-scale & Self-heal

Kubernetes 与 Pod:分布式指挥系统

Kubernetes & Pods: The Distributed Orchestration System

作为现代容器化工作负载的标准管理平台,Kubernetes 负责维护系统的期望状态。它通过控制平面(Control Plane)进行全局决策,并将任务分发给工作节点(Worker Nodes)。

As the standard platform for containerized workloads, Kubernetes maintains the desired state of the system, making global decisions via the Control Plane and distributing tasks to Worker Nodes.

原子单元:Pod

The Atomic Unit: Pod

Pod 是 K8s 的最小调度单位,它封装了一个或多个容器并共享网络命名空间(NetNS)。这意味着 Pod 内的所有容器共用相同的 IP 和 MAC 地址,通过 localhost 即可高效通信。

A Pod is the smallest unit in K8s, encapsulating containers that share a Network Namespace (NetNS). All containers in a Pod share the same IP/MAC address and communicate via localhost.

易失性特征 (Ephemeral)

Ephemeral Nature

Pod 是临时的。它们随业务需求被频繁创建或销毁。由于每个 Pod 都会获得唯一的集群 IP,但重建后 IP 会发生变化,这种“易失性”对网络寻址提出了巨大挑战。

Pods are temporary and rescheduled based on demand. While each gets a unique IP, this IP changes upon recreation, posing a significant challenge for stable network addressing.

服务 (Service):稳定的访问入口

Service: The Stable Access Portal

为了应对 Pod IP 的不确定性,Kubernetes 引入了 Service 抽象层。它通过标签选择器(Label Selectors)动态关联一组后端 Pod,并对外暴露统一的入口。

To handle Pod IP instability, Kubernetes uses the Service abstraction. It leverages Label Selectors to associate with a set of back-end Pods, providing a unified access point.

负载均衡与虚拟 IP

Load Balancing & VIP

Service 提供长久稳定的虚拟 IP(ClusterIP)。无论后端 Pod 如何漂移,访问者的流量都会被均匀分发到健康的实例上,确保了服务发现的连续性。

Services provide stable Virtual IPs (VIPs). Regardless of Pod rescheduling, traffic is evenly distributed to healthy instances, ensuring continuous service discovery.

暴露类型

Exposure Types

除了内部访问的 ClusterIP,还支持通过 NodePort(节点端口)或 LoadBalancer(云负载均衡器)将服务安全地公开给外部世界。

Beyond internal ClusterIP, services can be exposed via NodePort or LoadBalancer to the external world securely.

CNI:容器网络插件规范

CNI: Container Network Interface Specification

Kubernetes 本身不提供网络连通性,而是通过 CNI 规范 赋予插件管理权。插件负责处理网络接口的建立、IP 分配及策略执行。

Kubernetes itself doesn't provide networking; it relies on the CNI specification to allow plugins to manage interface establishment, IP allocation, and policy enforcement.

连通性与 IPAM

Connectivity & IPAM

利用 veth pair 技术建立容器与宿主机的隧道。IPAM 机制充当“DHCP 服务器”,确保每个 Pod 在 PodCIDR 网段中拥有唯一的身份标识。

Utilizing veth pair technology to tunnel containers to hosts. IPAM acts as a "DHCP server," ensuring each Pod has a unique identity within the PodCIDR range.

生命周期指令集

Lifecycle Operations

标准定义了四个核心动作:ADD(配置接口/IP)、DEL(释放资源)、CHECK(配置检查)及 VERSION(规范匹配)。

The spec defines four core actions: ADD (setup interface/IP), DEL (release resources), CHECK (verification), and VERSION (compatibility).

📜 Kubernetes 网络模型:帝国的基本法

📜 The Kubernetes Network Model: The Constitution

在引入任何 CNI 插件之前,必须理解 Kubernetes 强制执行的三条核心原则,这确保了 Pod 像虚拟机或物理机一样易于管理:

  • IP 唯一性:每个 Pod 拥有集群内唯一的 IP 地址。
  • IP Uniqueness: Each Pod gets its own unique cluster-wide IP address.
  • 直接通信:Pod 间通信无需经过地址转换 (NAT)。
  • Direct Communication: Pods communicate without the use of proxies or NAT.
  • 节点互通:宿主机上的 Agent (如 Kubelet) 必须能访问该节点的所有 Pod。
  • Node Connectivity: Agents on a node can communicate with all pods on that node.

通信模型:网络任务的本质

Networking Model: The Essence of Tasks

在任何系统中,网络的核心任务只有三点:寻址(我要去哪)转发(怎么过去)以及规则(我能去吗)。根据通信边界,我们将其分为:

In any system, networking revolves around Addressing (Where?), Forwarding (How?), and Rules (Can I?). Based on boundaries, we categorize traffic as:

⬅️ 东西向流量 (East-West)

⬅️ East-West Traffic

集群内部服务间的对话。这是微服务架构的基石,流量巨大且对延迟极度敏感,通常在扁平的二层或三层网络中直接寻址。

Communication between services within the cluster. This foundation of microservices is high-volume and latency-sensitive, using direct addressing.

⬆️ 南北向流量 (North-South)

⬆️ North-South Traffic

外部用户与内部服务间的进出口通信。由 Ingress 管理入站流量的安全边界,Egress 负责服务访问外部资源的安全审计。

Traffic between external users and internal services. Ingress manages security boundaries, while Egress handles outbound auditing.

CNI 与 Kubernetes 的协作生命周期

CNI & Kubernetes Collaboration Lifecycle

CNI 与 Kubernetes 的协作是一个高度标准化的自动化流程。由于系统本身不提供内置的网络实现,它通过 CNI 规范将网络配置的控制权委派给插件,确保每个 Pod 都能在瞬息万变的集群中获得唯一的身份标识,并构建全互联的通信平面。

The collaboration between CNI and Kubernetes is a highly standardized automated process. Since the system does not provide built-in networking, it delegates configuration control to plugins via the CNI specification, ensuring every Pod receives a unique identity and forms a fully interconnected communication plane.

1. 触发阶段 (Trigger)

1. Trigger Phase

调度与唤醒:当 Pod 被调度至特定节点后,该节点上的 Kubelet 进程会监听到调度事件。Kubelet 并不直接操作容器网络,而是通过 CRI(容器运行时接口)调用 containerdCRI-O 来准备 Pod 的执行环境。

Scheduling & Awakening: Once a Pod is assigned to a node, the local Kubelet detects the event. Instead of handling networking directly, it uses the CRI to invoke containerd or CRI-O to prepare the execution environment.

2. 调用阶段 (Invoke)

2. Invoke Phase

配置指令发送:在 Pod 基础设施层启动后,容器运行时会读取预设的 CNI 配置文件,并向相应的 CNI 插件发送 ADD 指令。指令中包含了 Pod 名称、命名空间以及至关重要的网络配置元数据。

Configuration Command: After the infrastructure layer is up, the container runtime reads CNI config files and sends an ADD command to the plugin, carrying metadata like Pod name, namespace, and network specs.

3. 执行阶段 (Execution)

3. Execution Phase

链路构建与分配:这是最核心的步骤。插件会执行以下动作:
• 创建 Network Namespace 实现逻辑隔离。
• 建立 veth pair 打通数据链路。
• 调用 IPAM 模块分配唯一的 Pod IP。
• 在内核中加载 eBPF 程序或过滤规则执行策略。

Link Building & Allocation: The core execution step:
• Creating Network Namespaces for isolation.
• Establishing veth pairs for the data link.
• Calling IPAM to allocate a unique Pod IP.
• Loading eBPF programs or filter rules in the kernel.

4. 反馈阶段 (Feedback)

4. Feedback Phase

状态同步上线:配置完成后,插件将包含 IP 地址和接口信息的 JSON 结果反馈给运行时。Kubelet 随后向 API Server 更新 Pod 状态。至此,Pod 状态由 Pending 转为 Running,正式接收业务流量。

State Sync & Launch: Upon completion, the plugin returns a JSON result with IP and interface info. Kubelet then updates the API Server. The Pod transitions from Pending to Running, ready for traffic.

5. 销毁阶段 (Destroy)

5. Destroy Phase

资源回收清理:当 Pod 生命周期终结时,运行时会发送 DELETE 指令。CNI 插件随即执行反向操作:回收 IP 地址、删除宿主机上的虚拟接口,并清理内核中的转发规则,确保系统资源彻底释放。

Resource Reclamation: At the end of the lifecycle, a DELETE command is sent. The CNI plugin performs reverse operations: reclaiming IPs, deleting virtual interfaces, and clearing kernel forwarding rules.

03. 传统容器网络的三大痛点

03. Three major pain points of traditional container networks

在云原生架构普及的今天,基于 Linux 传统内核组件(如 iptables)的方案正面临前所未有的挑战。这些挑战不仅是技术瓶颈,更是业务增长的“隐形枷锁”。

As cloud-native architectures become ubiquitous, legacy Linux kernel components (like iptables) face unprecedented challenges, acting as "invisible shackles" on business growth.

1. 大规模下的性能与扩展性瓶颈

1. Performance & Scalability Bottlenecks

● 传统规则的线性延迟 O(n)● Linear Latency O(n)

Kubernetes 依赖的 iptables 处理规则的复杂度为 $O(n)$。随着服务和 Pod 数量增加,内核必须按顺序扫描成千上万条规则,导致网络延迟随规模线性增长。

Legacy K8s networking relies on iptables with $O(n)$ complexity. As Services/Pods grow, the kernel scans thousands of rules sequentially, causing latency to grow linearly.

● 高频变动的更新开销● High-Churn Update Overhead

在“高流转”环境中,Pod 的状态变化要求规则在所有节点刷新。在大规模集群中这可能耗时数分钟,导致严重的网络状态不一致。

In high-churn environments, Pod changes force rule refreshes across all nodes. In large clusters, this can take minutes, causing severe network state inconsistency.

AI/GPU 负载挑战AI/GPU Impact

现代 AI 训练集群要求极低延迟抖动 (Jitter)。即便几毫秒的波动,也会让数千张 GPU 在同步时集体停顿,极大浪费昂贵算力资源。

AI training clusters demand ultra-low Jitter. Millisecond fluctuations can stall thousands of GPUs during sync, wasting expensive compute resources.

2. 动态环境下的安全与策略管理

2. Security & Policy Management

● IP 地址的不可靠性● Unreliability of IP Addresses

Pod 是瞬态的(Cattle 而非 Pets),IP 地址随重启频繁变动。传统的、基于 IP 地址的防火墙规则在云原生环境中几乎无效且难以维护。

Pods are cattle, not pets; IPs change constantly. Traditional IP-based firewall rules are practically useless and unmaintainable in dynamic environments.

● 应用层 (L7) 策略缺失● Lack of L7 Policies

传统工具只工作在 L3/L4。无法精细控制“仅允许前端调用后端的特定 API 路径”,东西向流量的访问控制成为巨大挑战。

Legacy tools only work at L3/L4. Implementing Zero Trust for East-West traffic (e.g., restricting specific API paths) is a major challenge.

零信任现状Zero Trust Reality

绝大多数流量发生在集群内部(东西向)。如何在不依赖 IP 的情况下实现细粒度访问控制,是传统边界防火墙的死角。

Most traffic is East-West. Achieving fine-grained access control without IP reliance is a blind spot for edge firewalls.

3. 多层抽象带来的可观测性挑战

3. Observability & Visibility Challenges

● “盲人摸象”的调试困境● The "Finger-pointing" Problem

网络跨越物理、虚拟、容器多层,运维人员难以判断异常发生在哪一层,导致所谓的“指责游戏”。

Networking spans multiple layers. Isolating root causes is difficult, leading to the "finger-pointing" game between teams.

● 缺乏业务语境的监控● Lack of Business Context

传统工具只看原始 IP,不知道流量代表哪个微服务。在万级 Pod 中,这种缺乏标识的数据产生严重的“信噪比”问题。

Legacy tools see raw IPs, not microservices. In massive clusters, data without identity leads to high noise and difficult troubleshooting.

SRE 深度视界SRE Deep Vision

17.4% 的用户因功能或性能 Bug 处于生产受阻状态。没有内核级的细粒度监控,我们就无法追踪跨微服务的“数字指纹”。

17.4% of users are production-blocked by performance bugs. Tracing digital fingerprints across services requires deep kernel-level visibility.

💡 类比理解:当物流帝国陷入危机 💡 Analogy: When the Logistics Empire in Crisis

📄

性能:文书风暴

Performance: Paperwork Storm

每增加一个员工,总部都要重印万份名单发给所有人。当人员流动过快,快递员忙于看名单而无暇送货,全城陷入停摆。

Each new hire forces the office to reprint 10k lists for everyone. High turnover means couriers spend more time reading lists than delivering.

🪪

安全:消失的身份

Security: Missing Identity

员工频繁更换制服与编号(IP),你无法通过“红衣服”来拦截禁运品。必须建立一套不可伪造的“刷脸”身份识别系统。

Staff change uniforms and numbers (IP) constantly. You can't block contraband by "red shirt"; you need an unforgeable "Face ID" system.

📦

观测:分拣中心的黑盒

Observability: Sorting Center Blackbox

包裹在复杂的传送带(网络层)里流转,一旦失踪,你根本看不见它卡在哪个环节,更不知道包裹里装的是什么。

Parcels flow through complex belts (layers). If lost, you can't see where it's stuck or what's inside the package.

在云原生架构普及的今天,传统的网络方案正面临前所未有的挑战。这些挑战并非源于设计失误,而是由于底层工具在面对动态、大规模集群时,触及了其设计的算法边界。

As cloud-native architectures become ubiquitous, traditional networking solutions face unprecedented challenges. These are not due to design flaws but rather the underlying tools hitting their algorithmic limits in dynamic, large-scale clusters.

为什么 Kubernetes 依赖 iptables 实现 Kube-proxy?

Why Kubernetes Relies on iptables for Kube-proxy?

从系统设计的本质来看,Kubernetes 的 Service 机制是将一个稳定的虚拟 IP (ClusterIP) 映射到一组动态变化的 Pod IP。为了实现这种流量重定向,我们需要在内核层面引入一个高效的“拦截器”与“转换器”。

Essentially, the Kubernetes Service mechanism maps a stable Virtual IP (ClusterIP) to a dynamic set of Pod IPs. To implement this traffic redirection, we need an efficient "interceptor" and "translator" at the kernel level.

内核拦截与 DNAT

Kernel Interception & DNAT

Linux 内核的 Netfilter 框架是处理数据包最成熟的方案。当数据包发往虚拟的 ClusterIP 时,iptables 规则会拦截它并执行目标地址转换(DNAT),将其重定向到真实的 Pod IP。

The Netfilter framework in the Linux kernel is the most mature packet handling solution. When a packet targets a Virtual IP, iptables rules intercept it and perform DNAT to redirect it to a real Pod IP.

历史权衡与兼容性

Historical Trade-off & Compatibility

在 K8s 早期,iptables 是唯一既能做 NAT 又能做包过滤,且随处可见的标准组件。尽管它不是为超大规模容器场景设计的,但在当时它是实现 Service 抽象最可靠的落地方案。

In the early days of K8s, iptables was the only standard tool capable of both NAT and filtering available everywhere. While not designed for massive scale, it was the most reliable way to implement Service abstractions.

iptables 运行机制与大规模集群的性能瓶颈

iptables Mechanisms & Large-Scale Performance Bottlenecks

iptables 的本质是一个线性的规则链表。当集群规模从几十个节点增长到成千上万个节点时,这种结构会导致性能的崩塌。

iptables is fundamentally a linear linked list of rules. When a cluster scales from dozens to thousands of nodes, this structure leads to a complete collapse in performance.

线性匹配的代价:$O(N)$

Linear Matching Cost: $O(N)$

内核处理每个数据包时,必须从第一条规则开始逐一比对。在大规模集群中,规则可能多达数万条。这种线性遍历会导致 CPU 消耗激增,并直接推高请求的端到端延迟。

The kernel must check every packet against rules one by one. In large clusters with tens of thousands of rules, this linear traversal spikes CPU usage and significantly increases end-to-end latency.

全量更新的开销

Full Table Update Overhead

每当 Pod 发生变动,整个 iptables 规则表都需要被全量读取、修改并写回。在动态调度的环境下,这种高频的原子更新操作会导致明显的网络抖动和系统响应变慢。

Whenever a Pod changes, the entire iptables table must be read, modified, and rewritten. In dynamic environments, these frequent atomic updates cause noticeable network jitter and slower system response.

04. Cilium 的内核革命:架构之美

04. The Kernel Revolution: The Beauty of Architecture

从蜂巢的结构稳定性到 eBPF 的极致执行效率,探索 Cilium 如何重塑云原生边界。

From the structural stability of the honeycomb to the ultimate execution efficiency of eBPF, explore how Cilium reshapes cloud-native boundaries.

精准定义:不只是 CNI

Precise Definition: More than just a CNI

Cilium 是一个开源的云原生解决方案,旨在通过革命性的内核技术 eBPF,为工作负载提供安全防护并观察其网络连通性。它超越了传统的 CNI 插件范畴,构建了一个集成了网络、可观测性和安全性的综合平台,能够透明地保护 HTTP、gRPC、Kafka 等高级协议。

Cilium is an open-source cloud-native solution designed to secure and observe network connectivity between workloads using the revolutionary kernel technology eBPF. It goes beyond traditional CNI plugins, building a comprehensive platform for networking, observability, and security that transparently protects protocols like HTTP, gRPC, and Kafka.

技术类比:内核界的 JavaScript Analogy: The JS of the Kernel

如果说 JavaScript 之于浏览器,使其从静态页面变为交互式应用;那么 eBPF 之于内核,则使其从固化的执行逻辑变为可编程的灵活底座。Cilium 正是利用这种“脚本化”能力,在不改变内核源码的情况下,动态注入网络逻辑。

If JavaScript is to the browser, turning static pages into interactive apps; eBPF is to the kernel, turning rigid execution into a programmable foundation. Cilium leverages this "scripting" capability to dynamically inject network logic without modifying kernel source code.

架构类比:从“检查站”到“直达专线” Analogy: Checkpoints vs. Express Lane

传统网络(iptables)像是在一条公路上设置了成千上万个检查站,包裹必须逐个排队核验;而 Cilium/eBPF 则为数据包开辟了“直达专线”,通过在内核挂钩点(Hooks)直接执行高效程序,实现极低延迟。

Traditional networking (iptables) is like thousands of checkpoints on a highway; packets must queue at each one. Cilium/eBPF creates an "Express Lane," executing efficient programs directly at kernel hooks for ultra-low latency.

组件全景:控制面与数据平面的交响乐

Landscape: Symphony of Control & Data Planes

Cilium 的架构设计严格遵循“控制与转发分离”的原则,围绕内核态加速用户态编排的理念展开。这种解耦不仅保证了极高的运行效率,还通过模块化设计实现了集群级的可扩展性。

Cilium's architecture strictly follows the "Control and Forwarding Separation" principle, centered around Kernel Acceleration and User-Space Orchestration. This decoupling ensures high efficiency and cluster-wide scalability.

用户空间 (The Brain) User Space (The Brain)
Cilium AgentCilium Agent

运行于每个节点的核心守护进程。负责监听 Kubernetes 事件,管理 eBPF 程序的整个生命周期,并将策略实时编译为内核字节码。

The core daemon on each node. Listens to K8s events, manages the eBPF lifecycle, and compiles policies into kernel bytecode.

Cilium OperatorCilium Operator

集群级控制器,负责 IP 地址管理 (IPAM)、身份分配、证书轮换等全局任务。默认使用 Kubernetes CRD 存储集群状态。

Cluster-wide controller. Handles global tasks like IPAM, identity allocation, and certificate rotation using Kubernetes CRDs.

Hubble & Envoy

观测层。Hubble 提供实时流量可视化;Envoy 作为辅助高性能代理,用于执行复杂的 Layer 7 (应用层) 策略。

Observability layer. Hubble provides flow visibility, while Envoy acts as a high-performance proxy for L7 policies.

桥梁:eBPF Maps (O(1) 效率)
Bridge: eBPF Maps (O(1) Efficiency)
内核空间 (The Muscle) Kernel Space (The Muscle)
eBPF Datapath Execution Engine (JIT Compiled - 接近原生性能)
eBPF Datapath Execution Engine (JIT Compiled - Near-Native Performance)
XDP Hook
Traffic Control
Socket Hook

eBPF 程序:Cilium 的大脑

eBPF Program: The Brain of Cilium

直接驻留在内核中处理网络包。通过 JIT (Just-In-Time) 编译,字节码被转换为机器码执行,效率接近原生内核模块。它支持多种战略级挂载点:

Resides in the kernel to process packets. Bytecode is converted to machine code via JIT compilation for near-native efficiency.

  • XDP: 在网络驱动层最早阶段处理数据包,极速防御 DDoS。
  • XDP: Processes packets at the driver stage for ultra-fast DDoS protection.
  • TC (Traffic Control): 用于容器间流量的精细化负载均衡与限速。
  • TC: Used for granular load balancing and rate-limiting between containers.
  • Socket: 实现 Layer 7 应用层协议解析与加密策略。
  • Socket: Enables L7 protocol parsing and encryption policies.

跨集群互联:Cluster Mesh

Multi-Cluster: Cluster Mesh

Cluster Mesh 是 Cilium 的跨集群方案。它通过加密隧道打破了物理边界,实现不同 Kubernetes 集群间 Pod 的直接安全通信。

Cluster Mesh is the multi-cluster solution. It breaks physical boundaries via encrypted tunnels for secure direct Pod-to-Pod communication.

CNI 插件职责: 符合 CNI 标准,当 Pod 创建时,它就像“帝国工程队”,自动配置网络命名空间并挂载 eBPF 规则,确保“即插即用”。

CNI Plugin Role: Acts as the "Engineering Crew" to auto-configure namespaces and BPF rules during Pod creation for plug-and-play networking.

寻址革命:从 $O(N)$ 到 $O(1)$ 的算法跨越

Lookup Revolution: Scaling from $O(N)$ to $O(1)$ Complexity

要彻底解决大规模集群的网络延迟,必须在算法复杂度上实现质的飞跃。传统的线性搜索机制在规则爆炸时必然导致性能崩塌,而 Cilium 通过引入内核原生的高效数据结构,实现了寻址效率的常量化。

To fundamentally solve network latency in large-scale clusters, an algorithmic leap is required. While linear search mechanisms collapse under rule explosion, Cilium achieves constant-time lookup by leveraging kernel-native data structures.

BPF Map:内核态的“即时索引”

BPF Map: Instant Kernel-Space Indexing

BPF Map 是内核空间与用户空间共享的高效键值对存储。它充当了 Cilium 的动态数据库,存储着 Service IP 到后端 Pod IP 列表的映射关系。与静态规则不同,它是数据驱动的,且支持毫秒级的原子更新。

BPF Maps act as efficient Key-Value stores shared between kernel and user space. Serving as Cilium's dynamic database, they map Service IPs to Pod IP lists, supporting millisecond-level atomic updates.

转发机制对比:遍历 vs 寻址

Mechanism Comparison: Traversal vs. Addressing

iptables 方式: 内核必须逐行比对规则列表。如果有 10,000 条规则,数据包可能需要匹配 10,000 次,复杂度为 $O(N)$。
Cilium 方式: eBPF 程序直接将目标 IP 提取为 Key,在哈希表中进行单次精准查询,复杂度为 $O(1)$。

iptables: The kernel traverses a rule list. With 10,000 rules, a packet might need 10,000 matches ($O(N)$).
Cilium: eBPF extracts the destination IP as a Key and performs a single hash lookup ($O(1)$).

结果:与集群规模解耦

Result: Decoupling from Cluster Scale

在 $O(1)$ 模式下,无论集群中有 10 个 Service 还是 100,000 个 Service,哈希查找的时间开销几乎完全一致。这意味着网络延迟不再随节点数线性增长,为超大规模 AI 训练和微服务架构提供了稳定的底座。

Under $O(1)$, lookup time remains constant whether there are 10 or 100,000 services. Network latency no longer scales with node count, providing a stable foundation for massive AI and microservice workloads.

技术证据:iptables 规则链与 eBPF 映射表

Technical Evidence: iptables Rule Chains vs. eBPF Maps

为了理解效率为何会发生代差级跃迁,我们需要对比观察 Linux 内核中两种截然不同的数据结构。

To understand why efficiency undergoes a generational leap, we must compare two distinct data structures within the Linux kernel.

iptables:顺序匹配的“线性清单”

iptables: Sequential "Linear Checklist"

iptables 负载均衡规则深度解析
Detailed iptables Load Balancing Rule Analysis
/* * 当请求到达 Service 时,内核必须按顺序逐行比对概率。 * When a request hits a Service, the kernel must check probabilities line-by-line. */ -A KUBE-SVC-ABC -m statistic --mode random --probability 0.33 -j KUBE-SEP-1 /* 如果前 33% 没中,则判断下 50%... If first 33% miss, check next 50%... */ -A KUBE-SVC-ABC -m statistic --mode random --probability 0.50 -j KUBE-SEP-2 -A KUBE-SVC-ABC -j KUBE-SEP-3 /* * 复杂度 O(N):规则条数与 Pod 副本数成正比。在大规模集群中,这种线性遍历会导致 CPU 剧烈震荡。 * Complexity O(N): The number of rules scales with Pod counts. In large clusters, this causes CPU spikes.

Cilium eBPF:基于哈希的“即时索引”

Cilium eBPF: Hash-based "Instant Index"

Cilium BPF Map 实时条目查看 (ipcache)
Real-time Cilium BPF Map Entry (ipcache)
# cilium-dbg map get cilium_ipcache Key (Destination IP) Value (Metadata/Identity) 172.16.1.158/32 identity=6 tunnel=10.75.59.82 172.16.2.156/32 identity=27858 tunnel=0.0.0.0 /* * 复杂度 O(1):内核直接将目标 IP 提取为 Key,在哈希表中进行单次精准查询。 * Complexity O(1): The kernel extracts Dest IP as the Key and performs a single hash lookup. * 无论条目有 20 条还是 20 万条,查找耗时几乎恒定。 * Lookup time is constant whether there are 20 or 200,000 entries. */

内核中的 Map 管理全景 (bpftool 视角)

In-Kernel Map Management (via bpftool)

Cilium 通过多种专用 Map 维护集群状态。例如 cilium_lb4_services_v2 管理服务映射,cilium_lxc 管理本地端点。这种数据驱动的架构,让内核能够像高性能数据库一样瞬间完成决策。

Cilium maintains state via specialized Maps. e.g., cilium_lb4_services_v2 for services and cilium_lxc for local endpoints. This data-driven architecture lets the kernel make instant decisions like a high-perf database.

05. 核心价值:eBPF 驱动的云原生进化

05. Core Value: eBPF-Driven Cloud-Native Evolution

WHY

传统模型正在撞上“规模之墙”

Traditional Models Hitting the "Scalability Wall"

传统的 Kubernetes 网络依赖于 iptables,这在处理大规模、动态的云原生环境时已捉襟见肘:

Legacy K8s networking relies on iptables, which struggles in massive, dynamic cloud-native environments:

🚀 突破 $O(N)$ 性能瓶颈

🚀 Breaking $O(N)$ Bottlenecks

iptables 依赖线性规则扫描,规则越多延迟越高。Cilium 利用 eBPF 哈希表 实现了常数级查找时间 ($O(1)$),无论集群有 10 个还是 10,000 个服务,性能始终如一。

iptables uses linear rule scans (higher latency with more rules). Cilium leverages eBPF hash maps for $O(1)$ lookup, ensuring consistent speed at any scale.

🛡️ 弃用脆弱的 IP 策略

🛡️ Deprecating Fragile IP Policies

Pod IP 随重启而变。Cilium 抛弃了不稳定的“易逝门牌号”,基于 身份 ID 进行策略匹配,实现了真正的逻辑身份隔离。

IPs are volatile. Cilium ditches "ephemeral door numbers" in favor of stable identity IDs for logic-based security enforcement.

🔍 消除可观测性盲区

🔍 Eliminating Observability Blindspots

传统工具无法解析微服务间的 L7 语义(如 gRPC 路径、Kafka 主题)。Cilium 提供了针对应用层协议的“X 光视野”,让一切透明化。

Legacy tools miss L7 semantics (gRPC, Kafka). Cilium provides "X-ray vision" for application protocols, making cross-service calls fully visible.

HOW

我们如何重塑内核?—— 内核态“硬加速”

How we remake the kernel? — In-Kernel "Hard Acceleration"

类比:从“检查站排队”到“内核直达专线”

Analogy: From "Checkpoint Queues" to "Kernel Express Lane"

传统路径:iptables 顺序核验 (O(n)) Legacy: iptables Linear Check (O(n)) Severe Latency! Cilium 路径:eBPF 内核直达专线 (O(1)) Cilium: eBPF Kernel Express Lane (O(1)) O(1) Ultra-Fast Rules/Checkpoints Kernel Fast-Path
( 💡 左右滑动查看完整路径 / Swipe to view full path )

🔍 我们如何重塑内核?

🔍 How we remake the kernel?

  • XDP (大门口门卫):在网卡驱动层处理报文。坏人刚到门口就被踢出去,保护内核不受 DDoS 冲击。
  • XDP: Handles packets at the driver stage. Malicious traffic is blocked at the "gate" before hitting the kernel stack.
  • TC (玄关安检):在进入协议栈前的玄关处检查报文内容,执行身份验证和负载均衡。
  • TC: Performs inspection and enforcement at the protocol entry point (ingress/egress).
  • sockmap (暗门重定向):利用 Socket 层重定向。本地 Pod 通信像在两个相邻房间开了一扇暗门,数据直接闪传。
  • sockmap: Short-circuits local Pod communication via a "secret door" at the socket layer.

💡 深度解析:

Cilium 利用 eBPF 实现了 BIG TCP 支持,通过减少内核处理大包时的分片开销,将 IPv6 吞吐量提升了 40-50%,并显著降低了 CPU 占用。这不仅是查找效率的革命,更是数据链路层的一次全面重构。

💡 Deep Dive:

Cilium supports BIG TCP, boosting IPv6 throughput by 40-50% by reducing segmentation overhead. This isn't just a lookup revolution; it's a complete rethink of the datapath.

WHAT

最终交付了什么?—— 统一的网络、安全与观测平面

What is delivered? — Unified Network, Security & Observability

🏎️ 高性能网络 (Networking)

🏎️ High Performance Networking

  • 替代 kube-proxy:利用 Maglev 算法实现极速负载均衡。
  • BIG TCP:支持 IPv6 大包处理,将吞吐量提升 40-50%。
  • 直通 Underlay:原生 BGP 支持,直接与物理核心交换机对话。
  • kube-proxy replacement: Instant LB via Maglev hash.
  • BIG TCP: Boosts IPv6 throughput by 40-50%.
  • Native BGP: Direct integration with physical core switches.

🛡️ 零信任安全 (Security)

🛡️ Zero-Trust Security

  • L7 深层防御:拦截 gRPC 路径、HTTP 方法及 DNS 请求。
  • 透明加密:内核级 WireGuard/IPsec 自动加密,零业务感知。
  • 运行时安全 (Tetragon):毫秒级响应风险,一旦违规直接 sigkill。
  • L7 Deep Defense: Filter gRPC, HTTP methods, and DNS.
  • Transparent Encryption: In-kernel WireGuard/IPsec protection.
  • Tetragon: Millisecond response with auto-sigkill for threats.

🔍 全栈可观测性 (Observability)

🔍 Full-Stack Observability

  • 无代理观测:无需 Sidecar 即可抓取流量“黄金指标”。
  • 依赖图谱:Hubble 自动生成实时的微服务拓扑地图。
  • 精准排障:通过 eBPF 明确报告每个包的丢弃原因与具体内核位置。
  • Sidecar-less: Extract "Golden Signals" without proxies.
  • Service Maps: Hubble auto-generates real-time service topologies.
  • Drop Diagnosis: Pinpoints the exact reason and kernel point for drops.

现代流量治理:Gateway API

Modern Traffic Governance: Gateway API

Cilium 不仅仅是一个连接插件,更是复杂的流量编排引擎。它原生支持 Gateway API,通过 eBPF 实现内核级的 HTTP 路由、Header 注入及权重分流。这解决了传统 Ingress 模式下,因规则更新导致 Proxy 频繁重载(Reload)而产生的长尾延迟问题。

Cilium is more than a connectivity plugin; it's a sophisticated traffic orchestration engine. It natively supports the Gateway API, implementing kernel-level HTTP routing and header manipulation via eBPF. This eliminates the tail latency caused by frequent proxy reloads in traditional Ingress models during rule updates.

面向角色的资源模型

Role-oriented Resource Model

通过将配置拆分为 GatewayClass、Gateway 和 Route,实现了基础设施管理员与应用开发者的职责分离,极大降低了大规模集群中的配置冲突概率。

By splitting configs into GatewayClass, Gateway, and Routes, it separates duties between infrastructure admins and app developers, significantly reducing config conflicts in large clusters.

内核级无感知切换

In-Kernel Seamless Switching

流量调度逻辑直接注入 eBPF 程序中,无论是灰度发布还是 A/B 测试,流量切换都在内核中毫秒级完成,且不产生任何上下文拷贝开销。

Traffic scheduling logic is injected directly into eBPF programs. Whether canary releases or A/B testing, traffic switching happens in-kernel in milliseconds without any context-copying overhead.

06. 性能之巅:eBPF 的降维打击

06. Peak Performance: The eBPF Advantage

性能优化不仅仅是代码的精简,更是路径的重构。Cilium 通过将转发逻辑下沉至内核的最底端,彻底消除了数据包处理过程中的系统级冗余。

Performance optimization is more than code trimming; it's path restructuring. Cilium eliminates system-level redundancy by pushing forwarding logic to the lowest layers of the kernel.

为什么传统 CNI 慢?因为数据包每经过一个 veth pair,都要触发一次 Linux 内核的中断处理和协议栈完整遍历。
在万核 CPU 的今天,性能瓶颈不再是带宽,而是 CPU 缓存失效 (Cache Miss)内核/用户态切换。Cilium 的优化就是:让数据包在内核中“直接闪传”,不走回头路。它是在减少 CPU 的无效做功。这在 AI 训练等高频通信场景下,直接决定了 GPU 的有效利用率。

Why is traditional CNI slow? Each veth pair crossing triggers kernel interrupts and full stack traversals. In the era of multi-core CPUs, the bottleneck is Cache Misses and Context Switching. Cilium's principle: Let packets "teleport" within the kernel without retracing paths.It reduces the amount of wasted CPU work. In high-frequency communication scenarios such as AI training, this directly determines the effective utilization of the GPU.

构建“快速通道”:内核级的直接重定向

The Fast Track: Kernel-Level Redirection

XDP

硬件级线速处理

Hardware-level Line Speed

通过 XDP (eXpress Data Path),Cilium 在数据包甚至还没进入 Linux 内核协议栈之前就完成负载均衡。实验证明,CPU 负载可降低多达 72 倍

Using XDP, Cilium processes packets before they hit the kernel stack. Studies show CPU load can be reduced by up to 72x.

sockmap

Socket 层级“短路”

Socket-layer Short-circuit

对于同节点的 Pod 通信,Cilium 利用 sockmap 在 Socket 层面直接重定向报文,跳过了昂贵的内核上下文切换与每包 NAT 操作。

For intra-node Pod comms, sockmap redirects traffic at the Socket layer, bypassing expensive context switches and per-packet NAT.

BIG TCP

高带宽吞吐增强

High-Bandwidth Boosting

支持 IPv6 BIG TCP,允许内核处理超过 64KB 的超大报文,吞吐量提升 40-50%,完美适配 AI 训练等海量数据场景。

Supports IPv6 BIG TCP, allowing the kernel to handle massive packets over 64KB, boosting throughput by 40-50% for AI workloads.

架构红利:KPR 与 Sidecar-free

Architectural ROI: KPR & Sidecar-free

KPR 是 Cilium 实现极致性能的秘密武器

KPR is Cilium's secret weapon for peak performance.

Kube-proxy Replacement (KPR)的本质是利用 eBPF 在内核级别彻底替换传统的 kube-proxy 及其依赖的 iptables 规则。在帝国初期,iptables 是万能的,但随着快递员(Pod)规模的爆炸,这种“手动翻阅纸质名单”的模式已成为发展的最大枷锁。

It uses eBPF to replace the traditional kube-proxy and its iptables-based logic at the kernel level. While iptables served the empire well initially, it has become a bottleneck as the number of couriers (Pods) exploded.

Cilium 方案:$O(1)$ 极速转发

Cilium Solution: $O(1)$ Fast-Path

哈希查找:利用 eBPF 哈希表。无论集群中有 10 个还是 10,000 个服务,查询耗时始终恒定,性能不随规模衰减。

Socket 级重定向:在应用发起 connect() 调用时,Cilium 就在 Socket 层面执行了重写。这意味着数据包在生成前就确定了目标,彻底消除了每包 NAT 的开销。

Hash Tables: Uses eBPF maps for constant-time lookup. Performance remains static regardless of the service count.

Socket-level Steering: Cilium rewrites at the connect() syscall. Targets are determined before packet generation, eliminating per-packet NAT overhead.

KPR 部署模式对比

KPR Deployment Mode Comparison

Strict (严格模式)Strict Mode

完全禁用 kube-proxy,Cilium 接管所有负载均衡逻辑。性能最高,推荐生产首选。

Completely disables kube-proxy. Best performance, recommended for production.

Partial (部分模式)Partial Mode

Cilium 仅替换 NodePort 或 HostPort。与 kube-proxy 共存,适用于旧版内核。

Replaces only specific functions. Co-exists with kube-proxy for legacy kernels.

Disabled (默认)Disabled Mode

完全依赖传统的 kube-proxy 处理流量。

Relies entirely on legacy iptables-based kube-proxy.

Sidecar-free Service Mesh:解脱“边车”的负重

Sidecar-free Service Mesh: Relieving the Burden

传统的服务网格(如 Istio)依赖 Sidecar(边车) 模式:每个 Pod 都必须配备一个私人保镖(Envoy 代理)。 这意味着任何一封快递包离开宿舍,都必须穿过三次协议栈:快递员 → 保镖 → 宿主机内核 → 网线。

Legacy Service Meshes (e.g., Istio) rely on Sidecars: every Pod needs a private "guard" (Envoy proxy). Any packet must traverse the stack 3 times: App → Sidecar → Kernel → Network.

Cilium 的革命:将网格功能直接植入内核。L3/L4 任务(路由、加密、LB)由 eBPF 零开销完成;只有复杂的 L7 解析才按需重定向到节点级共享代理(Per-node Envoy)。

Cilium's Revolution: Mesh logic baked into the kernel. L3/L4 tasks are handled by eBPF with near-zero overhead; L7 parsing is redirected to a shared Per-node Envoy on-demand.

消灭边车后的“商业红利”

Economic ROI of Sidecar-free Architecture

AVG. SAVINGS
32% +

基于 1000 节点规模测算,大幅削减云端计算成本。

Based on a 1000-node cluster, significantly cutting cloud bills.

为什么能省钱? (The Economic Breakdown)

Why Save Money?

  • 内存冗余消除: 10,000 Pod = 500GB Envoy 浪费。Cilium 仅需微量 Node-level 内存。
  • Memory Redundancy: 10k Pods = 500GB Envoy waste. Cilium uses minimal node-level memory.
  • 指令路径缩减: 报文不再频繁跨越用户/内核边界,减少上下文切换。
  • Path Reduction: Packets no longer frequent User/Kernel boundary crossings.
  • 网关整合: Gateway API 替代了 F5/Nginx 南北向网关,削减 15% 硬件成本。
  • Gateway Consolidation: Replaces external gateways, cutting infra costs by 15%.

Cilium 性能基准测试量化数据

Cilium Benchmark Metrics

9.62 Gbps TCP 性能 TCP Throughput (Direct)
-35% P99 延迟降低 P99 Latency reduction
Unlimited Service 扩展规模 Service Scalability
72x CPU 效率提升 CPU Efficiency Gain
网络模式 Networking Mode MTU 1500 (Gbps) Gbps (MTU 1500) CPU 消耗 CPU Cost
Cilium (Direct Routing) 9.62 极低 (Minimal)
Calico (IPIP) 8.45 中 (Medium)
Flannel (VXLAN) 7.41 高 (Heavy)

* 数据源自 2021 CNI 基准测试。封装(Overlay)是性能损耗的元凶。

* Data sourced from 2021 CNI benchmarks. Overlay is a major cause of performance degradation.

( 💡 左右滑动查看完整数据 / Swipe for data )
Maglev LB

谷歌级 Maglev 负载均衡算法

Google-class Maglev LB Algorithm

在大规模集群中,节点重启或漂移会导致传统 LB 丢失连接状态。Cilium 引入了谷歌研发的 Maglev 一致性哈希算法
价值:即使后端节点频繁变化,Maglev 也能确保已有连接始终命中相同的后端,实现极致的连接一致性与极低的 CPU 查询开销。

Cilium implements Google's Maglev Consistent Hashing. It ensures connection stickiness and consistent backend targeting even during massive node churns, with minimal CPU overhead.

性能结语:Cilium 通过 eBPF 实现了网络处理从“软件定义”向“硬件加速感”的飞跃。这种优势在支撑 OpenAI 7,500 节点 的 GPU 集群时,已转化为实实在在的算力红利。

Performance Verdict: Cilium evolves networking from "software-defined" to "hardware-accelerated" feel. At OpenAI's 7,500-node scale, this efficiency directly translates into computational ROI.

07. 身份感知安全:终结 IP 的“数字迷宫”

07. Identity-based Security: Ending the IP "Digital Maze"

在云原生环境中,IP 地址已演变为“瞬态消耗品”。我们需要一套与底层网络拓扑完全解耦的零信任安全架构

In cloud-native, IPs are ephemeral. We need a Zero Trust architecture decoupled from network topology.

💺

传统网络:看座位号 (IP)

Legacy: The Seat Number

乘客(Pod)在机舱内频繁换座。空乘(防火墙)必须时刻盯着谁坐在哪。每当有人换座,空乘就得重写上万条规则。

痛点:这种基于“座位号”的核验不仅缓慢($O(n)$ 线性损耗),且极易在换座瞬间出现安全真空。

Passengers change seats frequently. Flight attendants (Firewalls) must track every move. Every swap requires rewriting 10k+ rules.

Pain: This "Seat-based" check is slow ($O(n)$ drain) and creates security gaps during the swap moment.

🎫

Cilium:验登机牌 (Identity)

Cilium: The Boarding Pass

无论你坐在哪个位置(IP),你手里握着的“登机牌”(身份标签)是唯一的。内核只认牌子不认座,只要你是“VIP”,任何位置都能获得对应权限。

红利:规则判定不随 IP 变化,集群扩容 100 倍,安全判定的延迟依然维持在常数级 $O(1)$

No matter where you sit (IP), your "Boarding Pass" (Identity ID) is persistent. The kernel checks the pass privileges.

Bonus: Rules don't shift with IPs. Even with 100x cluster growth, check latency remains constant at $O(1)$.


深度机制:身份标识的内核生命周期

Mechanism: The Lifecycle of Identity

1. 数字化身份分层分配 (Layered Allocation) 1. Layered Digital Identity Allocation

Cilium 将标签哈希为 uint32 数值索引。为了兼顾效率与兼容性,身份被严格划分为不同段位: Cluster-local(16位空间)确保单集群极速性能; ClusterMesh(24位空间)利用高 8 位作为 cluster-id 实现跨集群寻址。 此外,Identity 0 具有特殊语义,在 Hubble 中代表未知,而在 eBPF 路径中作为通配符(Wildcard)存在。

Cilium hashes labels into uint32 indices. Identities are segmented for efficiency: Cluster-local (16-bit) for intra-cluster speed; ClusterMesh (24-bit) using high 8-bits as cluster-id. Note that Identity 0 acts as a wildcard in eBPF datapath and represents "not found" in Hubble.

2. 全程报文标记与透传 (Tagging) 2. Packet Tagging & Transparency

该 ID 会被无感知地注入 VXLAN 封装头部 或在内核数据结构(skb)中携带。报文从此自带“逻辑身份证”,跨节点通信时接收端内核能瞬间提取源身份。

The ID is transparently injected into VXLAN headers or carried in kernel structures (skb). Every packet carries a "logical ID card," allowing receivers to identify the source instantly across nodes.

3. 常数级 $O(1)$ 判定 (Enforcement) 3. Constant-time $O(1)$ Enforcement

接收端内核 eBPF 程序将 Source Identity 作为 Key,直接在 BPF Policy Map(哈希表)中执行单次检索。这种机制从根本上解决了规则多、延迟高的行业痼疾。

The receiver's eBPF program uses the Source Identity as a Key to perform a single lookup in the BPF Policy Map. This eliminates the latency scaling issue found in legacy firewalls.


零信任实践:从端口到 API 的深度控制

Zero Trust in Practice: Port to API Control

L3/L4 身份感知策略 (identity-aware.yaml)
L3/L4 Identity Aware Policy (identity-aware.yaml)
apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: "secure-payment-access" spec: endpointSelector: matchLabels: app: payment-db ingress: - fromEndpoints: - matchLabels: app: payment-service # 基于逻辑标签而非 IP# Based on Logical Tags not IP env: prod toPorts: - ports: - port: "6379" protocol: TCP
L7 API 级深层策略 (l7-api-aware.yaml)
L7 API level deep policy (l7-api-aware.yaml)
ingress: - fromEndpoints: - matchLabels: app: frontend toPorts: - ports: - port: "80" protocol: TCP rules: http: - method: "GET" path: "/v1/public/.*" # 允许 GET 公开路径# Permit GET /public - method: "POST" path: "/admin" deny: true # 拒绝敏感操作# Deny sensitive operations.

零信任网络的三大支柱

Three Pillars of Zero Trust Networking

1. API 级 (L7) 深层防御

1. L7 API-Aware Security

传统方案止步于端口。Cilium 能够理解 gRPC、HTTP、Kafka、DNS

示例:你可以通过 eBPF + Envoy 声明:“仅允许前端服务通过 GET 方法访问后端的 /v1/public/*,拒绝所有对 /admin 的尝试。”

Legacy stops at ports. Cilium understands gRPC, HTTP, Kafka, and DNS.

Example: Allow only GET requests to /v1/public/* from frontend, while blocking any attempts to /admin.

2. 内核级透明加密

2. Transparent Encryption

无需修改一行业务代码,无需笨重的 Sidecar。Cilium 在内核中集成 WireGuardIPsec

商业价值:为金融、医疗等严监管行业提供开箱即用的 mTLS 替代方案,实现数据传输的机密性与完整性。

Zero code changes, zero sidecars. Cilium integrates WireGuard or IPsec directly in the kernel.

ROI: Out-of-the-box mTLS alternative for highly regulated industries like FinTech and Healthcare.

3. 纵深防御:运行时安全

3. Defense-in-Depth: Runtime Security

利用 Cilium 的子项目 Tetragon,监控内核系统调用与文件访问。

实时阻断:一旦发现 Pod 试图非法读取敏感文件(如 /etc/shadow),内核会在毫秒内发送 sigkill,将威胁扼杀在萌芽状态。

Powered by Tetragon, monitoring kernel syscalls and file access.

Real-time Kill: If a Pod tries to illegally read /etc/shadow, the kernel sends a sigkill in milliseconds.

“IP 是临时的座位号,Identity 才是永久的身份证。Cilium 让安全边界真正推到了每个 Pod 的内核入口处。”

"IP is a temporary seat; Identity is a permanent passport. Cilium pushes the security boundary to the kernel entrance of every Pod."

08. 拨云见日:Hubble 全栈可观测性

08. Breaking the Fog: Hubble Full-Stack Observability

Hubble 是 Cilium 环境下的分布式网络和安全可观测性平台。它利用 eBPF 技术,以完全透明的方式提供对微服务及网络基础设施行为的深度洞察,实现了无需侵入应用代码(Sidecar-less)即可获取全量监控数据的跨越。

Hubble is a distributed observability platform for Cilium. Powered by eBPF, it provides deep, transparent insights into microservices and infrastructure behavior, achieving a leap to sidecar-less monitoring without modifying application code.

老式闭路电视 (Legacy tcpdump)

Legacy CCTV (Legacy tcpdump)

传统监控像是每辆私家车里的行车记录仪,只有装了记录仪的车才能提供数据(需要代码埋点)。在屏幕上,你只能看到模糊的人影移动(IP 流量),无法确定身份。

Legacy monitoring is like a dashcam in every car: only vehicles with sensors provide data. You see vague shadows (IP traffic) on screen but lack true identity and context.

智能监控天眼 (Hubble)

Smart AI Security (Hubble)

Hubble 是安装在城市每个路口和立交桥上的智能摄像头系统。无论车里有没有装记录仪,它都能一眼识别出那是消防车还是私家车(身份感知),并实时统计流量、通过速度及违章行为。

Hubble is an AI-powered smart city system. It identifies "Fire Trucks" vs. "Private Cars" (Identity-aware) at every intersection, tracking traffic flow, speed, and violations regardless of onboard tech.


SRE 实践:四大黄金信号的内核量化

SRE Practice: In-Kernel Golden Signals

在 SRE 方法论中,延迟、流量、错误、饱和度是衡量系统健康的基石。Hubble 通过内核级 eBPF 程序,直接提取并量化这些信号。

In SRE methodology, Latency, Traffic, Errors, and Saturation are the bedrock of health monitoring. Hubble quantifies these directly via in-kernel eBPF programs.

⏱️ 延迟 (Latency) Latency
  • 应用层追踪 (App-layer Tracking) App-layer Tracking 追踪 HTTP/gRPC 请求的 P95 和 P99 延迟,识别由于复杂业务逻辑导致的“长尾”性能瓶颈。 Track HTTP/gRPC P95/P99 latency to identify long-tail bottlenecks in complex business logic.
  • 内核网络路径耗时 (Network Path Timing) Network Path Timing 测量 TCP 往返时间 (RTT),帮助 SRE 区分延迟是出在“路”上(网络链路)还是“店”里(应用进程)。 Measure TCP RTT to distinguish if latency occurs on the "road" (link) or in the "shop" (app process).
📊 流量 (Traffic) Traffic
  • 基于身份的量化 (Identity-based Quant) Identity-based Quant 不再只是统计字节,而是精准监控 HTTP 每秒请求数 (RPS) 或 Kafka 主题的生产/消费频率。 Beyond byte counts: Monitor HTTP RPS or Kafka topic production/consumption rates with business context.
  • 多维服务拓扑 (Multi-dim Topology) Multi-dim Topology Hubble UI 自动生成实时服务依赖图,直观展示哪些微服务正在通信以及交互的频率。 Hubble UI auto-generates real-time service dependency maps, showing interaction frequency visually.
错误 (Errors) Errors
  • 协议级故障诊断 (Protocol Diagnostics) Protocol Diagnostics 自动提取 HTTP 4xx/5xx 或 gRPC PERMISSION_DENIED 等错误码,无需修改任何应用代码。 Extract HTTP 4xx/5xx or gRPC codes like PERMISSION_DENIED automatically without app instrumentation.
  • 透明丢包诊断 (Drop Diagnostics) Drop Diagnostics 明确报告数据包被丢弃的原因(如被特定的 Network Policy 拦截或端口冲突导致)。 Explicitly report packet drop reasons, such as specific Network Policy blocks or port conflicts.
📈 饱和度 (Saturation) Saturation
  • 网络路径饱和 (Path Saturation) Path Saturation 通过监控 TCP 重传率 (Retransmissions) 来预警潜在的网络拥塞或带宽瓶颈。 Monitor TCP Retransmission rates to alert for potential network congestion or bandwidth bottlenecks.
  • 内核平面饱和 (Datapath Saturation) Datapath Saturation 监控 eBPF Map Pressure。当策略哈希表接近 100% 时,预警数据平面的处理能力达到上限。 Monitor eBPF Map Pressure. Alert when policy hash tables near 100% capacity to prevent bottlenecks.

数字化取证:解析 Hubble Flow 的“身份指纹”

Forensics: Parsing the Identity Fingerprint

传统的 tcpdump 只能看到 IP 地址。而在 Hubble 的视野中,流量记录是一份带有身份的可审计凭证:

tcpdump sees IPs; Hubble sees Identity. Flow data becomes an auditable, identity-aware credential:

// 场景:来自联盟的 X-Wing 战机试图在 Death Star 着陆// Scenario: Alliance X-Wing attempting to land on Death Star { "source": { "pod_name": "xwing", "identity": 36770 }, "destination": { "pod_name": "deathstar", "identity": 15153 }, "verdict": "FORWARDED", "summary": "TCP Flags: SYN", "l7": { "http": { "method": "POST", "url": "/v1/request-landing" } } }
服务依赖拓扑Service Dependency Topology

基于身份识别的稳定架构图

Identity-Aware Stable Topology Map

网络监控告警Network Monitoring and Alerting

精准定位 DNS/L4/L7 故障

Precise Diagnostics for DNS/L4/L7 Failures

应用性能监控Application Performance Monitoring

0 侵入提取黄金信号指标

Zero-Instrumentation Golden Signal Extraction

安全可观测性Security Observability

拦截原因与横向移动追踪

Drop Reason & Lateral Movement Tracking

💡 技术总结:从“埋点”到“天眼”的跨越

💡 Tech Summary: From Instrumentation to Omniscience

Hubble 的最大优势在于其透明性。传统 SRE 监控通常需要开发者在代码中埋点或部署代理,而 Hubble 利用 eBPF 直接从 Linux 内核获取信号,实现了近乎零开销且 100% 覆盖的可观测性。这是实现 MTTR(平均故障恢复时间)缩短 40% 的核心驱动力。

Hubble's greatest strength is its transparency. Traditional SRE monitoring requires manual instrumentation or sidecars; Hubble leverages eBPF to pull signals directly from the kernel, achieving near-zero overhead and 100% coverage. This transparency is the key driver in reducing MTTR by 40%.

09. 技术前沿:Cilium 开启 AI 训练加速时代

09. Tech Frontier: Cilium Accelerating the AI Era

AI 革命已经到来。从训练大规模语言模型到提供实时推理,AI 工作负载对基础设施提出了前所未有的吞吐量与延迟要求。Cilium 在 2025 年迎来了发布十周年,正通过 eBPF 技术重新定义 AI 训练的网络底座。

The AI revolution is here. From training LLMs to real-time inference, AI workloads demand unprecedented throughput and low latency. Celebrating its 10th anniversary in 2025, Cilium is redefining AI networking foundations via eBPF.

一、 Cilium 最新技术进展 (v1.17 - v1.18)

I. Latest Technological Progress (v1.17 - v1.18)

高性能 Netkit 设备模型High-Perf Netkit Device Model

这是最具突破性的进展。Netkit 旨在绕过传统的 Linux Bridge 和 veth pair,为容器提供接近零开销的网络性能,使容器网络能够以接近宿主机的原生速度运行。

A breakthrough model. Netkit bypasses legacy Linux bridges and veth pairs, delivering near-zero overhead and host-native speeds for containers.

BGP 控制平面 v2 与路由聚合BGP Control Plane v2 & Aggregation

通过路由聚合(Route Aggregation),将分散的 Pod IP 汇总为大前缀,防止物理交换机耗尽 TCAM 内存,极大减少因 Pod 频繁扩缩容导致的路由震荡。

Implements Route Aggregation to prevent TCAM exhaustion on physical switches, mitigating route flapping caused by high-churn Pod scaling.

IPv6 优先与 BIG TCP 支持IPv6-First & BIG TCP Support

利用 BIG TCP 技术,内核能处理超大报文包头,使 IPv6 吞吐量提升 40-50%,同时显著降低高带宽应用下的 CPU 使用率。

With BIG TCP, the kernel handles massive packet headers, boosting IPv6 throughput by 40-50% while reducing CPU load.

💡 类比:算力工厂的“磁悬浮”进化

💡 Analogy: The "Maglev" Evolution of Compute Factories

如果把 AI 训练集群比作一个超高速自动化工厂,那么传统的网络路径就像是工厂里拥挤的窄道和需要人工签字的关卡(iptables 线性审查)。

Cilium 的进化: 就像给工厂铺设了磁悬浮轨道 (Netkit),并将关卡升级为人脸识别感应门 (eBPF O(1))。BIG TCP 则是将小货车升级为重型卡车,让 GPU 算力不再因为等待数据(原材料)而停工。

Think of an AI cluster as a high-speed factory. Legacy paths are narrow aisles with manual checkpoints (iptables).

Cilium's Evolution: Installs magnetic levitation tracks (Netkit) and replaces checkpoints with Face-ID gates (eBPF O(1)). BIG TCP upgrades small vans to heavy trucks, ensuring GPUs never stall waiting for data.

O(1) 消除抖动Zero Jitter
72x CPU 开销缩减CPU Reduction
+50% 吞吐量红利Throughput Gain

通往 2026:内核即服务

Towards 2026: Kernel-as-a-Service

Cilium 不仅仅是一个 CNI 插件,它是 AI 时代的神经中枢。通过将复杂的逻辑从应用层下沉至内核层,它为算力基础设施提供了一层透明、高性能且不可逾越的安全屏障。

Cilium is more than a CNI; it is the central nervous system for the AI era. By moving complexity from the app to the kernel, it provides a transparent, high-performance security shield for compute infra.

10. Cluster Mesh:跨越边界的“大一统”网络

10. Cluster Mesh: Unified Networking Across Boundaries

多集群架构已成为故障隔离、地理分布和扩容的必然选择。但随之而来的网络复杂性,如跨集群发现、策略同步和负载均衡,正成为新的架构瓶颈。

Multi-cluster is the choice for isolation and scale. However, networking complexities like cross-cluster discovery and policy sync have become the new architectural bottleneck.

🏝️

传统模式:孤立的群岛

Legacy: Isolated Islands

每个集群都是一座孤岛。跨集群通信必须依赖繁琐的 Ingress 网关或复杂的 VPN 隧道。这不仅增加了延迟,更让全局安全策略的统一下发成为幻想。

Each cluster is an island. Cross-cluster talk relies on heavy gateways or VPNs, increasing latency and making global security policies nearly impossible to manage.

🗺️

Cluster Mesh:联合的大陆

Cilium: The United Continent

打破集群间的物理边界。只要运行 Cilium,多个集群就能融合成一个巨大的逻辑网络。Pod 之间可以像在同一个数据中心一样直接通信,无论它们位于何处。

Breaking boundaries. Any cluster running Cilium joins a unified logical network. Pods communicate directly as if in the same DC, regardless of location.

🛡️ 高可用与容错

High Availability & Failover

支持多区域、多可用区部署。一旦某个集群因升级或故障不可用,流量会自动故障转移至其他集群,确保服务永不掉线。

Supports multi-region/AZ ops. If one cluster goes offline for upgrades or failure, it enables failover to others, ensuring zero downtime.

🔍 透明服务发现

Transparent Service Discovery

自动合并不同集群中同名、同空间的 Service 为全局服务 (Global Service)。应用无需感知集群位置,即可发现并调用目标服务。

Automatically merges identical services across clusters into a Global Service. Apps discover endpoints irrespective of where they reside.

⚡ 原生 IP 路由性能

Native IP Routing Performance

无需网关或代理,直接利用隧道或物理对等协议处理跨集群 Pod 通讯,实现接近原生的网络转发效率。

Handles cross-cluster Pod routing at native performance via tunneling or direct-routing, circumventing gateways or proxies.

🔐 统一策略执行

Uniform Policy Enforcement

将 Cilium 的 L3-L7 安全策略扩展至整个网格,确保无论集群数量多少,安全准则始终保持一致。

Extends L3-L7 policies across the entire mesh, ensuring a consistent security approach irrespective of the number of clusters.

实战背书:为什么架构师信任 Cluster Mesh?

Proven Value: Why Architects Trust Cluster Mesh?

“Cluster Mesh 让我们实现了跨本地数据中心和 AWS 的统一网络体验。应用间通信不再需要经过 Ingress,且具备了更强的灾备韧性。”

"Cluster Mesh provides a consistent networking experience across our DCs and AWS. Apps communicate directly without Ingress, creating disruption tolerance."

— Matheus Morais, IT Infrastructure Analyst, Sicredi
50%

存储成本降低 (Ecco)

Storage Cost Reduction

33%

关键操作延迟降低 (Ecco)

Latency Reduction

50k+

每秒请求处理 (Wildlife)

RPS in Production

No.1

CNCF 最成熟多集群技术

Most Mature Technology (CNCF)

11. 行业实战:为何全球巨头纷纷转向 Cilium?

11. Industry Stories: Why Giants are Turning to Cilium?

从 OpenAI 的训练集群到 Google Cloud 的原生集成,Cilium 正在成为支撑全球最顶级算力与云架构的“隐形神经系统”。

From OpenAI's massive training clusters to native Google Cloud integration, Cilium is becoming the "invisible nervous system" for the world's most advanced compute.

OpenAI: 支撑 AI 训练的神经系统 OpenAI: The AI Training Nervous System

在训练 GPT-4 等超大模型时,上万个 GPU 节点需要频繁进行 All-Reduce 梯度同步。哪怕网络中存在 1 毫秒的抖动,都会引发集群级的同步等待,导致数百万美元的算力浪费。

Training LLMs like GPT-4 requires frequent All-Reduce gradient synchronization across 10,000+ GPUs. Even a 1ms jitter can trigger cluster-wide stalls, resulting in millions of dollars of compute waste.

  • 极致带宽优化Bandwidth Optimization

    通过 eBPF 接管数据面,优化训练网络吞吐,支持 RoCE/RDMA 辅助加速。

    Uses eBPF to optimize datapath throughput, supporting RoCE/RDMA acceleration.

  • 全量拓扑透视Topology Visibility

    Hubble 提供了模型训练任务的精确流量拓扑,让 SRE 能够瞬间定位网络瓶颈。

    Hubble provides precise flow topology for training jobs, allowing SREs to pinpoint bottlenecks instantly.

架构师点评Architect's Insight

对于 OpenAI 来说,Cilium 不仅是一个网络组件,更是一个性能放大器,确保了极其昂贵的 GPU 算力不被传统协议栈阻塞。

For OpenAI, Cilium is more than networking—it's a performance multiplier that prevents expensive GPUs from being choked by legacy stacks.

Microsoft: Azure 的终极网络选型 Microsoft: Azure's Ultimate Selection

Microsoft 意识到大规模 VNET 的可观测性是企业级客户的核心痛点。Azure CNI Powered by Cilium 实现了公有云性能与原生 Kubernetes 体验的深度融合。

Microsoft recognized that observability for large-scale VNETs is a critical pain point. Azure CNI Powered by Cilium achieves seamless fusion of cloud performance and native K8s experience.

  • 原生性能执行Native Execution

    绕过传统网桥模式,性能比传统 Overlay 模式提升 30% 以上。

    Bypassing legacy bridges to deliver 30%+ better performance than traditional overlay modes.

  • 透明度革命Visibility Revolution

    云服务商无需在客户 Pod 中注入 Sidecar 即可提供详尽的 L7 流量视图。

    Cloud providers offer deep L7 views without injecting sidecars into customer pods.

架构师点评Architect's Insight

公有云巨头的“All-in”证明了基于 eBPF 的 Cilium 已经从“极客工具”进化为下一代云网络的工业标准

The "All-in" move by hyperscalers proves Cilium has evolved from a geek tool to the industrial standard for next-gen cloud networking.

Google Cloud: GKE Dataplane V2 Google Cloud: GKE Dataplane V2

Google 将 Cilium 作为 GKE 的默认数据平面(V2),核心动力源于对 FIPS 合规与内核级透明加密 的极致追求。

Google made Cilium the default for GKE Dataplane V2, driven by the need for FIPS compliance and kernel-level transparent encryption.

  • 硬核安全共识Hard-core Security

    验证了 eBPF 在多租户环境下实现“身份指纹”识别与强力访问控制的工业可靠性。

    Validated eBPF's reliability for identity fingerprinting and strong access control in multi-tenant environments.

架构师点评Architect's Insight

当所有云巨头都采用同一技术底座时,生态兼容性跨云统一安全策略就是架构师能拿到的最大红利。

When all major clouds adopt the same foundation, ecosystem compatibility and unified cross-cloud policy become the ultimate architect ROI.

12. Cilium 互动实验室:零起点架构演练

12. Cilium Interactive Lab: Zero-Start Architecture

基于真实的生产部署手册,点击下方阶段观察基础设施的“内核级”演进过程。

Based on real production guides, click the steps below to observe the "Kernel-level" evolution of infrastructure.

root@cilium-lab-node-1: ~
[系统] 实验室准备就绪。请从下方选择一个部署阶段... [System] Lab ready. Please select a deployment phase below...
集群状态 Cluster Status 未初始化Uninitialized
网络数据面 (Datapath) Networking Plane 标准内核栈Standard Stack
BGP 路由宣告 BGP Advertising 已禁用Disabled
Hubble 观测引擎 Hubble Engine 离线Offline

13. 落地路径建议

13. Implementation Path Recommendation

如何平滑地开启 Cilium 之旅,以及在 eBPF “黑盒”面前保持冷静的专家级工具箱。

How to ensure a smooth transition to Cilium and maintain control with an expert toolbox for the eBPF "black box."

1. 兼容性矩阵 (Compatibility)

1. Compatibility Matrix

维度Dimension推荐配置Requirement
内核版本Kernel 5.10+ (最优性能)5.10+ (Optimal)
云平台Cloud AWS, Azure, GCP, On-premAWS, Azure, GKE, Bare-metal
运行模式Mode Strict (全替换 Kube-proxy)Strict (Full KPR)
底层网络Underlay 原生路由 (BGP) 或 VXLANDirect Routing or VXLAN

💡 提示: 虽然 4.19 内核可运行,但 5.10+ 才能解锁大规模 $O(1)$ 查找和 Maglev 的完整威力。

💡 Tip: While 4.19 is supported, 5.10+ is required to unlock full $O(1)$ lookup and Maglev performance.

2. 迁移路径 (Migration Path)

2. Migration Strategy

1
基础底座:Kube-proxy 替换测试KPR Replacement Test

在非生产环境验证 kubeProxyReplacement=strict。通过移除 iptables 规则链,验证控制平面稳定性及 LB 响应。

Validate kubeProxyReplacement=strict in dev. Verify control plane stability by removing iptables overhead.

2
观测先行:灰度 L7 策略与画像Observability First: L7 Profiling

开启 Hubble 观测模式。在强制执行策略前,先进行“流量画像”,识别所有跨 Namespace 的 API 调用。

Enable Hubble in Audit Mode. Profile traffic to identify cross-namespace API calls before enforcing rules.

3
纵深防御:部署 Tetragon 集成Defense in Depth: Tetragon

从网络安全扩展到运行时安全。利用 Tetragon 强化敏感文件访问监控及内核级系统调用审计。

Extend security from network to runtime. Use Tetragon for sensitive file access and syscall auditing.

3. SRE 核心建议 (Best Practices)

3. SRE Best Practices

全栈可见性原则Full-stack Visibility

始终开启 Hubble。没有 Hubble 的 Cilium 就像盲目驾驶;它能缩短 40% 的平均故障处理时间 (MTTR)。

Always enable Hubble. Observability reduces MTTR by 40% in large-scale environments.

身份识别优先 (Identity-First)Identity-First Policy

彻底抛弃基于 IP 的防火墙思维。优先使用 K8s Labels/Identity 定义策略,确保策略随 Pod 漂移而自动跟随。

Ditch IP-based logic. Use Identity-aware policies to ensure rules follow workloads, not endpoints.

内核级加密 (Encryption)Kernel-level Encryption

利用内置的 WireGuard 加密保护东西向流量。相比 Sidecar 方案,这能减少 50% 以上的 CPU 加密损耗。

Use WireGuard for internal traffic. It cuts encryption CPU overhead by over 50% compared to sidecars.

🎯 总结:Cilium 选型决策准则

🎯 Summary: Architect's Decision Matrix

何时必须选择 Cilium? When to choose Cilium?
  • 节点数超过 500 或 Service 规则超过 5,000 条。
  • Nodes > 500 or Service rules > 5,000.
  • 追求 Sidecar-less 架构以节省 30% 以上的资源损耗。
  • Aiming for Sidecar-less Mesh to save 30% resources.
  • AI/ML 训练需要极低的网络抖动 (Jitter)。
  • AI/ML training requiring ultra-low Jitter.
如何确保落地成功? How to ensure success?
  • 内核版本锁定在 5.10+,开启 Strict 模式。
  • Lock Kernel at 5.10+; use Strict mode.
  • 先开启 Hubble 审计模式,不带策略运行两周。
  • Run Hubble in Audit mode for 2 weeks first.
  • 利用 Cluster Mesh 实现跨云的灾备韧性。
  • Leverage Cluster Mesh for multi-cloud resilience.

参考资料 (Reference)

Reference