Tetragon:
云原生安全的最后一道防线Tetragon:
The Final Line of
Defense for
Cloud-Native Security
在现代化的 IT 架构中,外围防火墙已不足及应对威胁。Tetragon 利用 eBPF 技术,将安全防御下沉至 Linux 内核最深处,实现真正的亚毫秒级实时阻断。 In modern IT architectures, perimeter firewalls are no longer enough. Tetragon utilizes eBPF technology to embed security deep into the Linux kernel for true real-time enforcement.
阅读本白皮书后,你将理解如何利用 eBPF 将防御延迟从分钟级降低至微秒级,并掌握在不侵入应用代码的前提下实现“零信任运行时”的实战方法。本文由Rao Weibo借助Gemini完成。 After reading this white paper, you will understand how to use eBPF to reduce defense latency from minutes to microseconds, and master the practical method of achieving "Zero Trust Runtime" without intruding into application code. This blog was created by Rao Weibo through Gemini.
[ Case Zero: 建立危机感 ][ Case Zero: Establishing Urgency ]
当防御“失效”:Dark-Chain 危机模拟When Defense Fails: Dark-Chain Crisis Simulation
你是一名 SecOps 运维负责人。你的公司一个月前上线了一个新的线上客户交互平台“My Dear Customer”。得益于出色的用户体验,目前该平台日活用户高达 100 万,公司内部正沉浸在业务欣欣向荣的喜悦中。You are a SecOps lead. Your company launched a new platform "My Dear Customer" a month ago. With 1 million daily active users, the company is celebrating its booming business.
然而,在某一个不被人知的角落里,潜藏着一个精心构造的漏洞。就在你的公司准备为累积用户量突破 500 万大关举行庆祝香槟会时,XZ Utils 供应链漏洞被一个代号为“Dark-Chain”的攻击小组激活。However, in a hidden corner, a sophisticated vulnerability lurks. As you prepare to celebrate 5 million users, the XZ Utils supply chain flaw is activated by an attack group codenamed "Dark-Chain".
他们的目标很明确:通过加密你那引以为傲的 500 万用户核心数据,对公司进行勒索,并在暗网公开叫价。此时,你手中的传统防火墙与扫描器,正处于一种诡异的“静默”状态...Their goal is clear: encrypting your core data for ransom and auctioning it on the dark web. Meanwhile, your traditional firewalls and scanners remain eerily silent...
Dark-Chain 攻击全景图 (MITRE ATT&CK Path)Dark-Chain Attack Landscape (MITRE ATT&CK Path)
为什么传统安全防御在 Dark-Chain 面前全线溃败? Why Traditional Defenses Fail Against Dark-Chain?
| 攻击阶段Stage | MITRE 技术MITRE Technique | 攻击动作描述Attack Description | 传统防御盲区 (Why it Fails)Traditional Blindspot |
|---|---|---|---|
| 初始访问Access | T1195.002 | 恶意代码通过 dlopen 在内存运行,后门隐藏在合法加密流中。Malicious code runs in memory via dlopen; backdoors hide
in encrypted streams. |
WAF/防火墙WAF/Firewall 无法解密合法的 TLS/SSH 内部指令。cannot decrypt legitimate internal TLS/SSH commands. |
| 提权持久化Persistence | T1014 | 修改 Syscall Table,动态剔除恶意进程在 ps/top 中的显示。Modifying Syscall Table to hide malicious processes from
ps/top. |
主机 IDSHost IDS 在被篡改的内核面前看到的是“虚假的安全”。sees "fake security" when the kernel itself is compromised. |
| 凭据获取Credentials | T1003.001 | 高权限进程合法读取 /etc/shadow 或内存中的 Session 密钥。Privileged processes "legally" read /etc/shadow or
Session keys. |
权限模型RBAC Model 看起来正常。审计日志无法区分“正常备份”与“恶意转储”。appears normal; logs cannot distinguish "normal backups" from "malicious dumps". |
| 防御逃逸Evasion | T1070 | 实时擦除 auth.log 记录,并利用 Rootkit 拦截日志写入动作。Erasing auth.log in real-time and intercepting log writes
via Rootkit. |
日志审计 (SIEM)SIEM Audit 只能分析"已发生的记录",无法感知"被抹除的空白"。only analyzes "existing records", missing the "erased white space." |
| 横向移动Lateral | T1021 | 使用合法凭据建立加密隧道,在内网不同安全域之间渗透。Using valid credentials to build encrypted tunnels for internal pivoting. | 内网隔离Micro-seg 对于已获取合法身份的横向移动缺乏细粒度的行为识别。lacks fine-grained behavior recognition for hijacked identities. |
运行时安全的核心理念是:不看你“长什么样”(静态特征),而看你“在做什么”(运行时行为)。Runtime security focus: Don't look at "what it is" (signatures), look at "what it does" (behavior).
[ 1/4: 建立共识 ][ 1/4: Building Consensus ]
什么是运行时安全? What is Runtime Security?
运行时安全是指在应用程序执行过程中对其进行监控和防护。它关注的是“现在正在发生什么”,而不是代码在构建时是什么样子。 Runtime security refers to monitoring and protecting applications during execution. It focuses on "what is happening now" rather than what the code looked like at build time.
即使镜像扫描和静态配置分析(Shift Left)是完美的,也无法覆盖 0-day 漏洞利用、供应链污染或运行时的特权提升。运行时安全提供 “Shield Right”,在恶意行为发生的瞬间构建最后一道防线。 Even if image scanning and static analysis (Shift Left) are perfect, they cannot cover 0-day exploits, supply chain attacks, or runtime privilege escalation. Runtime security provides "Shield Right," building the final line of defense the moment malicious behavior occurs.
将安全检查前移到开发阶段。通过代码扫描、镜像审计发现已知缺陷,防患于未然。 Moving security checks forward to the development stage. Finding known defects through code scanning and image auditing to prevent trouble early.
在生产运行阶段构建实时屏障。应对那些绕过开发阶段检查的未知威胁(0-day)或内部破坏。 Building real-time barriers in production to address unknown threats (0-days) or internal compromises that bypass dev-phase checks.
追踪二进制文件的全生命周期:谁启动了程序?是否派生了非法的子进程或 Shell?Tracking binary lifecycles: Who started the program? Did it spawn illegal child processes?
监控关键路径:敏感配置文件(如 /etc/shadow)是否被未授权篡改或非法读取?Monitoring critical paths: Is /etc/shadow being tampered with or read illegally?
感知连接意图:进程是否建立了非预期的外连或尝试访问恶意的 C2 域名?Sensing intent: Is the process making unexpected outbound connections?
识别越权行为:普通用户进程是否突然提权变成 Root?是否发生了异常的 Capability 变更?Identifying hijacking: Has a normal process suddenly elevated to Root?
[ 2/4: 工具解构 ][ 2/4: Tool Deconstruction ]
什么是 Tetragon?What is Tetragon?
Tetragon 是 Cilium 开源社区下的核心项目,它利用 eBPF 技术提供深度的内核级安全观测。不同于传统的安全工具,它能感知 Kubernetes 身份(Pod, Namespace, Labels),并在恶意行为发生的瞬间在内核态将其阻断。传统安全依赖用户态 Agent,容易被绕过。Tetragon 挂载在内核执行路径的核心点(如 LSM 或系统调用入口),提供“不可绕过”的审计一致性。 Tetragon is a core project under the Cilium open-source community, utilizing eBPF technology to provide deep kernel-level security observability. Unlike traditional security tools, it is aware of Kubernetes identities (Pod, Namespace, Labels) and blocks malicious behavior instantly in the kernel. Traditional security relies on user-space agents, which are easily bypassed. Tetragon is mounted at core execution points (such as LSM or syscall entry) to provide "unbypassable" audit consistency.
记住这个概念:Tetragon 不仅仅是“看到”威胁,而是在威胁发生之前由内核直接强制中断。 Remember this: Tetragon doesn't just "see" threats—it is forcibly interrupted by the kernel before the threat occurs.
🔍 核心痛点:为什么仅有内核观测是不够的? 🔍 Core Painpoint: Why Kernel Observability Alone Isn't Enough?
传统的内核安全工具(如 auditd)存在一个致命的“认知断层”:内核只认识 PID 2550 或 十六进制内存地址。但在云原生环境下,IP 是漂移的,PID 是会被迅速回收的。如果安全日志只告诉你“PID 2550 删除了文件”,你根本无法判断这是来自哪个业务 Pod。
Traditional kernel security tools (like auditd) suffer from a "Context Gap": the kernel only knows PID 2550 or Hex memory addresses. In cloud-native environments, IPs drift and PIDs are recycled. Knowing "PID 2550 deleted a file" is useless without knowing which Pod it belongs to.
Tetragon 的解法: 它在内核采集数据的瞬间,就利用 BPF Maps 将这些枯燥的内核 ID 翻译成了 K8s 的身份标签。 Tetragon's Solution: It uses BPF Maps to translate dry kernel IDs into K8s identity labels at the exact moment of data collection.
为什么身份感知 (Identity-Aware) 至关重要?Why is Identity-Awareness Crucial?
在云原生环境中,IP 地址是转瞬即逝的,PID 在不同容器间可能冲突。Tetragon 的核心能力在于它能实时感知上下文。 In cloud-native environments, IPs are ephemeral and PIDs may conflict between different containers. Tetragon's strength is its real-time context awareness.
- task_struct 关联:task_struct Association: 直接从内核进程结构体获取 Cgroup 信息。Directly retrieve Cgroup info from kernel process structures.
- 零开销映射:Zero-Overhead Mapping: 利用 eBPF Map 将 Cgroup ID 毫秒级映射到 K8s Pod 标签。Mapping Cgroup IDs to K8s Pod labels via eBPF Maps in milliseconds.
- 审计一致性:Audit Consistency: 日志中自带 Namespace/ServiceAccount,无需事后从日志中心进行复杂的 Trace ID 关联。Logs include Namespace/ServiceAccount info, eliminating complex post-event Trace ID correlation.
PID: 2550
SYS_CALL: connect()
DEST: 185.151.x.x
POD: payment-api-v2-7f8c
NAMESPACE: production
SERVICE_ACCOUNT: payment-service
LABELS: env=prod, app=payment, version=2.1
数据产出:结构化身份感知日志 The Output: Identity-Aware Structured Logs
身份感知不仅仅是视觉点缀,它体现在每一条审计日志中。Tetragon 自动将内核系统调用与 Kubernetes 元数据聚合,消除了事后手动关联的痛苦。 Identity-awareness is more than visual flair; it's embedded in every audit log. Tetragon automatically aggregates kernel syscalls with Kubernetes metadata, eliminating the pain of manual post-event correlation.
无需 IP-to-Pod 映射,日志自带 Namespace 与 Label 标签。 No manual IP-to-Pod mapping; logs come pre-enriched with metadata.
精确识别执行路径,结合 SBOM 确保只有受信任的程序在运行。 Identify paths precisely; integrate with SBOM to ensure trusted execution.
利用 exec_id 追溯父子进程关系,构建完整的攻击链路。
Use exec_id to track parent-child relations and build attack chains.
TracingPolicy 是一种 CRD(自定义资源声明),它定义了关键要素: TracingPolicy is a CRD (Custom Resource Definition) that defines:
特定的系统调用(Syscalls)或内核函数及其参数。例如:监控 execve 启动 shell
的行为。
Specific syscalls or kernel functions and their arguments. E.g.,
monitoring execve to detect shell launches.
[新特性New]
支持对系统调用参数的深度匹配。例如:仅监控对 /etc/ 目录下以 .conf
结尾文件的写操作,或者匹配特定的环境变量。
Supports deep matching of syscall arguments. E.g., only monitoring write
operations to .conf files in /etc/,or match specific environment variables.
观测记录、发送信号(如 SIGKILL)或覆盖返回值(阻止恶意写入生效)。 Observation logging, sending signals (e.g., SIGKILL), or overriding return values to block malicious writes.
场景关联:Scenario Correlation:
此时攻击者正在尝试执行 cat /etc/shadow。如果没有
TracingPolicy,这个行为会淹没在海量的系统日志中。Tetragon 的价值在于它能精确锁定这个“瞬间”。
The attacker is attempting to execute cat /etc/shadow. Without
TracingPolicy, this would be buried in massive logs. Tetragon pinpoints this exact
"moment."
专家视角 (Expert Perspective):Expert Perspective:
Tetragon 使用 Kprobes 和 Tracepoints 拦截内核事件。通过 BPF Maps 将内核数据高效传递至用户态的 Tetragon Agent 及其导出的 JSON 日志。这种架构确保了在微秒级别捕获并处理威胁,而不产生明显的业务抖动。 Tetragon uses Kprobes and Tracepoints to intercept kernel events. It efficiently passes kernel data to the Tetragon Agent via BPF Maps. This architecture ensures threat capture and processing at microsecond levels without business jitter.
传统安全:摄像头Traditional Security: Camera
以前的安全工具(如日志分析)就像是摄像头。它能录下小偷行窃的过程,但当你发现录像时,小偷早就跑了,财产也损失了。这是“异步检测”。 Previous security tools (like log analysis) are like cameras. They record the theft, but by the time you see the footage, the thief is gone and the assets are lost. This is "asynchronous detection."
Sidecar 模式:安检门Sidecar Mode: Security Gate
Sidecar 代理就像在每个房间门口设安检。虽然能阻拦,但每个人进出都要排队,导致大楼(服务器)拥堵,性能下降,且容易被 绕过。 Sidecar proxies are like security gates at every door. While they can block access, everyone must wait in line, leading to server congestion, performance degradation, and potential bypasses.
Tetragon:智能免疫Tetragon: Smart Immunity
Tetragon 就像人体免疫细胞,直接驻扎在血液(内核)中。只要识别到病毒(恶意进程),无需等待大脑指令,直接吞噬(Kill)病毒。这是“同步阻断”。 Tetragon is like immune cells living in the blood (kernel). Once a virus (malicious process) is identified, it kills it instantly without waiting for brain instructions. This is "synchronous enforcement."
理解 Tetragon 的捷径是:它是“内核内嵌入的逻辑判断器”。它将 K8s 的“身份”与 syscall 的“参数”组合成一个布尔逻辑,只有“真(允许)”才能执行,“假(拒绝)”则立即物理终止。 The mental model for Tetragon: it is a "logic decider embedded in the kernel." It combines K8s "identity" with syscall "arguments" into Boolean logic: only "True" executes, "False" results in immediate physical termination.
内核级可观测性Kernel-level Observability
利用 eBPF (Extended Berkeley Packet Filter),Tetragon 能够在不修改内核源码、不加载内核模块的情况下,安全地动态追踪内核函数(kprobes, tracepoints)。 Leveraging eBPF, Tetragon can safely and dynamically trace kernel functions (kprobes, tracepoints) without modifying the kernel source or loading kernel modules.
零开销数据上下文Zero-Overhead Data Context
传统 eBPF 工具只输出 PID/IP。Tetragon 维护了一个内核内的状态表,能够即时将底层的系统调用映射到 Kubernetes 的 Pod, Namespace, Service 等高层元数据。 Traditional eBPF tools only output PID/IP. Tetragon maintains an in-kernel state table that instantly maps low-level syscalls to K8s Pods, Namespaces, and Services.
In-Kernel EnforcementIn-Kernel Enforcement
Tetragon 不仅仅是“观察者”。它使用 bpf_send_signal 在恶意 syscall
完成前终止进程,杜绝了 TOCTOU 攻击。
Tetragon is more than an "observer." It uses
bpf_send_signal to terminate processes before a malicious syscall
completes, eliminating TOCTOU attacks.
Time-of-Check to Time-of-Use。指攻击者在安全检查后与实际执行前的时间差内,替换了原本检查通过的资源(如文件链接),从而绕过用户态安全工具。 Time-of-Check to Time-of-Use. This refers to the time gap between security check and execution where an attacker replaces a validated resource to bypass security tools.
[ 3/4: 深度解析 ][ 3/4: Deep Dive ]
技术架构:Tetragon 的运行逻辑 Technical Architecture: Tetragon's Logic
下图展示了 Tetragon 如何利用 kprobes 进行深度观测,并利用内核 LSM BPF 实现不可绕过的安全阻断。 The diagram below illustrates how Tetragon utilizes kprobes for deep observation and leverages kernel LSM BPF for unbypassable security enforcement.
请观察数据是如何从用户态请求进入内核钩子,并在策略引擎的判断下触发实时 SIGKILL 的。这个闭环在微秒内完成,不经过复杂的应用堆栈。 Observe how data flows from user-space requests into kernel hooks, triggering real-time SIGKILL via the policy engine. This loop completes in microseconds, bypassing complex application stacks.
架构设计:智能传感器与指挥中心 Architecture: Smart Sensors & Command Center
Isovalent Tetragon 是一个专为 Kubernetes 设计的基于 eBPF 的安全可观测性工具。其架构强调内核优先 (kernel-first) 原则,采用数据平面与控制平面分离的先进架构,将安全防御的重心下沉至 Linux 内核,以实现亚毫秒级的防御。下面详细说明其分层工作模式:Isovalent Tetragon is an eBPF-based security observability tool designed specifically for Kubernetes. Its architecture emphasizes the kernel-first principle, employs an advanced decoupled architecture, embedding core logic into the kernel to achieve sub-millisecond defense. Below is a detailed description of its layered operational model:
内核态:智能传感器Kernel: Smart Sensors
Tetragon 的 eBPF 程序作为“内核内置卫士”直接植入内核执行路径,遵循“内核优先” (Kernel-First) 原则以实现亚毫秒级的防御。
- 高效过滤:Efficient Filtering: 既然 90% 的过滤发生在内核态,传感器利用 kprobes(监控 execve)、tracepoints(捕获调度器事件)和 LSM BPF(阻断敏感文件访问)直接观察系统,仅将匹配策略的“高价值信号”推送到用户态。这种机制消除了昂贵的边界跨越开销。
- 策略本地化:Localized Policy: 安全策略(TracingPolicy)被加载至内核 BPF Maps 中。内核程序利用这些原子性状态存储表维护临时状态,即使在极高性能压力下也能确保审计一致性,并实现毫秒级的逻辑判断。
-
立即阻断:Immediate Enforcement:
这是真正的“同步阻断”机制。利用
bpf_send_signal辅助函数,在恶意系统调用(如非法写文件、特权提升)完成之前,直接在内核上下文触发 SIGKILL 信号终止进程,物理上杜绝了利用时间差绕过的 TOCTOU(检查与使用时间)攻击。
用户态:指挥中心User: Command Center
Tetragon Agent 通常以 DaemonSet 形式运行在用户空间,充当大脑角色,负责复杂的逻辑管理、元数据编排与生态集成。
- 身份关联:Identity Association: 实时订阅 Kubernetes API,利用内核中的 Cgroup 信息在毫秒级将底层 PID 映射为 Pod 标签、Namespace 及服务身份。确保即使在动态基础设施中,安全事件依然具有清晰的上下文身份。
- 策略编排:Policy Orchestration: 将 YAML 声明式策略翻译成内核字节码。在分发前,Verifier(验证器)会对策略进行严格的静态安全分析,确保其内存安全、无死循环且永不引发内核崩溃(Panic),保证了生产环境的绝对稳定。
- 结构化输出:Structured Output: 通过 gRPC 流、适配 SIEM 的 JSON Logs(含完整参数、Pod 标签、容器镜像指纹)或 Prometheus 指标导出遥测数据。这使得高级行为分析能够生成高优先级警报。
内核防御层级 (Protection Rings)Kernel Protection Rings
* 内核层级越深,防御的不可绕过性(Unbypassability)越强 * Deeper kernel levels provide stronger unbypassability
这是 Tetragon 实现实时阻断的关键。当内核检测到策略违规时,eBPF 程序会调用该 Helper 函数。它不仅是发送一个异步信号,而是在该进程返回用户态执行下一条指令前,确保信号挂载到进程的任务结构体(task_struct)中。这种同步属性杜绝了任何“执行间隙”。This is the key to Tetragon's real-time enforcement. When policy violation is detected, the eBPF program calls this Helper function. It ensures the signal is attached to the (task_struct) before the process returns to user space. This synchronization property eliminates any "execution gaps".
决胜时刻:Moment of
Decision:
正是靠此机制,Dark-Chain 尝试删除 auth.log
的指令还没来得及被文件系统执行,内核就直接从底层“拔掉了电源(SIGKILL)”。Thanks to this mechanism,
before Dark-Chain could delete auth.log by the file system, the kernel simply "pulled
the plug (SIGKILL)" from below.
进入内核前的“严格安检”。它在代码运行前进行形式化验证,确保程序:A "strict security check" before entering the kernel. It performs formal verification to ensure the program:
- 永不宕机:严禁无限循环,保证在可预测时间内结束。Never Crashes: No infinite loops, ensuring termination.
- 内存安全:严禁非法读取非授权内存区域。Memory Safety: No unauthorized memory access.
- 零崩盘风险:Verifier 不通过,程序根本无法加载。Zero Risk: If verification fails, the program won't load.
不同于早期的 kprobes 拦截(可能被 TOCTOU 绕过),Tetragon 利用了 LSM BPF。LSM 钩子位于内核权限检查的最底层。只有通过 LSM 检查的请求才能获得资源句柄,Tetragon 挂载于此,提供了物理层面的不可绕过性。Unlike early kprobes (vulnerable to TOCTOU), Tetragon uses LSM BPF. Hooks at the lowest level of permission checks ensure requests only get handles if they pass. Tetragon is hooked here providing physical unbypassability.
为什么 Tetragon 对生产环境是安全的?Why is Tetragon Safe for Production?
不同于传统的运行时工具,Tetragon 专为高性能关键业务设计。eBPF 的验证器 (Verifier) 确保加载的程序绝不会死循环或挂死内核,而内核态执行规避了海量的 CPU 指令周期。Unlike traditional runtime tools, Tetragon is designed for high-performance mission-critical business. The eBPF Verifier ensures loaded programs never infinite loop or hang the kernel, while in-kernel execution avoids massive CPU instruction cycles.
- 规避上下文切换:Avoid Context Switching: 过滤逻辑直接在内核执行,仅需发送极少量的事件到用户态。Filtering logic executes directly in-kernel, sending only minimal events to user-space.
- 数据零拷贝:Zero-Copy Data: 探测点与策略引擎共享 BPF Maps,无需昂贵的数据序列化与复制。Probes and policy engines share BPF Maps, eliminating expensive data serialization and copying.
基于思科内部大规模 Kubernetes 集群(1000+ 节点)的实战数据:Based on real-world data from Cisco's large-scale Kubernetes clusters (1000+ nodes):
由于 Tetragon 所有的过滤逻辑都发生在内核态,只有匹配策略的“高价值信号”才会跨过内核边界进入用户态,这从物理层面保证了在高并发 IO 场景下系统依然稳健。Since all Tetragon filtering logic happens in-kernel, only "high-value signals" matching the policy cross the kernel boundary. This physically ensures system stability even in high-concurrency IO scenarios.
深度上下文与进程溯源Deep Context and Process Ancestry
在安全审计中,仅知道“谁被杀了”是不够的,你必须知道它“从哪儿来”。Tetragon 通过 eBPF 维护了内核态的进程状态表,即使中间父进程已退出,依然能完整回溯整个攻击路径。In security auditing, knowing "who was killed" is not enough; you must know "where they came from." Tetragon maintains an in-kernel process state table via eBPF, enabling a complete backtrack of the attack path even if intermediate parent processes have exited.
解决传统日志碎片化难题Solving the Challenge of Fragmented Traditional Logs
传统工具依赖进程事件流的拼接。如果一个中间层脚本在攻击发生前就退出,审计日志会失去关联。Tetragon 利用 BPF Map 保证了证据链的完整性与不可伪造性。Traditional tools rely on the splicing of process event streams. If an intermediate script exits before the attack occurs, audit logs lose the connection. Tetragon utilizes BPF Maps to guarantee the integrity and non-forgeability of the chain of custody.
- ✅ 内核态持久化:In-Kernel Persistence: 进程信息不受退出的影响。Process information is unaffected by process termination.
- ✅ 身份识别联动:Identity Association: 自动关联 K8s Pod 标签。Automatically correlates with K8s Pod labels.
- ✅ 不可伪造:Non-forgeability: 数据来源于内核
task_struct遍历。Data is sourced directly from kerneltask_structtraversal.
内核态进程状态机 (Process State Machine)In-Kernel Process State Machine
为什么不只是追踪 PID?Why not just track PIDs?
在高并发环境中,PID 会被迅速回收利用(PID Recycling)。如果安全工具只看 PID,就会发生“误杀”或“证据错位”。In high-concurrency environments, PIDs are recycled quickly (PID Recycling). If security tools only look at PIDs, "false positives" or "misaligned evidence" will occur.
-
唯一身份标识:Unique Identity:
Tetragon 通过挂载
sched_process_fork,将 PID 与进程创建时间(Start Time)和 Cgroup ID 绑定,生成内核级全局唯一 ID。By hookingsched_process_fork, Tetragon binds the PID with the Start Time and Cgroup ID to generate a kernel-level globally unique ID. -
状态持久化:State Persistence:
利用 eBPF Map 维护一个活跃进程表。即使中间父进程退出,该条目依然保留其祖先关系,直到进程彻底
exit。Utilizing eBPF Maps to maintain an active process table. Even if intermediate parents exit, the entry retains the ancestry until the process fully exits.
应用场景与防御价值 (Value)Application Scenarios & Defensive Value
将安全策略下沉到内核,构建不可绕过的运行时防御边界。Embedding security policies into the kernel to build unbypassable runtime defensive boundaries.
- • 文件篡改:File Tampering: 监控
/etc/passwd,/etc/shadow等敏感配置文件的write行为。Monitoringwritebehavior on sensitive configuration files like/etc/passwdand/etc/shadow. - • 反弹 Shell:Reverse Shell: 监控
connect调用,禁止非白名单地址的外连企图。Monitoringconnectcalls to prohibit outbound connection attempts to non-whitelisted addresses. - • 容器逃逸:Container Escape: 检测异常的 Namespace 挂载行为,强制执行边界隔离。Detecting abnormal Namespace mount behaviors to enforce boundary isolation.
实时检测容器内执行 curl, apt-get, wget 等违规运维命令。通过行为白名单确保生产环境的“不可变性(Immutability)”,满足 SOC2/PCI-DSS 合规性审计。Real-time detection of unauthorized commands like curl, apt-get, and wget inside containers. Ensuring production "Immutability" via behavior whitelisting to satisfy SOC2/PCI-DSS compliance audits.
通过监控 TCP SYN 重传定位网络延迟瓶颈。基于 Kubernetes 身份执行“最小权限访问”策略,确保进程只能访问声明的外部域名。Locating network latency bottlenecks via TCP SYN retransmission monitoring. Executing "Least Privilege Access" based on Kubernetes identity to ensure processes only access declared external domains.
入侵发生后,利用 Tetragon 的深度祖先进程树回溯攻击者的横向移动轨迹、指令执行序列及环境变量指纹,提供法庭级证据链。Post-intrusion, leveraging Tetragon's deep process ancestry tree to backtrack lateral movement, instruction execution sequences, and environment variable fingerprints, providing forensic-grade evidence chains.
[新增强Enhanced] 与 SBOM (软件物料清单) 深度联动。确保 Pod 运行时的二进制文件指纹与镜像 SBOM 记录 100% 匹配,预防供应链投毒利用。Deep linkage with SBOM (Software Bill of Materials). Ensuring Pod runtime binary fingerprints match image SBOM records 100% to prevent supply chain poisoning exploits.
基于 Tetragon 的解析引擎,无需 Sidecar 即可在内核层感知 HTTP/DNS/TLS 元数据。这使得安全团队可以跨越“加密黑盒”,识别内网中隐藏的指令传输或非法 C2 域名访问。 Leveraging Tetragon's engine, achieve L7 visibility (HTTP/DNS/TLS) at the kernel level without sidecars. Identify hidden commands or illegal C2 domain access across encrypted tunnels.
Tetragon 不仅仅是一个阻断工具,更是现代 SOC 的核心数据源。它将内核原始信号转化为结构化的 JSON 事件流。Tetragon is not just an enforcement tool but a core data source for modern SOCs. It transforms raw kernel signals into structured JSON event streams.
eBPF Signals
Identity Enrichment
Structured Telemetry
Alerting & Visualization
策略发现:从观测到防御的自动化转化Policy Discovery: Automated Transition from Observation to Defense
安全策略不应靠手动猜测。Tetragon 可以通过观测业务在测试环境的正常行为,自动发现并提炼安全策略。Security policies should not rely on guesswork. Tetragon can automatically discover and refine security policies by observing normal workload behavior in test environments.
在复杂的生产环境中,我们需要更精准的防御. 例如:只允许特定的 Pod 访问特定的文件路径。In complex production environments, we need more precise defense. For example: allowing only specific Pods to access specific file paths.
[ 2025 战略前瞻 ][ 2025 Strategic Frontier ]
前沿动态:从“手动防御”走向“智能免疫” Frontier Trends: From Manual Defense to Smart Immunity
2025 年,Tetragon 的演进超越了简单的挂钩与阻断。我们正在构建一个具备自我进化能力的运行时安全中枢,消除 YAML 疲劳,并实现内核态的智能降噪。 In 2025, Tetragon's evolution goes beyond simple hooking and blocking. We are building a self-evolving runtime security hub that eliminates YAML fatigue and implements in-kernel intelligent noise reduction.
告别繁琐的手写 YAML。Tetragon 2025 引入 AI 辅助生成引擎,通过自然语言描述安全意图(如“保护我的支付服务免受反弹 Shell 攻击”),自动转换为精准的内核态 TracingPolicy。 Say goodbye to tedious YAML hand-coding. Tetragon 2025 introduces an AI-assisted generation engine. Describe security intent in natural language, and it automatically converts to precise in-kernel TracingPolicy.
提升 SRE 的幸福感。利用 BPF Map 在数据源头进行聚合,将重复的、高频的无害事件在内核态进行压缩,使上报至用户态的日志量减少了 95% 以上。 Empowering SREs. Utilizing BPF Maps for aggregation at the data source, high-frequency harmless events are compressed in-kernel, reducing log volume reported to user-space by over 95%.
从“身份”到“指纹”的跨越。Tetragon 在内核态实时提取执行程序的哈希,并与镜像 SBOM 记录进行二进制比对,确保没有任何供应链投毒后的“变异代码”能运行。 The leap from "Identity" to "Fingerprint." Tetragon extracts real-time execution hashes in-kernel and performs binary comparison with SBOM records to ensure no "mutated code" from supply chain attacks can run.
Tetragon 的未来不仅在于它能捕获什么,而是在于它如何降低安全团队的认知负载。通过 AI 辅助和内核态聚合,我们将从“海量警报的监控者”转变为“安全意图的编排者”。 Tetragon's future lies not just in what it can capture, but in how it reduces cognitive load. Through AI assistance and in-kernel aggregation, we pivot from "monitors of alerts" to "orchestrators of security intent."
[ 4/4: 实战演练 ][ 4/4: Hands-on Lab ]
实时攻防演练实验室Real-Time Threat Defense Lab
沉浸式体验内核级威胁阻断。请在下方实验场景中点击“启动演练”。Immersive experience of kernel-level threat enforcement. Click "Start Lab" in the scenarios below.
“Hear and you forget; see and you remember; DO and you understand.”
沉浸式体验内核级威胁阻断。请按顺序执行攻击模拟,观察 Tetragon 的实时防御动作。Experience kernel-level threat blocking. Execute attack simulations in order to observe Tetragon's real-time enforcement.
Tetragon 部署与策略下发Tetragon Deployment & Policy Distribution
在开始防御之前,我们需要将 Tetragon 注入到基础设施中。Tetragon 支持多种环境,能够无缝感知云原生上下文。Before enforcement begins, we need to inject Tetragon into the infrastructure. Tetragon supports diverse environments and seamlessly senses cloud-native contexts.
Kubernetes 部署:Kubernetes Deployment: 通过 Helm 仓库安装。Tetragon 会以 DaemonSet 形式运行在每个节点上,自动挂载 eBPF 程序并监听 API Server。Installed via Helm. Tetragon runs as a DaemonSet on each node, automatically mounting eBPF programs and listening to the API Server.
初始访问:供应链后门拦截Initial Access: Supply Chain Backdoor Interception
攻击原理:IFUNC 劫持。Attack Principle: IFUNC Hijacking. Dark-Chain 利用 XZ 漏洞,在 SSH 认证过程中通过内存钩子绕过 RSA 检查。由于攻击不产生新进程,传统 Agent 无法感知。Dark-Chain exploits the XZ vulnerability to bypass RSA checks via memory hooks during SSH authentication. Since the attack creates no new processes, traditional agents remain blind.
Tetragon 对策:Tetragon Response:
监控 sshd 进程的异常子进程派生。只要后门尝试派生 Shell,Tetragon 在内核态感知该行为并立即执行 SIGKILL。Monitoring abnormal child process spawning of the sshd process. As soon as the backdoor attempts to spawn a shell, Tetragon detects it in-kernel and executes SIGKILL instantly.
此策略的价值不仅在于阻断,更在于其产生的 JSON 具备“确定性语义”。相比传统 EDR 产生的海量进程树日志,该日志直接标记为 SIGKILL,使得 SOC 团队能瞬间区分“合法的管理操作”与“供应链攻击触发的后门行为”。
The value here isn't just enforcement; it's "Deterministic Semantics." Unlike verbose EDR process trees, these logs explicitly flag a SIGKILL, enabling SOC teams to instantly distinguish legitimate admin tasks from supply-chain backdoor triggers.
持久化:Rootkit 植入阻断Persistence: Rootkit Implant Interception
攻击原理:劫持 ld.so.preload。Attack Principle: Hijacking ld.so.preload.
攻击者尝试修改系统预加载路径,将恶意 .so 文件注入所有进程空间。这是 Dark-Chain 实现隐身的关键步骤。Attackers attempt to modify the system preload path to inject malicious .so files into all process spaces. This is a critical step for Dark-Chain to achieve invisibility.
Tetragon 对策:Tetragon Response:
锁定 /etc/ld.so.preload 的写权限。任何非授权进程尝试 open/write 该文件都会触发 SIGKILL。Locking write access to /etc/ld.so.preload. Any unauthorized process attempting to open or write to this file will trigger a SIGKILL.
凭据获取:敏感文件访问限制Credential Access: Sensitive File Access Restriction
攻击原理:读取 Shadow 文件。Attack Principle: Reading Shadow File. Dark-Chain 试图窃取 Root 密码哈希以进行大规模横向移动。即使拥有 Root 权限,此类行为也应被视为异常。Dark-Chain attempts to steal Root password hashes for large-scale lateral movement. Even with Root privileges, such behavior should be considered abnormal.
Tetragon 对策:Tetragon Response:
利用 LSM 钩子对 /etc/shadow 实施零信任防护,仅允许合法的 passwd 或 login 进程读取。Utilizing LSM hooks to implement zero-trust protection for /etc/shadow, allowing only legitimate passwd or login processes to read it.
对于 SOC2 或 PCI-DSS 审计,证明“谁不能读”比“谁读了”更难。Tetragon 的 NotIn 逻辑直接在内核层级实施了“最小权限”,其阻断记录可直接作为合规性控制点(Control Point)的自动化证明。
For SOC2/PCI-DSS, proving "who cannot read" is harder than "who did." Tetragon's NotIn logic implements Least Privilege at the kernel level, providing automated proof for Compliance Control Points via its denial logs.
防御逃逸:日志抹除实时拦截Defense Evasion: Real-time Log Erasure Interception
攻击原理:删除 auth.log。Attack Principle: Deleting auth.log. 为了不留下蛛丝马迹,Dark-Chain 尝试删除所有的审计记录。由于这是亚秒级的删除动作,SIEM 往往来不及记录。To leave no trace, Dark-Chain attempts to delete all audit records. Since this is a sub-second deletion, SIEM often fails to record it in time.
Tetragon 对策:Tetragon Response:
在内核层级监控 unlinkat 系统调用。只要针对系统日志路径的删除动作发生,内核立即阻断并将“正在删除”的行为证据上报。Monitoring the unlinkat system call at the kernel level. As soon as a deletion action targeting the system log path occurs, the kernel immediately blocks it and reports evidence of the "attempted deletion."
横向移动:非法隧道阻断Lateral Movement: Illegal Tunnel Interception
攻击原理:SSH 隧道。Attack Principle: SSH Tunneling. 攻击者建立 SSH 隧道准备加密并传输核心数据库。这对网络流量分析 (NTA) 来说看起来只是普通的 SSH 流量。Attackers establish an SSH tunnel to encrypt and transfer the core database. To Network Traffic Analysis (NTA), this appears as ordinary SSH traffic.
Tetragon 对策:Tetragon Response: 结合上下文感知。虽然 SSH 进程合法,但如果其发起的连接目标是数据库网段且未经授权,Tetragon 会强制中断 Socket 连接。Context-aware sensing. Although the SSH process is legitimate, Tetragon will forcibly interrupt the socket connection if the target is an unauthorized database network segment.
防御可视化:Cisco XDR & Splunk 实战仪表盘Defense Visualization: Cisco XDR & Splunk Operational Dashboard
Tetragon 提供的亚毫秒级信号,不仅在内核实施阻断,还会实时喂给 SIEM,为 SecOps 提供完整的威胁视图。The sub-millisecond signals provided by Tetragon not only enforce blocking in the kernel but also feed into SIEM in real-time, providing SecOps with a complete threat view.
性能对比:平均响应时间 (MTTR)Performance Comparison: Mean Time to Respond (MTTR)
(收集 -> 传输 -> 分析 -> 策略下发)(Collect -> Transfer -> Analyze -> Policy Distribution)
(内核态实时识别并发送 SIGKILL)(In-kernel real-time identification & SIGKILL dispatch)
从观测到防御:企业级演进建议From Observation to Defense: Enterprise Evolution Roadmap
建设云原生运行时安全不是一蹴而就的,我们建议采用“渐进式”防御路线。Building cloud-native runtime security is a journey; we recommend a "progressive" defensive strategy.
初步部署 Tetragon。开启所有关键系统调用的审计日志(Audit Logs),将高质量的内核原始信号送入 Splunk/Cisco XDR,建立全网资产行为视图。Initial Tetragon deployment. Enable Audit Logs for all critical syscalls, funneling high-quality raw kernel signals into Splunk/Cisco XDR to build a comprehensive asset behavior view.
在测试/预发环境开启策略发现功能。通过观察合法进程(如 Java/Nginx)的行为,自动提炼出“已知良好”的黄金行为白名单。Enable policy discovery in staging. By observing legitimate process behaviors (e.g., Java/Nginx), automatically refine "Known-Good" golden behavior whitelists.
正式下发 SIGKILL 阻断策略。将防御边界从“检测并报警”提升至“亚毫秒级内核态强制阻隔”,彻底杜绝 TOCTOU 攻击。Formally deploy SIGKILL enforcement policies. Elevate defense from "detect and alert" to "sub-millisecond kernel enforcement," eliminating TOCTOU attacks.
能力跃迁:为什么选择 Tetragon?The Leap: Why Choose Tetragon?
总结与核心见解Conclusion & Core Insights
🔑 关键结论 (Key Takeaways)🔑 Key Takeaways
- 同步比异步更安全:Sync is Safer than Async: 在内核态完成 Syscall 阻断是杜绝 0-day 逃逸的唯一手段。In-kernel syscall blocking is the only way to prevent 0-day escapes.
- 身份比 IP 更可靠:Identity over IP: Kubernetes 原生元数据映射消除了动态基础设施中的审计盲区。Native Kubernetes metadata mapping eliminates audit blind spots in dynamic infra.
- 极简比复杂更稳健:Simple is Robust: eBPF 验证器确保了防御逻辑不会拖垮业务性能。The eBPF Verifier ensures security logic never compromises performance.
利用 eBPF 的高性能钩子,在系统调用层面实现亚毫秒级的防御响应,确保在恶意操作发生前即刻阻断。Utilizing eBPF's high-performance hooks for sub-millisecond defense at the syscall level, ensuring immediate blocking before malicious acts.
不假设任何进程是安全的。通过深层上下文感知(用户、二进制路径、命名空间),仅允许白名单内的合法操作。Assuming no process is safe. Through deep context awareness (user, binary, namespace), it only allows legitimate white-listed operations.
在内核空间完成过滤,规避了昂贵的上下文切换开销,确保为高性能生产业务提供坚如夯实的保护。Filtering in kernel space avoids expensive context switching, providing robust protection for high-performance production workloads.
最后一句话:如果安全不能实时,那它只是取证。Final Thought: If security isn't real-time, it's just forensics.
准备好构建你的内核级防线了吗?Ready to build your kernel-level defense?
- Cisco Live 2025: BRKSEC-2167 - eBPF-Powered Security Architecture (PDF)
- Tetragon Documentation: Overview
- Tetragon Documentation: Getting Started Guide
- Tetragon Documentation: Core Concepts
- Tetragon Documentation: Security Use Cases
- Tetragon Architecture Deep Dive & Performance Analysis (Google Slides)
- Isovalent Runtime Security: Enterprise-grade eBPF Enforcement
- Isovalent Blog: What is Runtime Security?
- Isovalent Blog: Runtime Security 1.18 - Enhanced Prevention
- Isovalent Blog: Runtime Security 1.16 - Performance & Scalability
- Isovalent Blog: Tetragon and Splunk - Building a Real-time SOC
- Isovalent Blog: Top Tetragon Use Cases (Part 1)
- Isovalent Blog: Top Tetragon Use Cases (Part 2)
- Tetragon 2025 Year in Review: AI-powered Policies and Dynamic SBOM Integration.
- Deep Dive into Process Tetragon: In-kernel Lifecycle Tracking and Performance Aggregation.