前言
今天刚好放假,有空来整理下 iptables 的工作原理。对于 iptables/netfilter 的接触,还是之前研究 k8s service load balancing 时深入学习了下工作原理。
概念理解
下面摘录下官方的定义:
netfilter is a set of hooks inside the Linux kernel that allows kernel modules to register callback functions with the network stack. A registered callback function is then called back for every packet that traverses the respective hook within the network stack.
iptables is a generic table structure for the definition of rulesets. Each rule within an IP table consists of a number of classifiers (iptables matches) and one connected action (iptables target).
翻译下也就是说啊:netfilter 是内核空间的 hook,允许内核模块向网络协议栈里面注册回调函数(详细 netfilter 定义请参考 netfilter)。iptables 是用户空间的用于定义规则集的通用表结构(详细 iptables 定义请参考 iptables)。
iptables 和 netfilter 关系
上面说到 Netfilter 定义了些 hooks,其实他定义了 5 种 hook point(我们可以理解为回调函数点,数据包到达这些位置的时候会主动调用这些函数,使我们有机会能在数据包路由的时候改变它们的方向、内容). 分别是 PREROUTING,INPUT, OUTPUT, FORWARD, POST_ROUNTING。下面是 5 个钩子的调用顺序:
而 iptables 说到是一个用户空间的应用程序,它通过 Netfilter 放出的接口来对存放在内核内存中的 XXtables(Netfilter 的配置表)进行修改。这个 xxtables 就是有 tables , chains, rules 组成。iptables 在应用层负责修改这个规则文件。下面是两者系统中的关系:
Iptables 可操作的配置表
上面讲到 iptables 通过修改 Netfilter 暴露出来的配置表,进而实现数据包过滤、数据包处理、地址伪装、透明代理、动态网络地址转换 (Network Address Translation,NAT) 等功能。那么下面就讲讲 Netfilter 暴露出了哪些配置表呢!
上面也有提到,其实这些配置表就是 Tables,Chains 和 Rules 组成。
Tables
官方定义了下面 5 种 tables。
Filter: This is the default table (if no -t option is passed). It contains the built-in chains INPUT (for packets destined to local sockets), FORWARD (for packets being routed through the box), and OUTPUT (for locally-generated packets).
Nat: This table is consulted when a packet that creates a new connection is encountered. It consists of four built-ins: PREROUTING (for altering packets as soon as they come in), INPUT (for altering packets destined for local sockets), OUTPUT (for altering locally-generated packets before routing), and POSTROUTING (for altering packets as they are about to go out). IPv6 NAT support is available since kernel 3.7.
Mangle: This table is used for specialized packet alteration. Until kernel 2.4.17 it had two built-in chains: PREROUTING (for altering incoming packets before routing) and OUTPUT (for altering locally-generated packets before routing). Since kernel 2.4.18, three other built-in chains are also supported: INPUT (for packets coming into the box itself), FORWARD (for altering packets being routed through the box), and POSTROUTING (for altering packets as they are about to go out).
Raw: This table is used mainly for configuring exemptions from connection tracking in combination with the NOTRACK target. It registers at the netfilter hooks with higher priority and is thus called before ip_conntrack, or any other IP tables. It provides the following built-in chains: PREROUTING (for packets arriving via any network interface) OUTPUT (for packets generated by local processes)
Security: This table is used for Mandatory Access Control (MAC) networking rules, such as those enabled by the SECMARK and CONNSECMARK targets. Mandatory Access Control is implemented by Linux Security Modules such as SELinux. The security table is called after the filter table, allowing any Discretionary Access Control (DAC) rules in the filter table to take effect before MAC rules. This table provides the following built-in chains: INPUT (for packets coming into the box itself), OUTPUT (for altering locally-generated packets before routing), and FORWARD (for altering packets being routed through the box).
Chains
每个 Table 都有内置支持的 Chains,每个 Chain 是一个可以匹配一组包的 Rule 列表。,具体参考上面 Tables 中官网定义(黑色标注)。
Rules
每个 Rule 指定如何处理匹配的包。这称为“Target”,它可能是跳转到同一表中的用户定义 Chain。每个 Rule 定义了一个 Packet 和一个 Target 的标准,如果 Packet 没有被匹配,Chain 中的下一条 Rule 就会被检查,如果匹配上了,下一个 Rule 将被这个 Rule 的 Target 的值(可以是用户定义的 Chain) 指定。
工作流程
上面概念讲的有些难以理解。下面这张图可以很好地理解其工作流程!
从图中,我们可以总结出以下规律: 当一个数据包进入网卡时,数据包首先进入 PREROUTING 链,在 PREROUTING 链中我们有机会修改数据包的 DestIP(目的 IP),然后内核的"路由模块"根据"数据包目的 IP"以及"内核中的路由表"判断是否需要转送出去(注意,这个时候数据包的 DestIP 有可能已经被我们修改过了) 如果数据包就是进入本机的(即数据包的目的 IP 是本机的网口 IP),数据包就会沿着图向下移动,到达INPUT 链。数据包到达 INPUT 链后,然后我们的应用程序就会收到。 当我们应用程序发送数据包时,这些数据包经过OUTPUT 链,然后到达POSTROTING 链输出(注意,这个时候数据包的 SrcIP 有可能已经被我们修改过了)。 如果数据包是要转发出去的(即目的 IP 地址不再当前子网中),且内核允许转发,数据包就会向右移动,经过FORWARD 链,然后到达POSTROUTING 链输出(选择对应子网的网口发送出去)。
Iptables 命令格式
命令格式可参考下图:
总结
对于 Iptables 和 Netfilter 的使用还是蛮多的, 网上很多都是在防火墙方面的使用。但是我接触这个不是因为防火墙方便的需求,而是在 K8s 和 istio 中使用了这个机制做 load balancing 及路由转发,所以才去深入了解了下。后续会总结下 K8s kube_proxy 中使用 Iptables/Netfilter 去做路由转发和 load balancing 的文章。