kernel_optimize_test/include/net
Pablo Neira Ayuso 0838aa7fcf netfilter: fix netns dependencies with conntrack templates
Quoting Daniel Borkmann:

"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.

Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.

This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."

Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.

Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.

Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Daniel Borkmann <daniel@iogearbox.net>
2015-07-20 14:58:19 +02:00
..
9p 9p: switch p9_client_read() to passing struct iov_iter * 2015-04-11 22:28:27 -04:00
bluetooth Bluetooth: hci_core: increase max adv inst 2015-06-18 18:11:53 +02:00
caif caif: fix a signedness bug in cfpkt_iterate() 2015-02-20 17:35:14 -05:00
irda
iucv
netfilter netfilter: fix netns dependencies with conntrack templates 2015-07-20 14:58:19 +02:00
netns netfilter: fix netns dependencies with conntrack templates 2015-07-20 14:58:19 +02:00
nfc NFC: nci: add generic uart support 2015-06-11 23:37:37 +02:00
phonet
sctp sctp: fix ASCONF list handling 2015-06-14 12:55:49 -07:00
tc_act act_bpf: add initial eBPF support for actions 2015-03-20 19:10:44 -04:00
6lowpan.h
act_api.h
addrconf.h net: Export IGMP/MLD message validation code 2015-05-04 14:49:23 -04:00
af_ieee802154.h
af_rxrpc.h
af_unix.h net/unix: support SCM_SECURITY for stream sockets 2015-06-10 22:49:20 -07:00
af_vsock.h net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
ah.h
arp.h neigh: Factor out ___neigh_lookup_noref 2015-03-04 00:23:23 -05:00
atmclip.h
ax25.h ax25: Stop using sock->sk_protinfo. 2015-06-28 16:55:44 -07:00
ax88796.h
bond_3ad.h bonding: Implement port churn-machine (AD standard 43.4.17). 2015-02-24 16:05:48 -05:00
bond_alb.h
bond_options.h bonding: Implement user key part of port_key in an AD system. 2015-05-11 10:59:32 -04:00
bonding.h bonding: Implement user key part of port_key in an AD system. 2015-05-11 10:59:32 -04:00
busy_poll.h
cfg80211-wext.h
cfg80211.h cfg80211: properly send NL80211_ATTR_DISCONNECTED_BY_AP in disconnect 2015-05-26 15:21:27 +02:00
cfg802154.h nl802154: add support to set cca ed level 2015-05-27 19:29:42 +02:00
checksum.h net: fix sparse error in csum_replace4() 2015-05-17 13:08:29 -04:00
cipso_ipv4.h cipso: don't use IPCB() to locate the CIPSO IP option 2015-02-11 14:46:37 -05:00
cls_cgroup.h
codel.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-05-13 14:31:43 -04:00
compat.h net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
datalink.h
dcbevent.h
dcbnl.h net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dn_dev.h
dn_fib.h
dn_neigh.h netfilter: Pass socket pointer down through okfn(). 2015-04-07 15:25:55 -04:00
dn_nsp.h
dn_route.h
dn.h
dsa.h net: dsa: Add basic framework to support ndo_fdb functions 2015-03-29 13:23:54 -07:00
dsfield.h
dst_ops.h net: Remove protocol from struct dst_ops 2015-03-09 16:06:10 -04:00
dst.h net: make skb_dst_pop routine static 2015-05-12 23:19:49 -04:00
esp.h
ethoc.h
fib_rules.h net: ipv4 sysctl option to ignore routes when nexthop link is down 2015-06-24 02:15:54 -07:00
firewire.h
flow_dissector.h mpls: Add MPLS entropy label in flow_keys 2015-06-04 15:44:31 -07:00
flow.h
flowcache.h
fou.h
garp.h
gen_stats.h
genetlink.h net: Introduce possible_net_t 2015-03-12 14:39:40 -04:00
geneve.h geneve: move definition of geneve_hdr() to geneve.h 2015-05-13 15:59:13 -04:00
gre.h
gro_cells.h ip_tunnel: Create percpu gro_cell 2015-01-18 01:56:32 -05:00
gue.h
icmp.h
ieee80211_radiotap.h
ieee802154_netdev.h mac802154: cleanup llsec param flags 2015-06-12 11:42:29 +02:00
if_inet6.h ipv6: do retries on stable privacy addresses 2015-03-23 22:12:09 -04:00
inet6_connection_sock.h inet: get rid of central tcp/dccp listener timer 2015-03-20 12:40:25 -04:00
inet6_hashtables.h ipv6: get rid of __inet6_hash() 2015-03-18 22:00:35 -04:00
inet_common.h net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
inet_connection_sock.h tcp: fix child sockets to use system default congestion control if not set 2015-05-31 21:49:14 -07:00
inet_ecn.h
inet_frag.h ip_fragment: don't forward defragmented DF packet 2015-05-27 13:03:31 -04:00
inet_hashtables.h tcp: fix/cleanup inet_ehash_locks_alloc() 2015-05-26 19:48:46 -04:00
inet_sock.h inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations 2015-06-06 23:57:12 -07:00
inet_timewait_sock.h tcp/dccp: get rid of central timewait timer 2015-04-13 16:40:05 -04:00
inetpeer.h tcp: simplify inetpeer_addr_base use 2015-03-31 13:58:35 -04:00
ip6_checksum.h
ip6_fib.h ipv6: Create percpu rt6_info 2015-05-25 13:25:35 -04:00
ip6_route.h ipv6: Add rt6_get_cookie() function 2015-05-25 13:25:34 -04:00
ip6_tunnel.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
ip_fib.h net: ipv4 sysctl option to ignore routes when nexthop link is down 2015-06-24 02:15:54 -07:00
ip_tunnels.h ipip,gre,vti,sit: implement ndo_get_iflink 2015-04-02 14:05:00 -04:00
ip_vs.h net: Introduce possible_net_t 2015-03-12 14:39:40 -04:00
ip.h net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
ipcomp.h
ipconfig.h
ipv6.h net: Add full IPv6 addresses to flow_keys 2015-06-04 15:44:30 -07:00
ipx.h
iw_handler.h wext: add checked wrappers for adding events/points to streams 2015-02-28 21:31:12 +01:00
lapb.h
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
mac80211.h mac80211: convert HW flags to unsigned long bitmap 2015-06-10 16:05:36 +02:00
mac802154.h mac802154: fix flags BIT definitions order 2015-06-12 11:43:58 +02:00
mip6.h
mld.h
mpls.h
mrp.h
ndisc.h neigh: Factor out ___neigh_lookup_noref 2015-03-04 00:23:23 -05:00
neighbour.h net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state. 2015-03-20 21:47:40 -04:00
net_namespace.h net: include missing headers in net/net_namespace.h 2015-06-18 21:14:29 +02:00
net_ratelimit.h
netevent.h
netlabel.h
netlink.h netlink: implement nla_get_in_addr and nla_get_in6_addr 2015-03-31 13:58:35 -04:00
netprio_cgroup.h
netrom.h
nexthop.h
nl802154.h nl802154: fix misspelled enum 2015-06-10 12:24:33 +02:00
p8022.h
ping.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
pkt_cls.h
pkt_sched.h net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
protocol.h
psnap.h
raw.h
rawv6.h
red.h
regulatory.h
request_sock.h tcp: provide SYN headers for passive connections 2015-05-05 16:02:34 -04:00
rose.h
route.h ipv4: per cpu uncached list 2015-01-15 18:26:16 -05:00
rtnetlink.h rtnetlink: Mark name argument of rtnl_create_link() const 2015-04-10 12:42:40 -07:00
sch_generic.h net: sched: use counter to break reclassify loops 2015-05-13 15:08:14 -04:00
scm.h
secure_seq.h
slhc_vj.h
snmp.h
sock.h net: Kill sock->sk_protinfo 2015-06-28 16:55:44 -07:00
Space.h
stp.h
switchdev.h switchdev: rename vlan vid_start to vid_begin 2015-06-23 06:56:18 -07:00
tcp_memcontrol.h
tcp_states.h inet: add TCP_NEW_SYN_RECV state 2015-03-12 22:58:12 -04:00
tcp.h tcp: fill shinfo->gso_size at last moment 2015-06-11 16:33:11 -07:00
timewait_sock.h
transp_v6.h
tso.h
udp_tunnel.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
udp.h net: Remove iocb argument from sendmsg and recvmsg 2015-03-02 13:06:31 -05:00
udplite.h net: switch memcpy_fromiovec()/memcpy_fromiovecend() users to copy_from_iter() 2015-02-04 01:34:15 -05:00
vsock_addr.h
vxlan.h udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb(). 2015-04-07 15:29:08 -04:00
wext.h
wimax.h
x25.h
x25device.h
xfrm.h ipsec: Add IV generator information to xfrm_state 2015-05-28 11:23:20 +08:00