kernel_optimize_test/net/ipv4
Jiri Wiesner ebaf39e603 ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
The *_frag_reasm() functions are susceptible to miscalculating the byte
count of packet fragments in case the truesize of a head buffer changes.
The truesize member may be changed by the call to skb_unclone(), leaving
the fragment memory limit counter unbalanced even if all fragments are
processed. This miscalculation goes unnoticed as long as the network
namespace which holds the counter is not destroyed.

Should an attempt be made to destroy a network namespace that holds an
unbalanced fragment memory limit counter the cleanup of the namespace
never finishes. The thread handling the cleanup gets stuck in
inet_frags_exit_net() waiting for the percpu counter to reach zero. The
thread is usually in running state with a stacktrace similar to:

 PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
  #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
  #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
  #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
  #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
  #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
 #10 [ffff880621563e38] process_one_work at ffffffff81096f14

It is not possible to create new network namespaces, and processes
that call unshare() end up being stuck in uninterruptible sleep state
waiting to acquire the net_mutex.

The bug was observed in the IPv6 netfilter code by Per Sundstrom.
I thank him for his analysis of the problem. The parts of this patch
that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.

Signed-off-by: Jiri Wiesner <jwiesner@suse.com>
Reported-by: Per Sundstrom <per.sundstrom@redqube.se>
Acked-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05 20:44:46 -08:00
..
bpfilter
netfilter netfilter: nat: fix double register in masquerade modules 2018-11-27 00:36:46 +01:00
af_inet.c
ah4.c
arp.c
cipso_ipv4.c
datagram.c
devinet.c
esp4_offload.c
esp4.c
fib_frontend.c
fib_lookup.h
fib_notifier.c
fib_rules.c
fib_semantics.c
fib_trie.c
fou.c
gre_demux.c
gre_offload.c
icmp.c
igmp.c
inet_connection_sock.c
inet_diag.c
inet_fragment.c inet: frags: better deal with smp races 2018-11-08 18:40:30 -08:00
inet_hashtables.c
inet_timewait_sock.c
inetpeer.c
ip_forward.c
ip_fragment.c ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes 2018-12-05 20:44:46 -08:00
ip_gre.c
ip_input.c net: use skb_list_del_init() to remove from RX sublists 2018-12-05 16:22:05 -08:00
ip_options.c
ip_output.c net: always initialize pagedlen 2018-11-24 17:42:57 -08:00
ip_sockglue.c net: bpfilter: fix iptables failure if bpfilter_umh is disabled 2018-11-05 17:12:18 -08:00
ip_tunnel_core.c ip_tunnel: don't force DF when MTU is locked 2018-11-17 21:50:55 -08:00
ip_tunnel.c
ip_vti.c
ipcomp.c
ipconfig.c
ipip.c
ipmr_base.c
ipmr.c
Kconfig
Makefile
metrics.c
netfilter.c
netlink.c
ping.c
proc.c
protocol.c
raw_diag.c
raw.c
route.c
syncookies.c
sysctl_net_ipv4.c
tcp_bbr.c
tcp_bic.c
tcp_bpf.c
tcp_cdg.c
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c
tcp_fastopen.c
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: address problems caused by EDT misshaps 2018-11-24 17:41:37 -08:00
tcp_ipv4.c
tcp_lp.c
tcp_metrics.c
tcp_minisocks.c
tcp_nv.c
tcp_offload.c
tcp_output.c tcp: fix NULL ref in tail loss probe 2018-12-05 16:34:40 -08:00
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_timer.c tcp: fix SNMP TCP timeout under-estimation 2018-11-30 17:22:41 -08:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c
tunnel4.c
udp_diag.c
udp_impl.h
udp_offload.c
udp_tunnel.c
udp.c
udplite.c
xfrm4_input.c
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c
xfrm4_policy.c
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c