kernel_optimize_test/net/ipv4
Jason A. Donenfeld 46d6c5ae95 netfilter: use actual socket sk rather than skb sk when routing harder
If netfilter changes the packet mark when mangling, the packet is
rerouted using the route_me_harder set of functions. Prior to this
commit, there's one big difference between route_me_harder and the
ordinary initial routing functions, described in the comment above
__ip_queue_xmit():

   /* Note: skb->sk can be different from sk, in case of tunnels */
   int __ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl,

That function goes on to correctly make use of sk->sk_bound_dev_if,
rather than skb->sk->sk_bound_dev_if. And indeed the comment is true: a
tunnel will receive a packet in ndo_start_xmit with an initial skb->sk.
It will make some transformations to that packet, and then it will send
the encapsulated packet out of a *new* socket. That new socket will
basically always have a different sk_bound_dev_if (otherwise there'd be
a routing loop). So for the purposes of routing the encapsulated packet,
the routing information as it pertains to the socket should come from
that socket's sk, rather than the packet's original skb->sk. For that
reason __ip_queue_xmit() and related functions all do the right thing.

One might argue that all tunnels should just call skb_orphan(skb) before
transmitting the encapsulated packet into the new socket. But tunnels do
*not* do this -- and this is wisely avoided in skb_scrub_packet() too --
because features like TSQ rely on skb->destructor() being called when
that buffer space is truely available again. Calling skb_orphan(skb) too
early would result in buffers filling up unnecessarily and accounting
info being all wrong. Instead, additional routing must take into account
the new sk, just as __ip_queue_xmit() notes.

So, this commit addresses the problem by fishing the correct sk out of
state->sk -- it's already set properly in the call to nf_hook() in
__ip_local_out(), which receives the sk as part of its normal
functionality. So we make sure to plumb state->sk through the various
route_me_harder functions, and then make correct use of it following the
example of __ip_queue_xmit().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-10-30 12:57:39 +01:00
..
bpfilter net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces" 2020-08-10 12:06:44 -07:00
netfilter netfilter: use actual socket sk rather than skb sk when routing harder 2020-10-30 12:57:39 +01:00
af_inet.c io_uring: allow tcp ancillary data for __sys_recvmsg_sock() 2020-08-24 16:16:06 -07:00
ah4.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
arp.c inet: Use fallthrough; 2020-03-12 15:55:00 -07:00
bpf_tcp_ca.c bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON 2020-09-25 13:58:01 -07:00
cipso_ipv4.c cipso: fix 'audit_secid' kernel-doc warning in cipso_ipv4.c 2020-09-08 20:03:36 -07:00
datagram.c
devinet.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-05-31 17:48:46 -07:00
esp4_offload.c net: Add MODULE_DESCRIPTION entries to network modules 2020-06-20 21:33:57 -07:00
esp4.c ESP: Export esp_output_fill_trailer function 2020-02-19 13:52:32 +01:00
fib_frontend.c ipv4: Initialize flowi4_multipath_hash in data path 2020-09-14 14:54:56 -07:00
fib_lookup.h net: add net available in build_state 2020-03-29 22:30:57 -07:00
fib_notifier.c
fib_rules.c fib: use indirect call wrappers in the most common fib_rules_ops 2020-07-28 17:42:31 -07:00
fib_semantics.c net: Fix the arp error in some cases 2020-06-18 20:21:51 -07:00
fib_trie.c ipv4: Silence suspicious RCU usage warning 2020-08-26 15:58:48 -07:00
fou.c genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
gre_demux.c gre: fix uninit-value in __iptunnel_pull_header 2020-03-08 21:25:37 -07:00
gre_offload.c net: gre: recompute gre csum for sctp over gre tunnels 2020-08-03 15:29:44 -07:00
icmp.c icmp: randomize the global rate limiter 2020-10-16 16:47:09 -07:00
igmp.c ip*_mc_gsfget(): lift copyout of struct group_filter into callers 2020-05-20 20:31:27 -04:00
inet_connection_sock.c inet: remove icsk_ack.blocked 2020-09-30 14:21:30 -07:00
inet_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
inet_fragment.c
inet_hashtables.c net: ipv4: remove unused arg exact_dif in compute_score 2020-08-31 13:08:29 -07:00
inet_timewait_sock.c
inetpeer.c
ip_forward.c
ip_fragment.c
ip_gre.c ip_gre: set dev->hard_header_len and dev->needed_headroom properly 2020-10-13 18:35:29 -07:00
ip_input.c bpf: Add socket assign support 2020-03-30 13:45:04 -07:00
ip_options.c net: clean up codestyle for net/ipv4 2020-08-25 06:28:02 -07:00
ip_output.c networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
ip_sockglue.c net: Remove duplicated midx check against 0 2020-08-25 06:23:59 -07:00
ip_tunnel_core.c iptunnel: use new function dev_fetch_sw_netstats 2020-10-13 17:33:49 -07:00
ip_tunnel.c ipv4: use dev_sw_netstats_rx_add() 2020-10-06 06:23:21 -07:00
ip_vti.c ipv4: use dev_sw_netstats_rx_add() 2020-10-06 06:23:21 -07:00
ipcomp.c ipcomp: assign if_id to child tunnel from parent tunnel 2020-07-09 12:55:37 +02:00
ipconfig.c Documentation: nfsroot.rst: Fix references to nfsroot.rst 2020-03-02 13:11:46 -07:00
ipip.c net: ipip: implement header_ops->parse_protocol for AF_PACKET 2020-06-30 12:29:39 -07:00
ipmr_base.c
ipmr.c ipmr: Use full VIF ID in netlink cache reports 2020-09-10 12:25:51 -07:00
Kconfig net: ipv4: remove duplicate "the the" phrase in Kconfig text 2020-08-18 16:02:16 -07:00
Makefile udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
metrics.c
netfilter.c netfilter: use actual socket sk rather than skb sk when routing harder 2020-10-30 12:57:39 +01:00
netlink.c
nexthop.c nexthop: Fix performance regression in nexthop deletion 2020-10-19 20:07:15 -07:00
ping.c net: clean up codestyle 2020-08-31 12:33:34 -07:00
proc.c tcp: skip DSACKs with dubious sequence ranges 2020-09-24 20:15:45 -07:00
protocol.c
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
raw.c networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-15 12:43:21 -07:00
syncookies.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-05 18:40:01 -07:00
sysctl_net_ipv4.c tcp: reflect tos value received in SYN to the socket 2020-09-10 13:15:40 -07:00
tcp_bbr.c
tcp_bic.c tcp: fix stretch ACK bugs in BIC 2020-03-16 18:26:54 -07:00
tcp_bpf.c net: sk_msg: Simplify sk_psock initialization 2020-08-21 15:16:11 -07:00
tcp_cdg.c
tcp_cong.c tcp: Simplify tcp_set_congestion_control() load=false case 2020-09-10 20:53:01 -07:00
tcp_cubic.c tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT 2020-06-25 16:08:47 -07:00
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data 2020-02-27 18:50:19 -08:00
tcp_fastopen.c bpf: tcp: Add bpf_skops_established() 2020-08-24 14:35:00 -07:00
tcp_highspeed.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_htcp.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: Prevent low rmem stalls with SO_RCVLOWAT. 2020-10-23 19:11:20 -07:00
tcp_ipv4.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-10-08 15:44:50 -07:00
tcp_lp.c
tcp_metrics.c genetlink: move to smaller ops wherever possible 2020-10-02 19:11:11 -07:00
tcp_minisocks.c bpf: tcp: Do not limit cb_flags when creating child sk from listen sk 2020-10-02 11:34:48 -07:00
tcp_nv.c
tcp_offload.c
tcp_output.c tcp: add exponential backoff in __tcp_send_ack() 2020-09-30 14:21:30 -07:00
tcp_rate.c
tcp_recovery.c tcp: move tcp_mark_skb_lost 2020-09-25 17:17:14 -07:00
tcp_scalable.c net: ipv4: delete repeated words 2020-08-24 17:31:20 -07:00
tcp_timer.c inet: remove icsk_ack.blocked 2020-09-30 14:21:30 -07:00
tcp_ulp.c bpf: sockmap: Only check ULP for TCP sockets 2020-03-09 22:34:58 +01:00
tcp_vegas.c tcp: use semicolons rather than commas to separate statements 2020-10-13 17:11:52 -07:00
tcp_vegas.h
tcp_veno.c Replace HTTP links with HTTPS ones: IPv* 2020-07-06 13:23:03 -07:00
tcp_westwood.c
tcp_yeah.c tcp: fix stretch ACK bugs in Yeah 2020-03-16 18:26:55 -07:00
tcp.c tcp: Prevent low rmem stalls with SO_RCVLOWAT. 2020-10-23 19:11:20 -07:00
tunnel4.c tunnel4: add cb_handler to struct xfrm_tunnel 2020-07-09 12:51:36 +02:00
udp_bpf.c net: sk_msg: Simplify sk_psock initialization 2020-08-21 15:16:11 -07:00
udp_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
udp_impl.h net: pass a sockptr_t into ->setsockopt 2020-07-24 15:41:54 -07:00
udp_offload.c udp: initialize is_flist with 0 in udp_gro_receive 2020-03-30 10:35:03 -07:00
udp_tunnel_core.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp_tunnel_nic.c udp_tunnel: add the ability to share port tables 2020-09-28 12:50:12 -07:00
udp_tunnel_stub.c udp_tunnel: add central NIC RX port offload infrastructure 2020-07-10 13:54:00 -07:00
udp.c net: ipv4: delete repeated words 2020-08-24 17:31:20 -07:00
udplite.c net/ipv4: remove compat_ip_{get,set}sockopt 2020-07-19 18:16:41 -07:00
xfrm4_input.c xfrm: state: remove extract_input indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_output.c xfrm: fix unused variable warning if CONFIG_NETFILTER=n 2020-05-11 15:12:27 +02:00
xfrm4_policy.c
xfrm4_protocol.c
xfrm4_state.c xfrm: remove output_finish indirection from xfrm_state_afinfo 2020-05-06 09:40:08 +02:00
xfrm4_tunnel.c