Commit Graph

779521 Commits

Author SHA1 Message Date
Will Deacon
f6cc0c5016 arm64: Avoid calling stop_machine() when patching jump labels
Patching a jump label involves patching a single instruction at a time,
swizzling between a branch and a NOP. The architecture treats these
instructions specially, so a concurrently executing CPU is guaranteed to
see either the NOP or the branch, rather than an amalgamation of the two
instruction encodings.

However, in order to guarantee that the new instruction is visible, it
is necessary to send an IPI to the concurrently executing CPU so that it
discards any previously fetched instructions from its pipeline. This
operation therefore cannot be completed from a context with IRQs
disabled, but this is exactly what happens on the jump label path where
the hotplug lock is held and irqs are subsequently disabled by
stop_machine_cpuslocked(). This results in a deadlock during boot on
Hikey-960.

Due to the architectural guarantees around patching NOPs and branches,
we don't actually need to stop_machine() at all on the jump label path,
so we can avoid the deadlock by using the "nosync" variant of our
instruction patching routine.

Fixes: 693350a799 ("arm64: insn: Don't fallback on nosync path for general insn patching")
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Reported-by: John Stultz <john.stultz@linaro.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Tested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-17 10:26:44 +01:00
Dave Airlie
3d63a3c147 Merge tag 'drm-msm-next-2018-08-10' of git://people.freedesktop.org/~robclark/linux into drm-next
An optional follow-on PR for 4.19, on top of previous -fixes PR, which
brings in a6xx support.

These patches have been on list since earlier in the year (mostly
waiting for userspace).  They have been in linux-next since earlier in
the week, now that we have freedreno userspace working on a6xx[1][2].
So far glmark2, Chromium/ChromiumOS, gnome-shell, glamor, xonotic,
etc, are working.  And a healthy chuck of deqp works, and I've been
busy fixing things.  The needed libdrm changes (no new uapi changes
needed) are already on master, and the 2nd branch is rebased on that.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuCKekZ2Dho80qxODT1BEUGg4hbq33ACUy5VXs3dHbDLA@mail.gmail.com
2018-08-17 10:46:51 +10:00
Bartosz Golaszewski
b2201ee554 remoteproc/davinci: use the reset framework
Switch to using the reset framework instead of handcoded reset routines
we used so far.

Reviewed-by: Sekhar Nori <nsekhar@ti.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2018-08-16 17:39:55 -07:00
Dave Airlie
0258d7a5e2 Fixes for:
- DP full color range.
 - selftest for gem_object
 - forcewake on suspend
 - GPU reset
 
 This also include accumulated fixes from GVT:
 - Fix an error code in gvt_dma_map_page() (Dan)
 - Fix off by one error in intel_vgpu_write_fence() (Dan)
 - Fix potential Spectre v1 (Gustavo)
 - Fix workload free in vgpu release (Henry)
 - Fix cleanup sequence in intel_gvt_clean_device (Henry)
 - dmabuf mutex init place fix (Henry)
 - possible memory leak in intel_vgpu_ioctl() err path (Yi)
 - return error on cmd access check failure (Yan)
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbdcppAAoJEPpiX2QO6xPKZ+IH/jzYLJDShTvJ01gyXMmNUn2/
 JLKOVnKuBzDZj3duQHZTXZBBwC+Gr66uC0iX9hA0Zl/La/jmZdHfxY4PBDzlIuxq
 AZpK9kX7yKAP9TfbF35H6qUb3n09rlWO6L4pMtdO8nS1oMHEZ1UXTz9okjQQ/GFd
 hl4btwNo75xDB4aBFFNKT/bhpjSl0Yuox60Ff5q4kJ4XiZ88NKx+H9fppJWk/j3P
 YK972jnR1NugYmVuhL8ENFt1b1IuQ7Rn64O7SqYNou2Xcr1yM7lCFGyWDf2Tang/
 MSisLRmJSnHYSZ0d1qjJS3tRuTnQg0s3xi7KheMFYFnjYv8sR5hzvrDbFHYHzc8=
 =Esml
 -----END PGP SIGNATURE-----

Merge tag 'drm-intel-next-fixes-2018-08-16-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

Fixes for:
- DP full color range.
- selftest for gem_object
- forcewake on suspend
- GPU reset

This also include accumulated fixes from GVT:
- Fix an error code in gvt_dma_map_page() (Dan)
- Fix off by one error in intel_vgpu_write_fence() (Dan)
- Fix potential Spectre v1 (Gustavo)
- Fix workload free in vgpu release (Henry)
- Fix cleanup sequence in intel_gvt_clean_device (Henry)
- dmabuf mutex init place fix (Henry)
- possible memory leak in intel_vgpu_ioctl() err path (Yi)
- return error on cmd access check failure (Yan)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180816190335.GA7765@intel.com
2018-08-17 10:33:48 +10:00
Dave Airlie
637319c678 Merge branch 'drm-next-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for 4.19:
- Add VCN PSP FW loading for RV (this is required on upcoming parts)
- Fix scheduler setup ordering for VCE and UVD
- Few misc display fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180816181840.2786-1-alexander.deucher@amd.com
2018-08-17 09:26:13 +10:00
Dave Airlie
d32e2c6de7 Merge tag 'drm-msm-fixes-2018-08-10' of git://people.freedesktop.org/~robclark/linux into drm-next
Some small msm fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuZE0VEpatrtxGZtUB6FaQYr6Gf07UVpMsD15ook+5_WQ@mail.gmail.com
2018-08-17 09:25:32 +10:00
Steven Rostedt (VMware)
bb730b5833 tracing: Fix SPDX format headers to use C++ style comments
The Linux kernel adopted the SPDX License format headers to ease license
compliance management, and uses the C++ '//' style comments for the SPDX
header tags. Some files in the tracing directory used the C style /* */
comments for them. To be consistent across all files, replace the /* */
C style SPDX tags with the C++ // SPDX tags.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware)
bcea3f96e1 tracing: Add SPDX License format tags to tracing files
Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware)
179a0cc4e0 tracing: Add SPDX License format to bpf_trace.c
Add the SPDX License header to ease license compliance management.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 19:07:36 -04:00
Alexei Starovoitov
cbb2fb13db Merge branch 'sockmap-ulp-fixes'
Daniel Borkmann says:

====================
Batch of various fixes related to BPF sockmap and ULP, including
adding module alias to restrict module requests, races and memory
leaks in sockmap code. For details please refer to the individual
patches. Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:09 -07:00
Daniel Borkmann
585f5a6252 bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
and BPF_NOEXIST map update flags. While on array-like maps this approach
is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
enforce map update flags to be BPF_ANY such that xchg() can be used
directly, the current implementation in sock map does not guarantee
that such operation with BPF_EXIST / BPF_NOEXIST is atomic.

The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
socket from the slot which is then tested for NULL / non-NULL. However
later after __sock_map_ctx_update_elem(), the actual update is done
through osock = xchg(&stab->sock_map[i], sock). Problem is that in
the meantime a different CPU could have updated / deleted a socket
on that specific slot and thus flag contraints won't hold anymore.

I've been thinking whether best would be to just break UAPI and do
an enforcement of BPF_ANY to check if someone actually complains,
however trouble is that already in BPF kselftest we use BPF_NOEXIST
for the map update, and therefore it might have been copied into
applications already. The fix to keep the current behavior intact
would be to add a map lock similar to the sock hash bucket lock only
for covering the whole map.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:08 -07:00
Daniel Borkmann
166ab6f0a0 bpf, sockmap: fix map elem deletion race with smap_stop_sock
The smap_start_sock() and smap_stop_sock() are each protected under
the sock->sk_callback_lock from their call-sites except in the case
of sock_map_delete_elem() where we drop the old socket from the map
slot. This is racy because the same sock could be part of multiple
sock maps, so we run smap_stop_sock() in parallel, and given at that
point psock->strp_enabled might be true on both CPUs, we might for
example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
Therefore, hold the sock->sk_callback_lock as well on delete. Looks
like 2f857d0460 ("bpf: sockmap, remove STRPARSER map_flags and add
multi-map support") had this right, but later on e9db4ef6bf ("bpf:
sockhash fix omitted bucket lock in sock_close") removed it again
from delete leaving this smap_stop_sock() instance unprotected.

Fixes: e9db4ef6bf ("bpf: sockhash fix omitted bucket lock in sock_close")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:08 -07:00
Daniel Borkmann
d40b0116c9 bpf, sockmap: fix leakage of smap_psock_map_entry
While working on sockmap I noticed that we do not always kfree the
struct smap_psock_map_entry list elements which track psocks attached
to maps. In the case of sock_hash_ctx_update_elem(), these map entries
are allocated outside of __sock_map_ctx_update_elem() with their
linkage to the socket hash table filled. In the case of sock array,
the map entries are allocated inside of __sock_map_ctx_update_elem()
and added with their linkage to the psock->maps. Both additions are
under psock->maps_lock each.

Now, we drop these elements from their psock->maps list in a few
occasions: i) in sock array via smap_list_map_remove() when an entry
is either deleted from the map from user space, or updated via
user space or BPF program where we drop the old socket at that map
slot, or the sock array is freed via sock_map_free() and drops all
its elements; ii) for sock hash via smap_list_hash_remove() in exactly
the same occasions as just described for sock array; iii) in the
bpf_tcp_close() where we remove the elements from the list via
psock_map_pop() and iterate over them dropping themselves from either
sock array or sock hash; and last but not least iv) once again in
smap_gc_work() which is a callback for deferring the work once the
psock refcount hit zero and thus the socket is being destroyed.

Problem is that the only case where we kfree() the list entry is
in case iv), which at that point should have an empty list in
normal cases. So in cases from i) to iii) we unlink the elements
without freeing where they go out of reach from us. Hence fix is
to properly kfree() them as well to stop the leakage. Given these
are all handled under psock->maps_lock there is no need for deferred
RCU freeing.

I later also ran with kmemleak detector and it confirmed the finding
as well where in the state before the fix the object goes unreferenced
while after the patch no kmemleak report related to BPF showed up.

  [...]
  unreferenced object 0xffff880378eadae0 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
      [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
      [<000000002ef89e83>] 0xffffffffffffffff
  unreferenced object 0xffff880378ead240 (size 64):
    comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
    hex dump (first 32 bytes):
      00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
      00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
    backtrace:
      [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
      [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
      [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
      [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
      [<0000000000763660>] do_syscall_64+0x9a/0x300
      [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [<000000002ef89e83>] 0xffffffffffffffff
  [...]

Fixes: e9db4ef6bf ("bpf: sockhash fix omitted bucket lock in sock_close")
Fixes: 54fedb42c6 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
Fixes: 2f857d0460 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:08 -07:00
Daniel Borkmann
90545cdc3f tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
I found that in BPF sockmap programs once we either delete a socket
from the map or we updated a map slot and the old socket was purged
from the map that these socket can never get reattached into a map
even though their related psock has been dropped entirely at that
point.

Reason is that tcp_cleanup_ulp() leaves the old icsk->icsk_ulp_ops
intact, so that on the next tcp_set_ulp_id() the kernel returns an
-EEXIST thinking there is still some active ULP attached.

BPF sockmap is the only one that has this issue as the other user,
kTLS, only calls tcp_cleanup_ulp() from tcp_v4_destroy_sock() whereas
sockmap semantics allow dropping the socket from the map with all
related psock state being cleaned up.

Fixes: 1aa12bdf1b ("bpf: sockmap, add sock close() hook to remove socks")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:08 -07:00
Daniel Borkmann
037b0b86ec tcp, ulp: add alias for all ulp modules
Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:

  [root@bar]# cat foo.c
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <linux/tcp.h>
  #include <linux/in.h>

  int main(void)
  {
      int sock = socket(PF_INET, SOCK_STREAM, 0);
      setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
      return 0;
  }

  [root@bar]# gcc foo.c -O2 -Wall
  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  sctp                 1077248  4
  libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
  [root@bar]#

Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:

  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  [root@bar]#

Sockmap is not affected from this since it's either built-in or not.

Fixes: 734942cc4e ("tcp: ULP infrastructure")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-16 14:58:07 -07:00
Jason Gunthorpe
0a3173a5f0 Merge branch 'linus/master' into rdma.git for-next
rdma.git merge resolution for the 4.19 merge window

Conflicts:
 drivers/infiniband/core/rdma_core.c
   - Use the rdma code and revise with the new spelling for
     atomic_fetch_add_unless
 drivers/nvme/host/rdma.c
   - Replace max_sge with max_send_sge in new blk code
 drivers/nvme/target/rdma.c
   - Use the blk code and revise to use NULL for ib_post_recv when
     appropriate
   - Replace max_sge with max_recv_sge in new blk code
 net/rds/ib_send.c
   - Use the net code and revise to use NULL for ib_post_recv when
     appropriate

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:21:29 -06:00
Jason Gunthorpe
92f4e77c85 Revert "net/smc: Replace ib_query_gid with rdma_get_gid_attr"
This reverts commit ddb457c699.

The include rdma/ib_cache.h is kept, and we have to add a memset
to the compat wrapper to avoid compiler warnings in gcc-7

This revert is done to avoid extensive merge conflicts with SMC
changes in netdev during the 4.19 merge window.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 14:03:34 -06:00
Yonghong Song
965931e3a8 bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
Commit 394e40a297 ("bpf: extend bpf_prog_array to store pointers
to the cgroup storage") refactored the bpf_prog_array_copy_core()
to accommodate new structure bpf_prog_array_item which contains
bpf_prog array itself.

In the old code, we had
   perf_event_query_prog_array():
     mutex_lock(...)
     bpf_prog_array_copy_call():
       prog = rcu_dereference_check(array, 1)->progs
       bpf_prog_array_copy_core(prog, ...)
     mutex_unlock(...)

With the above commit, we had
   perf_event_query_prog_array():
     mutex_lock(...)
     bpf_prog_array_copy_call():
       bpf_prog_array_copy_core(array, ...):
         item = rcu_dereference(array)->items;
         ...
     mutex_unlock(...)

The new code will trigger a lockdep rcu checking warning.
The fix is to change rcu_dereference() to rcu_dereference_check()
to prevent such a warning.

Reported-by: syzbot+6e72317008eef84a216b@syzkaller.appspotmail.com
Fixes: 394e40a297 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-16 21:55:32 +02:00
Jesper Dangaard Brouer
817b89beb9 samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
It is common XDP practice to unload/deattach the XDP bpf program,
when the XDP sample program is Ctrl-C interrupted (SIGINT) or
killed (SIGTERM).

The samples/bpf programs xdp_redirect_cpu and xdp_rxq_info,
forgot to trap signal SIGTERM (which is the default signal used
by the kill command).

This was discovered by Red Hat QA, which automated scripts depend
on killing the XDP sample program after a timeout period.

Fixes: fad3917e36 ("samples/bpf: add cpumap sample program xdp_redirect_cpu")
Fixes: 0fca931a6f ("samples/bpf: program demonstrating access to xdp_rxq_info")
Reported-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-16 21:55:32 +02:00
Tariq Toukan
21b172ee11 net/xdp: Fix suspicious RCU usage warning
Fix the warning below by calling rhashtable_lookup_fast.
Also, make some code movements for better quality and human
readability.

[  342.450870] WARNING: suspicious RCU usage
[  342.455856] 4.18.0-rc2+ #17 Tainted: G           O
[  342.462210] -----------------------------
[  342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage!
[  342.476568]
[  342.476568] other info that might help us debug this:
[  342.476568]
[  342.486978]
[  342.486978] rcu_scheduler_active = 2, debug_locks = 1
[  342.495211] 4 locks held by modprobe/3934:
[  342.500265]  #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at:
mlx5_unregister_interface+0x18/0x90 [mlx5_core]
[  342.511953]  #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20
[  342.521109]  #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60
[mlx5_core]
[  342.531642]  #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0
[  342.541206]
[  342.541206] stack backtrace:
[  342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G           O      4.18.0-rc2+ #17
[  342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[  342.565606] Call Trace:
[  342.568861]  dump_stack+0x78/0xb3
[  342.573086]  xdp_rxq_info_unreg+0x3f5/0x6b0
[  342.578285]  ? __call_rcu+0x220/0x300
[  342.582911]  mlx5e_free_rq+0x38/0xc0 [mlx5_core]
[  342.588602]  mlx5e_close_channel+0x20/0x120 [mlx5_core]
[  342.594976]  mlx5e_close_channels+0x26/0x40 [mlx5_core]
[  342.601345]  mlx5e_close_locked+0x44/0x50 [mlx5_core]
[  342.607519]  mlx5e_close+0x42/0x60 [mlx5_core]
[  342.613005]  __dev_close_many+0xb1/0x120
[  342.617911]  dev_close_many+0xa2/0x170
[  342.622622]  rollback_registered_many+0x148/0x460
[  342.628401]  ? __lock_acquire+0x48d/0x11b0
[  342.633498]  ? unregister_netdev+0xe/0x20
[  342.638495]  rollback_registered+0x56/0x90
[  342.643588]  unregister_netdevice_queue+0x7e/0x100
[  342.649461]  unregister_netdev+0x18/0x20
[  342.654362]  mlx5e_remove+0x2a/0x50 [mlx5_core]
[  342.659944]  mlx5_remove_device+0xe5/0x110 [mlx5_core]
[  342.666208]  mlx5_unregister_interface+0x39/0x90 [mlx5_core]
[  342.673038]  cleanup+0x5/0xbfc [mlx5_core]
[  342.678094]  __x64_sys_delete_module+0x16b/0x240
[  342.683725]  ? do_syscall_64+0x1c/0x210
[  342.688476]  do_syscall_64+0x5a/0x210
[  342.693025]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 8d5d885275 ("xdp: rhashtable with allocator ID to pointer mapping")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-16 21:55:21 +02:00
Steven Rostedt (VMware)
91c1e6ba39 blktrace: Add SPDX License format header
Add the SPDX License header to ease license compliance management.

Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 15:49:02 -04:00
Yuval Shaia
54c73f8651 net/mlx5e: Delete unneeded function argument
priv argument is not used by the function, delete it.

Fixes: a89842811e ("net/mlx5e: Merge per priority stats groups")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:28:06 -07:00
Ivan Khoronzhuk
70fd8036d0 Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
If set cbs parameters calculated for 1000Mb, but use on 100Mb port
w/o h/w offload (for cpsw offload it doesn't matter), it works
incorrectly. According to the example and testing board, second port
is 100Mb interface. Correct them on recalculated for 100Mb interface.
It allows to use the same command for CBS software implementation for
board in example.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:27:16 -07:00
Kees Cook
5e22002aa8 isdn: Disable IIOCDBGVAR
It was possible to directly leak the kernel address where the isdn_dev
structure pointer was stored. This is a kernel ASLR bypass for anyone
with access to the ioctl. The code had been present since the beginning
of git history, though this shouldn't ever be needed for normal operation,
therefore remove it.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:26:24 -07:00
Lad, Prabhakar
4531681837 net: dsa: add support for ksz9897 ethernet switch
ksz9477 is superset of ksz9xx series, driver just works
out of the box for ksz9897 chip with this patch.

Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:24:55 -07:00
Toshiaki Makita
7797b93b75 veth: Free queues on link delete
David Ahern reported memory leak in veth.

=======================================================================
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800354d5c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722952 (age 25.904s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<(____ptrval____)>] kmemleak_alloc+0x70/0x94
    [<(____ptrval____)>] slab_post_alloc_hook+0x42/0x52
    [<(____ptrval____)>] __kmalloc+0x101/0x142
    [<(____ptrval____)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
    [<(____ptrval____)>] veth_newlink+0x147/0x3ac [veth]
    ...
unreferenced object 0xffff88002e009c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722958 (age 25.898s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<(____ptrval____)>] kmemleak_alloc+0x70/0x94
    [<(____ptrval____)>] slab_post_alloc_hook+0x42/0x52
    [<(____ptrval____)>] __kmalloc+0x101/0x142
    [<(____ptrval____)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
    [<(____ptrval____)>] veth_newlink+0x219/0x3ac [veth]
=======================================================================

veth_rq allocated in veth_newlink() was not freed on dellink.

We need to free up them after veth_close() so that any packets will not
reference the queues afterwards. Thus free them in veth_dev_free() in
the same way as freeing stats structure (vstats).

Also move queues allocation to veth_dev_init() to be in line with stats
allocation.

Fixes: 638264dc90 ("veth: Support per queue XDP ring")
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:22:31 -07:00
Cong Wang
ff93bca769 ila: make lockdep happy again
Previously, alloc_ila_locks() and bucket_table_alloc() call
spin_lock_init() separately, therefore they have two different
lock names and lock class keys. However, after commit b893281715
("ila: Call library function alloc_bucket_locks") they both call
helper alloc_bucket_spinlocks() which now only has one lock
name and lock class key. This causes a few bogus lockdep warnings
as reported by syzbot.

Fix this by making alloc_bucket_locks() a macro and pass declaration
name as lock name and a static lock class key inside the macro.

Fixes: b893281715 ("ila: Call library function alloc_bucket_locks")
Reported-by: <syzbot+b66a5a554991a8ed027c@syzkaller.appspotmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:14:42 -07:00
Vlad Buslov
32039eac4c net: sched: act_ife: always release ife action on init error
Action init API was changed to always take reference to action, even when
overwriting existing action. Substitute conditional action release, which
was executed only if action is newly created, with unconditional release in
tcf_ife_init() error handling code to prevent double free or memory leak in
case of overwrite.

Fixes: 4e8ddd7f17 ("net: sched: don't release reference on action overwrite")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:12:12 -07:00
Jason Gunthorpe
89982f7cce Linux 4.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
 mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
 qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
 zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
 vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
 wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
 L47IdXo=
 =sJkt
 -----END PGP SIGNATURE-----

Merge tag 'v4.18' into rdma.git for-next

Resolve merge conflicts from the -rc cycle against the rdma.git tree:

Conflicts:
 drivers/infiniband/core/uverbs_cmd.c
  - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
  - Merge removal of file->ucontext in for-next with new code in -rc
 drivers/infiniband/core/uverbs_main.c
  - for-next removed code from ib_uverbs_write() that was modified
    in for-rc

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-16 13:12:00 -06:00
Fabrizio Castro
5f34f69ede dt-bindings: net: ravb: Add support for r8a774a1 SoC
Document RZ/G2M (R8A774A1) SoC bindings.

Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Biju Das <biju.das@bp.renesas.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:09:44 -07:00
Hangbin Liu
a51c76b4df cls_matchall: fix tcf_unbind_filter missing
Fix tcf_unbind_filter missing in cls_matchall as this will trigger
WARN_ON() in cbq_destroy_class().

Fixes: fd62d9f5c5 ("net/sched: matchall: Fix configuration race")
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-16 12:08:26 -07:00
Dmitry Torokhov
13fe7056be Merge branch 'next' into for-linus
Prepare input updates for 4.19 merge window.
2018-08-16 11:10:56 -07:00
Michel Dänzer
c9533d1bca drm/amdgpu: Use kvmalloc for allocating UVD/VCE/VCN BO backup memory
The allocated size can be (at least?) as large as megabytes, and
there's no need for it to be physically contiguous.

May avoid spurious failures to initialize / suspend the corresponding
block while there's memory pressure.

Bugzilla: https://bugs.freedesktop.org/107432
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-16 12:59:11 -05:00
Linus Torvalds
5c60a7389d Orangefs: one cleanup and Souptick's vm_fault_t patch
1. Adding new return type vm_fault_t (Souptick Joarder)
 2. remove redundant pointer orangefs_inode (Colin Ian King)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbdZXmAAoJEM9EDqnrzg2+O+YP/AoU+NnPj9rDYKC/OImp4uhh
 aIER9LOFXFJocWULAQccFXLawRVzllBwwcWSwLlGAa2AT8DyIxpuyxJhNLIfrEKV
 axsfAQA/mU529i8PRgwnYdQJ0cKgzHR9qrQvTrBPAV+xhrlIeQI48cNlriwJikFF
 0bXkWZt5ZSn+e5FkKFm/OqiialwcrOkMGnM+Apa0B9MSvmapLcCuvGxqYYKEbSaV
 JYqnZ3DiDnBp/6RYUY/qn/Azp8gCDfrPlm05lUZnAbyFGwaidunOgNMHTbQAZ//H
 hLuGRsMWOdQqwEMr+H9vPZVBTp6DfupgH8BgB5Y5EHcwgoWK5U3sZZQKP5f8+9vh
 7StCSnc9qT5iJWTbOWIngIpSeNnVa6iF7QMXt7wxOQY2ITu5Cnot1fWhuj2UcA36
 xmf38B6YRX4VeLMc/eryQCD7d4EpBYIqdyaLAg0Qg1Y35DU9b3QkC56ca56uQrHY
 QZeQAqH63CpHiajrYCHE5wsr5zrLXbYj229Idq2KBhEqXcxCV17kwjLF3rpyEbxu
 9I4HpafzQ0Sho+zsCgakyu5DYBAfMbAYqR7pT5MGNB8yYVzxMcSEsAWSQ42Ab1qb
 P09p1ojQQxjrqApMOa6L4MrLNA7Wl75LGRnwNy7c83qkys8Y90JhdZsQlLwlp+PT
 rYnIliKQuTRY+7JV/4WL
 =3Oz+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Orangefs: one cleanup and Souptick's vm_fault_t patch:

   - add new return type vm_fault_t (Souptick Joarder)

   - remove redundant pointer (Colin Ian King)"

* tag 'for-linus-4.19-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: remove redundant pointer orangefs_inode
  orangefs: Adding new return type vm_fault_t
2018-08-16 10:53:45 -07:00
Mikulas Patocka
1e1132ea21 dm writecache: fix a crash due to reading past end of dirty_bitmap
wc->dirty_bitmap_size is in bytes so must multiply it by 8, not by
BITS_PER_LONG, to get number of bitmap_bits.

Fixes crash in find_next_bit() that was reported:
https://bugzilla.kernel.org/show_bug.cgi?id=200819

Reported-by: edo.rus@gmail.com
Fixes: 48debafe4f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-08-16 13:43:01 -04:00
Pablo Neira Ayuso
feb9f55c33 netfilter: nft_dynset: allow dynamic updates of non-anonymous set
This check is superfluous since it breaks valid configurations, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:11 +02:00
Máté Eckl
90d827f06b netfilter: nft_tproxy: Fix missing-braces warning
This patch fixes a warning reported by the kbuild test robot (from linux-next
tree):
   net/netfilter/nft_tproxy.c: In function 'nft_tproxy_eval_v6':
>> net/netfilter/nft_tproxy.c:85:9: warning: missing braces around initializer [-Wmissing-braces]
     struct in6_addr taddr = {0};
            ^
   net/netfilter/nft_tproxy.c:85:9: warning: (near initialization for 'taddr.in6_u') [-Wmissing-braces]

This warning is actually caused by a gcc bug already resolved in newer
versions (kbuild used 4.9) so this kind of initialization is omitted and
memset is used instead.

Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:10 +02:00
Dmitry V. Levin
cdb2f40124 netfilter: uapi: fix linux/netfilter/nf_osf.h userspace compilation errors
Move inclusion of <linux/ip.h> and <linux/tcp.h> from
linux/netfilter/xt_osf.h to linux/netfilter/nf_osf.h to fix
the following linux/netfilter/nf_osf.h userspace compilation errors:

/usr/include/linux/netfilter/nf_osf.h:59:24: error: 'MAX_IPOPTLEN' undeclared here (not in a function)
  struct nf_osf_opt opt[MAX_IPOPTLEN];
/usr/include/linux/netfilter/nf_osf.h:64:17: error: field 'ip' has incomplete type
  struct iphdr   ip;
/usr/include/linux/netfilter/nf_osf.h:65:18: error: field 'tcp' has incomplete type
  struct tcphdr   tcp;

Fixes: bfb15f2a95 ("netfilter: extract Passive OS fingerprint infrastructure from xt_osf")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:09 +02:00
Harsha Sharma
3206c516ce netfilter: nft_ct: make l3 protocol field optional for timeout object
If l3 protocol value is not specified for ct timeout object then use the
value from nft_ctx protocol family.

Signed-off-by: Harsha Sharma <harshasharmaiitr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:08 +02:00
Máté Eckl
1bfc2bc7ad netfilter: doc: Add nf_tables part in tproxy.txt
Recently, transparent proxy support has been added to nf_tables so that
this document should be updated with the new information.

- Nft commands are added as alternatives to iptables ones.
- The link for a patched iptables is removed as it is already part of
  the mainline iptables implementation (and the link is dead).
- tcprdr is added as an example implementation of a transparent proxy

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: KOVACS Krisztian <hidden@sch.bme.hu>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:07 +02:00
Michal Hocko
a148ce1537 netfilter: x_tables: do not fail xt_alloc_table_info too easilly
eacd86ca3b ("net/netfilter/x_tables.c: use kvmalloc()
in xt_alloc_table_info()") has unintentionally fortified
xt_alloc_table_info allocation when __GFP_RETRY has been dropped from
the vmalloc fallback. Later on there was a syzbot report that this
can lead to OOM killer invocations when tables are too large and
0537250fdc ("netfilter: x_tables: make allocation less aggressive")
has been merged to restore the original behavior. Georgi Nikolov however
noticed that he is not able to install his iptables anymore so this can
be seen as a regression.

The primary argument for 0537250fdc was that this allocation path
shouldn't really trigger the OOM killer and kill innocent tasks. On the
other hand the interface requires root and as such should allow what the
admin asks for. Root inside a namespaces makes this more complicated
because those might be not trusted in general. If they are not then such
namespaces should be restricted anyway. Therefore drop the __GFP_NORETRY
and replace it by __GFP_ACCOUNT to enfore memcg constrains on it.

Fixes: 0537250fdc ("netfilter: x_tables: make allocation less aggressive")
Reported-by: Georgi Nikolov <gnikolov@icdsoft.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:05 +02:00
Florian Westphal
1c117d3b72 netfilter: conntrack: fix removal of conntrack entries when l4tracker is removed
nf_ct_l4proto_unregister_one() leaves conntracks added by
to-be-removed tracker behind, nf_ct_l4proto_unregister has to iterate
for each protocol to be removed.

v2: call nf_ct_iterate_destroy without holding nf_ct_proto_mutex.

Fixes: 2c41f33c1b ("netfilter: move table iteration out of netns exit paths")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:04 +02:00
Florian Westphal
6a48de0144 netfilter: nf_tables: don't prevent event handler from device cleanup on netns exit
When a netnsamespace exits, the nf_tables pernet_ops will remove all rules.
However, there is one caveat:

Base chains that register ingress hooks will cause use-after-free:
device is already gone at that point.

The device event handlers prevent this from happening:
netns exit synthesizes unregister events for all devices.

However, an improper fix for a race condition made the notifiers a no-op
in case they get called from netns exit path, so revert that part.

This is safe now as the previous patch fixed nf_tables pernet ops
and device notifier initialisation ordering.

Fixes: 0a2cf5ee43 ("netfilter: nf_tables: close race between netns exit and rmmod")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:03 +02:00
Florian Westphal
d209df3e7f netfilter: nf_tables: fix register ordering
We must register nfnetlink ops last, as that exposes nf_tables to
userspace.  Without this, we could theoretically get nfnetlink request
before net->nft state has been initialized.

Fixes: 99633ab29b ("netfilter: nf_tables: complete net namespace support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:02 +02:00
Florian Westphal
3e673b23b5 netfilter: fix memory leaks on netlink_dump_start error
Shaochun Chen points out we leak dumper filter state allocations
stored in dump_control->data in case there is an error before netlink sets
cb_running (after which ->done will be called at some point).

In order to fix this, add .start functions and move allocations there.

Same pattern as used in commit 90fd131afc
("netfilter: nf_tables: move dumper state allocation into ->start").

Reported-by: shaochun chen <cscnull@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:37:00 +02:00
Taehee Yoo
4ef360dd6a netfilter: nft_set: fix allocation size overflow in privsize callback.
In order to determine allocation size of set, ->privsize is invoked.
At this point, both desc->size and size of each data structure of set
are used. desc->size means number of element that is given by user.
desc->size is u32 type. so that upperlimit of set element is 4294967295.
but return type of ->privsize is also u32. hence overflow can occurred.

test commands:
   %nft add table ip filter
   %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; }
   %nft list ruleset

splat looks like:
[ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled
[ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7
[ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
[ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
[ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
[ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
[ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
[ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
[ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
[ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
[ 1239.229091] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 1239.229091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
[ 1239.229091] Call Trace:
[ 1239.229091]  ? nft_hash_remove+0xf0/0xf0 [nf_tables_set]
[ 1239.229091]  ? memset+0x1f/0x40
[ 1239.229091]  ? __nla_reserve+0x9f/0xb0
[ 1239.229091]  ? memcpy+0x34/0x50
[ 1239.229091]  nf_tables_dump_set+0x9a1/0xda0 [nf_tables]
[ 1239.229091]  ? __kmalloc_reserve.isra.29+0x2e/0xa0
[ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
[ 1239.229091]  ? nf_tables_commit+0x2c60/0x2c60 [nf_tables]
[ 1239.229091]  netlink_dump+0x470/0xa20
[ 1239.229091]  __netlink_dump_start+0x5ae/0x690
[ 1239.229091]  nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables]
[ 1239.229091]  nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables]
[ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
[ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
[ 1239.229091]  ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables]
[ 1239.229091]  ? nla_parse+0xab/0x230
[ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
[ 1239.229091]  nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink]
[ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 1239.229091]  ? debug_show_all_locks+0x290/0x290
[ 1239.229091]  ? sched_clock_cpu+0x132/0x170
[ 1239.229091]  ? find_held_lock+0x39/0x1b0
[ 1239.229091]  ? sched_clock_local+0x10d/0x130
[ 1239.229091]  netlink_rcv_skb+0x211/0x320
[ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
[ 1239.229091]  ? netlink_ack+0x7b0/0x7b0
[ 1239.229091]  ? ns_capable_common+0x6e/0x110
[ 1239.229091]  nfnetlink_rcv+0x2d1/0x310 [nfnetlink]
[ 1239.229091]  ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink]
[ 1239.229091]  ? netlink_deliver_tap+0x829/0x930
[ 1239.229091]  ? lock_acquire+0x265/0x2e0
[ 1239.229091]  netlink_unicast+0x406/0x520
[ 1239.509725]  ? netlink_attachskb+0x5b0/0x5b0
[ 1239.509725]  ? find_held_lock+0x39/0x1b0
[ 1239.509725]  netlink_sendmsg+0x987/0xa20
[ 1239.509725]  ? netlink_unicast+0x520/0x520
[ 1239.509725]  ? _copy_from_user+0xa9/0xc0
[ 1239.509725]  __sys_sendto+0x21a/0x2c0
[ 1239.509725]  ? __ia32_sys_getpeername+0xa0/0xa0
[ 1239.509725]  ? retint_kernel+0x10/0x10
[ 1239.509725]  ? sched_clock_cpu+0x132/0x170
[ 1239.509725]  ? find_held_lock+0x39/0x1b0
[ 1239.509725]  ? lock_downgrade+0x540/0x540
[ 1239.509725]  ? up_read+0x1c/0x100
[ 1239.509725]  ? __do_page_fault+0x763/0x970
[ 1239.509725]  ? retint_user+0x18/0x18
[ 1239.509725]  __x64_sys_sendto+0x177/0x180
[ 1239.509725]  do_syscall_64+0xaa/0x360
[ 1239.509725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1239.509725] RIP: 0033:0x7f5a8f468e03
[ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
[ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03
[ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003
[ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c
[ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0
[ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0
[ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables
[ 1239.670713] ---[ end trace 39375adcda140f11 ]---
[ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
[ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
[ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
[ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
[ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
[ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
[ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
[ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
[ 1239.751785] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 1239.760993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
[ 1239.775679] Kernel panic - not syncing: Fatal exception
[ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1239.776630] Rebooting in 5 seconds..

Fixes: 20a69341f2 ("netfilter: nf_tables: add netlink set API")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:59 +02:00
Florian Westphal
da786717e0 netfilter: ip6t_rpfilter: set F_IFACE for linklocal addresses
Roman reports that DHCPv6 client no longer sees replies from server
due to

ip6tables -t raw -A PREROUTING -m rpfilter --invert -j DROP

rule.  We need to set the F_IFACE flag for linklocal addresses, they
are scoped per-device.

Fixes: 47b7e7f828 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
Reported-by: Roman Mamedov <rm@romanrm.net>
Tested-by: Roman Mamedov <rm@romanrm.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:58 +02:00
Matteo Croce
b71ed54dc2 ipvs: don't show negative times in ip_vs_conn
Since commit 500462a9de ("timers: Switch to a non-cascading wheel"),
timers duration can last even 12.5% more than the scheduled interval.

IPVS has two handlers, /proc/net/ip_vs_conn and /proc/net/ip_vs_conn_sync,
which shows the remaining time before that a connection expires.
The default expire time for a connection is 60 seconds, and the
expiration timer can fire even 4 seconds later than the scheduled time.
The expiration time is calculated subtracting jiffies to the scheduled
expiration time, and it's shown as a huge number when the timer fires late,
since both values are unsigned.

This can confuse script and tools which relies on it, like ipvsadm:

    root@mcroce-redhat:~# while ipvsadm -lc |grep SYN_RECV; do sleep 1 ; done
    TCP 00:05  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:04  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:03  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:02  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:01  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 00:00  SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:44 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:43 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:42 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:41 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:40 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000
    TCP 68719476:39 SYN_RECV    [fc00:1::1]:55732  [fc00:1::2]:8000   [fc00:2000::1]:8000

Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:57 +02:00
Matteo Croce
14d32b2525 jiffies: add utility function to calculate delta in ms
add jiffies_delta_to_msecs() helper func to calculate the delta between
two times and eventually 0 if negative.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:55 +02:00
Tan Hu
a53b42c118 ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()
We came across infinite loop in ipvs when using ipvs in docker
env.

When ipvs receives new packets and cannot find an ipvs connection,
it will create a new connection, then if the dest is unavailable
(i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.

But if the dropped packet is the first packet of this connection,
the connection control timer never has a chance to start and the
ipvs connection cannot be released. This will lead to memory leak, or
infinite loop in cleanup_net() when net namespace is released like
this:

    ip_vs_conn_net_cleanup at ffffffffa0a9f31a [ip_vs]
    __ip_vs_cleanup at ffffffffa0a9f60a [ip_vs]
    ops_exit_list at ffffffff81567a49
    cleanup_net at ffffffff81568b40
    process_one_work at ffffffff810a851b
    worker_thread at ffffffff810a9356
    kthread at ffffffff810b0b6f
    ret_from_fork at ffffffff81697a18

race condition:
    CPU1                           CPU2
    ip_vs_in()
      ip_vs_conn_new()
                                   ip_vs_del_dest()
                                     __ip_vs_unlink_dest()
                                       ~IP_VS_DEST_F_AVAILABLE
      cp->dest && !IP_VS_DEST_F_AVAILABLE
      __ip_vs_conn_put
    ...
    cleanup_net  ---> infinite looping

Fix this by checking whether the timer already started.

Signed-off-by: Tan Hu <tan.hu@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-08-16 19:36:51 +02:00