Commit Graph

812148 Commits

Author SHA1 Message Date
Linus Torvalds
6e7bd3b549 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix MAC address setting in mac80211 pmsr code, from Johannes Berg.

 2) Probe SFP modules after being attached, from Russell King.

 3) Byte ordering bug in SMC rx_curs_confirmed code, from Ursula Braun.

 4) Revert some r8169 changes that are causing regressions, from Heiner
    Kallweit.

 5) Fix spurious connection timeouts in netfilter nat code, from Florian
    Westphal.

 6) SKB leak in tipc, from Hoang Le.

 7) Short packet checkum issue in mlx4, similar to a previous mlx5
    change, from Saeed Mahameed. The issue is that whilst padding bytes
    are usually zero, it is not guarateed and the hardware doesn't take
    the padding bytes into consideration when generating the checksum.

 8) Fix various races in cls_tcindex, from Cong Wang.

 9) Need to set stream ext to NULL before freeing in SCTP code, from Xin
    Long.

10) Fix locking in phy_is_started, from Heiner Kallweit.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
  net: ethernet: freescale: set FEC ethtool regs version
  net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
  mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
  net: phy: fix potential race in the phylib state machine
  net: phy: don't use locking in phy_is_started
  selftests: fix timestamping Makefile
  net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
  net: fix possible overflow in __sk_mem_raise_allocated()
  dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
  net: phy: fix interrupt handling in non-started states
  sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
  sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
  net/mlx5e: XDP, fix redirect resources availability check
  net/mlx5: Fix a compilation warning in events.c
  net/mlx5: No command allowed when command interface is not ready
  net/mlx5e: Fix NULL pointer derefernce in set channels error flow
  netfilter: nft_compat: use-after-free when deleting targets
  team: avoid complex list operations in team_nl_cmd_options_set()
  net_sched: fix two more memory leaks in cls_tcindex
  net_sched: fix a memory leak in cls_tcindex
  ...
2019-02-15 08:00:11 -08:00
Linus Torvalds
02d7504089 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal fix from Eric Biederman:
 "Just a single patch that restores PTRACE_EVENT_EXIT functionality that
  was accidentally broken by last weeks fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Restore the stop PTRACE_EVENT_EXIT
2019-02-15 07:56:24 -08:00
Linus Torvalds
cb5b020a8d Revert "exec: load_script: don't blindly truncate shebang string"
This reverts commit 8099b047ec.

It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.

Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-14 15:02:18 -08:00
Bob Peterson
23e93c9b2c Revert "gfs2: read journal in large chunks to locate the head"
This reverts commit 2a5f14f279.

This patch causes xfstests generic/311 to fail. Reverting this for
now until we have a proper fix.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-14 09:52:51 -08:00
Vivien Didelot
f9bcc9f3ee net: ethernet: freescale: set FEC ethtool regs version
Currently the ethtool_regs version is set to 0 for FEC devices.

Use this field to store the register dump version exposed by the
kernel. The choosen version 2 corresponds to the kernel compile test:

        #if defined(CONFIG_M523x) || defined(CONFIG_M527x)
        || defined(CONFIG_M528x) || defined(CONFIG_M520x)
        || defined(CONFIG_M532x) || defined(CONFIG_ARM)
        || defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)

and version 1 corresponds to the opposite. Binaries of ethtool unaware
of this version will dump the whole set as usual.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:45:35 -05:00
Huang Zijiang
c969c6e7ab net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.

Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:28:52 -05:00
Jann Horn
2c2ade8174 mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
number of references that we might need to create in the fastpath later,
the bump-allocation fastpath only has to modify the non-atomic bias value
that tracks the number of extra references we hold instead of the atomic
refcount. The maximum number of allocations we can serve (under the
assumption that no allocation is made with size 0) is nc->size, so that's
the bias used.

However, even when all memory in the allocation has been given away, a
reference to the page is still held; and in the `offset < 0` slowpath, the
page may be reused if everyone else has dropped their references.
This means that the necessary number of references is actually
`nc->size+1`.

Luckily, from a quick grep, it looks like the only path that can call
page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
requires CAP_NET_ADMIN in the init namespace and is only intended to be
used for kernel testing and fuzzing.

To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
`offset < 0` path, below the virt_to_page() call, and then repeatedly call
writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
with a vector consisting of 15 elements containing 1 byte each.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:12:17 -05:00
David S. Miller
61c4c0bcff Merge branch 'net-phy-fix-locking-issue'
Heiner Kallweit says:

====================
net: phy: fix locking issue

Russell pointed out that the locking used in phy_is_started() isn't
needed and misleading. This locking also contributes to a race fixed
with patch 2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:04:55 -05:00
Heiner Kallweit
a200490717 net: phy: fix potential race in the phylib state machine
Russell reported the following race in the phylib state machine
(quoting from his mail):

if (phy_polling_mode(phydev) && phy_is_started(phydev))
	phy_queue_state_machine(phydev, PHY_STATE_TIME);

state = PHY_UP
thread 0			thread 1
				phy_disconnect()
				+-phy_is_started()
phy_is_started()                |
				`-phy_stop()
				  +-phydev->state = PHY_HALTED
				  `-phy_stop_machine()
				    `-cancel_delayed_work_sync()
phy_queue_state_machine()
`-mod_delayed_work()

At this point, the phydev->state_queue() has been added back onto the
system workqueue despite phy_stop_machine() having been called and
cancel_delayed_work_sync() called on it.

Fix this by protecting the complete operation in thread 0.

Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:04:55 -05:00
Heiner Kallweit
a2fc9d7e36 net: phy: don't use locking in phy_is_started
Russell suggested to remove the locking from phy_is_started() because
the read is atomic anyway and actually the locking may be more
misleading.

Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:04:55 -05:00
Deepa Dinamani
39c1331962 selftests: fix timestamping Makefile
The clean target in the makefile conflicts with the generic
kselftests lib.mk, and fails to properly remove the compiled
test programs.

Remove the redundant rule, the TEST_GEN_FILES will be already
removed by the CLEAN macro in lib.mk.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-14 12:03:16 -05:00
Dan Carpenter
8d6ea93285 net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS.  The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions.  The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.

Fixes: 8cfa94984c ("net: dsa: bcm_sf2: add suspend/resume callbacks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 22:11:53 -08:00
Eric Dumazet
5bf325a532 net: fix possible overflow in __sk_mem_raise_allocated()
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 21:05:18 -08:00
John David Anglin
7c0db24cc4 dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
The GPIO interrupt controller on the espressobin board only supports edge interrupts.
If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
stuck low and no further interrupts occur.

I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
directly cleared by reading the Global 1 status register.  However, the device interrupt
flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
This is done by reading the relevant SERDES and PHY status register.

The code only services interrupts whose status bit is set at the time of reading its status
register.  If an interrupt event occurs after its status is read and before all interrupts
are serviced, then this event will not be serviced and the INTn output pin will remain low.

This is not a problem with polling or level interrupts since the handler will be called
again to process the event.  However, it's a big problem when using level interrupts.

The fix presented here is to add a loop around the code servicing switch interrupts.  If
any pending interrupts remain after the current set has been handled, we loop and process
the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
gone high and we will get an edge when a new event occurs.

Tested on espressobin board.

Fixes: dc30c35be7 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by:  John David Anglin <dave.anglin@bell.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 20:46:38 -08:00
Heiner Kallweit
b79555d5d8 net: phy: fix interrupt handling in non-started states
phylib enables interrupts before phy_start() has been called, and if
we receive an interrupt in a non-started state, the interrupt handler
returns IRQ_NONE. This causes problems with at least one Marvell chip
as reported by Andrew.
Fix this by handling interrupts the same as in phy_mac_interrupt(),
basically always running the phylib state machine. It knows when it
has to do something and when not.
This change allows to handle interrupts gracefully even if they
occur in a non-started state.

Fixes: 2b3e88ea65 ("net: phy: improve phy state checking")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 20:44:12 -08:00
Xin Long
af98c5a785 sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.

With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.

We fix it by simply setting them to NULL after freeing.

Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:33:44 -05:00
Xin Long
fc228abc23 sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
Jianlin reported a panic when running sctp gso over gre over vlan device:

  [   84.772930] RIP: 0010:do_csum+0x6d/0x170
  [   84.790605] Call Trace:
  [   84.791054]  csum_partial+0xd/0x20
  [   84.791657]  gre_gso_segment+0x2c3/0x390
  [   84.792364]  inet_gso_segment+0x161/0x3e0
  [   84.793071]  skb_mac_gso_segment+0xb8/0x120
  [   84.793846]  __skb_gso_segment+0x7e/0x180
  [   84.794581]  validate_xmit_skb+0x141/0x2e0
  [   84.795297]  __dev_queue_xmit+0x258/0x8f0
  [   84.795949]  ? eth_header+0x26/0xc0
  [   84.796581]  ip_finish_output2+0x196/0x430
  [   84.797295]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.798183]  ? ip_finish_output+0x169/0x270
  [   84.798875]  ip_output+0x6c/0xe0
  [   84.799413]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.800145]  iptunnel_xmit+0x144/0x1c0
  [   84.800814]  ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
  [   84.801699]  gre_tap_xmit+0xac/0xf0 [ip_gre]
  [   84.802395]  dev_hard_start_xmit+0xa5/0x210
  [   84.803086]  sch_direct_xmit+0x14f/0x340
  [   84.803733]  __dev_queue_xmit+0x799/0x8f0
  [   84.804472]  ip_finish_output2+0x2e0/0x430
  [   84.805255]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.806154]  ip_output+0x6c/0xe0
  [   84.806721]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.807516]  sctp_packet_transmit+0x716/0xa10 [sctp]
  [   84.808337]  sctp_outq_flush+0xd7/0x880 [sctp]

It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().

For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.

So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
computing checksum in sctp_gso_segment.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:31:43 -05:00
David S. Miller
f325ef7297 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Missing structure initialization in ebtables causes splat with
   32-bit user level on a 64-bit kernel, from Francesco Ruggeri.

2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
   Andrea Claudi.

3) Fix possible use-after-free from release path of target extensions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:13:47 -05:00
David S. Miller
41ceb5e87f mlx5-fixes-2019-02-13
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcZKsDAAoJEEg/ir3gV/o+gewH/3yQYIgVIjibdw1YF5S4vGXV
 M0FOsx2OEcaxVjsm+GzWTthARtIO/d5NuYO/j7QsWLopGugTGCSc6SusrVQtH7PY
 b5Fz/pR+ElArt0Ovx06A1twdO5gtoT920rP+E+Gvl/ZR78ceknHUGxJBnGEh2p9D
 svM3Q4mdmJZ+a3ehMCxMWS2vf7If43fhHXyt9OKEMpbzAJ03MVLVagt3OGUNi6QP
 vl4Fuq5OX6zumKLXPpql8kh+Fbu9QHc8vcuwPnG8GmsBetS083HSC6EPRUbo/EPR
 3IzuAibltXoYWlvCPpotCjjSuyqvkOdVudD5jyM3gFIS3XuJdnwkn6k5gvPCk6E=
 =W+Ws
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-02-13

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:07:47 -05:00
Saeed Mahameed
407e17b1a6 net/mlx5e: XDP, fix redirect resources availability check
Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).

To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.

The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.

Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:51 -08:00
Tariq Toukan
5400261e4d net/mlx5: Fix a compilation warning in events.c
Eliminate the following compilation warning:

drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]:  => 238:3

Fixes: c2fb3db22d ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Huy Nguyen
4cab346bcf net/mlx5: No command allowed when command interface is not ready
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.

Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.

Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Maria Pasechnik
fb35c534b7 net/mlx5e: Fix NULL pointer derefernce in set channels error flow
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.

The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.

This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.

Fixes: bbeb53b8b2 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-13 15:40:50 -08:00
Linus Torvalds
b6ea7bcf77 This fixes kprobes/uprobes dynamic processing of strings, where
it processes the args but does not update the remaining length
 of the buffer that the string arguments will be placed in. It
 constantly passes in the total size of buffer used instead of
 passing in the remaining size of the buffer used. This could cause
 issues if the strings are larger than the max size of an event
 which could cause the strings to be written beyond what was reserved
 on the buffer.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXGN7BRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoa1AQD7+6O0DncGwk5aWqRHESXKlmOWteW6
 eMFbEw3KDcvs2gEAvNLB1i2yVH6Enn50M0KpmYJMbyZK/LVn2QsPZfU/LgQ=
 =KMBZ
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "This fixes kprobes/uprobes dynamic processing of strings, where it
  processes the args but does not update the remaining length of the
  buffer that the string arguments will be placed in. It constantly
  passes in the total size of buffer used instead of passing in the
  remaining size of the buffer used.

  This could cause issues if the strings are larger than the max size of
  an event which could cause the strings to be written beyond what was
  reserved on the buffer"

* tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: probeevent: Correctly update remaining space in dynamic area
2019-02-13 10:28:17 -08:00
Pablo Neira Ayuso
753c111f65 netfilter: nft_compat: use-after-free when deleting targets
Fetch pointer to module before target object is released.

Fixes: 29e3880109 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-13 18:14:54 +01:00
Eric W. Biederman
cf43a757fd signal: Restore the stop PTRACE_EVENT_EXIT
In the middle of do_exit() there is there is a call
"ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process
in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for
for the debugger to release the task or SIGKILL to be delivered.

Skipping past dequeue_signal when we know a fatal signal has already
been delivered resulted in SIGKILL remaining pending and
TIF_SIGPENDING remaining set.  This in turn caused the
scheduler to not sleep in PTACE_EVENT_EXIT as it figured
a fatal signal was pending.  This also caused ptrace_freeze_traced
in ptrace_check_attach to fail because it left a per thread
SIGKILL pending which is what fatal_signal_pending tests for.

This difference in signal state caused strace to report
strace: Exit of unknown pid NNNNN ignored

Therefore update the signal handling state like dequeue_signal
would when removing a per thread SIGKILL, by removing SIGKILL
from the per thread signal mask and clearing TIF_SIGPENDING.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Ivan Delalande <colona@arista.com>
Cc: stable@vger.kernel.org
Fixes: 35634ffa17 ("signal: Always notice exiting tasks")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-02-13 08:31:41 -06:00
Linus Torvalds
1f947a7a01 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "6 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: proc: smaps_rollup: fix pss_locked calculation
  Rename include/{uapi => }/asm-generic/shmparam.h really
  Revert "mm: use early_pfn_to_nid in page_ext_init"
  mm/gup: fix gup_pmd_range() for dax
  Revert "mm: slowly shrink slabs with a relatively small number of objects"
  Revert "mm: don't reclaim inodes with many attached pages"
2019-02-12 17:15:33 -08:00
Sandeep Patil
27dd768ed8 mm: proc: smaps_rollup: fix pss_locked calculation
The 'pss_locked' field of smaps_rollup was being calculated incorrectly.
It accumulated the current pss everytime a locked VMA was found.  Fix
that by adding to 'pss_locked' the same time as that of 'pss' if the vma
being walked is locked.

Link: http://lkml.kernel.org/r/20190203065425.14650-1-sspatil@android.com
Fixes: 493b0e9d94 ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Sandeep Patil <sspatil@android.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: <stable@vger.kernel.org>	[4.14.x, 4.19.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Masahiro Yamada
76ce2a80a2 Rename include/{uapi => }/asm-generic/shmparam.h really
Commit 36c0f7f0f8 ("arch: unexport asm/shmparam.h for all
architectures") is different from the patch I submitted.

My patch is this:

  https://lore.kernel.org/lkml/1546904307-11124-1-git-send-email-yamada.masahiro@socionext.com/T/#u

The file renaming part:

  rename include/{uapi => }/asm-generic/shmparam.h (100%)

was lost when it was picked up.

I think it was an accident because Andrew did not say anything.

Link: http://lkml.kernel.org/r/1549158277-24558-1-git-send-email-yamada.masahiro@socionext.com
Fixes: 36c0f7f0f8 ("arch: unexport asm/shmparam.h for all architectures")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Qian Cai
2f1ee0913c Revert "mm: use early_pfn_to_nid in page_ext_init"
This reverts commit fe53ca5427 ("mm: use early_pfn_to_nid in
page_ext_init").

When booting a system with "page_owner=on",

start_kernel
  page_ext_init
    invoke_init_callbacks
      init_section_page_ext
        init_page_owner
          init_early_allocated_pages
            init_zones_in_node
              init_pages_in_zone
                lookup_page_ext
                  page_to_nid

The issue here is that page_to_nid() will not work since some page flags
have no node information until later in page_alloc_init_late() due to
DEFERRED_STRUCT_PAGE_INIT.  Hence, it could trigger an out-of-bounds
access with an invalid nid.

  UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
  index 7 is out of range for type 'zone [5]'

Also, kernel will panic since flags were poisoned earlier with,

CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

start_kernel
  setup_arch
    pagetable_init
      paging_init
        sparse_init
          sparse_init_nid
            memblock_alloc_try_nid_raw

It did not handle it well in init_pages_in_zone() which ends up calling
page_to_nid().

  page:ffffea0004200000 is uninitialized and poisoned
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
  page_owner info is not active (free page?)
  kernel BUG at include/linux/mm.h:990!
  RIP: 0010:init_page_owner+0x486/0x520

This means that assumptions behind commit fe53ca5427 ("mm: use
early_pfn_to_nid in page_ext_init") are incomplete.  Therefore, revert
the commit for now.  A proper way to move the page_owner initialization
to sooner is to hook into memmap initialization.

Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Yu Zhao
414fd080d1 mm/gup: fix gup_pmd_range() for dax
For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true
on x86.  So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.

Link: http://lkml.kernel.org/r/20190111034033.601-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S . Tsirkin" <mst@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Dave Chinner
a9a238e83f Revert "mm: slowly shrink slabs with a relatively small number of objects"
This reverts commit 172b06c32b ("mm: slowly shrink slabs with a
relatively small number of objects").

This change changes the agressiveness of shrinker reclaim, causing small
cache and low priority reclaim to greatly increase scanning pressure on
small caches.  As a result, light memory pressure has a disproportionate
affect on small caches, and causes large caches to be reclaimed much
faster than previously.

As a result, it greatly perturbs the delicate balance of the VFS caches
(dentry/inode vs file page cache) such that the inode/dentry caches are
reclaimed much, much faster than the page cache and this drives us into
several other caching imbalance related problems.

As such, this is a bad change and needs to be reverted.

[ Needs some massaging to retain the later seekless shrinker
  modifications.]

Link: http://lkml.kernel.org/r/20190130041707.27750-3-david@fromorbit.com
Fixes: 172b06c32b ("mm: slowly shrink slabs with a relatively small number of objects")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Dave Chinner
69056ee6a8 Revert "mm: don't reclaim inodes with many attached pages"
This reverts commit a76cf1a474 ("mm: don't reclaim inodes with many
attached pages").

This change causes serious changes to page cache and inode cache
behaviour and balance, resulting in major performance regressions when
combining worklaods such as large file copies and kernel compiles.

  https://bugzilla.kernel.org/show_bug.cgi?id=202441

This change is a hack to work around the problems introduced by changing
how agressive shrinkers are on small caches in commit 172b06c32b ("mm:
slowly shrink slabs with a relatively small number of objects").  It
creates more problems than it solves, wasn't adequately reviewed or
tested, so it needs to be reverted.

Link: http://lkml.kernel.org/r/20190130041707.27750-2-david@fromorbit.com
Fixes: a76cf1a474 ("mm: don't reclaim inodes with many attached pages")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Wolfgang Walter <linux@stwm.de>
Cc: Roman Gushchin <guro@fb.com>
Cc: Spock <dairinin@gmail.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-12 16:33:18 -08:00
Linus Torvalds
991b9eb424 hwmon fix for v5.0-rc7
Fix fan detection for NCT6793D.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJcY0DCAAoJEMsfJm/On5mBVSoP/1UGF7EV24gNh27W9o8astqr
 RzdM0Bwd5weimJ2TjbIHMBOcxfYRd74+cBSYb2ET0HhJtMf/qDRHRn6aT+UQxdYX
 oiPLYFoujZHkcK3mrKGOwgO0JNW7xuaY0Y2CcwI4iJGFKohBLxATU5q5gNLFtT5/
 xOzw7B2B7+D4b9Ez7QNbzCPjvRMrkUzQ3GNwujRSKSB0Hk0bWtgLqSth2fLbM6P2
 BAnwFtrA9nS11I4c5lw4YIJF3kNc4YRWaWIR42qIHjG0KS2zYiUagziEEyTELPQf
 +MwhPHjHt2vQpR6/KxMmyXK1s81YE1wufFWCw0YDjoQvPgJPeGOnMOuov3z2wOyM
 /BNDttVunnBib6/CbXlLyGbVDtFKNTQDyVz+87VI965C7jsDiGmIEIHuZUIUnVzc
 fW2SAQ7Yf2JoqXHIRw7HvWFyH78FF3sOR/iax2jnqdH+NGbwR/RSRp98dZjX7C0m
 51unNopeTSkeD9aYzLR9dJzZE0Q5iBdsieht7nc0ce5eZbmcBpWfRzQBedbdWhO4
 SOjVvQgwAC+NuwV5d2F256RzScit8fq7Kss4KMvmmQuoOKu8/ZjRP5puUuwvgLL/
 MQqkNdeS7cgjk1udNHsqD333jpbWiHQmAdGhOyS1Ucrv8iCpgNPrynjc7s+lPkCy
 Q4koniuqN7zU8q6oYuzt
 =XfB6
 -----END PGP SIGNATURE-----

Merge tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Fix fan detection for NCT6793D"

* tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix fan6 detection for NCT6793D
2019-02-12 16:29:33 -08:00
Linus Torvalds
57902dc067 RISC-V Fixes for 5.0-rc7
This tag contains a pair of bug fixes that I'd like to include in 5.0:
 
 * A fix to disambiguate swap from invalid PTEs, which fixes an error
   when trying to unmap PROT_NONE pages.
 * A revert to an optimization of the size of flat binaries.  This is
   really a workaround to prevent breaking existing boot flows, but since
   the change was introduced as part of the 5.0 merge window I'd like to
   have the fix in before 5.0 so we can avoid a regression for any proper
   releases.
 
 With these I hope we're out of patches for 5.0 in RISC-V land.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlxjN2gTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQY73EACOoplTb4zBKXY4pCSZG/Gnu+AEJz77
 CS1wR86TThkyxIJTlIGmX0xSFuds4f+ymCMbZ7SfXa20es62iZtEQA72TVb+g80L
 hD/0yZFZQkcBEQOVOC2ZWvaj4tVZB703yHYZGSnN/6sHISS0yoO8fqf19SGSmDC3
 aCNE5Fm6z5MNozS6Z5Lrtpw2rmpDwQCHnxfLPzEvjC9DPMwzHQbyP3tALWcfLpK7
 aIPmq/xjUtS5eVb/3CHhLmkYFTNlOs4p0jsKjw2Sw/3NWHCXrPw6afrjZ533j8nR
 BI/u/EErMJWl6qLasdzTTfhE16sLDzbowiGdOfSurjwRDkAokw9/zvLDLDzLXwoT
 UQkCcp2mnOsCmdXaC8L0yUi3mHh2M82GL1RXq4POyCgD77DqVvAJ/ot/Lq3UFila
 T2rfsxIFC2pHmdDlQe1zo31g6TeI4FJaHA4juNXce2bIbtaU4OxqiXfN/cKjyAX2
 8hevxUwy/PY0aqzLl14Xg2tcdurVbFh8BZ7eWYwqWsTrVwRg0ZhnWG0FdopNMA9D
 z5nBbE3KnVEDrMh7QzSgtNDzSmWRUKD0S5B0t4/lMv9uvYrtqeBXCKS3vJ1tc0zy
 tzj+tyjmgsfkAIVfA/fQqznx7sXAdD/jCPkUtotQUP2YkKRFbVyLjVTXKQ3BiyAl
 xrZhsMPr8BPj0Q==
 =EjWP
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains a pair of bug fixes that I'd like to include in 5.0:

   - A fix to disambiguate swap from invalid PTEs, which fixes an error
     when trying to unmap PROT_NONE pages.

   - A revert to an optimization of the size of flat binaries. This is
     really a workaround to prevent breaking existing boot flows, but
     since the change was introduced as part of the 5.0 merge window I'd
     like to have the fix in before 5.0 so we can avoid a regression for
     any proper releases.

  With these I hope we're out of patches for 5.0 in RISC-V land"

* tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S"
  riscv: Add pte bit to distinguish swap from invalid
2019-02-12 13:33:48 -08:00
Cong Wang
2fdeee2549 team: avoid complex list operations in team_nl_cmd_options_set()
The current opt_inst_list operations inside team_nl_cmd_options_set()
is too complex to track:

    LIST_HEAD(opt_inst_list);
    nla_for_each_nested(...) {
        list_for_each_entry(opt_inst, &team->option_inst_list, list) {
            if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst))
                continue;
            list_add(&opt_inst->tmp_list, &opt_inst_list);
        }
    }
    team_nl_send_event_options_get(team, &opt_inst_list);

as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().

Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.

Fixes: 4fb0534fb7 ("team: avoid adding twice the same option to the event list")
Fixes: 2fcdb2c9e6 ("team: allow to send multiple set events in one message")
Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:19:27 -05:00
David S. Miller
a090d7948e Merge branch 'net_sched-some-fixes-for-cls_tcindex'
Cong Wang says:

====================
net_sched: some fixes for cls_tcindex

This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.

v2: fix a compile error in patch 2
    drop netns refcnt in patch 1
====================

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
2019-02-12 14:10:56 -05:00
Cong Wang
1db817e75f net_sched: fix two more memory leaks in cls_tcindex
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.

For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.

For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang
033b228e7f net_sched: fix a memory leak in cls_tcindex
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang
8015d93ebd net_sched: fix a race condition in tcindex_destroy()
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2 ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
David S. Miller
6a7dd17200 Merge branch 'ena-races'
Arthur Kiyanovski says:

====================
net: ena: race condition bug fix and version update

This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:05:07 -05:00
Arthur Kiyanovski
d9b8656da9 net: ena: update driver version from 2.0.2 to 2.0.3
Update driver version due to bug fix.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:05:07 -05:00
Arthur Kiyanovski
e1f1bd9bfb net: ena: fix race between link up and device initalization
Fix race condition between ena_update_on_link_change() and
ena_restore_device().

This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.

Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.

Fixes: d18e4f6834 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:05:07 -05:00
Kal Conley
fc62814d69 net/packet: fix 4gb buffer limit due to overflow check
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.

This change fixes support for packet ring buffers >= UINT_MAX.

Fixes: 8f8d28e4d6 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:37:23 -05:00
Konstantin Khlebnikov
1ec17dbd90 inet_diag: fix reporting cgroup classid and fallback to priority
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Fixes: 0888e372c3 ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:35:57 -05:00
Eric Dumazet
4ffcbfac60 batman-adv: fix uninit-value in batadv_interface_tx()
KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]

batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.

[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
 batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
 __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
 netdev_start_xmit include/linux/netdevice.h:4365 [inline]
 xmit_one net/core/dev.c:3257 [inline]
 dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
 __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
 packet_snd net/packet/af_packet.c:2928 [inline]
 packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2759 [inline]
 __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
 __kmalloc_reserve net/core/skbuff.c:137 [inline]
 __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
 alloc_skb include/linux/skbuff.h:998 [inline]
 alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
 sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
 packet_alloc_skb net/packet/af_packet.c:2781 [inline]
 packet_snd net/packet/af_packet.c:2872 [inline]
 packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc:	Marek Lindner <mareklindner@neomailbox.ch>
Cc:	Simon Wunderlich <sw@simonwunderlich.de>
Cc:	Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:30:43 -05:00
Linus Torvalds
1d110257c2 sound fixes for 5.0-rc7
It's a bit of surprising that we've got more changes than hoped
 at this late stage, but the all don't look too scaring but small
 fixes.
 
 One change in ALSA core side is again the PCM regression fix that
 was partially addressed for OSS, but now the all relevant change
 is reverted instead.  Also, a few ASoC core fixes for UAF and OOB
 are included, while the rest are usual random device-specific
 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlxhn2AOHHRpd2FpQHN1
 c2UuZGUACgkQLtJE4w1nLE9VHg/+LYWUmEcC5jpJrknCMu9SGvvjK8BZahEB/vUH
 NR+Dcy61SszIng4Wc2rtKdQscmKXf4XZmHejiu2oJDYk7KWaU0eQnL06NLIdHCRg
 4Vu302dhnG7THBzG99CtE2cWjTA3YLW93H/WgG7gQEKBNGjkUz4OpAbjYmJle/GO
 TyH6c2a9vFia/+8h4LX+HJrLysrbdK0mpibDQ/kRqcRxwd1i1becf3CDXj8KfMdE
 KS+RpimxgMOF8PokjdhzaomnIGgcf0OrIN6iSXOAxwBWRcIhyWFE6npIz2+vxJ3f
 Di69z7z8Oju3PvfXwUyLbSWjhhle6Mrm3qOOs4sPmAu45uuMFtxClo6WzOmT0OQt
 5bVw60ITLE06U/Za9bR4W/uWbdb6I4Q2I5yG50KzoIfqsmk9sejd1975MJUaz5WL
 HrGxBGqT3wXvbq8tCdctRl0q+jQotSRDrQS+S5vqvSzSNLyYevJkWAYchdnbXPxU
 UVXGLWSBM4yM+KX4fYzi33cQaIIST2m9kWwgs3R+PYxoSMAIrYVBOQDn3aU7xmdF
 9O6Sm3xnf7Eo+HUC5gnZ6FOTJJJHqaXN9E019bhUieS60pUoMfwsfSaEddkf46KI
 eIttfY6RNyOvL9tuQj+XoP6lJTy1xwe29ZzpH0zlgp1q79DIOeizB3/bNf4j97DX
 8mRzg4I=
 =NQQX
 -----END PGP SIGNATURE-----

Merge tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "It's a bit of surprising that we've got more changes than hoped at
  this late stage, but they all don't look too scary but small fixes.

  One change in ALSA core side is again the PCM regression fix that was
  partially addressed for OSS, but now the all relevant change is
  reverted instead. Also, a few ASoC core fixes for UAF and OOB are
  included, while the rest are usual random device-specific fixes"

* tag 'sound-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: pcm: Revert capture stream behavior change in blocking mode
  ALSA: usb-audio: Fix implicit fb endpoint setup by quirk
  ALSA: hda - Add quirk for HP EliteBook 840 G5
  ASoC: samsung: Prevent clk_get_rate() calls in atomic context
  ASoC: rsnd: ssiu: correct shift bit for ssiu9
  ASoC: rsnd: fixup rsnd_ssi_master_clk_start() user count check
  ASoC: dapm: fix out-of-bounds accesses to DAPM lookup tables
  ASoC: topology: fix oops/use-after-free case with dai driver
  ASoC: rsnd: fixup MIX kctrl registration
  ASoC: core: Allow soc_find_component lookups to match parent of_node
  ASoC: rt5682: Correct the setting while select ASRC clk for AD/DA filter
  ASoC: MAINTAINERS: fsl: Change Fabio's email address
  ASoC: hdmi-codec: fix oops on re-probe
2019-02-12 10:18:08 -08:00
Linus Torvalds
8ae757efd3 Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
nojournal mode unsafe.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlxiSNUACgkQ8vlZVpUN
 gaPcCgf+LTKVur85OUKqeEDRfiCYjNemzgSOve8EtiGnPzG5s8vVeVQ5n880lTzY
 c/Q2/CotN1/0slWxF8WmZ6xTzdrdadtdJ2iv6O5R58uj3lz3UK4cXJf+XmvQRj3L
 5UqMygRN1omKFCp+dB38SeZuE1amV/tV61Q3ATnVMLi+xZcH8lV+efulNi19p2Gd
 EC7DYs2seAiNEJAh9nVSL1LWo3i0OH5JJQCv08+c71CPPqixAYuuEy59q55k6OvW
 Jp8mhYd4/5ueoV+rxH8Gft52MaOPvPHaxWvxrIbpKb5u8EZ6WUXGZ74qdK3KG6hG
 k29IX8XoXWlTHkfSiOOZLUJuW7n+XQ==
 =w43x
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Revert a commit which landed in v5.0-rc1 since it makes fsync in ext4
  nojournal mode unsafe"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal"
2019-02-12 09:57:45 -08:00
Li RongQing
d1f20798a1 ipv6: propagate genlmsg_reply return code
genlmsg_reply can fail, so propagate its return code

Fixes: 915d7e5e59 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:36:02 -05:00
Saeed Mahameed
29dded89e8 net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.

Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.

Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.

Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:26:21 -05:00