commit 6b285a5587506bae084cf9a3ed5aa491d623b91b upstream.
Reject userspace memslots whose size exceeds the storage capacity of an
"unsigned long". KVM's uAPI takes the size as u64 to support large slots
on 64-bit hosts, but does not account for the size being truncated on
32-bit hosts in various flows. The access_ok() check on the userspace
virtual address in particular casts the size to "unsigned long" and will
check the wrong number of bytes.
KVM doesn't actually support slots whose size doesn't fit in an "unsigned
long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long",
not a "u64", and misc arch specific code follows that behavior.
Fixes: fa3d315a4c ("KVM: Validate userspace_addr of memslot when registered")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <20211104002531.1176691-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 94ebc035456a4ccacfbbef60c444079a256623ad upstream.
[Why]
When trying to lightup two 4k60 non-DSC displays behind a branch device
that supports DSC we can't lightup both at once due to bandwidth
limitations - each requires 48 VCPI slots but we only have 63.
[How]
The workaround already exists in the code but is guarded by a CONFIG
that cannot be set by the user and shouldn't need to be.
Check for specific branch device IDs to device whether to enable
the workaround for multiple display scenarios.
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cdef485217d30382f3bf6448c54b4401648fe3f1 upstream.
The kernel leaks memory when a `fib` rule is present in IPv6 nftables
firewall rules and a suppress_prefix rule is present in the IPv6 routing
rules (used by certain tools such as wg-quick). In such scenarios, every
incoming packet will leak an allocation in `ip6_dst_cache` slab cache.
After some hours of `bpftrace`-ing and source code reading, I tracked
down the issue to ca7a03c417 ("ipv6: do not free rt if
FIB_LOOKUP_NOREF is set on suppress rule").
The problem with that change is that the generic `args->flags` always have
`FIB_LOOKUP_NOREF` set[1][2] but the IPv6-specific flag
`RT6_LOOKUP_F_DST_NOREF` might not be, leading to `fib6_rule_suppress` not
decreasing the refcount when needed.
How to reproduce:
- Add the following nftables rule to a prerouting chain:
meta nfproto ipv6 fib saddr . mark . iif oif missing drop
This can be done with:
sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr . mark . iif oif missing drop
- Run:
sudo ip -6 rule add table main suppress_prefixlength 0
- Watch `sudo slabtop -o | grep ip6_dst_cache` to see memory usage increase
with every incoming ipv6 packet.
This patch exposes the protocol-specific flags to the protocol
specific `suppress` function, and check the protocol-specific `flags`
argument for RT6_LOOKUP_F_DST_NOREF instead of the generic
FIB_LOOKUP_NOREF when decreasing the refcount, like this.
[1]: ca7a03c417/net/ipv6/fib6_rules.c (L71)
[2]: ca7a03c417/net/ipv6/fib6_rules.c (L99)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215105
Fixes: ca7a03c417 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
Cc: stable@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6f48394cf1f3e8486591ad98c11cdadb8f1ef2ad upstream.
Trying to remove the fsl-sata module in the PPC64 GNU/Linux
leads to the following warning:
------------[ cut here ]------------
remove_proc_entry: removing non-empty directory 'irq/69',
leaking at least 'fsl-sata[ff0221000.sata]'
WARNING: CPU: 3 PID: 1048 at fs/proc/generic.c:722
.remove_proc_entry+0x20c/0x220
IRQMASK: 0
NIP [c00000000033826c] .remove_proc_entry+0x20c/0x220
LR [c000000000338268] .remove_proc_entry+0x208/0x220
Call Trace:
.remove_proc_entry+0x208/0x220 (unreliable)
.unregister_irq_proc+0x104/0x140
.free_desc+0x44/0xb0
.irq_free_descs+0x9c/0xf0
.irq_dispose_mapping+0x64/0xa0
.sata_fsl_remove+0x58/0xa0 [sata_fsl]
.platform_drv_remove+0x40/0x90
.device_release_driver_internal+0x160/0x2c0
.driver_detach+0x64/0xd0
.bus_remove_driver+0x70/0xf0
.driver_unregister+0x38/0x80
.platform_driver_unregister+0x14/0x30
.fsl_sata_driver_exit+0x18/0xa20 [sata_fsl]
---[ end trace 0ea876d4076908f5 ]---
The driver creates the mapping by calling irq_of_parse_and_map(),
so it also has to dispose the mapping. But the easy way out is to
simply use platform_get_irq() instead of irq_of_parse_map(). Also
we should adapt return value checking and propagate error values.
In this case the mapping is not managed by the device but by
the of core, so the device has not to dispose the mapping.
Fixes: faf0b2e5af ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c8ad7e8cf29eb55836e7a0215f967746ab2b504 upstream.
When the `rmmod sata_fsl.ko` command is executed in the PPC64 GNU/Linux,
a bug is reported:
==================================================================
BUG: Unable to handle kernel data access on read at 0x80000800805b502c
Oops: Kernel access of bad area, sig: 11 [#1]
NIP [c0000000000388a4] .ioread32+0x4/0x20
LR [80000000000c6034] .sata_fsl_port_stop+0x44/0xe0 [sata_fsl]
Call Trace:
.free_irq+0x1c/0x4e0 (unreliable)
.ata_host_stop+0x74/0xd0 [libata]
.release_nodes+0x330/0x3f0
.device_release_driver_internal+0x178/0x2c0
.driver_detach+0x64/0xd0
.bus_remove_driver+0x70/0xf0
.driver_unregister+0x38/0x80
.platform_driver_unregister+0x14/0x30
.fsl_sata_driver_exit+0x18/0xa20 [sata_fsl]
.__se_sys_delete_module+0x1ec/0x2d0
.system_call_exception+0xfc/0x1f0
system_call_common+0xf8/0x200
==================================================================
The triggering of the BUG is shown in the following stack:
driver_detach
device_release_driver_internal
__device_release_driver
drv->remove(dev) --> platform_drv_remove/platform_remove
drv->remove(dev) --> sata_fsl_remove
iounmap(host_priv->hcr_base); <---- unmap
kfree(host_priv); <---- free
devres_release_all
release_nodes
dr->node.release(dev, dr->data) --> ata_host_stop
ap->ops->port_stop(ap) --> sata_fsl_port_stop
ioread32(hcr_base + HCONTROL) <---- UAF
host->ops->host_stop(host)
The iounmap(host_priv->hcr_base) and kfree(host_priv) functions should
not be executed in drv->remove. These functions should be executed in
host_stop after port_stop. Therefore, we move these functions to the
new function sata_fsl_host_stop and bind the new function to host_stop.
Fixes: faf0b2e5af ("drivers/ata: add support to Freescale 3.0Gbps SATA Controller")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 054aa8d439b9185d4f5eb9a90282d1ce74772969 upstream.
Jann Horn points out that there is another possible race wrt Unix domain
socket garbage collection, somewhat reminiscent of the one fixed in
commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
See the extended comment about the garbage collection requirements added
to unix_peek_fds() by that commit for details.
The race comes from how we can locklessly look up a file descriptor just
as it is in the process of being closed, and with the right artificial
timing (Jann added a few strategic 'mdelay(500)' calls to do that), the
Unix domain socket garbage collector could see the reference count
decrement of the close() happen before fget() took its reference to the
file and the file was attached onto a new file descriptor.
This is all (intentionally) correct on the 'struct file *' side, with
RCU lookups and lockless reference counting very much part of the
design. Getting that reference count out of order isn't a problem per
se.
But the garbage collector can get confused by seeing this situation of
having seen a file not having any remaining external references and then
seeing it being attached to an fd.
In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the
fix was to serialize the file descriptor install with the garbage
collector by taking and releasing the unix_gc_lock.
That's not really an option here, but since this all happens when we are
in the process of looking up a file descriptor, we can instead simply
just re-check that the file hasn't been closed in the meantime, and just
re-do the lookup if we raced with a concurrent close() of the same file
descriptor.
Reported-and-tested-by: Jann Horn <jannh@google.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 52d04d408185b7aa47628d2339c28ec70074e0ae upstream.
When running without MIO support, with pci=nomio or for devices which
are not MIO-capable the zPCI subsystem generates pseudo-MMIO addresses
to allow access to PCI BARs via MMIO based Linux APIs even though the
platform uses function handles and BAR numbers.
This is done by stashing an index into our global IOMAP array which
contains the function handle in the 16 most significant bits of the
addresses returned by ioremap() always setting the most significant bit.
On the other hand the MIO addresses assigned by the platform for use,
while requiring special instructions, allow PCI access with virtually
mapped physical addresses. Now the problem is that these MIO addresses
and our own pseudo-MMIO addresses may overlap, while functionally this
would not be a problem by itself this overlap is detected by common code
as both address types are added as resources in the iomem_resource tree.
This leads to the overlapping resource claim of either the MIO capable
or non-MIO capable devices with being rejected.
Since PCI is tightly coupled to the use of the iomem_resource tree, see
for example the code for request_mem_region(), we can't reasonably get
rid of the overlap being detected by keeping our pseudo-MMIO addresses
out of the iomem_resource tree.
Instead let's move the range used by our own pseudo-MMIO addresses by
starting at (1UL << 62) and only using addresses below (1UL << 63) thus
avoiding the range currently used for MIO addresses.
Fixes: c7ff0e918a ("s390/pci: deal with devices that have no support for MIO instructions")
Cc: stable@vger.kernel.org # 5.3+
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c1b5a84669d2477d8fffe9136e86a2cff591729 upstream.
When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.
It is because get_cpu_device() failed in add_cpu_dev_symlink().
cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:
register_cpu()
->device_register()
->device_add()
->bus_probe_device()
->cpufreq_add_dev()
But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.
Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().
I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().
Fixes: 2f0ba790df ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d49eb91e86e8c1c1614c72e3e958b6b7e2472a9 upstream.
Currently when removing an ipmi_user the removal is deferred as a work on
the system's workqueue. Although this guarantees the free operation will
occur in non atomic context, it can race with the ipmi_msghandler module
removal (see [1]) . In case a remove_user work is scheduled for removal
and shortly after ipmi_msghandler module is removed we can end up in a
situation where the module is removed fist and when the work is executed
the system crashes with :
BUG: unable to handle page fault for address: ffffffffc05c3450
PF: supervisor instruction fetch in kernel mode
PF: error_code(0x0010) - not-present page
because the pages of the module are gone. In cleanup_ipmi() there is no
easy way to detect if there are any pending works to flush them before
removing the module. This patch creates a separate workqueue and schedules
the remove_work works on it. When removing the module the workqueue is
drained when destroyed to avoid the race.
[1] https://bugs.launchpad.net/bugs/1950666
Cc: stable@vger.kernel.org # 5.1
Fixes: 3b9a907223 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Message-Id: <20211115131645.25116-1-ioanna-maria.alifieraki@canonical.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed53ae75693096f1c10b4561edd31a07b631bd72 upstream.
As reported by Exuvo is possible that we have lot's of EPROTO errors
during device start i.e. firmware load. But after that device works
correctly. Hence marking device gone by few EPROTO errors done by
commit e383c70474 ("rt2x00: check number of EPROTO errors") caused
regression - Exuvo device stop working after kernel update. To fix
disable the check during device start.
Link: https://lore.kernel.org/linux-wireless/bff7d309-a816-6a75-51b6-5928ef4f7a8c@exuvo.se/
Reported-and-tested-by: Exuvo <exuvo@exuvo.se>
Fixes: e383c70474 ("rt2x00: check number of EPROTO errors")
Cc: stable@vger.kernel.org
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211111141003.GA134627@wp.pl
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6bbfa44116689469267f1a6e3d233b52114139d2 upstream.
The 'kprobe::data_size' is unsigned, thus it can not be negative. But if
user sets it enough big number (e.g. (size_t)-8), the result of 'data_size
+ sizeof(struct kretprobe_instance)' becomes smaller than sizeof(struct
kretprobe_instance) or zero. In result, the kretprobe_instance are
allocated without enough memory, and kretprobe accesses outside of
allocated memory.
To avoid this issue, introduce a max limitation of the
kretprobe::data_size. 4KB per instance should be OK.
Link: https://lkml.kernel.org/r/163836995040.432120.10322772773821182925.stgit@devnote2
Cc: stable@vger.kernel.org
Fixes: f47cd9b553 ("kprobes: kretprobe user entry-handler")
Reported-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee201011c1e1563c114a55c86eb164b236f18e84 upstream.
IPCB/IP6CB need to be initialized when processing outbound v4 or v6 pkts
in the codepath of vrf device xmit function so that leftover garbage
doesn't cause futher code that uses the CB to incorrectly process the
pkt.
One occasion of the issue might occur when MPLS route uses the vrf
device as the outgoing device such as when the route is added using "ip
-f mpls route add <label> dev <vrf>" command.
The problems seems to exist since day one. Hence I put the day one
commits on the Fixes tags.
Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Fixes: 35402e3136 ("net: Add IPv6 support to VRF device")
Cc: stable@vger.kernel.org
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20211130162637.3249-1-ssuryaextr@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e9380df851878cee71df5a1c7611584421527f7e ]
The commit ddfd9dcf27 ("ACPI: PM: Add acpi_[un]register_wakeup_handler()")
added new functions for drivers to use during the s2idle wakeup path, but
didn't add stubs for when CONFIG_ACPI wasn't set.
Add those stubs in for other drivers to be able to use.
Fixes: ddfd9dcf27 ("ACPI: PM: Add acpi_[un]register_wakeup_handler()")
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20211101014853.6177-1-mario.limonciello@amd.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7a61432dc81375be06b02f0061247d3efbdfce3a ]
Possible recursive locking is detected by lockdep when SMC
falls back to TCP. The corresponding warnings are as follows:
============================================
WARNING: possible recursive locking detected
5.16.0-rc1+ #18 Tainted: G E
--------------------------------------------
wrk/1391 is trying to acquire lock:
ffff975246c8e7d8 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0x109/0x250 [smc]
but task is already holding lock:
ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&ei->socket.wq.wait);
lock(&ei->socket.wq.wait);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by wrk/1391:
#0: ffff975246040130 (sk_lock-AF_SMC){+.+.}-{0:0}, at: smc_connect+0x43/0x150 [smc]
#1: ffff975246c8f918 (&ei->socket.wq.wait){..-.}-{3:3}, at: smc_switch_to_fallback+0xfe/0x250 [smc]
stack backtrace:
Call Trace:
<TASK>
dump_stack_lvl+0x56/0x7b
__lock_acquire+0x951/0x11f0
lock_acquire+0x27a/0x320
? smc_switch_to_fallback+0x109/0x250 [smc]
? smc_switch_to_fallback+0xfe/0x250 [smc]
_raw_spin_lock_irq+0x3b/0x80
? smc_switch_to_fallback+0x109/0x250 [smc]
smc_switch_to_fallback+0x109/0x250 [smc]
smc_connect_fallback+0xe/0x30 [smc]
__smc_connect+0xcf/0x1090 [smc]
? mark_held_locks+0x61/0x80
? __local_bh_enable_ip+0x77/0xe0
? lockdep_hardirqs_on+0xbf/0x130
? smc_connect+0x12a/0x150 [smc]
smc_connect+0x12a/0x150 [smc]
__sys_connect+0x8a/0xc0
? syscall_enter_from_user_mode+0x20/0x70
__x64_sys_connect+0x16/0x20
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
The nested locking in smc_switch_to_fallback() is considered to
possibly cause a deadlock because smc_wait->lock and clc_wait->lock
are the same type of lock. But actually it is safe so far since
there is no other place trying to obtain smc_wait->lock when
clc_wait->lock is held. So the patch replaces spin_lock() with
spin_lock_nested() to avoid false report by lockdep.
Link: https://lkml.org/lkml/2021/11/19/962
Fixes: 2153bd1e3d3d ("Transfer remaining wait queue entries during fallback")
Reported-by: syzbot+e979d3597f48262cb4ee@syzkaller.appspotmail.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Acked-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d9fc706108c15f8bc2d4ccccf8e50f74830fabd9 ]
perf_tip() may allocate memory or use a literal, this means memory
wasn't freed if allocated. Change the API so that literals aren't used.
At the same time add missing frees for system_path. These issues were
spotted using leak sanitizer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118073804.2149974-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ca1f534a776cc7d42f2c33da4732b74ec2790cd ]
perf_hpp__column_unregister() removes an entry from a list but doesn't
free the memory causing a memory leak spotted by leak sanitizer.
Add the free while at the same time reducing the scope of the function
to static.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211118071247.2140392-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9e1a8d9f683260d50e0a14176d3f7c46a93b2700 ]
'perf inject' is currently not working for Arm SPE. When you try to run
'perf inject' and 'perf report' with a perf.data file that contains SPE
traces, the tool reports a "Bad address" error:
# ./perf record -e arm_spe_0/ts_enable=1,store_filter=1,branch_filter=1,load_filter=1/ -a -- sleep 1
# ./perf inject -i perf.data -o perf.inject.data --itrace
# ./perf report -i perf.inject.data --stdio
0x42c00 [0x8]: failed to process type: 9 [Bad address]
Error:
failed to process sample
As far as I know, the issue was first spotted in [1], but 'perf inject'
was not yet injecting the samples. This patch does something similar to
what cs_etm does for injecting the samples [2], but for SPE.
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210412091006.468557-1-leo.yan@linaro.org/#24117339
[2] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/util/cs-etm.c?h=perf/core&id=133fe2e617e48ca0948983329f43877064ffda3e#n1196
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20211105104130.28186-2-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0fa68da72c3be09e06dd833258ee89c33374195f ]
The definition of macro MOTO_SROM_BUG is:
#define MOTO_SROM_BUG (lp->active == 8 && (get_unaligned_le32(
dev->dev_addr) & 0x00ffffff) == 0x3e0008)
and the if statement
if (MOTO_SROM_BUG) lp->active = 0;
using this macro indicates lp->active could be 8. If lp->active is 8 and
the second comparison of this macro is false. lp->active will remain 8 in:
lp->phy[lp->active].gep = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].rst = (*p ? p : NULL); p += (2 * (*p) + 1);
lp->phy[lp->active].mc = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ana = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].fdx = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].ttm = get_unaligned_le16(p); p += 2;
lp->phy[lp->active].mci = *p;
However, the length of array lp->phy is 8, so array overflows can occur.
To fix these possible array overflows, we first check lp->active and then
return -EINVAL if it is greater or equal to ARRAY_SIZE(lp->phy) (i.e. 8).
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Teng Qi <starmiku1207184332@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 61217be886b5f7402843677e4be7e7e83de9cb41 ]
In line 5001, if all id in the array 'lp->phy[8]' is not 0, when the
'for' end, the 'k' is 8.
At this time, the array 'lp->phy[8]' may be out of bound.
Signed-off-by: zhangyue <zhangyue1@kylinos.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f9c55c8066bcd93ac25234a02585701fe2e31df ]
The offset value is used in pointer math on skb->data.
Since ipv6_skip_exthdr may return -1 the pointer to uh and th
may not point to the actual udp and tcp headers and potentially
overwrite other stuff. This is why I think this should be checked.
EDIT: added {}'s, thanks Kees
Signed-off-by: Jordy Zomer <jordy@pwning.systems>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a66998e0fbf213d47d02813b9679426129d0d114 ]
The if statement:
if (port >= DSAF_GE_NUM)
return;
limits the value of port less than DSAF_GE_NUM (i.e., 8).
However, if the value of port is 6 or 7, an array overflow could occur:
port_rst_off = dsaf_dev->mac_cb[port]->port_rst_off;
because the length of dsaf_dev->mac_cb is DSAF_MAX_PORT_NUM (i.e., 6).
To fix this possible array overflow, we first check port and if it is
greater than or equal to DSAF_MAX_PORT_NUM, the function returns.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Teng Qi <starmiku1207184332@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1527f69204fe35f341cb599f1cb01bd02daf4374 ]
AMD requires that the SATA controller be configured for devsleep in order
for S0i3 entry to work properly.
commit b1a9585cc3 ("ata: ahci: Enable DEVSLP by default on x86 with
SLP_S0") sets up a kernel policy to enable devsleep on Intel mobile
platforms that are using s0ix. Add the PCI ID for the SATA controller in
Green Sardine platforms to extend this policy by default for AMD based
systems using s0i3 as well.
Cc: Nehal-bakulchandra Shah <Nehal-bakulchandra.Shah@amd.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=214091
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 27dfaedc0d321b4ea4e10c53e4679d6911ab17aa ]
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed
There is a potential memleak if not call kobject_put.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Bernard Zhao <bernard@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2cf49e00d40d5132e3d067b5aa6d84791929ab15 ]
In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case. When reset been triggered again, driver should avoid to do uninitialization again.
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a0c2f8b6709a9a4af175497ca65f93804f57b248 ]
We can race where iscsi_session_recovery_timedout() has woken up the error
handler thread and it's now setting the devices to offline, and
session_recovery_timedout()'s call to scsi_target_unblock() is also trying
to set the device's state to transport-offline. We can then get a mix of
states.
For the case where we can't relogin we want the devices to be in
transport-offline so when we have repaired the connection
__iscsi_unblock_session() can set the state back to running.
Set the device state then call into libiscsi to wake up the error handler.
Link: https://lore.kernel.org/r/20211105221048.6541-2-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99b63316c39988039965693f5f43d8b4ccb1c86c ]
During the suspend is in process, thermal_zone_device_update bails out
thermal zone re-evaluation for any sensor trip violation without
setting next valid trip to that sensor. It assumes during resume
it will re-evaluate same thermal zone and update trip. But when it is
in suspend temperature goes down and on resume path while updating
thermal zone if temperature is less than previously violated trip,
thermal zone set trip function evaluates the same previous high and
previous low trip as new high and low trip. Since there is no change
in high/low trip, it bails out from thermal zone set trip API without
setting any trip. It leads to a case where sensor high trip or low
trip is disabled forever even though thermal zone has a valid high
or low trip.
During thermal zone device init, reset thermal zone previous high
and low trip. It resolves above mentioned scenario.
Signed-off-by: Manaf Meethalavalappu Pallikunhi <manafm@codeaurora.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a91cf0ffbc244792e0b3ecf7d0fddb2f344b461f ]
When a disk has write caching disabled, we skip submission of a bio with
flush and sync requests before writing the superblock, since it's not
needed. However when the integrity checker is enabled, this results in
reports that there are metadata blocks referred by a superblock that
were not properly flushed. So don't skip the bio submission only when
the integrity checker is enabled for the sake of simplicity, since this
is a debug tool and not meant for use in non-debug builds.
fstests/btrfs/220 trigger a check-integrity warning like the following
when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
------------[ cut here ]------------
WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
FS: 00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
Call Trace:
btrfsic_process_written_block+0x2f7/0x850 [btrfs]
__btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
? bio_associate_blkg_from_css+0xa4/0x2c0
btrfsic_submit_bio+0x18/0x30 [btrfs]
write_dev_supers+0x81/0x2a0 [btrfs]
? find_get_pages_range_tag+0x219/0x280
? pagevec_lookup_range_tag+0x24/0x30
? __filemap_fdatawait_range+0x6d/0xf0
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
? find_first_extent_bit+0x9b/0x160 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
write_all_supers+0x1b3/0xa70 [btrfs]
? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
btrfs_commit_transaction+0x59d/0xac0 [btrfs]
close_ctree+0x11d/0x339 [btrfs]
generic_shutdown_super+0x71/0x110
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x31/0x70
cleanup_mnt+0xb8/0x140
task_work_run+0x6d/0xb0
exit_to_user_mode_prepare+0x1f0/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f009f711dfb
RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
---[ end trace 2c4b82abcef9eec4 ]---
S-65536(sdb2/65536/1)
-->
M-1064960(sdb2/1064960/1)
Reviewed-by: Filipe Manana <fdmanana@gmail.com>
Signed-off-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5dbc4cb4667457b0c53bcd7bff11500b3c362975 ]
There is a difference in how architectures treat "mem=" option. For some
that is an amount of online memory, for s390 and x86 this is the limiting
max address. Some memblock api like memblock_enforce_memory_limit()
take limit argument and explicitly treat it as the size of online memory,
and use __find_max_addr to convert it to an actual max address. Current
s390 usage:
memblock_enforce_memory_limit(memblock_end_of_DRAM());
yields different results depending on presence of memory holes (offline
memory blocks in between online memory). If there are no memory holes
limit == max_addr in memblock_enforce_memory_limit() and it does trim
online memory and reserved memory regions. With memory holes present it
actually does nothing.
Since we already use memblock_remove() explicitly to trim online memory
regions to potential limit (think mem=, kdump, addressing limits, etc.)
drop the usage of memblock_enforce_memory_limit() altogether. Trimming
reserved regions should not be required, since we now use
memblock_set_current_limit() to limit allocations and any explicit memory
reservations above the limit is an actual problem we should not hide.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 39f53292181081d35174a581a98441de5da22bc9 ]
When WWAN device wake from S3 deep, under thinkpad platform,
WWAN would be disabled. This disable status could be checked
by command 'nmcli r wwan' or 'rfkill list'.
Issue analysis as below:
When host resume from S3 deep, thinkpad_acpi driver would
call hotkey_resume() function. Finnaly, it will use
wan_get_status to check the current status of WWAN device.
During this resume progress, wan_get_status would always
return off even WWAN boot up completely.
In patch V2, Hans said 'sw_state should be unchanged
after a suspend/resume. It's better to drop the
tpacpi_rfk_update_swstate call all together from the
resume path'.
And it's confimed by Lenovo that GWAN is no longer
available from WHL generation because the design does not
match with current pin control.
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Link: https://lore.kernel.org/r/20211108060648.8212-1-slark_xiao@163.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f338954a5fbe21eb22b4223141e31f2a26366d5 ]
This adds dual fan control for P1 / X1 Extreme Gen4
Signed-off-by: Jimmy Wang <jimmy221b@163.com>
Link: https://lore.kernel.org/r/20211105090528.39677-1-jimmy221b@163.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6def480181f15f6d9ec812bca8cbc62451ba314c ]
When kmemdup called failed and register_net_sysctl return NULL, should
return ENOMEM instead of ENOBUFS
Signed-off-by: liuguoqiang <liuguoqiang@uniontech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b922f622592af76b57cbc566eaeccda0b31a3496 ]
This bug report shows up when running our research tools. The
reports is SOOB read, but it seems SOOB write is also possible
a few lines below.
In details, fw.len and sw.len are inputs coming from io. A len
over the size of self->rpc triggers SOOB. The patch fixes the
bugs by adding sanity checks.
The bugs are triggerable with compromised/malfunctioning devices.
They are potentially exploitable given they first leak up to
0xffff bytes and able to overwrite the region later.
The patch is tested with QEMU emulater.
This is NOT tested with a real device.
Attached is the log we found by fuzzing.
BUG: KASAN: slab-out-of-bounds in
hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
Read of size 4 at addr ffff888016260b08 by task modprobe/213
CPU: 0 PID: 213 Comm: modprobe Not tainted 5.6.0 #1
Call Trace:
dump_stack+0x76/0xa0
print_address_description.constprop.0+0x16/0x200
? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
__kasan_report.cold+0x37/0x7c
? aq_hw_read_reg_bit+0x60/0x70 [atlantic]
? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
kasan_report+0xe/0x20
hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
hw_atl_utils_fw_rpc_call+0x95/0x130 [atlantic]
hw_atl_utils_fw_rpc_wait+0x176/0x210 [atlantic]
hw_atl_utils_mpi_create+0x229/0x2e0 [atlantic]
? hw_atl_utils_fw_rpc_wait+0x210/0x210 [atlantic]
? hw_atl_utils_initfw+0x9f/0x1c8 [atlantic]
hw_atl_utils_initfw+0x12a/0x1c8 [atlantic]
aq_nic_ndev_register+0x88/0x650 [atlantic]
? aq_nic_ndev_init+0x235/0x3c0 [atlantic]
aq_pci_probe+0x731/0x9b0 [atlantic]
? aq_pci_func_init+0xc0/0xc0 [atlantic]
local_pci_probe+0xd3/0x160
pci_device_probe+0x23f/0x3e0
Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2153bd1e3d3dbf6a3403572084ef6ed31c53c5f0 ]
The SMC fallback is incomplete currently. There may be some
wait queue entries remaining in smc socket->wq, which should
be removed to clcsocket->wq during the fallback.
For example, in nginx/wrk benchmark, this issue causes an
all-zeros test result:
server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html
Running 5s test @ http://11.200.15.93/index.html
1 threads and 1 connections
Thread Stats Avg Stdev Max ± Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 -nan%
0 requests in 5.00s, 0.00B read
Requests/sec: 0.00
Transfer/sec: 0.00B
The reason for this all-zeros result is that when wrk used SMC
to replace TCP, it added an eppoll_entry into smc socket->wq
and expected to be notified if epoll events like EPOLL_IN/
EPOLL_OUT occurred on the smc socket.
However, once a fallback occurred, wrk switches to use clcsocket.
Now it is clcsocket->wq instead of smc socket->wq which will
be woken up. The eppoll_entry remaining in smc socket->wq does
not work anymore and wrk stops the test.
This patch fixes this issue by removing remaining wait queue
entries from smc socket->wq to clcsocket->wq during the fallback.
Link: https://www.spinics.net/lists/netdev/msg779769.html
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 77dfc2bc0bb4b8376ecd7a430f27a4a8fff6a5a0 ]
ieee80211_get_keyid() will return false value if IV has been stripped,
such as return 0 for IP/ARP frames due to LLC header, and return -EINVAL
for disassociation frames due to its length... etc. Don't try to access
it if it's not present.
Signed-off-by: Xing Song <xing.song@mediatek.com>
Link: https://lore.kernel.org/r/20211101024657.143026-1-xing.song@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb162bb2b4394108c8f055d1b115735331205e28 ]
When PHY_SUN6I_MIPI_DPHY is selected, and RESET_CONTROLLER
is not selected, Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for PHY_SUN6I_MIPI_DPHY
Depends on [n]: (ARCH_SUNXI [=n] || COMPILE_TEST [=y]) && HAS_IOMEM [=y] && COMMON_CLK [=y] && RESET_CONTROLLER [=n]
Selected by [y]:
- DRM_SUN6I_DSI [=y] && HAS_IOMEM [=y] && DRM_SUN4I [=y]
This is because DRM_SUN6I_DSI selects PHY_SUN6I_MIPI_DPHY
without selecting or depending on RESET_CONTROLLER, despite
PHY_SUN6I_MIPI_DPHY depending on RESET_CONTROLLER.
These unmet dependency bugs were detected by Kismet,
a static analysis tool for Kconfig. Please advise if this
is not the appropriate solution.
v2:
Fixed indentation to match the rest of the file.
Signed-off-by: Julian Braha <julianbraha@gmail.com>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20211109032351.43322-1-julianbraha@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2d33f5504490a9d90924476dbccd4a5349ee1ad0 ]
This reverts commit 54fc3c681d
which does not allow 1:1 mapping even for the system RAM which
is usually possible.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f3506eee81d1f700d9ee2d2f4a88fddb669ec032 ]
Fix the length of holes reported at the end of a file: the length is
relative to the beginning of the extent, not the seek position which is
rounded down to the filesystem block size.
This bug went unnoticed for some time, but is now caught by the
following assertion in iomap_iter_done():
WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 49462e2be119d38c5eb5759d0d1b712df3a41239 ]
Before this patch, evict would clear the iopen glock's gl_object after
releasing the inode glock. In the meantime, another process could reuse
the same block and thus glocks for a new inode. It would lock the inode
glock (exclusively), and then the iopen glock (shared). The shared
locking mode doesn't provide any ordering against the evict, so by the
time the iopen glock is reused, evict may not have gotten to setting
gl_object to NULL.
Fix that by releasing the iopen glock before the inode glock in
gfs2_evict_inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>gl_object
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9b91b6b019fda817eb52f728eb9c79b3579760bc upstream.
There's possibility of an ABBA deadlock in case of a splice write to an
overlayfs file and a concurrent splice write to a corresponding real file.
The call chain for splice to an overlay file:
-> do_splice [takes sb_writers on overlay file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
-> vfs_iter_write
...
-> ovl_write_iter [takes sb_writers on real file]
And the call chain for splice to a real file:
-> do_splice [takes sb_writers on real file]
-> do_splice_from
-> iter_file_splice_write [takes pipe->mutex]
Syzbot successfully bisected this to commit 82a763e61e2b ("ovl: simplify
file splice").
Fix by reverting the write part of the above commit and by adding missing
bits from ovl_write_iter() into ovl_splice_write().
Fixes: 82a763e61e2b ("ovl: simplify file splice")
Reported-and-tested-by: syzbot+579885d1a9a833336209@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Stan Hu <stanhu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 82a763e61e2b601309d696d4fa514c77d64ee1be upstream.
generic_file_splice_read() and iter_file_splice_write() will call back into
f_op->iter_read() and f_op->iter_write() respectively. These already do
the real file lookup and cred override. So the code in ovl_splice_read()
and ovl_splice_write() is redundant.
In addition the ovl_file_accessed() call in ovl_splice_write() is
incorrect, though probably harmless.
Fix by calling generic_file_splice_read() and iter_file_splice_write()
directly.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Stan Hu <stanhu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 164051a6ab5445bd97f719f50b16db8b32174269 upstream.
The TP.CM_BAM message must be sent to the global address [1], so add a
check to drop TP.CM_BAM sent to a non-global address.
Without this patch, the receiver will treat the following packets as
normal RTS/CTS transport:
18EC0102#20090002FF002301
18EB0102#0100000000000000
18EB0102#020000FFFFFFFFFF
[1] SAE-J1939-82 2015 A.3.3 Row 1.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Link: https://lore.kernel.org/all/1635431907-15617-4-git-send-email-zhangchangzhong@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f015d89a47cd8855cd92f71fff770095bd885a1 upstream.
The mechanism in use to allow the client to see the results of COPY/CLONE
is to drop those pages from the pagecache. This forces the client to read
those pages once more from the server. However, truncate_pagecache_range()
zeros out partial pages instead of dropping them. Let us instead use
invalidate_inode_pages2_range() with full-page offsets to ensure the client
properly sees the results of COPY/CLONE operations.
Cc: <stable@vger.kernel.org> # v4.7+
Fixes: 2e72448b07 ("NFS: Add COPY nfs operation")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 53af98c091bc42fd9ec64cfabc40da4e5f3aae93 upstream.
Renoir and newer gfx9 APUs have new TSC register that is
not part of the gfxoff tile, so it can be read without
needing to disable gfx off.
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a48d96fd58a666ae231c3dd6fe4a458798ac645 upstream.
Use __maybe_unused for noirq_suspend()/noirq_resume() hooks to avoid
build warning with !CONFIG_PM_SLEEP:
>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:796:12: error: 'stmmac_pltfr_noirq_resume' defined but not used [-Werror=unused-function]
796 | static int stmmac_pltfr_noirq_resume(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:775:12: error: 'stmmac_pltfr_noirq_suspend' defined but not used [-Werror=unused-function]
775 | static int stmmac_pltfr_noirq_suspend(struct device *dev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Fixes: 276aae377206 ("net: stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 85b6d24646e4125c591639841169baa98a2da503 upstream.
Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we
initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in
task->sysvshm.shm_clist and gets IPC namespace not from current task
as it was before but from shp's object itself, then call
shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
lifetime, so, if we get shp object from the task->sysvshm.shm_clist
while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f7991 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fe3d10024073f06f04c74b9674bd71ccc1d787cf upstream.
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f260 ("mm: mmap: zap pages
with read mmap_sem in munmap"). gfn_to_hva() will only translate using
KVM memory regions, but won't validate the VMA.
Further, we should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things will
happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.
Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().
Fixes: 2d42f94773 ("s390/kvm: Add PGSTE manipulation functions")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e679004dec37566f658a255157d3aed9d762a2b7 upstream.
Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20210707091045.460-1-jgross@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a884daa61a7d91650987e855464526aef219590f upstream.
Today netfront will trust the backend to send only sane response data.
In order to avoid privilege escalations or crashes in case of malicious
backends verify the data to be within expected limits. Especially make
sure that the response always references an outstanding request.
Note that only the tx queue needs special id handling, as for the rx
queue the id is equal to the index in the ring page.
Introduce a new indicator for the device whether it is broken and let
the device stop working when it is set. Set this indicator in case the
backend sets any weird data.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 21631d2d741a64a073e167c27769e73bc7844a2f upstream.
The tx_skb_freelist elements are in a single linked list with the
request id used as link reference. The per element link field is in a
union with the skb pointer of an in use request.
Move the link reference out of the union in order to enable a later
reuse of it for requests which need a populated skb pointer.
Rename add_id_to_freelist() and get_id_from_freelist() to
add_id_to_list() and get_id_from_list() in order to prepare using
those for other lists as well. Define ~0 as value to indicate the end
of a list and place that value into the link for a request not being
on the list.
When freeing a skb zero the skb pointer in the request. Use a NULL
value of the skb pointer instead of skb_entry_is_link() for deciding
whether a request has a skb linked to it.
Remove skb_entry_set_link() and open code it instead as it is really
trivial now.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>