We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e9 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad87170 ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Hughes <james.hughes@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph writes:
This is the current NVMe pile: virtualization extensions, lots of FC
updates and various misc bits. There are a few more FC bits that didn't
make the cut, but we'd like to get this request out before the merge
window for sure.
Just two fixes. The first fixes kprobing a stdu, and is marked for stable as
it's been broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take a certain
exception when the kernel is running relocated (currently only used for kdump),
we checkstop the box.
Thanks to:
Ravi Bangoria.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY+YyZAAoJEFHr6jzI4aWA6iYP/iwHOwFrAVMKl/zN8f5Vq/Ci
pNPhOJoWFNkKfNkjzqGZXleoB76jc342d3uDDCuGI65YZVFIrlCFc1k93hCrlZYS
jU0dJX+RLFEqcqXbwGhsBLEUjT17R8kxzgQ1J8gHzsFUjvbo7c2u49e7WvGrmqB9
+ksoqy81XfN9nkW4xDw+ME6bUcodW5rkxYNIPuZ0BUdnarPC/sdVvLPVzKuPwcRj
56wlry8kIwTdhqUSA9pYDaq1BY80AEp8d2VFEVsibhiNyJjyDFVHX6t4k/9an7oD
IXiqBVuMX7RnYnAI86aaaoqkZ8EOVeNX0A4U2XtQjGuu+avnwAlYaJ+cFvhbzWXX
zfjX8XanuFc7+Yok4G5W6Rqlye6DB6Ep4Asj3S1Nihv/UHKToVfvtAd46pXmUf2e
3Y+ut69AhT4aJZV4QGpJdUuh98xVR5dnmiAV/Yx+vKkcf3Bz2FzJ3OA/PGkevE0C
M6hY8kjMI9cKFgN6WO5ziNFwAj4t2JHf78F7A5Fkp3I3H0FbDKqhU31Gp/Gnrv3L
Giavyms78Z2+XVg+uxvXUIakKfnrWLao8HxwwCsfKKh9uhPoltM+I+5FI9mG3FSq
XVyA81XqcSkH3Gq3Y5aYZI3cq7YOk3auiWKazQ9H4Fbpi342LiT0v9eIxaP4XxbC
cY/QV6StaJ8cvQA/p2oL
=yUj5
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Just two fixes.
The first fixes kprobing a stdu, and is marked for stable as it's been
broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take
a certain exception when the kernel is running relocated (currently
only used for kdump), we checkstop the box.
Thanks to Ravi Bangoria"
* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY+jDWAAoJEFmIoMA60/r8QA0QALF2maWciwCps10R1rSMJOxn
MIZbc5nXlgcI8jEnla05x5qGa/QSUmkq0zRF/RW4RbdshN1OZABlrcc4NFyJ50oL
ARo7KBlXyDmWUBcXOKZsMMGsxLHexUl6BUoctedkUR83tfSoS36BA+DFvGZsyHAJ
D3Nq/67VR+xR8OoZgXyd7pRirvVW0x6o2RbYXFV7au8p7ds13BWo3Xxwtn/2h+3p
ybrdmEUFY9DGkLtULrHIWvcPK9LDBAU0uVU3pAzp6Ss6noVDLQH8d4+tyuUB5aIm
/CGPDc9N89YWv4yyqVdvsPVJXbBg17byhO2kgKYrlNmt6Q/PFwMf4/cF1EWvgt/K
eIWfNtTSVVa57Seb7B+AKN/leIwT++C19ZJvBWFtJVK+VCpmlsc5vtR0K2beIBI7
p6lW5PbqlttfoQphKSnJy4Gf/2B+6jft+nzOydfei2Olnjsr4kkwSC9/UWBKbXiS
Q/6S7clz3bBiPb8IxQArtaGzemPvsB7Gy55stTWXq3ux3IVLbpmzGspZBa30gKq8
9H0bjPjY5c0gMGcvoCmd7+fZ4ZTqIsBWMQB/0O/iAZv4d42/BZS70khbRC7gXZ4c
O3MVRyUL9ADx2Jysoy0Yg+Q//hB6t7xOZWMQRqmoMGxO2PtFgrObg0SwNrEW+K0b
BFsHfZykauzjCOPZmeFx
=qqfj
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Sorry this is so late. It's been in -next for over a week, but I
forgot to send it on until now.
A single fix to the DT binding of the HiSilicon PCIe host support"
* tag 'pci-v4.11-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: hisi: Fix DT binding (hisi-pcie-almost-ecam)
Pull block layer fixes from Jens Axboe:
"A couple of last minute fixes for regressions in this cycle. More
specifically:
- Two patches from Andy, adjusting the NVMe APST quirks to avoid some
issues specific to one Toshiba drive, and some variant of Samsung
on two specific Dell laptops.
- A fix for mtip32xx, turning off mq scheduling on that device. We
have a real fix for this, but it's too late in the cycle.
Thankfully we already have a NO_SCHED flag we can apply here. A
prep patch for this is ensuring that we honor the NO_SCHED flag
when attempting to online switch schedulers, previsouly we only did
so for drive load time. From Ming.
- Fixing an oops in blk-mq polling with scheduling attached. This one
is easily reproducible, it would be a shame to release 4.11 with
that issue. From me.
I'd prefer not having to send in patches at this point in time, but
the above are all things that have regressed in this cycle and the
fixes are relatively straight forward"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix potential oops with polling and blk-mq scheduler
nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
nvme: Adjust the Samsung APST quirk
mtip32xx: pass BLK_MQ_F_NO_SCHED
block: respect BLK_MQ_F_NO_SCHED
- Avoid a false-positive warning regarding a variable that may not
be initialized that started to trigger after a previous general
build fix (Arnd Bergmann).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJY+d0VAAoJEILEb/54YlRxh+kQAIO0Eegg7JMZrxVsMl+t53sY
wBwUGUbeALZeh5H1xju0rH5mcyDtTCFc5ufPG46EPAoNog7o+hENOMcKkEOygf7c
h3rzfY+L3+22mYTEtjy9g8/nSwGtzNNTSFXlx/XXWuu5sB+tum8zL6xxguUAOh1R
v74Q+qi71UoEpWTQEiREun3+f3ze5pNG5qWas+prcK+GoPXlu2CiSbTY1IkGN5l0
MTn1//hkj+UY9c38ZtHJyQmOqXx4RaHZv4GwtX3yspzCvqx0n2SsItxKBHO8vSaG
H0w88Kmrc+lkrqfa28Uy5disiakqbFOTmmn+dC/M6UYQGWiwTejI6G3RuBdy1XKA
PilJvu3gRdAJT1X79onC5Mum9OFO4lupG2s3X3G9vrU7tBUvqdfpERcCRoHMXsvj
KYAExu9aQUGx1bPs9/3mjXwTjQJvP1l5sEl9fwfxp5/TovrcySqUWOs1xt95CEYd
w74NIKPlomeC0UM4cfKXyohw8xD1GKgdXo6+Sq014FlFlYXn00jDIQwfZEwlG4Gb
8ZyOfs3GFBdUyEZ4rL7RIJ4/JJa1MDYHohwqX1Mb67HjnZENY0PJ1CTaBfw8iH83
rz3edw92Zr0QvOkeIj+ReDF4eZFndL3GzCOGewjJMXP+k6WpTLokzfbojIYN5w+K
Clpnfz8DkK8bB4QcxF7Z
=W+qr
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI build fix from Rafael Wysocki:
"This avoids a false-positive build warning from the compiler.
Specifics:
- Avoid a false-positive warning regarding a variable that may not be
initialized that started to trigger after a previous general build
fix (Arnd Bergmann)"
* tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / power: Avoid maybe-uninitialized warning
- kmalloc sdio scratch buffer to make it DMA-friendly
MMC host:
- dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
- sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50 cards
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY+cg7AAoJEP4mhCVzWIwpe+AP/RGstHkyUJkX0GEtwkZxRR+K
1mXjq9sACIbB3ERp9U/3p5+sQUN9VYY3JXlq2nlUTjF0V80Cu0KuWcu/la4wt9Q9
ARrl9ZKbmgfox/o2GWPNvF0tloJ1C0pPmfedUe2nGRtoRHbPwVGA7OT3XmQF4nz8
6kJH98wfa51eaxwY2u50gP1urLO2ayu9eO+S4jaBj6HAFw4aoqJbzK8Hg/sCFSZ+
N9pWooBVl5Km+xWFdgutofjK7C+C/qXmSuLWqrJhMnMtaXCkU9NkdTUiTmgQbHJU
ZZ35gwTZqmidaGz99OTaf+o98XmXIQuGu2teS+v7bA20Ak78AnH/eueRS3lKGFE/
LIfcdIGu/8063OKl6vNYVCG/G9ADSENeKtCHSf4dzXetuz4PWIudW3aIAV831EaX
lRBY/cyTD5O0ns1NlfkEqvW0HlGxoViMsKqPhUznWSPG0lIVIMkNdKNY9kuNrlK9
59RXusCxb4wEt95+EFycYsN5SEa+jer8b3mkZbpe0N7yn/NmLCY2cWpR5iWWn55z
N5hh62vNy071wMQj6LYhBQ9J+PGTnnxTtOQ/WW8fT2m19ONmlIuAIRv5r3/cgWOS
E9/6wOvl3cx9x3NxopdO5ygUED9LCaXH8w73c5YTSynqn3ZZRKg7T+O8lF07T57j
NRXVe01w7dShecMTaL6x
=XYxi
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- kmalloc sdio scratch buffer to make it DMA-friendly
MMC host:
- dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
- sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
cards"
* tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
mmc: sdio: fix alignment issue in struct sdio_func
Pull input fixlet from Dmitry Torokhov:
"An update to Elan PS/2 driver to allow working on yet another
Lifebook"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
We need to get the command payload from the request before
we attempt to dereference it.
Fixes: 4dda4735c5 ("mtip32xx: add a status field to struct mtip_cmd")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently most IOs which return the nvme error codes are retried on
the other path if those IOs returns EIO from NVMe driver. This
patch let Multipath distinguish nvme media error codes and some
generic or cmd-specific nvme error codes so that multipath will
not retry those kinds of IO, to save bandwidth.
Signed-off-by: Junxiong Guan <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If an IO timeout occurs, it's helpful to know if the controller did not
post a completion or the driver missed an interrupt. While we never expect
the latter, this patch will make it possible to tell the difference so
we don't have to guess.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
The FC-NVME spec revised syntax to avoid comma separators.
Sync with the change in the parser for traddr on port attachments.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
remoteport teardown never aborted the LS opertions. Add support.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Link LS's on the remoteport rather than the controller. LS's are
between nport's. Makes more sense, especially on async teardown where
the controller is torn down regardless of the LS (LS is more of a notifier
to the target of the teardown), to have them on the remoteport.
While revising ls send/done routines, issues were seen relative to
refcounting and cleanup, especially in async path. Reworked these code
paths.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
target transport:
----------------------
There are cases when there is a need to abort in-progress target
operations (writedata) so that controller termination or errors can
clean up. That can't happen currently as the abort is another target
op type, so it can't be used till the running one finishes (and it may
not). Solve by removing the abort op type and creating a separate
downcall from the transport to the lldd to request an io to be aborted.
The transport will abort ios on queue teardown or io errors. In general
the transport tries to call the lldd abort only when the io state is
idle. Meaning: ops that transmit data (readdata or rsp) will always
finish their transmit (or the lldd will see a state on the
link or initiator port that fails the transmit) and the done call for
the operation will occur. The transport will wait for the op done
upcall before calling the abort function, and as the io is idle, the
io can be cleaned up immediately after the abort call; Similarly, ios
that are not waiting for data or transmitting data must be in the nvmet
layer being processed. The transport will wait for the nvmet layer
completion before calling the abort function, and as the io is idle,
the io can be cleaned up immediately after the abort call; As for ops
that are waiting for data (writedata), they may be outstanding
indefinitely if the lldd doesn't see a condition where the initiatior
port or link is bad. In those cases, the transport will call the abort
function and wait for the lldd's op done upcall for the operation, where
it will then clean up the io.
Additionally, if a lldd receives an ABTS and matches it to an outstanding
request in the transport, A new new transport upcall was created to abort
the outstanding request in the transport. The transport expects any
outstanding op call (readdata or writedata) will completed by the lldd and
the operation upcall made. The transport doesn't act on the reported
abort (e.g. clean up the io) until an op done upcall occurs, a new op is
attempted, or the nvmet layer completes the io processing.
fcloop:
----------------------
Updated to support the new target apis.
On fcp io aborts from the initiator, the loopback context is updated to
NULL out the half that has completed. The initiator side is immediately
called after the abort request with an io completion (abort status).
On fcp io aborts from the target, the io is stopped and the initiator side
sees it as an aborted io. Target side ops, perhaps in progress while the
initiator side is done, continue but noop the data movement as there's no
structure on the initiator side to reference.
patch also contains:
----------------------
Revised lpfc to support the new abort api
commonized rsp buffer syncing and nulling of private data based on
calling paths.
errors in op done calls don't take action on the fod. They're bad
operations which implies the fod may be bad.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Current design has the fcloop job struct, used for both initiator and
target processing, allocated as part of the initiator request structure.
On aborts, the initiator side (based on the request) may terminate, yet
the target side wants to continue processing. the target side can't do
that if the initiator side goes away.
Revise fcloop to allocate an independent target side structure when it
starts an io from the initiator.
Added a lock to the request struct as well to synchronize pointer updates
on abort calls.
Modified target downcalls to recognize conditions where initiator has
aborted the io (thus nulled the pointer between job structs), thus
avoid referencing sgl lists which are gone and no longer making upcalls
to the initiator.
In conditions where the targetport is no longer connected, have the
initiator return an access failure rather than simulating a command
completion.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
With the advent of the opdone calls changing context, the lldd can no
longer assume that once the op->done call returns for RSP operations
that the request struct is no longer being accessed.
As such, revise the lldd api for a req_release callback that the
transport will call when the job is complete. This will also be used
with abort cases.
Fixed text in api header for change in io complete semantics.
Revised lpfc to support the new req_release api.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Two new feature flags were added to control whether upcalls to the
transport result in context switches or stay in the calling context.
NVMET_FCTGTFEAT_CMD_IN_ISR:
By default, if the flag is not set, the transport assumes the
lldd is in a non-isr context and in the cpu context it should be
for the io queue. As such, the cmd handler is called directly in the
calling context.
If the flag is set, indicating the upcall is an isr context, the
transport mandates a transition to a workqueue. The workqueue assigned
to the queue is used for the context.
NVMET_FCTGTFEAT_OPDONE_IN_ISR
By default, if the flag is not set, the transport assumes the
lldd is in a non-isr context and in the cpu context it should be
for the io queue. As such, the fcp operation done callback is called
directly in the calling context.
If the flag is set, indicating the upcall is an isr context, the
transport mandates a transition to a workqueue. The workqueue assigned
to the queue is used for the context.
Updated lpfc for flags
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This is safer as it doesn't rely on the data being stored in
a single page in an sgl.
It also aids our effort to start phasing out users of sg_page. See [1].
For this we kmalloc some memory, copy to it and free at the end. Note:
we can't allocate this memory on the stack as the kbuild test robot
reports some frame size overflows on i386.
[1] https://lwn.net/Articles/720053/
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
1) to store the doorbell values so they can be lookup without the doorbell
MMIO write
2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.
FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).
The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.
Contributions:
Eric Northup <digitaleric@google.com>
Frank Swiderski <fes@google.com>
Ted Tso <tytso@mit.edu>
Keith Busch <keith.busch@intel.com>
Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.
[1] Running on a 4 core machine:
fio --time_based --name=benchmark --runtime=30
--filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
--direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
--rw=randread --blocksize=4k --randrepeat=false
Signed-off-by: Rob Nelson <rlnelson@google.com>
[mlin: port for upstream]
Signed-off-by: Ming Lin <mlin@kernel.org>
[koike: updated for upstream]
Signed-off-by: Helen Koike <helen.koike@collabora.co.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
The QPRIO field is only valid if weighted round robin arbitration is used,
and this driver doesn't enable that controller configuration option.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
No point in providing and exporting this helper. There's just
one (real) user of it, just use rq_data_dir().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
I lack the basic understanding of what segments mean, so we were being
limited to 512kib requests even with higher max_sectors sizes set.
Setting the maximum number of segments to unlimited allows us to
actually have arbitrarily large IO's go through NBD.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
commit c13660a08c ("blk-mq-sched: change ->dispatch_requests()
to ->dispatch_request()") removed the last user of this function.
Hence also remove the function itself.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
If the caller passes in wait=true, it has to be able to block
for a driver tag. We just had a bug where flush insertion
would block on tag allocation, while we had preempt disabled.
Ensure that we catch cases like that earlier next time.
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Fixes an issue where the size of the poll_stat array in request_queue
does not match the size expected by the new size based bucketing for
IO completion polling.
Fixes: 720b8ccc45 ("blk-mq: Add a polling specific stats function")
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Accumulator is present in configs with FPU and/or DSP MPY (mpy > 6)
Instead of doing this in pt_regs (and thus every kernel entry/exit),
this could have been done in context switch (and for user task only) as
currently kernel doesn't clobber these registers for its own accord.
However we will soon start using 64-bit multiply instructions for kernel
which can clobber these. Also gcc folks also plan to start using these
as GPRs, hence better to always save/restore them
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Merge two mm fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: prevent NR_ISOLATE_* stats from going negative
Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
Commit 6afcf8ef0c ("mm, compaction: fix NR_ISOLATED_* stats for pfn
based migration") moved the dec_node_page_state() call (along with the
page_is_file_cache() call) to after putback_lru_page().
But page_is_file_cache() can change after putback_lru_page() is called,
so it should be called before putback_lru_page(), as it was before that
patch, to prevent NR_ISOLATE_* stats from going negative.
Without this fix, non-CONFIG_SMP kernels end up hanging in the
while(too_many_isolated()) { congestion_wait() } loop in
shrink_active_list() due to the negative stats.
Mem-Info:
active_anon:32567 inactive_anon:121 isolated_anon:1
active_file:6066 inactive_file:6639 isolated_file:4294967295
^^^^^^^^^^
unevictable:0 dirty:115 writeback:0 unstable:0
slab_reclaimable:2086 slab_unreclaimable:3167
mapped:3398 shmem:18366 pagetables:1145 bounce:0
free:1798 free_pcp:13 free_cma:0
Fixes: 6afcf8ef0c ("mm, compaction: fix NR_ISOLATED_* stats for pfn based migration")
Link: http://lkml.kernel.org/r/1492683865-27549-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ming Ling <ming.ling@spreadtrum.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 374ad05ab6.
While the patch worked great for userspace allocations, the fact that
softirq loses the per-cpu allocator caused problems. It needs to be
redone taking into account that a separate list is needed for hard/soft
IRQs or alternatively find a cheap way of detecting reentry due to an
interrupt. Both are possible but sufficiently tricky that it shouldn't
be rushed.
Jesper had one method for allowing softirqs but reported that the cost
was high enough that it performed similarly to a plain revert. His
figures for netperf TCP_STREAM were as follows
Baseline v4.10.0 : 60316 Mbit/s
Current 4.11.0-rc6: 47491 Mbit/s
Jesper's patch : 60662 Mbit/s
This patch : 60106 Mbit/s
As this is a regression, I wish to revert to noirq allocator for now and
go back to the drawing board.
Link: http://lkml.kernel.org/r/20170415145350.ixy7vtrzdzve57mh@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reported-by: Tariq Toukan <ttoukan.linux@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than bucketing IO statisics based on direction only we also
bucket based on the IO size. This leads to improved polling
performance. Update the bucket callback function and use it in the
polling latency estimation.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In order to allow for filtering of IO based on some other properties
of the request than direction we allow the bucket function to return
an int.
If the bucket callback returns a negative do no count it in the stats
accumulation.
Signed-off-by: Stephen Bates <sbates@raithlin.com>
Fixed up Kyber scheduler stat callback.
Signed-off-by: Jens Axboe <axboe@fb.com>
If we have a scheduler attached, blk_mq_tag_to_rq() on the
scheduled tags will return NULL if a request is no longer
in flight. This is different than using the normal tags,
where it will always return the fixed request. Check for
this condition for polling, in case we happen to enter
polling for a completed request.
The request address remains valid, so this check and return
should be perfectly safe.
Fixes: bd166ef183 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Tested-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
There's a report that it malfunctions with APST on.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
I got a couple more reports: the Samsung APST issues appears to
affect multiple 950-series devices in Dell XPS 15 9550 and Precision
5510 laptops. Change the quirk: rather than blacklisting the
firmware on the first problematic SSD that was reported, disable
APST on all 144d:a802 devices if they're installed in the two
affected Dell models. While we're at it, disable only the deepest
sleep state instead of all of them -- the reporters say that this is
sufficient to fix the problem.
(I have a device that appears to be entirely identical to one of the
affected devices, but I have a different Dell laptop, so it's not
the case that all Samsung devices with firmware BXW75D0Q are broken
under all circumstances.)
Samsung engineers have an affected system, and hopefully they'll
give us a better workaround some time soon. In the mean time, this
should minimize regressions.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.
CVE-2017-7979
Fixes: 1045ba77a5 ("net sched actions: Add support for user cookies")
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sudarsana Reddy Kalluru says:
====================
qed: Dcbx bug fixes
The series has set of bug fixes for dcbx implementation of qed driver.
Please consider applying this to 'net' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ieee_setpfc() callback implementation to populate traffic class
count with the user provided value.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some adapters may not publish the max_tc value. Populate the default
value for max_tc field in case the mfw didn't provide one.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver was failing to check that the SKB wasn't cloned
before adding checksum data.
Replace existing handling to extend/copy the header buffer
with skb_cow_head.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N, who was reviewing TI's CPSW driver patches is
not working for TI anymore and wont be reviewing patches for
that driver.
Drop Mugunthan as the maintiainer for this driver.
Grygorii continues to be a reviewer. Dave Miller applies the
patches directly and adding a maintainer is actually
misleading since get_maintainer.pl script stops suggesting
that Dave Miller be copied.
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-04-19
Two fixes for af_key:
1) Add a lock to key dump to prevent a NULL pointer dereference.
From Yuejie Shi.
2) Fix slab-out-of-bounds in parse_ipsecrequests.
From Herbert Xu.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>