Commit Graph

841756 Commits

Author SHA1 Message Date
Michael J. Ruhl
fe2ac04712 IB/rdmavt: Set QP allowed opcodes after QP allocation
Currently QP allowed_ops is set after the QP is completely initialized.
This curtails the use of this optimization for any initialization before
allowed_ops is set.

Fix by adding a helper to determine the correct allowed_ops and moving the
setting of the allowed_ops to just after QP allocation.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
5136bfea7e IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full
When a completion queue is full, the associated queue pairs are not put
into the error state. According to the IBTA specification, this is a
violation.

Quote from IBTA spec:
C9-218: A Requester Class F error occurs when the CQ is inaccessible or
full and an attempt is made to complete a WQE.  The Affected QP shall be
moved to the error state and affiliated asynchronous errors generated as
described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The
current WQE and any subsequent WQEs are left in an unknown state.

C11-37: The CI shall generate a CQ Error when a CQ overrun is
detected. This condition will result in an Affiliated Asynchronous Error
for any associated Work Queues when they attempt to use that
CQ. Completions can no longer be added to the CQ. It is not guaranteed
that completions present in the CQ at the time the error occurred can be
retrieved. Possible causes include a CQ overrun or a CQ protection error.

Put the qp in error state when cq is full. Implement a state called full
to continue to put other associated QPs in error state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
f592ae3c99 IB/rdmavt: Fracture single lock used for posting and processing RWQEs
Usage of single lock prevents fetching posted and processing receive work
queue entries from progressing simultaneously and impacts overall
performance.

Fracture the single lock used for posting and processing Receive Work
Queue Entries (RWQEs) to allow the circular buffer to be filled and
emptied at the same time. Two new spinlocks - one for the producers and
one for the consumers used for posting and processing RWQEs simultaneously
and the two indices are define on two different cache lines. The threshold
count is used to avoid reading other index in different cache line every
time.

Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Kamenee Arumugam
dabac6e460 IB/hfi1: Move receive work queue struct into uapi directory
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the
providers but are not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Kamenee Arumugam
239b0e52d8 IB/hfi1: Move rvt_cq_wc struct into uapi directory
The rvt_cq_wc struct elements are shared between rdmavt and the providers
but not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

In that case, move rvt_cq_wc struct into the rvt-abi.h header file and
create a rvt_k_cq_w for the kernel completion queue.

Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Jason Gunthorpe
371bb62158 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into rdma.git for-next

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 21:18:23 -03:00
Colin Ian King
10dcc7448e RDMA/hns: fix spelling mistake "attatch" -> "attach"
There is a spelling mistake in an dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 16:33:00 -03:00
Doug Ledford
34d65cd837 RDMA/netlink: Audit policy settings for netlink attributes
For all string attributes for which we don't currently accept the element
as input, we only use it as output, set the string length to
RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1.  That way we will only
accept a null string for that element.  This will prevent someone from
writing a new input routine that uses the element without also updating
the policy to have a valid value.

Also while there, make sure the existing entries that are valid have the
correct policy, if not, correct the policy.  Remove unnecessary checks
for nla_strlcpy() overflow once the policy has been set correctly.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 16:26:54 -03:00
Lijun Ou
e9816ddf2a RDMA/hns: Cleanup unnecessary exported symbols
This patch removes the hns-roce.ko for cleanup all the exported symbols in
common part.

Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 14:48:44 -03:00
Mauro Carvalho Chehab
97162a1ee8 docs: infiniband: convert docs to ReST and rename to *.rst
The InfiniBand docs are plain text with no markups.  So, all we needed to
do were to add the title markups and some markup sequences in order to
properly parse tables, lists and literal blocks.

At its new index.rst, let's add a :orphan: while this is not linked to the
main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:23:09 -03:00
Dan Carpenter
b417c0879d RDMA/hns: Fix an error code in hns_roce_set_user_sq_size()
This function is supposed to return negative kernel error codes but here
it returns CMD_RST_PRC_EBUSY (2).  The error code eventually gets passed
to IS_ERR() and since it's not an error pointer it leads to an Oops in
hns_roce_v1_rsv_lp_qp()

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:21:13 -03:00
Colin Ian King
7ef7587541 RDMA/hns: fix potential integer overflow on left shift
There is a potential integer overflow when int i is left shifted as this
is evaluated using 32 bit arithmetic but is being used in a context that
expects an expression of type dma_addr_t.  Fix this by casting integer i
to dma_addr_t before shifting to avoid the overflow.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 2ac0bc5e72 ("RDMA/hns: Add a group interfaces for optimizing buffers getting flow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-25 10:18:19 -03:00
Max Gurtovoy
7796d2a3bb RDMA/mlx5: Refactor MR descriptors allocation
Improve code readability using static helpers for each memory region
type. Re-use the common logic to get smaller functions that are easy
to maintain and reduce code duplication.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Max Gurtovoy
2563e2f30a RDMA/mlx5: Use PA mapping for PI handover
If possibe, avoid doing a UMR operation to register data and protection
buffers (via MTT/KLM mkeys). Instead, use the local DMA key and map the
SG lists using PA access. This is safe, since the internal key for data
and protection never exposed to the remote server (only signature key
might be exposed). If PA mappings are not possible, perform mapping
using MTT/KLM descriptors.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1266.4K/1262.4K    1720.1K/1732.1K
4k    793139/570902      1129.6K/773982
32k   72660/72086        97229/96164

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1590.2K/1600.1K    1828.2K/1830.3K
4k    1078.1K/937272     1142.1K/815304
32k   77012/77369        98125/97435

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
de0ae958de RDMA/mlx5: Improve PI handover performance
In some loads, there is performance degradation when using KLM mkey
instead of MTT mkey. This is because KLM descriptor access is via
indirection that might require more HW resources and cycles.
Using KLM descriptor is not necessary when there are no gaps at the
data/metadata sg lists. As an optimization, use MTT mkey whenever it
is possible. For that matter, allocate internal MTT mkey and choose the
effective pi_mr for in transaction according to the required mapping
scheme.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o/baseline):

bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
4k    570902/571233/457874       773982/743293/642080
32k   72086/72388/71933          96164/71789/93249

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)                IOPS(write)
----    ----------                ----------
512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
4k    937272/921992/762934       815304/753772/646071
32k   77369/75052/72058          97435/73180/94612

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Max Gurtovoy <maxg@mellanox.com>
Suggested-by: Idan Burstein <idanb@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
5c171cbe3a RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR code
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY
was used.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
e9a53e73a2 RDMA/rw: Use IB_WR_REG_MR_INTEGRITY for PI handover
Replace the old signature handover API with the new one. The new API
simplifes PI handover code complexity for ULPs and improve performance.
For RW API it will reduce the maximum number of work requests per task
and the need of dealing with multiple MRs (and their registrations and
invalidations) per task. All the mappings and registration of the data
and the protection buffers is done by the LLD using a single WR and a
special MR type (IB_MR_TYPE_INTEGRITY) for the PI handover operation.

The setup of the tested benchmark (using iSER ULP):
 - 2 servers with 24 cores (1 initiator and 1 target)
 - ConnectX-4/ConnectX-5 adapters
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1243.3K/1182.3K    1725.1K/1680.2K
4k    571233/528835      743293/748259
32k   72388/71086        71789/93573

Using write_generate=0 and read_verify=0 (w/w.o patch):
bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512   1572.1K/1427.2K    1823.5K/1724.3K
4k    921992/916194      753772/768267
32k   75052/73960        73180/95484

There is a performance degradation when writing big block sizes.
Degradation is caused by the complexity of combining multiple
indirections and perform RDMA READ operation from it. This will be
fixed in the following patches by reducing the indirections if
possible.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:28 -03:00
Israel Rukshin
6cb2d5b105 RDMA/rw: Introduce rdma_rw_inv_key helper
This is a preparation for adding new signature API to the rw-API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
185eddc457 RDMA/core: Validate integrity handover device cap
Protect the case that a ULP tries to allocate a QP with signature
enabled flag while the LLD doesn't support this feature.
While we're here, also move integrity_en attribute from mlx5_qp to
ib_qp as a preparation for adding new integrity API to the rw-API
(that is part of ib_core module).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
c0a6cbb9cb RDMA/core: Rename signature qp create flag and signature device capability
Rename IB_QP_CREATE_SIGNATURE_EN to IB_QP_CREATE_INTEGRITY_EN
and IB_DEVICE_SIGNATURE_HANDOVER to IB_DEVICE_INTEGRITY_HANDOVER.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
5a6781a558 RDMA/core: Add an integrity MR pool support
This is a preparation for adding new signature API to the rw-API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
b9294f8b7c IB/iser: Unwind WR union at iser_tx_desc
After decreasing WRs array size from 7 to 3 it is more
readable to give each WR a descriptive name.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Israel Rukshin
b76a439982 IB/iser: Use IB_WR_REG_MR_INTEGRITY for PI handover
Using this new API reduces iSER code complexity.
It also reduces the maximum number of work requests per task and the need
of dealing with multiple MRs (and their registrations and invalidations)
per task. It is done by using a single WR and a special MR type
(IB_MR_TYPE_INTEGRITY) for PI operation.

The setup of the tested benchmark:
 - 2 servers with 24 cores (1 initiator and 1 target)
 - 24 target sessions with 1 LUN each
 - ramdisk backstore
 - PI active

Performance results running fio (24 jobs, 128 iodepth) using
write_generate=0 and read_verify=0 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512     1236.6K/1164.3K   1357.2K/1332.8K
1k      1196.5K/1163.8K   1348.4K/1262.7K
2k      1016.7K/921950    1003.7K/931230
4k      662728/600545     595423/501513
8k      385954/384345     333775/277090
16k     222864/222820     170317/170671
32k     116869/114896     82331/82244
64k     55205/54931       40264/40021

Using write_generate=1 and read_verify=1 (w/w.o patch):

bs      IOPS(read)        IOPS(write)
----    ----------        ----------
512     1090.1K/1030.9K   1303.9K/1101.4K
1k      1057.7K/904583    1318.4K/988085
2k      965226/638799     1008.6K/692514
4k      555479/410151     542414/414517
8k      298675/224964     264729/237508
16k     133485/122481     164625/138647
32k     74329/67615       80143/78743
64k     35716/35519       39294/37334

We get performance improvement at all block sizes.
The most significant improvement is when writing 4k bs (almost 30% more
iops).

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
38ca87c6f1 RDMA/mlx5: Introduce and implement new IB_WR_REG_MR_INTEGRITY work request
This new WR will be used to perform PI (protection information) handover
using the new API. Using the new API, the user will post a single WR that
will internally perform all the needed actions to complete PI operation.
This new WR will use a memory region that was allocated as
IB_MR_TYPE_INTEGRITY and was mapped using ib_map_mr_sg_pi to perform the
registration. In the old API, in order to perform a signature handover
operation, each ULP should perform the following:
1. Map and register the data buffers.
2. Map and register the protection buffers.
3. Post a special reg WR to configure the signature handover operation
   layout.
4. Invalidate the signature memory key.
5. Invalidate protection buffers memory key.
6. Invalidate data buffers memory key.

In the new API, the mapping of both data and protection buffers is
performed using a single call to ib_map_mr_sg_pi function. Also the
registration of the buffers and the configuration of the signature
operation layout is done by a single new work request called
IB_WR_REG_MR_INTEGRITY.
This patch implements this operation for mlx5 devices that are capable to
offload data integrity generation/validation while performing the actual
buffer transfer.
This patch will not remove the old signature API that is used by the iSER
initiator and target drivers. This will be done in the future.

In the internal implementation, for each IB_WR_REG_MR_INTEGRITY work
request, we are using a single UMR operation to register both data and
protection buffers using KLM's.
Afterwards, another UMR operation will describe the strided block format.
These will be followed by 2 SET_PSV operations to set the memory/wire
domains initial signature parameters passed by the user.
In the end of the whole transaction, only the signature memory key
(the one that exposed for the RDMA operation) will be invalidated.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
22465bba39 RDMA/mlx5: Update set_sig_data_segment attribute for new signature API
Explicitly pass the sig_mr and the access flags for the mkey segment
configuration. This function will be used also in the new signature
API, so modify it in order to use it in both APIs. This is a preparation
commit before adding new signature API.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:27 -03:00
Max Gurtovoy
9ac7c4bcd3 RDMA/mlx5: Pass UMR segment flags instead of boolean
UMR ctrl segment flags can vary between UMR operations. for example,
using inline UMR or adding free/not-free checks for a memory key.
This is a preparation commit before adding new signature API that
will not need not-free checks for the internal memory key during the
UMR operation.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
62e3c379d4 RDMA/mlx5: Add attr for max number page list length for PI operation
PI offload (protection information) is a feature that each RDMA provider
can implement differently. Thus, introduce new device attribute to define
the maximal length of the page list for PI fast registration operation. For
example, mlx5 driver uses a single internal MR to map both data and
protection SGL's, so it's equal to max_fast_reg_page_list_len / 2.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
6c984472ba RDMA/mlx5: Implement mlx5_ib_map_mr_sg_pi and mlx5_ib_alloc_mr_integrity
mlx5_ib_map_mr_sg_pi() will map the PI and data dma mapped SG lists to the
mlx5 memory region prior to the registration operation. In the new
API, the mlx5 driver will allocate an internal memory region for the
UMR operation to register both PI and data SG lists. The internal MR
will use KLM mode in order to map 2 (possibly non-contiguous/non-align)
SG lists using 1 memory key. In the new API, each ULP will use 1 memory
region for the signature operation (instead of 3 in the old API). This
memory region will have a key that will be exposed to remote server to
perform RDMA operation. The internal memory key that will map the SG lists
will stay private.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
7c717d3aee RDMA/core: Add signature attrs element for ib_mr structure
This element will describe the needed characteristics for the signature
operation per signature enabled memory region (type IB_MR_TYPE_INTEGRITY).
Also add meta_length attribute to ib_sig_attrs structure for saving the
mapped metadata length (needed for the new API implementation).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
2cdfcdd867 RDMA/core: Introduce ib_map_mr_sg_pi to map data/protection sgl's
This function will map the previously dma mapped SG lists for PI
(protection information) and data to an appropriate memory region for
future registration.
The given MR must be allocated as IB_MR_TYPE_INTEGRITY.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Israel Rukshin
26bc7eaee9 RDMA/core: Introduce IB_MR_TYPE_INTEGRITY and ib_alloc_mr_integrity API
This is a preparation for signature verbs API re-design. In the new
design a single MR with IB_MR_TYPE_INTEGRITY type will be used to perform
the needed mapping for data integrity operations.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
a0bc099abf RDMA/core: Save the MR type in the ib_mr structure
This is a preparation for the signature verbs API change. This change is
needed since the MR type will define, in the upcoming patches, the need
for allocating internal resources in LLD for signature handover related
operations. It will also help to make sure that signature related
functions are called with an appropriate MR type and fail otherwise.
Also introduce new mr types IB_MR_TYPE_USER, IB_MR_TYPE_DMA and
IB_MR_TYPE_DM for correctness.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Max Gurtovoy
36b1e47ff0 RDMA/core: Introduce new header file for signature operations
Ease the exhausted ib_verbs.h file and make the code more readable.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24 11:49:26 -03:00
Linus Torvalds
4b972a01a7 Linux 5.2-rc6 2019-06-22 16:01:36 -07:00
Linus Torvalds
6698a71a1e IOMMU Fix for v5.2-rc5:
- Revert a commit from the previous pile of fixes which causes
 	  new lockdep splats. It is better to revert it for now and work
 	  on a better and more well tested fix.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl0Og/QACgkQK/BELZcB
 GuOkZhAAqzTHRndHUzXpSr7L+MrioQI3pmPKuyK72pxQyhuNlwRNn84c8pUxyQe/
 0I6xxvPRu/763BepA0iHW4wKuFtuM299y95STxh7pcEV9+LSTdsrl7TRDxebDG+A
 I7ZJoPlDSrZfnAdlzXAXF1GDmNS+6tM3zRYaH/DSBk5EQhmYLZ6qGiFfrY3DZ7A4
 Y6MhP4OK0bsTE6nL8MFvxGXQA00CdiYIwxBbdCy0uYyOMIrGRE6/Wu+n6Ld1f3PM
 bXuJrj3ZV4Wm1emolG5CWACtJcwvvC+/gn+qQ23mTQvSe4cwK97lUgZ7dS3sTpfr
 ZZFAyzTKAwLN7mA3n/5vu0HI6ZwfipE1izkZhTqHzR/Q+Gy0y0afhGuJOWY5N7Y7
 2AGxp78e3cuIMN8VChHjSlpRvg/EUA9xgX1oylX3lR/YJLW62V1QXmUiHU66KTuo
 QESUSw6IPedqsoLmAJeaP7a5hC+MdliliQEJKOhbluyLfHnMZMo0Ao0jCn/Qtpqr
 S+PfiroXKWaP6jsRKPD6p9mE4g4S01pREGnfTULH5RKqBYu8LjPjV++xWNxoy66Q
 bhoxkIX8+5x9eUlX9zKIm9faqC8P4VbFdu7P6E/wwzeQrRlMYHpWCyBHhmSEj1Xh
 loLvJe5mQv9l+eKWWh4CUngripM1tSnAONWBRsnz0Bz1LoraWqQ=
 =JRtp
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fix from Joerg Roedel:
 "Revert a commit from the previous pile of fixes which causes new
  lockdep splats. It is better to revert it for now and work on a better
  and more well tested fix"

* tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
2019-06-22 14:08:47 -07:00
Peter Xu
0aafc8ae66 Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
This reverts commit 7560cc3ca7.

With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:

    ======================================================
    WARNING: possible circular locking dependency detected
    5.2.0-rc5 #78 Not tainted
    ------------------------------------------------------
    swapper/0/1 is trying to acquire lock:
    00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
    but task is already holding lock:
    00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #1 (device_domain_lock){....}:
           _raw_spin_lock_irqsave+0x3c/0x50
           dmar_insert_one_dev_info+0xbb/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           kernel_init_freeable+0x218/0x2c1
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
    -> #0 (&(&iommu->lock)->rlock){+.+.}:
           lock_acquire+0x9e/0x170
           _raw_spin_lock+0x25/0x30
           domain_context_mapping_one+0xa5/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           kernel_init_freeable+0x218/0x2c1
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50

    other info that might help us debug this:
     Possible unsafe locking scenario:
           CPU0                    CPU1
           ----                    ----
      lock(device_domain_lock);
                                   lock(&(&iommu->lock)->rlock);
                                   lock(device_domain_lock);
      lock(&(&iommu->lock)->rlock);

     *** DEADLOCK ***
    2 locks held by swapper/0/1:
     #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
     #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0

    stack backtrace:
    CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
    Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
    Call Trace:
     dump_stack+0x85/0xc0
     print_circular_bug.cold.57+0x15c/0x195
     __lock_acquire+0x152a/0x1710
     lock_acquire+0x9e/0x170
     ? domain_context_mapping_one+0xa5/0x4e0
     _raw_spin_lock+0x25/0x30
     ? domain_context_mapping_one+0xa5/0x4e0
     domain_context_mapping_one+0xa5/0x4e0
     ? domain_context_mapping_one+0x4e0/0x4e0
     pci_for_each_dma_alias+0x30/0x140
     dmar_insert_one_dev_info+0x3b2/0x510
     domain_add_dev_info+0x50/0x90
     dev_prepare_static_identity_mapping+0x30/0x68
     intel_iommu_init+0xddd/0x1422
     ? printk+0x58/0x6f
     ? lockdep_hardirqs_on+0xf0/0x180
     ? do_early_param+0x8e/0x8e
     ? e820__memblock_setup+0x63/0x63
     pci_iommu_init+0x16/0x3f
     do_one_initcall+0x5d/0x2b4
     ? do_early_param+0x8e/0x8e
     ? rcu_read_lock_sched_held+0x55/0x60
     ? do_early_param+0x8e/0x8e
     kernel_init_freeable+0x218/0x2c1
     ? rest_init+0x230/0x230
     kernel_init+0xa/0x100
     ret_from_fork+0x3a/0x50

domain_context_mapping_one() is taking device_domain_lock first then
iommu lock, while dmar_insert_one_dev_info() is doing the reverse.

That should be introduced by commit:

7560cc3ca7 ("iommu/vt-d: Fix lock inversion between iommu->lock and
              device_domain_lock", 2019-05-27)

So far I still cannot figure out how the previous deadlock was
triggered (I cannot find iommu lock taken before calling of
iommu_flush_dev_iotlb()), however I'm pretty sure that that change
should be incomplete at least because it does not fix all the places
so we're still taking the locks in different orders, while reverting
that commit is very clean to me so far that we should always take
device_domain_lock first then the iommu lock.

We can continue to try to find the real culprit mentioned in
7560cc3ca7, but for now I think we should revert it to fix current
breakage.

CC: Joerg Roedel <joro@8bytes.org>
CC: Lu Baolu <baolu.lu@linux.intel.com>
CC: dave.jiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-06-22 21:19:58 +02:00
Linus Torvalds
b253d5f3ec pci-v5.2-fixes-1
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl0OTC0UHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vxBkQ//Zye+wY3Md9M3TWHYjj73jeJTmfb/
 BQgWU2OsOvGWDgQe7gX2c5sjkJe1S450Mf9CvRYu77z1SHgT/2E25yY3OxV1wNMS
 UX1xjm91N1/KBnPj2L6ks5exequobVdAkkhUF0WsRB8L09KHe+E/eg8hKNhRA5jK
 mqNCmRox20LjbkKpKJF2p20ynU8+psFoM3Enm1JRo5UprgXsfFwBJaB75qQBGhN8
 SaRBQVMP2vJoghRVeofj2y/cZWSIKEcZ/dOY+q6MMzd2hsRxjiuqRLrm4f6MStdV
 W+0qr6qd0V6BzxMO+NrCrrhWrkjEb8cqB0F8V/hw3xU8G/17CBEk34HLrqWi2+3D
 //puT7TjXA8t/awkuz+wH2saDldZU4BfeDgpEriop//jQa30EhXM0RLiUBZofKfS
 U88qkd4N/CqPbScTe71ve5pUW5WH5kdcBYWHTN5venEW3sxCR13vFtlfDheki/Mc
 C865E7+ZEep8FhakhGrwiS6MjQrlF1Mzq3BxGviED0Cw92Rz3SEShdp+C0Qk/Av6
 5OYUaDfOw6tx92hBz6DtlbTHUNYbXCMv9aXCR2ju1DjDrIkFeIIr8cJwKI64f4LZ
 EJXIQrEKVWD7QOLe1ebRBlZKV+mEN0q9ZTII2waMcUfZ7GXLLueqYkCKRC8hd805
 +5sZtKspodBpl04=
 =0Qnd
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
 "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2
  because we don't yet know how to support P2PDMA in that case (Logan
  Gunthorpe)"

* tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
2019-06-22 09:42:29 -07:00
Linus Torvalds
f410276646 SCSI fixes on 20190622
Three driver fixes (and one version number update): a suspend hang in
 ufs, a qla hard lock on module removal and a qedi panic during
 discovery.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXQ5HJiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYoMAQDmq1hE
 rnBYcnQHdLWztkg7taCqnFdvjUocQES3+tuuVQEA+OOSqD7cz6ZKX5crZdSJm0Ka
 BqeTaOMoZXUFkQ4NA9c=
 =Vr0g
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three driver fixes (and one version number update): a suspend hang in
  ufs, a qla hard lock on module removal and a qedi panic during
  discovery"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla2xxx: Fix hardlockup in abort command during driver remove
  scsi: ufs: Avoid runtime suspend possibly being blocked forever
  scsi: qedi: update driver version to 8.37.0.20
  scsi: qedi: Check targetname while finding boot target information
2019-06-22 09:39:03 -07:00
Linus Torvalds
a8282bf087 powerpc fixes for 5.2 #5
Seven fixes, all for bugs introduced this cycle.
 
 The commit to add KASAN support broke booting on 32-bit SMP machines, due to a
 refactoring that moved some setup out of the secondary CPU path.
 
 A fix for another 32-bit SMP bug introduced by the fast syscall entry
 implementation for 32-bit BOOKE. And a build fix for the same commit.
 
 Our change to allow the DAWR to be force enabled on Power9 introduced a bug in
 KVM, where we clobber r3 leading to a host crash.
 
 The same commit also exposed a previously unreachable bug in the nested KVM
 handling of DAWR, which could lead to an oops in a nested host.
 
 One of the DMA reworks broke the b43legacy WiFi driver on some people's
 powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
 
 A fix for TLB flushing in KVM introduced a new bug, as it neglected to also
 flush the ERAT, this could lead to memory corruption in the guest.
 
 Thanks to:
   Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry Finger, Michael
   Neuling, Suraj Jitindar Singh.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdDg4YAAoJEFHr6jzI4aWACKYP/RG1cqYDjEWz0N9bxjAOanx6
 z//hPZqZrObEORx0mek07LNj6JDy4eL7CB9WaEudJjHt7mYugLYq0g7hUMVvBWnB
 irFEuzGJ8EgWl1aMbmz+fgf49PBIuroy2o/4pyzzQXoDaw44QyUaCke2VEBskQNG
 RW64C2rDVrPgpRHzBB9EZVNv7svmo6ERJsEpRvqP3PZG1ZxgXW+DXbEdSmJCcgAt
 8oI+z6frRv+0ez+nge7TULo8DuheShfxc7l0jFrd48i35v2qB/IowPr8cof9fRwM
 TqnB+3dZXHPKPz6J9mz80p9ZDe1omLzg6i9EiR2/7a3XGpRBo7kCg3Iri7N5pu0j
 LotK9l1+mXWLy5P6lOHH5/tEHv52Wqsvh5IetpNJ2tgXp3MzbOc1/Ut9h7Ag7cRw
 WRa7tNXQ5Ud8uPM1Pds8Ymhd+/nZ9RItjGcu6S095/OGpM1FJR9a0QnfUHMyfyuX
 kAGrJDWcAkCd/Q9tKHsQotuZAFmRCQe4JFkzTiGzwdjYWYgtTA1c/eIv3+SG7eLV
 1dsaIYzIS56b+Qz2Qc/pKHwho+I9o505Y7LFXxlCGXDDjyI72ioTQDwiSBzaZdc9
 ORwNchLfpXNpiNXRoRqAnqmhWxavYmA6oJ13RDBiMBxIUWHynVbEzLlX9fPNdBFj
 Kw3Zd15znokXBzU+1mDE
 =Ju1y
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "This is a frustratingly large batch at rc5. Some of these were sent
  earlier but were missed by me due to being distracted by other things,
  and some took a while to track down due to needing manual bisection on
  old hardware. But still we clearly need to improve our testing of KVM,
  and of 32-bit, so that we catch these earlier.

  Summary: seven fixes, all for bugs introduced this cycle.

   - The commit to add KASAN support broke booting on 32-bit SMP
     machines, due to a refactoring that moved some setup out of the
     secondary CPU path.

   - A fix for another 32-bit SMP bug introduced by the fast syscall
     entry implementation for 32-bit BOOKE. And a build fix for the same
     commit.

   - Our change to allow the DAWR to be force enabled on Power9
     introduced a bug in KVM, where we clobber r3 leading to a host
     crash.

   - The same commit also exposed a previously unreachable bug in the
     nested KVM handling of DAWR, which could lead to an oops in a
     nested host.

   - One of the DMA reworks broke the b43legacy WiFi driver on some
     people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.

   - A fix for TLB flushing in KVM introduced a new bug, as it neglected
     to also flush the ERAT, this could lead to memory corruption in the
     guest.

  Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
  Finger, Michael Neuling, Suraj Jitindar Singh"

* tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
  powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
  KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
  KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
  powerpc/32: fix build failure on book3e with KVM
  powerpc/booke: fix fast syscall entry on SMP
  powerpc/32s: fix initial setup of segment registers on secondary CPU
2019-06-22 09:09:42 -07:00
Marcel Holtmann
693cd8ce3f Bluetooth: Fix regression with minimum encryption key size alignment
When trying to align the minimum encryption key size requirement for
Bluetooth connections, it turns out doing this in a central location in
the HCI connection handling code is not possible.

Original Bluetooth version up to 2.0 used a security model where the
L2CAP service would enforce authentication and encryption.  Starting
with Bluetooth 2.1 and Secure Simple Pairing that model has changed into
that the connection initiator is responsible for providing an encrypted
ACL link before any L2CAP communication can happen.

Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and
before devices are causing a regression.  The encryption key size check
needs to be moved out of the HCI connection handling into the L2CAP
channel setup.

To achieve this, the current check inside hci_conn_security() has been
moved into l2cap_check_enc_key_size() helper function and then called
from four decisions point inside L2CAP to cover all combinations of
Secure Simple Pairing enabled devices and device using legacy pairing
and legacy service security model.

Fixes: d5bb334a8e ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-22 09:07:39 -07:00
Linus Torvalds
c356dc4b54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix leak of unqueued fragments in ipv6 nf_defrag, from Guillaume
    Nault.

 2) Don't access the DDM interface unless the transceiver implements it
    in bnx2x, from Mauro S. M. Rodrigues.

 3) Don't double fetch 'len' from userspace in sock_getsockopt(), from
    JingYi Hou.

 4) Sign extension overflow in lio_core, from Colin Ian King.

 5) Various netem bug fixes wrt. corrupted packets from Jakub Kicinski.

 6) Fix epollout hang in hvsock, from Sunil Muthuswamy.

 7) Fix regression in default fib6_type, from David Ahern.

 8) Handle memory limits in tcp_fragment more appropriately, from Eric
    Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  tcp: refine memory limit test in tcp_fragment()
  inet: clear num_timeout reqsk_alloc()
  net: mvpp2: debugfs: Add pmap to fs dump
  ipv6: Default fib6_type to RTN_UNICAST when not set
  net: hns3: Fix inconsistent indenting
  net/af_iucv: always register net_device notifier
  net/af_iucv: build proper skbs for HiperTransport
  net/af_iucv: remove GFP_DMA restriction for HiperTransport
  net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
  hvsock: fix epollout hang from race condition
  net/udp_gso: Allow TX timestamp with UDP GSO
  net: netem: fix use after free and double free with packet corruption
  net: netem: fix backlog accounting for corrupted GSO frames
  net: lio_core: fix potential sign-extension overflow on large shift
  tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
  ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
  ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
  tun: wake up waitqueues after IFF_UP is set
  net: remove duplicate fetch in sock_getsockopt
  tipc: fix issues with early FAILOVER_MSG from peer
  ...
2019-06-21 22:23:35 -07:00
Eric Dumazet
b6653b3629 tcp: refine memory limit test in tcp_fragment()
tcp_fragment() might be called for skbs in the write queue.

Memory limits might have been exceeded because tcp_sendmsg() only
checks limits at full skb (64KB) boundaries.

Therefore, we need to make sure tcp_fragment() wont punish applications
that might have setup very low SO_SNDBUF values.

Fixes: f070ef2ac6 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-21 20:58:42 -04:00
Linus Torvalds
121bddf39a Pull request for 5.1-rc5
- 2 minor EFA fixes
 - 10 hfi1 fixes related to scaling issues
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAl0NBtIACgkQuCajMw5X
 L92+mBAApWkaC+Zvy3ZkbmSpayK3+/q3KmOF7HLUM7iQzKa/G5JknKlLE25Ye0wB
 wMlMn77bCg4RRx+mOpFcsSQYzaJ41Dh5Xp7IRWvVTR51b/UYdJ7dT04HQEIycAN1
 WtW7+SPsFIYWWv6s1P2gSYZnnAFy+wtd7njVUIIc/DS9LmlGU4+Ok5zcUA8nfVqu
 aW01cB5PqOEn4XrkcR76eUJhQjr4Ucq8JCneKOjp0gGuPkqtRwqyWaHGCDvADhS2
 QG0Fc8ISaj8WMevH83lzWB+3Qw0R4rBuPfGdy2EmyDz4/lWKkkaG4phrqLCINLlA
 05hzVZpyG2Ad+5UGPrS1waX/QsNEa2o0UB4sEwnS93zKj89uvcFJ3i9N0Ve47WSm
 f6RKwaFZUYoRkI+I9VtQ7YyJswtakHQ8l3aUCCniJ0ah85VUn8vfYVe5UDP505Fp
 Kc7e3pv3s+L91ZuVL/Z53rSclqlhh5RRuozK3jLFxFjBO/b9TohwWVYE83UPQEdN
 NRy4RsH8PTpcmxNnGGBIO3iB6QZNlSy5qAlRCqeUQKJfowb2qDwYcR955R69komG
 0TNHv3siQluhq3gmfgjxOfE/YDui0erkYwu0ZN9Y9NX+o5OjruDrcHs5B3T0/sV5
 xolnHCpoq9mlvrTBs7rQVyjaHsHjN6ThzV3MzUYBZp8yFz70zwg=
 =Amd0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:
 "This is probably our last -rc pull request. We don't have anything
  else outstanding at the moment anyway, and with the summer months on
  us and people taking trips, I expect the next weeks leading up to the
  merge window to be pretty calm and sedate.

  This has two simple, no brainer fixes for the EFA driver.

  Then it has ten not quite so simple fixes for the hfi1 driver. The
  problem with them is that they aren't simply one liner typo fixes.
  They're still fixes, but they're more complex issues like livelock
  under heavy load where the answer was to change work queue usage and
  spinlock usage to resolve the problem, or issues with orphaned
  requests during certain types of failures like link down which
  required some more complex work to fix too. They all look like
  legitimate fixes to me, they just aren't small like I wish they were.

  Summary:

   - 2 minor EFA fixes

   - 10 hfi1 fixes related to scaling issues"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/efa: Handle mmap insertions overflow
  RDMA/efa: Fix success return value in case of error
  IB/hfi1: Handle port down properly in pio
  IB/hfi1: Handle wakeup of orphaned QPs for pio
  IB/hfi1: Wakeup QPs orphaned on wait list after flush
  IB/hfi1: Use aborts to trigger RC throttling
  IB/hfi1: Create inline to get extended headers
  IB/hfi1: Silence txreq allocation warnings
  IB/hfi1: Avoid hardlockup with flushlist_lock
  IB/hfi1: Correct tid qp rcd to match verbs context
  IB/hfi1: Close PSM sdma_progress sleep window
  IB/hfi1: Validate fault injection opcode user input
2019-06-21 14:47:09 -07:00
Linus Torvalds
c036f7dabc More NFS client fixes for Linux 5.2
Bugfixes:
 - SUNRPC: Fix a credential refcount leak
 - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
 - SUNRPC: Fix xps refcount imbalance on the error path
 - NFS4: Only set creation opendata if O_CREAT
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl0NMHQACgkQ18tUv7Cl
 QOujAw//bL4p0ADvCSglqO0ceZcHpHuQVhj10f+tyXfkq3R7WDQmP2qok/+6uZjk
 rYxWh6nCqQqTBSmstf4ouLMjdQTk+zDQf38zSSWGPW0U6rCX4kl/yOyskBriYqnf
 W4U5+hOCvQ8prKdkcjbhYqdvz6qT9bhc5X7kHgtD66CtvyzUDraYw04Mojzodl91
 CPV97rDZbiHBAgZPBFKF+qoTXEL7hQlHUREcR/DZBLV3qrMBsraog1T1OpWRGev0
 OAxVAwZyXFcWDFm7mFcItMA4WcUnZbL77gPNtvZSfgYPxHsytHfH8KOmEG7zP5Yy
 +blko41nvNR2UVOPQ/zTvWj9pkuHQlacDUrlYdgnOmHuGhufj1/xx4Z5C2dkTTnp
 ufjcVXJOqZE7lWyA4IOWapc7gLM3q6I8sUR9w5nLc1meYphNAqHZgK0fTgfbMoo7
 JweUBcgGmNvkMpkX561HBe16ENKvcgQQp666VsnTTI/BPZ/BBdlayKGBbxA3j22F
 znF4gwznvQ0jVxtlNsibzSwM8GQbc6UM5fGPM+atlPJEzban0waQaIbCW347DViZ
 fTXP2NQmvH1x+YNgx6RTRwVBWWT02u/ijQHf7+NvwWLWTKb9OXwQVjlnWHu/E/Qi
 MpNXxtBTT6y+DG09VNV9CYwmAsULnlRQWe9RNBfMz+4sjLCyIjg=
 =+z30
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull more NFS client fixes from Anna Schumaker:
 "These are mostly refcounting issues that people have found recently.
  The revert fixes a suspend recovery performance issue.

   - SUNRPC: Fix a credential refcount leak

   - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"

   - SUNRPC: Fix xps refcount imbalance on the error path

   - NFS4: Only set creation opendata if O_CREAT"

* tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Fix a credential refcount leak
  Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
  net :sunrpc :clnt :Fix xps refcount imbalance on the error path
  NFS4: Only set creation opendata if O_CREAT
2019-06-21 13:45:41 -07:00
Andy Lutomirski
ff17bbe0bb x86/vdso: Prevent segfaults due to hoisted vclock reads
GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
pages before the vclock mode checks.  This creates a path through
vclock_gettime() in which no vclock is enabled at all (due to disabled
TSC on old CPUs, for example) but the pvclock or hvclock page
nevertheless read.  This will segfault on bare metal.

This fixes commit 459e3a2153 ("gcc-9: properly declare the
{pv,hv}clock_page storage") in the sense that, before that commit, GCC
didn't seem to generate the offending code.  There was nothing wrong
with that commit per se, and -stable maintainers should backport this to
all supported kernels regardless of whether the offending commit was
present, since the same crash could just as easily be triggered by the
phase of the moon.

On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
I'm not too concerned about performance regressions from this fix.

Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Reported-by: Duncan Roe <duncan_roe@optusnet.com.au>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-21 13:31:52 -07:00
Trond Myklebust
19d55046cd SUNRPC: Fix a credential refcount leak
All callers of __rpc_clone_client() pass in a value for args->cred,
meaning that the credential gets assigned and referenced in
the call to rpc_new_client().

Reported-by: Ido Schimmel <idosch@idosch.org>
Fixes: 79caa5fad4 ("SUNRPC: Cache cred of process creating the rpc_client")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21 14:45:09 -04:00
Anna Schumaker
502980e84e Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
Jon Hunter reports:
  "I have been noticing intermittent failures with a system suspend test on
   some of our machines that have a NFS mounted root file-system. Bisecting
   this issue points to your commit 431235818b ("SUNRPC: Declare RPC
   timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
   appear to resolve the problem.

   The cause of the suspend failure appears to be a long delay observed
   sometimes when resuming from suspend, and this is causing our test to
   timeout."

This reverts commit 431235818b.

Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21 14:43:42 -04:00
Lin Yi
b962261484 net :sunrpc :clnt :Fix xps refcount imbalance on the error path
rpc_clnt_add_xprt take a reference to struct rpc_xprt_switch, but forget
to release it before return, may lead to a memory leak.

Signed-off-by: Lin Yi <teroincn@163.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21 14:43:35 -04:00
Benjamin Coddington
909105199a NFS4: Only set creation opendata if O_CREAT
We can end up in nfs4_opendata_alloc during task exit, in which case
current->fs has already been cleaned up.  This leads to a crash in
current_umask().

Fix this by only setting creation opendata if we are actually doing an open
with O_CREAT.  We can drop the check for NULL nfs4_open_createattrs, since
O_CREAT will never be set for the recovery path.

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21 14:43:25 -04:00
Linus Torvalds
a4c33bbb66 Just one ARM fix this time around for Jason Donenfeld, fixing a
problem with the VDSO generation on big endian.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAXQzkd/TnkBvkraxkAQIJFA/+OKfKuVmT7VUynlb3w1iDn1jzLt8Ja1Vt
 N/PRdGi9B/5P+XjRtYvqhANc3Yea431vj2K46Il4aHzFWkDQNUvAF7oYeX32dhqv
 FlgdDgfXsG3pOajjEe/X4dqYiM1lwCsSn0bev7haLeIaY0sMcqPrVnYLTY5WoohR
 26nxVryyVwHY5B5Zwt62sYGzO9XkYsi6v81zC4T19+iVAxloKcqYlCMFlNYld0pE
 rXSXd0kHz4a/Jpyfa2glWhvOknJ74VPyzV3XMjt1+VcHAVC9fM7ziNHHZO+fxFjW
 h5ANj5+YO11pcsUOTe0Pw5xxqo8q9ixN23hkwoiYeV2vvkQdVLOwID5Ms6RTwkMD
 oR56fXwzg7V2pEnVO1sMAPlaC8uJEp0JoF/6MAUhRpLAT4KKwO45QWlZr8SccvEW
 wurSPMMCL7zSHejO7v3Nnq03fvo0PESpGRH2RiDB7jkvjHBstNYgkhYjC9foAHRu
 jl0fJroJW+U6pK9wYJhRQdYDo9ON9TQD5WZuAS+iTPPecTCTz66s/dBp7b46vrBj
 269jFpA7sClw2Cu41zHC7Z132LMd6HrdI3VLdjM0uoYwn/jnRf2gbihY3Py+yIED
 rLDWHT1XSXzTx7LPxdEV3miDnHTNigAKR7y+O5vZ/q3Y8d3HBRAQsset8HoZNJaK
 AzHAUfNi53c=
 =VtLs
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fix from Russell King:
 "Just one ARM fix this time around for Jason Donenfeld, fixing a
  problem with the VDSO generation on big endian"

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8867/1: vdso: pass --be8 to linker if necessary
2019-06-21 11:11:30 -07:00