Currently, there's no good way to test for the presence of
set_memory_ro/rw/x/nx() helpers implemented by archs such as
x86, arm, arm64 and s390.
There's DEBUG_SET_MODULE_RONX and DEBUG_RODATA, however both
don't really reflect that: set_memory_*() are also available
even when DEBUG_SET_MODULE_RONX is turned off, and DEBUG_RODATA
is set by parisc, but doesn't implement above functions. Thus,
add ARCH_HAS_SET_MEMORY that is selected by mentioned archs,
where generic code can test against this.
This also allows later on to move DEBUG_SET_MODULE_RONX out of
the arch specific Kconfig to define it only once depending on
ARCH_HAS_SET_MEMORY.
Suggested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
hrtimer handlers run with masked hard IRQ, we can therefore
use napi_schedule_irqoff()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e70ac17165.
jtcp_rcv_established() is in fact called with hard irq being disabled.
Initial bug report from Ricardo Nabinger Sanchez [1] still needs
to be investigated, but does not look like a TCP bug.
[1] https://www.spinics.net/lists/netdev/msg420960.html
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Ricardo Nabinger Sanchez <rnsanchez@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use eth_hw_addr_random() to set a random MAC address in order to make sure
dev->addr_assign_type will be properly set to NET_ADDR_RANDOM.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Petazzoni says:
====================
net: mvpp2: misc improvements and preparation patches
This series contains a number of fixes, misc improvements and
preparation patches for an upcoming series that adds support for the
new PPv2.2 network controller to the mvpp2 driver.
The most significant improvements are:
- Switching to using build_skb(), which is necessary for the upcoming
PPv2.2 support, but anyway a good improvement to the current mvpp2
driver (supporting PPv2.1).
- Making the driver build on 64-bit platforms.
Changes since v3:
- Addition of a patch "net: mvpp2: fix DMA address calculation in
mvpp2_txq_inc_put()", which fixes a bug in the driver in the
calculation of DMA addresses. This bug was found using
DMA_API_DEBUG.
- Modify the "net: mvpp2: switch to build_skb() in the RX path" patch
to recalculate the fragment size when the MTU is changed in
mvpp2_bm_update_mtu().
- Added Acked-by from Russell King on all patches, except:
* "net: mvpp2: fix DMA address calculation in
mvpp2_txq_inc_put()", because it's a new patch
* "net: mvpp2: switch to build_skb() in the RX path" because I
modified it since the v3.
- Rebased on top of 4.10.
Changes since v2:
- Fix remaining 64-bit build warning, reported by David Miller.
- Adjust how bit mask related definitions are done in "net: mvpp2:
simplify MVPP2_PRS_RI_* definitions" according to Russell King
suggestions.
- Add a patch "net: mvpp2: remove useless arguments in
mvpp2_rx_{pkts,time}_coal_set", suggested by Russell King.
- Rework mvpp2_rx_time_coal_set() implementation to avoid overflows
and rounding errors. I've used the implementation suggested by
Russell King.
Changes since v1:
- This series is split as a separate series from the larger patch set
adding support for PPv2.2 in the mvpp2 driver, as requested by
David Miller.
- Rebased on top of v4.10-rc1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvpp2 is going to be extended to support the Marvell Armada 7K/8K
platform, which is ARM64. As a preparation to this work, this commit
enables building the mvpp2 driver on ARM64, by:
- Adjusting the Kconfig dependency
- Fixing the types used in the driver so that they are 32/64-bits
compliant. We use dma_addr_t for DMA addresses, and unsigned long
for virtual addresses.
It is worth mentioning that after this commit, the driver is for now
still only used on 32-bits platforms, and will only work on 32-bits
platforms.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adapts the mvpp2 RX path to use the build_skb() method. Not
only build_skb() is now the recommended mechanism, but it also
simplifies the addition of support for the PPv2.2 variant.
Indeed, without build_skb(), we have to keep track for each RX
descriptor of the physical address of the packet buffer, and the virtual
address of the SKB. However, in PPv2.2 running on 64 bits platform,
there is not enough space in the descriptor to store the virtual address
of the SKB. So having to take care only of the address of the packet
buffer, and building the SKB upon reception helps in supporting PPv2.2.
The implementation is fairly straightforward:
- mvpp2_skb_alloc() is renamed to mvpp2_buf_alloc() and no longer
allocates a SKB. Instead, it allocates a buffer using the new
mvpp2_frag_alloc() function, with enough space for the data and SKB.
- The initialization of the RX buffers in mvpp2_bm_bufs_add() as well
as the refill of the RX buffers in mvpp2_rx_refill() is adjusted
accordingly.
- Finally, the mvpp2_rx() is modified to use build_skb().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the MVPP2_PRS_RI_* definitions use the ~(value) syntax, which
doesn't compile nicely on 64-bit. Moreover, those definitions are in
fact unneeded, since they are always used in combination with a bit
mask that ensures only the appropriate bits are modified.
Therefore, such definitions should just be set to 0x0. In addition, as
suggested by Russell King, we change the _MASK definitions to also use
the BIT() macro so that it is clear they are related to the values
defined afterwards.
For example:
#define MVPP2_PRS_RI_L2_CAST_MASK 0x600
#define MVPP2_PRS_RI_L2_UCAST ~(BIT(9) | BIT(10))
#define MVPP2_PRS_RI_L2_MCAST BIT(9)
#define MVPP2_PRS_RI_L2_BCAST BIT(10)
becomes
#define MVPP2_PRS_RI_L2_CAST_MASK (BIT(9) | BIT(10))
#define MVPP2_PRS_RI_L2_UCAST 0x0
#define MVPP2_PRS_RI_L2_MCAST BIT(9)
#define MVPP2_PRS_RI_L2_BCAST BIT(10)
Because the values (MVPP2_PRS_RI_L2_UCAST, MVPP2_PRS_RI_L2_MCAST and
MVPP2_PRS_RI_L2_BCAST) are always applied with
MVPP2_PRS_RI_L2_CAST_MASK, and therefore there is no need for
MVPP2_PRS_RI_L2_UCAST to be defined as ~(BIT(9) | BIT(10)).
It fixes the following warnings when building the driver on a 64-bit
platform (which is not possible as of this commit, but will be enabled
in a follow-up commit):
drivers/net/ethernet/marvell/mvpp2.c: In function ‘mvpp2_prs_mac_promisc_set’:
drivers/net/ethernet/marvell/mvpp2.c:524:33: warning: large integer implicitly truncated to unsigned type [-Woverflow]
#define MVPP2_PRS_RI_L2_UCAST ~(BIT(9) | BIT(10))
^
drivers/net/ethernet/marvell/mvpp2.c:1459:33: note: in expansion of macro ‘MVPP2_PRS_RI_L2_UCAST’
mvpp2_prs_sram_ri_update(&pe, MVPP2_PRS_RI_L2_UCAST,
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvpp2_bm_bufs_add() currently creates a fake cookie by calling
mvpp2_bm_cookie_pool_set(), just to be able to call
mvpp2_pool_refill(). But all what mvpp2_pool_refill() does is extract
the pool ID from the cookie, and call mvpp2_bm_pool_put() with this ID.
Instead of doing this convoluted thing, just call mvpp2_bm_pool_put()
directly, since we have the BM pool ID.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit drops dead code from the mvpp2 driver. The 'in_use' and
'in_use_thresh' fields of 'struct mvpp2_bm_pool' are
incremented/decremented/initialized in various places. But they are only
used in one place:
if (is_recycle &&
(atomic_read(&bm_pool->in_use) < bm_pool->in_use_thresh))
return 0;
However 'is_recycle', passed as argument to mvpp2_rx_refill() is always
false. So in fact, this code is never reached, and the 'is_recycle'
argument is useless. So let's drop this code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit remove a field of 'struct mvpp2_tx_queue' that is not used
anywhere.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mvpp2_txq_bufs_free() function is called upon TX completion to DMA
unmap TX buffers, and free the corresponding SKBs. It gets the
references to the SKB to free and the DMA buffer to unmap from a per-CPU
txq_pcpu data structure.
However, the code currently increments the pointer to the next entry
before doing the DMA unmap and freeing the SKB. It does not cause any
visible problem because for a given SKB the TX completion is guaranteed
to take place on the CPU where the TX was started. However, it is much
more logical to increment the pointer to the next entry once the current
entry has been completely unmapped/released.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring the MVPP2_ISR_RX_THRESHOLD_REG with the RX coalescing
time threshold, we do not check for the maximum allowed value supported
by the driver, which means we might overflow and use a bogus value. This
commit adds a check for this situation, and if a value higher than what
is supported by the hardware is provided, then we use the maximum value
supported by the hardware.
In order to achieve this in a way that avoids overflow and rounding
errors, we introduce two utility functions mvpp2_usec_to_cycles() and
cycles_to_usec(). Many thanks to Russell King for suggesting this
implementation.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mvpp2_rx_pkts_coal_set() does the following to avoid setting
a too large value for the RX coalescing by packet number:
val = (pkts & MVPP2_OCCUPIED_THRESH_MASK);
This means that if you set a value that is slightly higher the the
maximum number of packets, you in fact get a very low value. It makes a
lot more sense to simply check if the value is too high, and if it's too
high, limit it to the maximum possible value.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Russell King, the last argument of
mvpp2_rx_{pkts,time}_coal_set() is useless, since the packet/time
coalescing value is already stored in the 'struct mvpp2_rx_queue *'
passed as argument to these functions. So passing the packet/time value
as an additional argument, and setting them again in the mvpp2_rx_queue
structure is useles.
This commit therefore gets rid of this additional argument, assuming the
caller has assigned the appropriate value to rxq->pkts_coal or
rxq->time_coal before calling the respective functions.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TX descriptors are filled in, the buffer DMA address is split
between the tx_desc->buf_phys_addr field (high-order bits) and
tx_desc->packet_offset field (5 low-order bits).
However, when we re-calculate the DMA address from the TX descriptor in
mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into
account. This means that when the DMA address is not aligned on a 32
bytes boundary, we end up calling dma_unmap_single() with a DMA address
that was not the one returned by dma_map_single().
This inconsistency is detected by the kernel when DMA_API_DEBUG is
enabled. We fix this problem by properly calculating the DMA address in
mvpp2_txq_inc_put().
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MACSec test failed when asynchronous crypto operations is used. It
encounters packet validation failed since macsec_skb_cb(skb)->valid
is always 'false'.
This patch adds missing "macsec_skb_cb(skb)->valid = true" in
macsec_decrypt_done() when "err == 0".
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The USEC_PER_SEC is used once in sock_set_timeout as the max value of
tv_usec. But there are other similar codes which use the literal
1000000 in this file.
It is minor cleanup to keep consitent.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch switch to use build_skb() for small buffer which can have
better performance for both TCP and XDP (since we can work at page
before skb creation). It also remove lots of XDP codes since both
mergeable and small buffer use page frag during refill now.
Before | After
XDP_DROP(xdp1) 64B : 11.1Mpps | 14.4Mpps
Tested with xdp1/xdp2/xdp_ip_tx_tunnel and netperf.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skbs processed by ip_cmsg_recv() are not guaranteed to
be linear e.g. when sending UDP packets over loopback with
MSGMORE.
Using csum_partial() on [potentially] the whole skb len
is dangerous; instead be on the safe side and use skb_checksum().
Thanks to syzkaller team to detect the issue and provide the
reproducer.
v1 -> v2:
- move the variable declaration in a tighter scope
Fixes: ad6f939ab1 ("ip: Add offset parameter to ip_cmsg_recv")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
silences the below warning:
drivers/net/vxlan.c: In function ‘neigh_reduce’:
drivers/net/vxlan.c:1599:25: warning: variable ‘saddr’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds changelink rtnl op support for vxlan netdevs.
code changes involve:
- refactor vxlan_newlink into vxlan_nl2conf to be
used by vxlan_newlink and vxlan_changelink
- vxlan_nl2conf and vxlan_dev_configure take a
changelink argument to isolate changelink checks
and updates.
- Allow changing only a few attributes:
- return -EOPNOTSUPP for attributes that cannot
be changed for now. Incremental patches can
make the non-supported one available in the future
if needed.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only one possible error path which reaches the err label, so
return ERR_PTR(-ENOMEM) directly if alloc_netdev_mqs() fails. This also
allows to omit the err variable.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Andy Price submitted a patch to make gfs2_write_full_page a
static function.
2. Dan Carpenter submitted a patch to fix a ERR_PTR thinko.
I've also got a few patches, three of which fix bugs related to
deleting very large files, which cause GFS2 to run out of
journal space:
3. The first one prevents GFS2 delete operation from requesting too
much journal space.
4. The second one fixes a problem whereby GFS2 can hang because it
wasn't taking journal space demand into its calculations.
5. The third one wakes up IO waiters when a flush is done to restart
processes stuck waiting for journal space to become available.
The other three patches are a performance improvement related to
spin_lock contention between multiple writers:
6. The "tr_touched" variable was switched to a flag to be more atomic
and eliminate the possibility of some races.
7. Function meta_lo_add was moved inline with its only caller to make
the code more readable and efficient.
8. Contention on the gfs2_log_lock spinlock was greatly reduced by
avoiding the lock altogether in cases where we don't really need
it: buffers that already appear in the appropriate metadata list
for the journal. Many thanks to Steve Whitehouse for the ideas and
principles behind these patches.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYrEEEAAoJENeLYdPf93o7bjoIAIqPG/EAzi+idgMWDPQa9Eit
53dPy16snkrbWwtaK6spSWlH6bGYuHeanXORYon9bvtVjKYaa4NQclGihN2IE6uB
O8zT+MGwP45LDhNplVJpumaOALZ9ZDqQSe+3tHeNK3FhNirLyiIjSqrHt/7Yi1qi
fPLlT4Jx0TBo5rhvEGa7Yg01WhWVtnmVSMqJXj/7ZtC50s1aPyDUikdNIDfDCN2X
LxfKGDXuk6p63VQ6JKqYSBVATCR0/bbKfkuk/kBUTYLoHoapImxB8d0HgIdsh1Mv
9PlbZnnNW8k5oapuhVxjl0T5G0JsQgCkPb/wlte+ryOCjBoc2L2fCUV5qc0QxWc=
=xQyl
-----END PGP SIGNATURE-----
Merge tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Robert Peterson:
"We've got eight GFS2 patches for this merge window:
- Andy Price submitted a patch to make gfs2_write_full_page a static
function.
- Dan Carpenter submitted a patch to fix a ERR_PTR thinko.
Three patches fix bugs related to deleting very large files, which
cause GFS2 to run out of journal space:
- The first one prevents GFS2 delete operation from requesting too
much journal space.
- The second one fixes a problem whereby GFS2 can hang because it
wasn't taking journal space demand into its calculations.
- The third one wakes up IO waiters when a flush is done to restart
processes stuck waiting for journal space to become available.
The final three patches are a performance improvement related to
spin_lock contention between multiple writers:
- The "tr_touched" variable was switched to a flag to be more atomic
and eliminate the possibility of some races.
- Function meta_lo_add was moved inline with its only caller to make
the code more readable and efficient.
- Contention on the gfs2_log_lock spinlock was greatly reduced by
avoiding the lock altogether in cases where we don't really need
it: buffers that already appear in the appropriate metadata list
for the journal. Many thanks to Steve Whitehouse for the ideas and
principles behind these patches"
* tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Make gfs2_write_full_page static
GFS2: Reduce contention on gfs2_log_lock
GFS2: Inline function meta_lo_add
GFS2: Switch tr_touched to flag in transaction
GFS2: Wake up io waiters whenever a flush is done
GFS2: Made logd daemon take into account log demand
GFS2: Limit number of transaction blocks requested for truncates
GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
Pull UDF fixes and cleanups from Jan Kara:
"Several small UDF fixes and cleanups and a small cleanup of fanotify
code"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: simplify the code of fanotify_merge
udf: simplify udf_ioctl()
udf: fix ioctl errors
udf: allow implicit blocksize specification during mount
udf: check partition reference in udf_read_inode()
udf: atomically read inode size
udf: merge module informations in super.c
udf: remove next_epos from udf_update_extent_cache()
udf: Factor out trimming of crtime
udf: remove empty condition
udf: remove unneeded line break
udf: merge bh free
udf: use pointer for kernel_long_ad argument
udf: use __packed instead of __attribute__ ((packed))
udf: Make stat on symlink report symlink length as st_size
fs/udf: make #ifdef UDF_PREALLOCATE unconditional
fs: udf: Replace CURRENT_TIME with current_time()
It was found when running fio sequential write test with a XFS ramdisk
on a KVM guest running on a 2-socket x86-64 system, the %CPU times
as reported by perf were as follows:
69.75% 0.59% fio [k] down_write
69.15% 0.01% fio [k] call_rwsem_down_write_failed
67.12% 1.12% fio [k] rwsem_down_write_failed
63.48% 52.77% fio [k] osq_lock
9.46% 7.88% fio [k] __raw_callee_save___kvm_vcpu_is_preempt
3.93% 3.93% fio [k] __kvm_vcpu_is_preempted
Making vcpu_is_preempted() a callee-save function has a relatively
high cost on x86-64 primarily due to at least one more cacheline of
data access from the saving and restoring of registers (8 of them)
to and from stack as well as one more level of function call.
To reduce this performance overhead, an optimized assembly version
of the the __raw_callee_save___kvm_vcpu_is_preempt() function is
provided for x86-64.
With this patch applied on a KVM guest on a 2-socket 16-core 32-thread
system with 16 parallel jobs (8 on each socket), the aggregrate
bandwidth of the fio test on an XFS ramdisk were as follows:
I/O Type w/o patch with patch
-------- --------- ----------
random read 8141.2 MB/s 8497.1 MB/s
seq read 8229.4 MB/s 8304.2 MB/s
random write 1675.5 MB/s 1701.5 MB/s
seq write 1681.3 MB/s 1699.9 MB/s
There are some increases in the aggregated bandwidth because of
the patch.
The perf data now became:
70.78% 0.58% fio [k] down_write
70.20% 0.01% fio [k] call_rwsem_down_write_failed
69.70% 1.17% fio [k] rwsem_down_write_failed
59.91% 55.42% fio [k] osq_lock
10.14% 10.14% fio [k] __kvm_vcpu_is_preempted
The assembly code was verified by using a test kernel module to
compare the output of C __kvm_vcpu_is_preempted() and that of assembly
__raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The cpu argument in the function prototype of vcpu_is_preempted()
is changed from int to long. That makes it easier to provide a better
optimized assembly version of that function.
For Xen, vcpu_is_preempted(long) calls xen_vcpu_stolen(int), the
downcast from long to int is not a problem as vCPU number won't exceed
32 bits.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guest segment selector is 16 bit field and guest segment base is natural
width field. Fix two incorrect invocations accordingly.
Without this patch, build fails when aggressive inlining is used with ICC.
Cc: stable@vger.kernel.org
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel's VMX is daft and resets the hidden TSS limit register to 0x67
on VMX reload, and the 0x67 is not configurable. KVM currently
reloads TR using the LTR instruction on every exit, but this is quite
slow because LTR is serializing.
The 0x67 limit is entirely harmless unless ioperm() is in use, so
defer the reload until a task using ioperm() is actually running.
Here's some poorly done benchmarking using kvm-unit-tests:
Before:
cpuid 1313
vmcall 1195
mov_from_cr8 11
mov_to_cr8 17
inl_from_pmtimer 6770
inl_from_qemu 6856
inl_from_kernel 2435
outl_to_kernel 1402
After:
cpuid 1291
vmcall 1181
mov_from_cr8 11
mov_to_cr8 16
inl_from_pmtimer 6457
inl_from_qemu 6209
inl_from_kernel 2339
outl_to_kernel 1391
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[Force-reload TR in invalidate_tss_limit. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Historically, the entire TSS + io bitmap structure was cacheline
aligned, but commit ca241c7503 ("x86: unify tss_struct") changed it
(presumably inadvertently) so that the fixed-layout hardware part is
cacheline-aligned and the io bitmap is after the padding. This wastes
24 bytes (the hardware part should be 104 bytes, but this pads it to
128 bytes) and, serves no purpose, and causes sizeof(struct
x86_hw_tss) to have a confusing value.
Drop the pointless alignment.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use actual pointer types for pointers (instead of unsigned long) and
replace hardcoded constants with the appropriate self-documenting
macros.
The function is still a bit messy, but this seems a lot better than
before to me.
This is mostly borrowed from a patch by Thomas Garnier.
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It was a bit buggy (it didn't list all segment types that needed
64-bit fixups), but the bug was irrelevant because it wasn't called
in any interesting context on 64-bit kernels and was only used for
data segents on 32-bit kernels.
To avoid confusion, make it explicitly 32-bit only.
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The current CPU's TSS base is a foregone conclusion, so there's no need
to parse it out of the segment tables. This should save a couple cycles
(as STR is surely microcoded and poorly optimized) but, more importantly,
it's a cleanup and it means that segment_base() will never be called on
64-bit kernels.
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rather than open-coding the kernel TSS limit in set_tss_desc(), make
it a real macro near the TSS layout definition.
This is purely a cleanup.
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This commit adds the ndo_do_ioctl() callback which allows the userspace to
access PHY registers, for example. This will make mii-diag and similar
tools work.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
As I don't have the hardware, I'd be very pleased if
someone may test this patch.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Fix probe and open bugs.
Fix 3 issues related to bnxt_init_one() and bnxt_open(). Don't probe
bridge devices and fixup some error code paths.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In the bnxt_init_one() failure path, bar1 and bar2 are not
being unmapped. This commit fixes this issue. Reorganize the
code so that bnxt_init_one()'s failure path and bnxt_remove_one()
can call the same function to do the PCI cleanup.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bnxt_hwrm_ring_free() is called during a failure path in bnxt_open(),
it is possible that the completion rings have not been allocated yet.
In that case, the completion doorbell has not been initialized, and
calling bnxt_disable_int() will crash. Fix it by checking that the
completion ring has been initialized before writing to the completion
ring doorbell.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are additional SoC devices that use the same device ID for
bridge and NIC devices. The bnxt driver should reject probe against
all bridge devices since it's meant to be used with only endpoint
devices.
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull CIFS/SMB3 updates from Steve French:
"Includes support for a critical SMB3 security feature: per-share
encryption from Pavel, and a cleanup from Jean Delvare.
Will have another cifs/smb3 merge next week"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Allow to switch on encryption with seal mount option
CIFS: Add capability to decrypt big read responses
CIFS: Decrypt and process small encrypted packets
CIFS: Add copy into pages callback for a read operation
CIFS: Add mid handle callback
CIFS: Add transform header handling callbacks
CIFS: Encrypt SMB3 requests before sending
CIFS: Enable encryption during session setup phase
CIFS: Add capability to transform requests before sending
CIFS: Separate RFC1001 length processing for SMB2 read
CIFS: Separate SMB2 sync header processing
CIFS: Send RFC1001 length in a separate iov
CIFS: Make send_cancel take rqst as argument
CIFS: Make SendReceive2() takes resp iov
CIFS: Separate SMB2 header structure
CIFS: Fix splice read for non-cached files
cifs: Add soft dependencies
cifs: Only select the required crypto modules
cifs: Simplify SMB2 and SMB311 dependencies
primarily used for testing, but which can be useful on production
systems when a scratch volume is being destroyed and the data on it
doesn't need to be saved. This found (and we fixed) a number of bugs
with ext4's recovery to corrupted file system --- the bugs increased
the amount of data that could be potentially lost, and in the case of
the inline data feature, could cause the kernel to BUG.
Also included are a number of other bug fixes, including in ext4's
fscrypt, DAX, inline data support.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirXesACgkQ8vlZVpUN
gaMOzQf8Ct6uPatV+m855oR4dAbZr2+lY4A4C+vHDzBtSMkPRyLX8cuo8XcwfTIm
vPVyDnL6EPyhXPxxfItu+92wAq1m5mVpKo57d0Ft5lw0rHxNtJTgVSRzsQ7VDRjj
5qMHW2K7Bk7EjzTeW3SF8/3+hqpzkAvRtNCntcomk5h08+cWMC8JSnn1kqw+naIn
EcbrC72GZb8JUELogVXC2vU58lp50SSBdr3l005jqKc5BvljMvdJ0Izn/3RVyU7u
q7vtynhe2ScFcHe/UzL1QgmQOy32tJpbS0NHalW47aw3Ynmn4cSX0YhhT9FDjRNQ
VOOfo1m1sAg166x0E+Nn7FeghTSSyA==
=cPIf
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"For this cycle we add support for the shutdown ioctl, which is
primarily used for testing, but which can be useful on production
systems when a scratch volume is being destroyed and the data on it
doesn't need to be saved.
This found (and we fixed) a number of bugs with ext4's recovery to
corrupted file system --- the bugs increased the amount of data that
could be potentially lost, and in the case of the inline data feature,
could cause the kernel to BUG.
Also included are a number of other bug fixes, including in ext4's
fscrypt, DAX, inline data support"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
ext4: rename EXT4_IOC_GOINGDOWN to EXT4_IOC_SHUTDOWN
ext4: fix fencepost in s_first_meta_bg validation
ext4: don't BUG when truncating encrypted inodes on the orphan list
ext4: do not use stripe_width if it is not set
ext4: fix stripe-unaligned allocations
dax: assert that i_rwsem is held exclusive for writes
ext4: fix DAX write locking
ext4: add EXT4_IOC_GOINGDOWN ioctl
ext4: add shutdown bit and check for it
ext4: rename s_resize_flags to s_ext4_flags
ext4: return EROFS if device is r/o and journal replay is needed
ext4: preserve the needs_recovery flag when the journal is aborted
jbd2: don't leak modified metadata buffers on an aborted journal
ext4: fix inline data error paths
ext4: move halfmd4 into hash.c directly
ext4: fix use-after-iput when fscrypt contexts are inconsistent
jbd2: fix use after free in kjournald2()
ext4: fix data corruption in data=journal mode
ext4: trim allocation requests to group size
ext4: replace BUG_ON with WARN_ON in mb_find_extent()
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirP6wACgkQ8vlZVpUN
gaMwpQgApR67CxzlstxYjZpWPAqC8McJ2FBDX+mCOle5Vkc1WQDklwr0oCfQThTj
eDSFRhNfIvyPh0DJ589PxBCsWOqN5h6Si7hD5ZinomVNI+IL89OytaU5EV2OpWaW
iKdJgO9Tm8U7LuY6FOIoVdX57kUXVdkWoj61rC056B1SNhnNiVeofi7lYDM8Ix4q
IGSQ9W24iQKmCk4hCwgObhJBRK9RnlOH0GLUmpMaS+jnfnj/uePwdxWEFsPuCOob
8acAJ49lr55kjIw79E0BAyWxhEZ2aiArHk8PaWynT/DyNq3ftcapPlpftoeba8vo
glBJRX70QxPvt0iHEp0ykfExkhWhFA==
=Joki
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Various cleanups for the file system encryption feature"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: constify struct fscrypt_operations
fscrypt: properly declare on-stack completion
fscrypt: split supp and notsupp declarations into their own headers
fscrypt: remove redundant assignment of res
fscrypt: make fscrypt_operations.key_prefix a string
fscrypt: remove unused 'mode' member of fscrypt_ctx
ext4: don't allow encrypted operations without keys
fscrypt: make test_dummy_encryption require a keyring key
fscrypt: factor out bio specific functions
fscrypt: pass up error codes from ->get_context()
fscrypt: remove user-triggerable warning messages
fscrypt: use EEXIST when file already uses different policy
fscrypt: use ENOTDIR when setting encryption policy on nondirectory
fscrypt: use ENOKEY when file cannot be created w/o key