BPF map offload follow similar path to program offload. At creation
time users may specify ifindex of the device on which they want to
create the map. Map will be validated by the kernel's
.map_alloc_check callback and device driver will be called for the
actual allocation. Map will have an empty set of operations
associated with it (save for alloc and free callbacks). The real
device callbacks are kept in map->offload->dev_ops because they
have slightly different signatures. Map operations are called in
process context so the driver may communicate with HW freely,
msleep(), wait() etc.
Map alloc and free callbacks are muxed via existing .ndo_bpf, and
are always called with rtnl lock held. Maps and programs are
guaranteed to be destroyed before .ndo_uninit (i.e. before
unregister_netdev() returns). Map callbacks are invoked with
bpf_devs_lock *read* locked, drivers must take care of exclusive
locking if necessary.
All offload-specific branches are marked with unlikely() (through
bpf_map_is_dev_bound()), given that branch penalty will be
negligible compared to IO anyway, and we don't want to penalize
SW path unnecessarily.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a helper to check if netdev could be found and whether it
has .ndo_bpf callback. There is no need to check the callback
every time it's invoked, ndos can't reasonably be swapped for
a set without .ndp_bpf while program is loaded.
bpf_dev_offload_check() will also be used by map offload.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
All map types reimplement the field-by-field copy of union bpf_attr
members into struct bpf_map. Add a helper to perform this operation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use the new callback to perform allocation checks for hash maps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Number of attribute checks are currently performed after hashtab
is already allocated. Move them to be able to split them out to
the check function later on. Checks have to now be performed on
the attr union directly instead of the members of bpf_map, since
bpf_map will be allocated later. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
.map_alloc callbacks contain a number of checks validating user-
-provided map attributes against constraints of a particular map
type. For offloaded maps we will need to check map attributes
without actually allocating any memory on the host. Add a new
callback for validating attributes before any memory is allocated.
This callback can be selectively implemented by map types for
sharing code with offloads, or simply to separate the logical
steps of validation and allocation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Masami Hiramatsu says:
====================
Here are the 5th version of patches to moving error injection
table from kprobes. This version fixes a bug and update
fail-function to support multiple function error injection.
Here is the previous version:
https://patchwork.ozlabs.org/cover/858663/
Changes in v5:
- [3/5] Fix a bug that within_error_injection returns false always.
- [5/5] Update to support multiple function error injection.
Thank you,
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Support in-kernel fault-injection framework via debugfs.
This allows you to inject a conditional error to specified
function using debugfs interfaces.
Here is the result of test script described in
Documentation/fault-injection/fault-injection.txt
===========
# ./test_fail_function.sh
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0227404 s, 46.1 MB/s
btrfs-progs v4.4
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: bfa96010-12e9-4360-aed0-42eec7af5798
Node size: 16384
Sector size: 4096
Filesystem size: 1001.00MiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 58.00MiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 1001.00MiB /dev/loop2
mount: mount /dev/loop2 on /opt/tmpmnt failed: Cannot allocate memory
SUCCESS!
===========
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add injectable error types for each error-injectable function.
One motivation of error injection test is to find software flaws,
mistakes or mis-handlings of expectable errors. If we find such
flaws by the test, that is a program bug, so we need to fix it.
But if the tester miss input the error (e.g. just return success
code without processing anything), it causes unexpected behavior
even if the caller is correctly programmed to handle any errors.
That is not what we want to test by error injection.
To clarify what type of errors the caller must expect for each
injectable function, this introduces injectable error types:
- EI_ETYPE_NULL : means the function will return NULL if it
fails. No ERR_PTR, just a NULL.
- EI_ETYPE_ERRNO : means the function will return -ERRNO
if it fails.
- EI_ETYPE_ERRNO_NULL : means the function will return -ERRNO
(ERR_PTR) or NULL.
ALLOW_ERROR_INJECTION() macro is expanded to get one of
NULL, ERRNO, ERRNO_NULL to record the error type for
each function. e.g.
ALLOW_ERROR_INJECTION(open_ctree, ERRNO)
This error types are shown in debugfs as below.
====
/ # cat /sys/kernel/debug/error_injection/list
open_ctree [btrfs] ERRNO
io_ctl_init [btrfs] ERRNO
====
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.
Some differences has been made:
- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
feature. It is automatically enabled if the arch supports
error injection feature for kprobe or ftrace etc.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Compare instruction pointer with original one on the
stack instead using per-cpu bpf_kprobe_override flag.
This patch also consolidates reset_current_kprobe() and
preempt_enable_no_resched() blocks. Those can be done
in one place.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.
As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As pointed out by Daniel Borkmann, using bpf_target_off() is not
necessary for xdp_rxq_info when extracting queue_index and
ifindex, as these members are u32 like BPF_W.
Also fix trivial spelling mistake introduced in same commit.
Fixes: 02dd3291b2 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Peng Li says:
====================
hns3: add some new features and fix some bugs
This patchset adds 3 ethtool features: get_channels,
get_coalesce and get_coalesce, and fix some bugs.
[patch 1/11] adds ethtool_ops.get_channels (ethtool -l) support
for VF.
[patch 2/11] removes TSO config command from VF driver,
as only main PF can config TSO MSS length according to
hardware.
[patch 3/11 - 4/11] add ethtool_ops {get|set}_coalesce
(ethtool -c/-C) support to PF.
[patch 5/11 - 9/11] fix some bugs related to {get|set}_coalesce.
[patch 10/11 - 11/11] fix the features handling in
hns3_nic_set_features(). Local variable "changed" was defined
to indicates features changed, but was used only for feature
NETIF_F_HW_VLAN_CTAG_RX. Add checking to improve the reliability.
---
Change log:
V1 -> V2:
1, Rewrite the cover letter requested by David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It's necessary to check hook whether being defined before
calling, improve the reliability.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Local variable "changed" was defined to indicates features changed,
but was used only for feature NETIF_F_HW_VLAN_CTAG_RX. Add checking
for other features.
Fixes: 052ece6dc1 ("net: hns3: add ethtool related offload command")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the int_gl_idx does not be set, the default interrupt coalesce index
is 0. The TX queues and the RX queues will both use the GL0 as the
interrupt coalesce GL switch. But it should be GL1 for TX queues and GL0
for RX queues.
This patch adds the int_gl_idx setup for TX queues and RX queues.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, driver used 2us as the GL unit. The time unit ethtool
command "-c" and "-C" use is 1us, so now the GL unit driver uses
actually is 1us.
This patch changes the unit of GL value macro from
2us to 1us.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the TX GL and the RX GL need to be set separately,
hns3_set_vector_coalesc_gl() has been replaced with
hns3_set_vector_coalesce_rx_gl() and hns3_set_vector_coalesce_tx_gl().
This patch removes hns3_set_vector_coalesc_gl().
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GL update function uses the max GL value between tx_int_gl and
rx_int_gl to set both new tx_int_gl and new rx_int_gl. Therefore, User
can not enable TX GL self-adaptive or RX GL self-adaptive individually.
This patch refactors the code to update the TX GL and the RX GL
separately, making user can enable TX GL self-adaptive or RX GL
self-adaptive individually.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the hardware, the coalesce configurable registers include GL0, GL1,
GL2. In the driver, the TX queues use the register GL1 and the RX queues
use the register GL0. This function initializes the configuration of the
interrupt coalescing, but does not distinguish between the TX direction
and the RX direction. It will cause some confusion.
This patch refactors the function to initialize the TX GL and the RX GL
separately. And the initialization of related variables also is added to
this patch.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ethtool_ops.set_coalesce support to PF.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ethtool_ops.get_coalesce support to PF.
Whilst our hardware supports per queue values, external interfaces
support only a single shared value. As such we use the values for
queue 0.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only main PF can config TSO MSS length according to hardware.
This patch removes TSO config command from VF driver.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch supports the ethtool's get_channels() for VF.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.
Also, we should attempt to patch BPF call args if JIT always on is
enabled. Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJaV5ghAAoJEEp/3jgCEfOL/bEH/3UopVYhLpHPqIAGCQuONuvG
4imm/19uVIsbJMCHZ/bbksPPOaswqCKws5ScfIJKzg9vI8PQaQ28BnQh/vELHlmk
lUgdXgVPaCoM/3UlCquxmBn5IooLt0t7LdCvMO2MvF3mf+jThyfwcoc1H2xe1Yh2
k9dPQWcfNBY7jGTBGzqyuNwfg9DZSMyBRx4JmOsqlCI/GDcA8DsLHIykA4wRQRF+
gtSCYeUuZsfukCk4Z1ImoYyfbtK/tYH4s9UOc5pEsd0s0NUs8bn7hgQn2QKWmQt4
HsOzNTiAl/pXSRi8BMqPyFGVdZLT1cRkeo/FW1Lkxg5NqJmv005ebJ1Jx8YHW9A=
=JTlC
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
stable"
* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
rbd: set max_segments to USHRT_MAX
rbd: reacquire lock should update lock owner client id
introduced by yours truly.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaVzO3AAoJEEEQszewGV1zATUQAJKL3aYxRbm+BzszDAivdZ9d
Pb4OEmMPyUMnYyl6o4GvCqJrWhIFCW3F5QY/4gsFt+ZYf2wsVZGvGGca3nYPFYG6
qCp+gIhpY32GlVW9lqVKVEplYkxYyLva+qCKzV+aiMar2kDV8YApEbv59X7R97ir
lsxujweIcYgTQYuEJI/Wr3vwqWGZL/hPVqydfI/gSKVYdqHca2hIvz5mZpYDVM/f
34hgX5V8z0yj8z4rUmgoShAicToK8de3sXChG7uBaFDGeOVX1LMgjUdFucQzHr0U
aPUAWYsrqVi5ZczK5ClhKFCr5y1cfYPv0I9bYsyqHsHFMJq3HMI1eRoFy25POFTi
RnelnThRQKPydPCVIGif3/xfDZOn/IVi9BN6AOxTjnAw/DA/gIavQUDcoh8j4EFK
BePwSiKyJKg4CT3q/beIz2Lxx7JrCzioB3Q1kNGQipIv/X5LeT7lNVEmWP3X1Dif
0od6eeOhFwTSK+sXO0pSHuTB+QhOatLBdoOaHbRnwWqkWMjwVxr2KAFrHzxBueXg
qBGD+GyytZjda9wxBJxWTJNYQA75xBZM1eZUvlv2Fu3hzlDvHUcSVJ8rC82qJ6k7
/xz2rJz86ic+MJ3p3qOdNlUOJ7pEskqiUcttK0/LWXeKhgw3+oTQEG/asGfrPeZg
GAp/6k4XgaXI8iALKsWi
=S8Fj
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"
* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-11
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various BPF related improvements and fixes to nfp driver: i) do
not register XDP RXQ structure to control queues, ii) round up
program stack size to word size for nfp, iii) restrict MTU changes
when BPF offload is active, iv) add more fully featured relocation
support to JIT, v) add support for signed compare instructions to
the nfp JIT, vi) export and reuse verfier log routine for nfp, and
many more, from Jakub, Quentin and Nic.
2) Fix a syzkaller reported GPF in BPF's copy_verifier_state() when
we hit kmalloc failure path, from Alexei.
3) Add two follow-up fixes for the recent XDP RXQ series: i) kvzalloc()
allocated memory was only kfree()'ed, and ii) fix a memory leak where
RX queue was not freed in netif_free_rx_queues(), from Jakub.
4) Add a sample for transferring XDP meta data into the skb, here it
is used for setting skb->mark with the buffer from XDP, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2018-01-11
Here's likely the last bluetooth-next pull request for the 4.16 kernel.
- Added support for Bluetooth on 2015+ MacBook (Pro)
- Fix to QCA Rome suspend/resume handling
- Two new QCA_ROME USB IDs in btusb
- A few other minor fixes
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
platform_get_resource() may fail and return NULL, so we should
better check it's return value to avoid a NULL pointer dereference
a bit later in the code.
This is detected by Coccinelle semantic patch.
@@
expression pdev, res, n, t, e, e1, e2;
@@
res = platform_get_resource(pdev, t, n);
+ if (!res)
+ return -EINVAL;
... when != res == NULL
e = devm_ioremap(e1, res->start, e2);
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -ENODEV from the of_phy_connect() error
handling case instead of 0, as done elsewhere in this function.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I ran into a randconfig build failure:
drivers/net/ethernet/socionext/netsec.c: In function 'netsec_probe':
drivers/net/ethernet/socionext/netsec.c:1583:17: error: implicit declaration of function 'devm_ioremap'; did you mean 'ioremap'? [-Werror=implicit-function-declaration]
Including linux/io.h directly fixes this.
Fixes: 533dd11a12 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2018-01-10
This series contains updates to i40e and i40evf only.
Alice adds the displaying of priority xon/xoff packet stats, since we
were already keeping track of them. Based on the recent changes, bump
the driver versions.
Jake changes how the driver determines whether or not the device is
currently up to resolve the possible issue of freeing data structures
and other memory before they have been fully allocated. Refactored
the driver to simplify the locking behavior and to consistently use
spinlocks instead of an overloaded bit lock to protect MAC and filter
lists. Created a helper function which can convert the AdminQ link
speed definition into a virtchnl definition.
Colin Ian King cleans up a redundant variable initialization.
Alex cleans up the driver to stop clearing the pending bit array for
each vector manually, since it is prone to dropping an interrupt and
based on the hardware specs, the pending bit array will be cleared
automatically in MSI-X mode. Cleaned up flags for promiscuous mode to
resolve an issue where enabling & disabling promiscuous mode on a VF
would leave us in a high polling rate for the adminq task. Cleaned up
code that was prone to race issues.
Jingjing renames pipeline personalization profile (ppp) to dynamic
device personalization (ddp) because it was being confused with the
well known point to point protocol. Also removed checks for "track_id"
being zero, since it is valid for it to be zero for profiles that do
not have any 'write' commands.
v2: cleaned up commit message for patch 12 based on feedback from Sergei
Shtylyov and Alex Duyck
v3: dropped patch 15 from the original series while Mariusz Stachura
works on the changes that Jakub Kicinski has suggested
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Initiating a kdump via the command line can cause a pending interrupt
to be handled by the ibmvnic driver when initializing the sub-CRQ
irqs during driver initialization.
NIP [d000000000ca34f0] ibmvnic_interrupt_rx+0x40/0xd0 [ibmvnic]
LR [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
Call Trace:
[c000000047fcfde0] [c000000008132ef0] __handle_irq_event_percpu+0xa0/0x2f0
[c000000047fcfea0] [c00000000813317c] handle_irq_event_percpu+0x3c/0x90
[c000000047fcfee0] [c00000000813323c] handle_irq_event+0x6c/0xd0
[c000000047fcff10] [c0000000081385e0] handle_fasteoi_irq+0xf0/0x250
[c000000047fcff40] [c0000000081320a0] generic_handle_irq+0x50/0x80
[c000000047fcff60] [c000000008014984] __do_irq+0x84/0x1d0
[c000000047fcff90] [c000000008027564] call_do_irq+0x14/0x24
[c00000003c92af00] [c000000008014b70] do_IRQ+0xa0/0x120
[c00000003c92af50] [c000000008002594] hardware_interrupt_common+0x114/0x180
Signed-off-by: David S. Miller <davem@davemloft.net>
add changes to t4_eth_xmit to enable vxlan segmentation
offload support.
Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ndo_udp_tunnel_add and ndo_udp_tunnel_del
to support vxlan tunnelling.
Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add data structures and macros to be used in vxlan
offload.
Original work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull vfs regression fix from Al Viro/
Fix a leak in socket() introduced by commit 8e1611e235 ("make
sock_alloc_file() do sock_release() on failures").
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix a leak in socket(2) when we fail to allocate a file descriptor.
Pull networking fixes from David Miller:
1) BPF speculation prevention and BPF_JIT_ALWAYS_ON, from Alexei
Starovoitov.
2) Revert dev_get_random_name() changes as adjust the error code
returns seen by userspace definitely breaks stuff.
3) Fix TX DMA map/unmap on older iwlwifi devices, from Emmanuel
Grumbach.
4) From wrong AF family when requesting sock diag modules, from Andrii
Vladyka.
5) Don't add new ipv6 routes attached to the null_entry, from Wei Wang.
6) Some SCTP sockopt length fixes from Marcelo Ricardo Leitner.
7) Don't leak when removing VLAN ID 0, from Cong Wang.
8) Hey there's a potential leak in ipv6_make_skb() too, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: sr: fix TLVs not being copied using setsockopt
ipv6: fix possible mem leaks in ipv6_make_skb()
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
mlxsw: pci: Wait after reset before accessing HW
nfp: always unmask aux interrupts at init
8021q: fix a memory leak for VLAN 0 device
of_mdio: avoid MDIO bus removal when a PHY is missing
caif_usb: use strlcpy() instead of strncpy()
doc: clarification about setting SO_ZEROCOPY
net: gianfar_ptp: move set_fipers() to spinlock protecting area
sctp: make use of pre-calculated len
sctp: add a ceiling to optlen in some sockopts
sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: avoid false sharing of map refcount with max_entries
ipv6: remove null_entry before adding default route
SolutionEngine771x: add Ether TSU resource
SolutionEngine771x: fix Ether platform data
docs-rst: networking: wire up msg_zerocopy
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
...
Creating a bpf sample that shows howto use the XDP 'data_meta'
infrastructure, created by Daniel Borkmann. Very few drivers support
this feature, but I wanted a functional sample to begin with, when
working on adding driver support.
XDP data_meta is about creating a communication channel between BPF
programs. This can be XDP tail-progs, but also other SKB based BPF
hooks, like in this case the TC clsact hook. In this sample I show
that XDP can store info named "mark", and TC/clsact chooses to use
this info and store it into the skb->mark.
It is a bit annoying that XDP and TC samples uses different tools/libs
when attaching their BPF hooks. As the XDP and TC programs need to
cooperate and agree on a struct-layout, it is best/easiest if the two
programs can be contained within the same BPF restricted-C file.
As the bpf-loader, I choose to not use bpf_load.c (or libbpf), but
instead wrote a bash shell scripted named xdp2skb_meta.sh, which
demonstrate howto use the iproute cmdline tools 'tc' and 'ip' for
loading BPF programs. To make it easy for first time users, the shell
script have command line parsing, and support --verbose and --dry-run
mode, if you just want to see/learn the tc+ip command syntax:
# ./xdp2skb_meta.sh --dev ixgbe2 --dry-run
# Dry-run mode: enable VERBOSE and don't call TC+IP
tc qdisc del dev ixgbe2 clsact
tc qdisc add dev ixgbe2 clsact
tc filter add dev ixgbe2 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark
# Flush XDP on device: ixgbe2
ip link set dev ixgbe2 xdp off
ip link set dev ixgbe2 xdp obj ./xdp2skb_meta_kern.o sec xdp_mark
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Got broken by "make sock_alloc_file() do sock_release() on failures" -
cleanup after sock_map_fd() failure got pulled all the way into
sock_alloc_file(), but it used to serve the case when sock_map_fd()
failed *before* getting to sock_alloc_file() as well, and that got
lost. Trivial to fix, fortunately.
Fixes: 8e1611e235 (make sock_alloc_file() do sock_release() on failures)
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Edward Cree says:
====================
sfc: support 25G configuration with ethtool
Adds support for advertise bits beyond the 32-bit legacy masks, and plumbs in
translation of the new 25/50/100G bits to/from MCDI.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Store and handle ethtool link mode masks within the driver instead of
just a single u32. However, quite a significant amount of existing code
wants to manipulate the masks directly, and thus now uses the first
unsigned long (i.e. mask[0]) as though it were a legacy u32 mask. This
is ok because all the bits that code is interested in are in the first
32 bits of the mask; but it might be a good idea to change them in
future to use the proper bitmap API.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only handles direct speed setting, not autoneg, because the driver is
still trying to pretend it uses the legacy ethtool API which doesn't
have advertised/supported bits for 25/50/100G.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw qdisc refactoring
This patchset refactors the qdisc handling in mlxsw driver in order to make
it more object oriented like.
It helps readability, laying the groundwork for the offloading of
additional qdiscs by the driver
This patchset also makes the qdiscs statistics more generic.
Patch 1 moves the qdiscs declaration to the spectrum_qdisc.c
Patches 2-3 clean the offloaded stats requests. Patch 2 changes the RED
generic stats struct to be sharable by other offloaded qdiscs. Patch 3
changes the xstats request to be like the stats. Note that these patches
are outside the driver scope.
Patches 4-5 clean the statistics related functions and structs within the
driver.
Patches 6-7 decrease the need for the same parameters to be sent to many
functions.
Patches 8-11 create a functions pointers struct, to make the qdiscs
handling more object oriented like.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>