kernel_optimize_test

Author	SHA1	Message	Date
Florian Fainelli	82272db84d	net: dsa: Drop WARN() in tag_brcm.c We may be able to see invalid Broadcom tags when the hardware and drivers are misconfigured, or just while exercising the error path. Instead of flooding the console with messages, flat out drop the packet. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 15:00:22 -05:00
Stephen Boyd	3ebe8344eb	net: ks8851: Drop eeprom_size structure member After commit `51b7b1c34e` (KSZ8851-SNL: Add ethtool support for EEPROM via eeprom_93cx6, 2011-11-21) this structure member is unused. Delete it. Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:56:44 -05:00
David S. Miller	bef4e179b0	Merge branch 'bpf-misc' Daniel Borkmann says: ==================== Misc BPF improvements This series adds various misc improvements to BPF, f.e. allowing skb_load_bytes() helper to be used with filter/reuseport programs to facilitate programming, test cases for program tag, etc. For details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:07 -05:00
Daniel Borkmann	3fadc80115	bpf: enable verifier to better track const alu ops William reported couple of issues in relation to direct packet access. Typical scheme is to check for data + [off] <= data_end, where [off] can be either immediate or coming from a tracked register that contains an immediate, depending on the branch, we can then access the data. However, in case of calculating [off] for either the mentioned test itself or for access after the test in a more "complex" way, then the verifier will stop tracking the CONST_IMM marked register and will mark it as UNKNOWN_VALUE one. Adding that UNKNOWN_VALUE typed register to a pkt() marked register, the verifier then bails out in check_packet_ptr_add() as it finds the registers imm value below 48. In the first below example, that is due to evaluate_reg_imm_alu() not handling right shifts and thus marking the register as UNKNOWN_VALUE via helper __mark_reg_unknown_value() that resets imm to 0. In the second case the same happens at the time when r4 is set to r4 &= r5, where it transitions to UNKNOWN_VALUE from evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside evaluate_reg_alu(), where the register's imm turns into 3. That is, for registers with type UNKNOWN_VALUE, imm of 0 means that we don't know what value the register has, and for imm > 0 it means that the value has [imm] upper zero bits. F.e. when shifting an UNKNOWN_VALUE register by 3 to the right, no matter what value it had, we know that the 3 upper most bits must be zero now. This is to make sure that ALU operations with unknown registers don't overflow. Meaning, once we know that we have more than 48 upper zero bits, or, in other words cannot go beyond 0xffff offset with ALU ops, such an addition will track the target register as a new pkt() register with a new id, but 0 offset and 0 range, so for that a new data/data_end test will be required. Is the source register a CONST_IMM one that is to be added to the pkt() register, or the source instruction is an add instruction with immediate value, then it will get added if it stays within max 0xffff bounds. >From there, pkt() type, can be accessed should reg->off + imm be within the access range of pkt(). [...] from 28 to 30: R0=imm1,min_value=1,max_value=1 R1=pkt(id=0,off=0,r=22) R2=pkt_end R3=imm144,min_value=144,max_value=144 R4=imm0,min_value=0,max_value=0 R5=inv48,min_value=2054,max_value=2054 R10=fp 30: (bf) r5 = r3 31: (07) r5 += 23 32: (77) r5 >>= 3 33: (bf) r6 = r1 34: (0f) r6 += r5 cannot add integer value with 0 upper zero bits to ptr_to_packet [...] from 52 to 80: R0=imm1,min_value=1,max_value=1 R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv R4=imm272 R5=inv56,min_value=17,max_value=17 R6=pkt(id=0,off=26,r=34) R10=fp 80: (07) r4 += 71 81: (18) r5 = 0xfffffff8 83: (5f) r4 &= r5 84: (77) r4 >>= 3 85: (0f) r1 += r4 cannot add integer value with 3 upper zero bits to ptr_to_packet Thus to get above use-cases working, evaluate_reg_imm_alu() has been extended for further ALU ops. This is fine, because we only operate strictly within realm of CONST_IMM types, so here we don't care about overflows as they will happen in the simulated but also real execution and interaction with pkt() in check_packet_ptr_add() will check actual imm value once added to pkt(), but it's irrelevant before. With regards to `06c1c04972` ("bpf: allow helpers access to variable memory") that works on UNKNOWN_VALUE registers, the verifier becomes now a bit smarter as it can better resolve ALU ops, so we need to adapt two test cases there, as min/max bound tracking only becomes necessary when registers were spilled to stack. So while mask was set before to track upper bound for UNKNOWN_VALUE case, it's now resolved directly as CONST_IMM, and such contructs are only necessary when f.e. registers are spilled. For commit `6b17387307` ("bpf: recognize 64bit immediate loads as consts") that initially enabled dw load tracking only for nfp jit/ analyzer, I did couple of tests on large, complex programs and we don't increase complexity badly (my tests were in ~3% range on avg). I've added a couple of tests similar to affected code above, and it works fine with verifier now. Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Gianluca Borello <g.borello@gmail.com> Cc: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:06 -05:00
Daniel Borkmann	62b6466026	bpf: add prog tag test case to bpf selftests Add the test case used to compare the results from fdinfo with af_alg's output on the tag. Tests are from min to max sized programs, with and without maps included. # ./test_tag test_tag: OK (40945 tests) Tested on x86_64 and s390x. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:06 -05:00
Daniel Borkmann	d1b662adcd	bpf: allow option for setting bpf_l4_csum_replace from scratch When programs need to calculate the csum from scratch for small UDP packets and use bpf_l4_csum_replace() to feed the result from helpers like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0 that would ignore the case of current csum being 0, and which would still allow for the helper to set the csum and transform when needed to CSUM_MANGLED_0. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:06 -05:00
Daniel Borkmann	2492d3b867	bpf: enable load bytes helper for filter/reuseport progs BPF_PROG_TYPE_SOCKET_FILTER are used in various facilities such as for SO_REUSEPORT and packet fanout demuxing, packet filtering, kcm, etc, and yet the only facility they can use is BPF_LD with {BPF_ABS, BPF_IND} for single byte/half/word access. Direct packet access is only restricted to tc programs right now, but we can still facilitate usage by allowing skb_load_bytes() helper added back then in `05c74e5e53` ("bpf: add bpf_skb_load_bytes helper") that calls skb_header_pointer() similarly to bpf_load_pointer(), but for stack buffers with larger access size. Name the previous sk_filter_func_proto() as bpf_base_func_proto() since this is used everywhere else as well, similarly for the ctx converter, that is, bpf_convert_ctx_access(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:05 -05:00
Daniel Borkmann	4faf940dd8	bpf: simplify __is_valid_access test on cb The __is_valid_access() test for cb[] from `62c7989b24` ("bpf: allow b/h/w/dw access for bpf's cb in ctx") was done unnecessarily complex, we can just simplify it the same way as recent fix from `2d071c643f` ("bpf, trace: make ctx access checks more robust") did. Overflow can never happen as size is 1/2/4/8 depending on access. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:46:05 -05:00
Arnd Bergmann	187024144c	phy: marvell: remove conflicting initializer One line was apparently pasted incorrectly during a new feature patch: drivers/net/phy/marvell.c:2090:15: error: initialized field overwritten [-Werror=override-init] .features = PHY_GBIT_FEATURES, I'm removing the extraneous line here to avoid the W=1 warning and restore the previous flags value, and I'm slightly reordering the lines for consistency to make it less likely to happen again in the future. The ordering in the array is still not the same as in the structure definition, instead I picked the order that is most common in this file and that seems to make more sense here. Fixes: `0b04680fda` ("phy: marvell: Add support for temperature sensor") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:08:46 -05:00
Phil Sutter	e1636836e0	net: dummy: Introduce dummy virtual functions The idea for this was born when testing VF support in iproute2 which was impeded by hardware requirements. In fact, not every VF-capable hardware driver implements all netdev ops, so testing the interface is still hard to do even with a well-sorted hardware shelf. To overcome this and allow for testing the user-kernel interface, this patch allows to turn dummy into a PF with a configurable amount of VFs. Since my patch series 'bus-agnostic-num-vf' has been accepted, implementing the required interfaces is pretty straightforward: Iff 'num_vfs' module parameter was given a value >0, a dummy bus type is being registered which implements the 'num_vf()' callback. Additionally, a dummy parent device common to all dummy devices is registered which sits on the above dummy bus. Joint work with Sabrina Dubroca. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 14:07:22 -05:00
Philippe Reynes	8b86b2c1b8	net: broadcom: bnx2x: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:49:19 -05:00
David S. Miller	36f877804c	Merge branch 'packet-sampling-offload' Jiri Pirko says: ==================== Add support for offloading packet-sampling Yotam says: The first patch introduces the psample module, a netlink channel dedicated to packet sampling implemented using generic netlink. This module provides a generic way for kernel modules to sample packets, while not being tied to any specific subsystem like NFLOG. The second patch adds the sample tc action, which uses psample to randomly sample packets that match a classifier. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic). The last two patches add the support for offloading the matchall-sample tc command in the mlxsw driver, for ingress qdiscs. An example for psample usage can be found in the libpsample project at: https://github.com/Mellanox/libpsample v1->v2: - Reword first patch's commit message - Fix typo in comment in second patch - Change order of tc_sample uapi enum to match convention - Rename act_sample action callback tcf_sample -> tcf_sample_act ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:44:29 -05:00
Yotam Gigi	98d0f7b9ac	mlxsw: spectrum: Add packet sample offloading support Using the MPSC register, add the functions that configure port-based packet sampling in hardware and the necessary datatypes in the mlxsw_sp_port struct. In addition, add the necessary trap for sampled packets and integrate with matchall offloading to allow offloading of the sample tc action. The current offload support is for the tc command: tc filter add dev <DEV> parent ffff: \ matchall skip_sw \ action sample rate <RATE> group <GROUP> [trunc <SIZE>] Where only ingress qdiscs are supported, and only a combination of matchall classifier and sample action will lead to activating hardware packet sampling. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:44:28 -05:00
Yotam Gigi	0677d6828b	mlxsw: reg: add the Monitoring Packet Sampling Configuration Register The MPSC register allows to configure ingress packet sampling on specific port of the mlxsw device. The sampled packets are then trapped via PKT_SAMPLE trap. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:44:28 -05:00
Yotam Gigi	5c5670fae4	net/sched: Introduce sample tc action This action allows the user to sample traffic matched by tc classifier. The sampling consists of choosing packets randomly and sampling them using the psample module. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic). Example: To sample ingress traffic from interface eth1, one may use the commands: tc qdisc add dev eth1 handle ffff: ingress tc filter add dev eth1 parent ffff: \ matchall action sample rate 12 group 4 Where the first command adds an ingress qdisc and the second starts sampling randomly with an average of one sampled packet per 12 packets on dev eth1 to psample group 4. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:44:28 -05:00
Yotam Gigi	6ae0a62861	net: Introduce psample, a new genetlink channel for packet sampling Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel. For every sampled packet, the psample module adds the following metadata fields: PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been truncated during sampling PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the user who initiated the sampling. This field allows the user to differentiate between several samplers working simultaneously and filter packets relevant to him PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The sequence is kept for each group PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets PSAMPLE_ATTR_DATA - the actual packet bits The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast group. In addition, add the GET_GROUPS netlink command which allows the user to see the current sample groups, their refcount and sequence number. This command currently supports only netlink dump mode. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:44:28 -05:00
David S. Miller	d36db83bac	Merge branch 'mdio_module_driver-misc' Florian Fainelli says: ==================== net: couple mdio_module_driver changes Small patch series fixing a comment for mdio_module_driver and finally utilizing it in b53_mdio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:37:51 -05:00
Florian Fainelli	8a180cc79d	net: dsa: b53: Utilize mdio_module_driver Eliminate a bit of boilerplate code. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:37:51 -05:00
Florian Fainelli	b70f43a161	net: phy: Fix typo for MDIO module boilerplate comment The module boilerplate macro is named mdio_module_driver and not module_mdio_driver, fix that. Fixes: `a9049e0c51` ("mdio: Add support for mdio drivers.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:37:50 -05:00
David S. Miller	dd8e01fbff	Merge branch 'stmmac-dwmac-meson8b-configurable-RGMII-TX-delay' Martin Blumenstingl says: ==================== stmmac: dwmac-meson8b: configurable RGMII TX delay Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4 cycle (= 2ns) TX clock delay. This seems to work fine for many boards (for example Odroid-C2 or Amlogic's reference boards) but there are some others where TX traffic is simply broken. There are probably multiple reasons why it's working on some boards while it's broken on others: - some of Amlogic's reference boards are using a Micrel PHY - hardware circuit design - maybe more... iperf3 results on my Mecool BB2 board (Meson GXM, RTL8211F PHY) with TX clock delay disabled on the MAC (as it's enabled in the PHY driver). TX throughput was virtually zero before: $ iperf3 -c 192.168.1.100 -R Connecting to host 192.168.1.100, port 5201 Reverse mode, remote host 192.168.1.100 is sending [ 4] local 192.168.1.206 port 52828 connected to 192.168.1.100 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 108 MBytes 901 Mbits/sec [ 4] 1.00-2.00 sec 94.2 MBytes 791 Mbits/sec [ 4] 2.00-3.00 sec 96.5 MBytes 810 Mbits/sec [ 4] 3.00-4.00 sec 96.2 MBytes 808 Mbits/sec [ 4] 4.00-5.00 sec 96.6 MBytes 810 Mbits/sec [ 4] 5.00-6.00 sec 96.5 MBytes 810 Mbits/sec [ 4] 6.00-7.00 sec 96.6 MBytes 810 Mbits/sec [ 4] 7.00-8.00 sec 96.5 MBytes 809 Mbits/sec [ 4] 8.00-9.00 sec 105 MBytes 884 Mbits/sec [ 4] 9.00-10.00 sec 111 MBytes 934 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1000 MBytes 839 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 998 MBytes 837 Mbits/sec receiver iperf Done. $ iperf3 -c 192.168.1.100 Connecting to host 192.168.1.100, port 5201 [ 4] local 192.168.1.206 port 52832 connected to 192.168.1.100 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.01 sec 99.5 MBytes 829 Mbits/sec 117 139 KBytes [ 4] 1.01-2.00 sec 105 MBytes 884 Mbits/sec 129 70.7 KBytes [ 4] 2.00-3.01 sec 107 MBytes 889 Mbits/sec 106 187 KBytes [ 4] 3.01-4.01 sec 105 MBytes 878 Mbits/sec 92 143 KBytes [ 4] 4.01-5.00 sec 105 MBytes 882 Mbits/sec 140 129 KBytes [ 4] 5.00-6.01 sec 106 MBytes 883 Mbits/sec 115 195 KBytes [ 4] 6.01-7.00 sec 102 MBytes 863 Mbits/sec 133 70.7 KBytes [ 4] 7.00-8.01 sec 106 MBytes 884 Mbits/sec 143 97.6 KBytes [ 4] 8.01-9.01 sec 104 MBytes 875 Mbits/sec 124 107 KBytes [ 4] 9.01-10.01 sec 105 MBytes 876 Mbits/sec 90 139 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.01 sec 1.02 GBytes 874 Mbits/sec 1189 sender [ 4] 0.00-10.01 sec 1.02 GBytes 873 Mbits/sec receiver iperf Done. I get similar TX throughput on my Meson GXBB "MXQ Pro+" board when I disable the PHY's TX-delay and configure a 4ms TX-delay on the MAC. So changes to at least the RTL8211F PHY driver are needed to get it working properly in all situations. Changes since v4: - add a fallback of 2ns (the value which was previously hardcoded) for the TX delay so we are backwards-compatible with older .dts' - update the documentation with the new fallback value and add a small note that the "amlogic,tx-delay" property is ignored when the phy-mode is "rmii". Changes since v3: - rebased to apply against current net-next branch (fixes a conflict with `d2ed0a7755` "net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks") Changes since v2: - moved all .dts patches (3-7) to a separate series - removed the default 2ns TX delay when phy-mode RGMII is specified - (rebased against current net-next) Changes since v1: - renamed the devicetree property "amlogic,tx-delay" to "amlogic,tx-delay-ns", which makes the .dts easier to read as we can simply specify human-readable values instead of having "preprocessor defines and calculation in human brain". Thanks to Andrew Lunn for the suggestion! - improved documentation to indicate when the MAC TX-delay should be configured and how to use the PHY's TX-delay - changed the default TX-delay in the dwmac-meson8b driver from 2ns to 0ms when any of the rgmii-*id modes are used (the 2ns default value still applies for phy-mode "rgmii") - added patches to properly reset the PHY on Meson GXBB devices and to use a similar configuration than the one we use on Meson GXL devices (by passing a phy-handle to stmmac and defining the PHY in the mdio0 bus - patch 3-6) - add the "amlogic,tx-delay-ns" property to all boards which are using the RGMII PHY (patch 7) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:35:41 -05:00
Martin Blumenstingl	b765234e72	net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable Prior to this patch we were using a hardcoded RGMII TX clock delay of 2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for many boards, but unfortunately not for all (due to the way the actual circuit is designed, sometimes because the TX delay is enabled in the PHY, etc.). Making the TX delay on the MAC side configurable allows us to support all possible hardware combinations. This allows fixing a compatibility issue on some boards, where the RTL8211F PHY is configured to generate the TX delay. We can now turn off the TX delay in the MAC, because otherwise we would be applying the delay twice (which results in non-working TX traffic). Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:35:40 -05:00
Martin Blumenstingl	d5490f1f67	net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac This allows configuring the RGMII TX clock delay. The RGMII clock is generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue. The configuration depends on the actual hardware (no delay may be needed due to the design of the actual circuit, the PHY might add this delay, etc.). Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:35:40 -05:00
Andrew Lunn	23e3d618e4	net: dsa: Fix inverted test for multiple CPU interface Remove the wrong !, otherwise we get false positives about having multiple CPU interfaces. Fixes: `b22de49086` ("net: dsa: store CPU switch structure in the tree") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 13:33:50 -05:00
Felix Fietkau	6db6f0eae6	bridge: multicast to unicast Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <nbd@nbd.name> [linus.luessing@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 12:39:52 -05:00
Krister Johansen	4548b683b7	Introduce a sysctl that modifies the value of PROT_SOCK. Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl that denotes the first unprivileged inet port in the namespace. To disable all privileged ports set this to zero. It also checks for overlap with the local port range. The privileged and local range may not overlap. The use case for this change is to allow containerized processes to bind to priviliged ports, but prevent them from ever being allowed to modify their container's network configuration. The latter is accomplished by ensuring that the network namespace is not a child of the user namespace. This modification was needed to allow the container manager to disable a namespace's priviliged port restrictions without exposing control of the network namespace to processes in the user namespace. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-24 12:10:51 -05:00
Daniel Borkmann	d140199af5	bpf, lpm: fix kfree of im_node in trie_update_elem We need to initialize im_node to NULL, otherwise in case of error path it gets passed to kfree() as uninitialized pointer. Fixes: `b95a5c4db0` ("bpf: add a longest prefix match trie map implementation") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 21:17:35 -05:00
David S. Miller	2acc76cbb7	Merge branch 'bpf-lpm' Daniel Mack says: ==================== bpf: add longest prefix match map This patch set adds a longest prefix match algorithm that can be used to match IP addresses to a stored set of ranges. It is exposed as a bpf map type. Internally, data is stored in an unbalanced tree of nodes that has a maximum height of n, where n is the prefixlen the trie was created with. Note that this has nothing to do with fib or fib6 and is in no way meant to replace or share code with it. It's rather a much simpler implementation that is specifically written with bpf maps in mind. Patch 1/2 adds the implementation, 2/2 an extensive test suite and 3/3 has benchmarking code for the new trie type. Feedback is much appreciated. Changelog: v3 -> v4: * David added a 3rd patch that augments map_perf_test for LPM trie benchmarks * Limit allocation of maps of this new type to CAP_SYS_ADMIN for now, as requested by Alexei * Add a stub .map_delete_elem so the core does not stumble over a NULL pointer when the syscall is invoked * Tests for non-power-of-2 prefix lengths were added * More comment style fixes v2 -> v3: * Store both the key match data and the caller provided value in the same byte array attached to a node. This avoids double allocations * Bring back node->flags to distinguish between 'real' and intermediate nodes * Fix comment style and some typos v1 -> v2: * Turn spin lock into raw spinlock * Lock with irqsave options during trie_update_elem() * Return -ENOMEM properly from trie_alloc() * Force attr->flags == BPF_F_NO_PREALLOC during creation * Set trie->map.pages after creation to account for map memory * Allow arbitrary value sizes * Removed node->flags and denode intermediate nodes through node->value == NULL instead rfc -> v1: * Add __rcu pointer annotations to make sparse happy * Fold _lpm_trie_find_target_node() into its only caller * Fix some minor documentation issues ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
David Herrmann	b8a943e294	samples/bpf: add lpm-trie benchmark Extend the map_perf_test_{user,kern}.c infrastructure to stress test lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure the latency depending on trie size and lookup count. On my Intel Haswell i7-6400U, a single gettid() syscall with an empty bpf program takes roughly 6.5us on my system. Lookups in empty tries take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192 entries take ~7.1us (on the first _and_ any subsequent try). Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
David Herrmann	4d3381f5a3	bpf: Add tests for the lpm trie map The first part of this program runs randomized tests against the lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm) based on simple, linear, single linked lists. The implementation should be pretty straightforward. Based on tlpm, this inserts randomized data into bpf-lpm-maps and verifies the trie-based bpf-map implementation behaves the same way as tlpm. The second part uses 'real world' IPv4 and IPv6 addresses and tests the trie with those. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
Daniel Mack	b95a5c4db0	bpf: add a longest prefix match trie map implementation This trie implements a longest prefix match algorithm that can be used to match IP addresses to a stored set of ranges. Internally, data is stored in an unbalanced trie of nodes that has a maximum height of n, where n is the prefixlen the trie was created with. Tries may be created with prefix lengths that are multiples of 8, in the range from 8 to 2048. The key used for lookup and update operations is a struct bpf_lpm_trie_key, and the value is a uint64_t. The code carries more information about the internal implementation. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
Bhumika Goyal	10eeb5e645	net: xilinx: constify net_device_ops structure Declare net_device_ops structure as const as it is only stored in the netdev_ops field of a net_device structure. This field is of type const, so net_device_ops structures having same properties can be made const too. Done using Coccinelle: @r1 disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p={...}; @ok1@ identifier r1.i; position p; struct net_device ndev; @@ ndev.netdev_ops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct net_device_ops i; File size before: text data bss dec hex filename 6201 744 0 6945 1b21 ethernet/xilinx/xilinx_emaclite.o File size after: text data bss dec hex filename 6745 192 0 6937 1b19 ethernet/xilinx/xilinx_emaclite.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 15:58:49 -05:00
Bhumika Goyal	30bd2f52e5	net: moxa: constify net_device_ops structures Declare net_device_ops structure as const as it is only stored in the netdev_ops field of a net_device structure. This field is of type const, so net_device_ops structures having same properties can be made const too. Done using Coccinelle: @r1 disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p={...}; @ok1@ identifier r1.i; position p; struct net_device ndev; @@ ndev.netdev_ops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct net_device_ops i; File size before: text data bss dec hex filename 4821 744 0 5565 15bd ethernet/moxa/moxart_ether.o File size after: text data bss dec hex filename 5373 192 0 5565 15bd ethernet/moxa/moxart_ether.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 15:58:49 -05:00
Timur Tabi	4404323c6a	net: qcom/emac: claim the irq only when the device is opened During reset, functions emac_mac_down() and emac_mac_up() are called, so we don't want to free and claim the IRQ unnecessarily. Move those operations to open/close. Signed-off-by: Timur Tabi <timur@codeaurora.org> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 13:03:28 -05:00
Timur Tabi	41c1093f2e	net: qcom/emac: rename emac_phy to emac_sgmii and move it The EMAC has an internal PHY that is often called the "SGMII". This SGMII is also connected to an external PHY, which is managed by phylib. These dual PHYs often cause confusion. In this case, the data structure for managing the SGMII was mis-named and located in the wrong header file. Structure emac_phy is renamed to emac_sgmii to clearly indicate it applies to the internal PHY only. It also also moved from emac_phy.h (which supports the external PHY) to emac_sgmii.h (where it belongs). To keep the changes minimal, only the structure name is changed, not the names of any variables of that type. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 12:54:35 -05:00
Eric Dumazet	b9032741e4	bnx2x: avoid two atomic ops per page on x86 Commit `4cace675d6` ("bnx2x: Alloc 4k fragment for each rx ring buffer element") added extra put_page() and get_page() calls on arches where PAGE_SIZE=4K like x86 Reorder things to avoid this overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 11:16:27 -05:00
David S. Miller	41e8c70ee1	Merge branch 'bcm7278' Florian Fainelli says: ==================== net: dsa: bcm_sf2: Add support for BCM7278 This patch series adds support for the Broadcom BCM7278 integrated switch which is a successor of the BCM7445 switch. We have a little bit of register shuffling going on, which is why most of the functional changes are to deal with that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:59:00 -05:00
Florian Fainelli	039a7b8592	net: phy: bcm7xxx: Implement EGPHY workaround for 7278 Implement the HW design team recommended workaround in for 7278. Since the GPHY now returns its revision information in MII_PHYS_ID[23] we need to check whether the revision provided in flags is 0 or not. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	582d0ac397	net: phy: bcm7xxx: Add entry for BCM7278 Add support for the BCM7278 28nm process Gigabit Ethernet PHY. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	64ff2aef91	net: dsa: bcm_sf2: Allow non-IMP ports to have Broadcom tags enabled Parse the "brcm,use-bcm-hdr" boolean property during ports identification to fill a bitmask of ports that should have Broadcom tags enabled. This is needed in some configurations where per-packet metadata can be exchanged using Broadcom tags between the switch and an on-chip acceleration device. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	ebb2ac4f32	net: dsa: bcm_sf2: Move code enabling Broadcom tags In preparation for enabling Broadcom tags on different ports based on configuration information, dedicate a function that is responsible for enabling Broadcom tags for a given port and update the IMP port setup to call it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	0fe9933804	net: dsa: bcm_sf2: Add support for BCM7278 integrated switch Add support for the integrated switch found on BCM7278: - core_reg_align is set to 1, to force a translation into the target address space which is 8 bytes aligned - an alternate SWITCH_REG layout is provided since registers are largely bit/masks compatible but have different offsets - conditional for all CORE_STS_OVERRIDE_{IMP,GMII_P} since those got moved way out of the traditional register space Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	a78e86ed58	net: dsa: bcm_sf2: Prepare for different register layouts In preparation for supporting a new device with a slightly different register layout, affecting the SWITCH_REG and SWITCH_CORE address spaces, perform a few preparatory steps: - allow matching the compatible string against a data description - convert the SWITCH_REG register accesses into an indirection table - prepare for supporting a SWITCH_CORE register alignment requirement Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	329b5c58f8	net: dsa: bcm_sf2: Make SF2_IO64_MACRO() utilize 32-bit macro There is no point inlining the 32-bit direct register read/write part, just infer it from the existing macro. This will make it easier to centralize the address rewriting that we are going to introduce later on. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
David S. Miller	b20b564b95	Merge branch 'systemport-lite' Florian Fainelli says: ==================== net: systemport: Add support for SYSTEMPORT lite This patch series adds support for SYSTEMPORT Lite which is an evolution of the existing SYSTEMPORT adapter. The two generations are largely identical as far as the transmit/receive path are concerned, and there were just a few control path changes here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:07 -05:00
Florian Fainelli	44a4524c54	net: systemport: Add support for SYSTEMPORT Lite Add supporf for the SYSTEMPORT Lite Ethernet controller, this piece of hardware is largely based on the full-blown SYSTEMPORT and differs in the following: - no full-blown UniMAC, instead we have the MagicPacket matching from UniMAC at same offset, and a GMII Interface Block (GIB) for the MAC-level stuff, since we are always interfaced to an Ethernet switch which is fully Ethernet compliant shortcuts could be made - 16 transmit queues, whose interrupts are moved into the first Level-2 interrupt controller bank - slight TDMA offset change (a register was inserted after TDMA_STATUS, sigh) - 256 RX descriptors (512 words) and 256 TX descriptors (not visible) As a consequence of these two things, update the code paths accordingly to differentiate the full-blown from the light version. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:06 -05:00
Florian Fainelli	7b78be48a8	net: systemport: Dynamically allocate number of TX rings In preparation for adding SYSTEMPORT Lite, which has twice as less transmit queues than SYSTEMPORT make sure we do allocate TX rings based on the systemport,txq property to get an appropriate memory footprint. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:06 -05:00
Eric Dumazet	9ca677b1bd	ipv6: add NUMA awareness to seg6_hmac_init_algo() Since we allocate per cpu storage, let's also use NUMA hints. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:50:36 -05:00
jpinto	f4ec60644a	net: stmicro: fix LS field mask in EEE configuration This patch fixes the LS mask when setting EEE timer. LS field is 10 bits long and not 11 as currently. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Reported-By: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:47:36 -05:00
Geliang Tang	3704eb6f6f	net/mlx4: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:46:13 -05:00
Geliang Tang	530cef21d9	6lowpan: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:46:13 -05:00

1 2 3 4 5 ...

649359 Commits