kernel_optimize_test

Author	SHA1	Message	Date
Martin KaFai Lau	bcc6b1b7eb	bpf: Add hash of maps support This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Martin KaFai Lau	56f668dfe0	bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Martin KaFai Lau	fad73a1a35	bpf: Fix and simplifications on inline map lookup Fix in verifier: For the same bpf_map_lookup_elem() instruction (i.e. "call 1"), a broken case is "a different type of map could be used for the same lookup instruction". For example, an array in one case and a hashmap in another. We have to resort to the old dynamic call behavior in this case. The fix is to check for collision on insn_aux->map_ptr. If there is collision, don't inline the map lookup. Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later patch for how-to trigger the above case. Simplifications on array_map_gen_lookup(): 1. Calculate elem_size from map->value_size. It removes the need for 'struct bpf_array' which makes the later map-in-map implementation easier. 2. Remove the 'elem_size == 1' test Fixes: `81ed18ab30` ("bpf: add helper inlining infra and optimize map_array lookup") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Joao Pinto	b4f0a66155	net: stmmac: fix dma operation mode config for older versions The dma operation mode configuration routine was wrongly moved to a function (stmmac_mtl_configuration) that is only executed if the core version is >= 4.00. Fixes: `6deee2221e` ("net: stmmac: prepare dma op mode config for multiple queues") Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Joao Pinto <jpinto@synopsys.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:35:03 -07:00
Joel Scherpelz	bbea124bc9	net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: Joel Scherpelz <jscherpelz@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 14:20:54 -07:00
David S. Miller	0e4c9f13da	Merge branch 'nfp-concurrency' Jakub Kicinski says: ==================== nfp: allow concurrency in core and minor fixes The first 10 patches of this series prepare nfpcore for concurrent accesses. This will be needed by upcoming hwmon and devlink patches. Most locking is already in place, the patches in this series iron out a few bugs. Last 5 patches are fixes and cleanups to the netdev code, including removal of doorbell pointers used only on old versions of the chip, removal of unnecessarily defensive code and flushing xmit_more more carefully on error paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:10 -07:00
Jakub Kicinski	ac0488ef59	nfp: disable FW on reconfiguration errors Since we no longer need to keep the FW enabled for .ndo_close() to work we can always stop FW after reconfiguration failure. This seems to make most FWs more resilient to faults (at least in error injection scenarios). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:09 -07:00
Jakub Kicinski	219ad6c1c3	nfp: remove defensive checks around ndo_open()/ndo_close() Device open and close handlers check if the device is already in the desired state. Thanks to our reconfig infrastructure this should not be necessary, there doesn't seem to be any code in the driver which depends on it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:09 -07:00
Jakub Kicinski	28b0cfee7b	nfp: flush xmit_more on error paths In case of ring full or DMA mapping error remember to flush xmit_more delayed kicks. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:09 -07:00
Jakub Kicinski	83d08a1d74	nfp: remove RX queue pointers NFP6000 doesn't use queue pointers/doorbells for RX, it uses 'done' bit in descriptors. Remove the pointers from data structures. Since we are saving space in rx_ring structure make fields we previously compressed to 16bits word size again. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:09 -07:00
Jakub Kicinski	87232d9615	nfp: don't use netdev_warn() before netdev is registered Fix warning which was using netdev_warn() instead of dev_warn() to early. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:08 -07:00
Jakub Kicinski	7d2da60382	nfp: fix nfp_cpp_read()/nfp_cpp_write() error paths When acquiring an area fails we can't call function doing both release and free. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:08 -07:00
Jakub Kicinski	1bb665e3dd	nfp: fix invalid area detection Core should detect when someone is trying to request an access window which is too large for a given type of access. Otherwise the requester will be put on a wait queue for ever without any error message. Add const qualifiers to clarify that we are only looking at read- -only members in relevant functions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:08 -07:00
Jakub Kicinski	76e8f93e89	nfp: don't ignore return value of wait_event_interruptible When signal interrupts waiting for an area to become available we assume success. Pay attention to the return code. Unpack the code a little bit to make it more readable. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:08 -07:00
Jakub Kicinski	69a4aa8931	nfp: correct return codes when msleep gets interrupted msleep_interruptible() returns time left to wait, not error code. Return ERESTARTSYS when interrupted. While at it correct a comment and make the polling a bit more aggressive. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:08 -07:00
Jakub Kicinski	a831ffb560	nfp: lock area cache earlier We shouldn't access area_cache_list without its lock even to check if it's empty. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:07 -07:00
Jakub Kicinski	61e81abdce	nfp: document expected locking in the core Document which fields of nfp_cpp are protected by which locks. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:07 -07:00
Jakub Kicinski	8672103f41	nfp: move mutex code out of nfp_cppcore.c After mutex cache removal we can put the mutex code in a separate source file. This makes it clear it doesn't play with internals of struct nfp_cpp any more. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:07 -07:00
Jakub Kicinski	832ff9482e	nfp: remove cpp mutex cache CPP mutex cache was introduced to work around the fact that the same host could successfully acquire a lock multiple times. It used to collapse multiple users to the same struct nfp_cpp_mutex and track use count. Unfortunately it's racy. Since we now force all nfp_mutex_lock() callers within the host to actually succeed at acquiring the lock we no longer need the cache, let's remove it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:07 -07:00
Jakub Kicinski	f1ba63ec90	nfp: fail graciously when someone tries to grab global lock The global device lock is acquired to search the resource table. The lock is actually itself part of the table (entry 0). Therefore if someone asks for resource 0 we would deadlock since double locking is no longer allowed. Currently the driver doesn't try to lock that resource so let's simply make sure we fail graciously and not add special handling of this case until really need. Hide the relevant defines in the source file. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:06 -07:00
Jakub Kicinski	3d4fc6eb4b	nfp: disallow sharing mutexes on the same machine NFP can be connected to multiple machines via PCI or other buses. Access to hardware resources is arbitrated using locks residing in device memory. Currently nfpcore only respects the mutexes when it comes to inter-host locking, but if we try to acquire the same lock again, on one host - it will simply return success because owner of the lock is already set to that host. This makes the locks useless for arbitration within one host and unfair because whichever host grabbed the lock will have a chance to reacquire it without others getting a shot. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:59:06 -07:00
Dan Carpenter	31c7ba9eff	net: dwc-xlgmac: fix an error code in xlgmac_alloc_pages() The dma_mapping_error() returns true if there is an error but we want to return -ENOMEM and not 1. Fixes: `65e0ace2c5` ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jie Deng <jiedeng@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:46:10 -07:00
David Ahern	a7678c70ef	rtnetlink: Add dump all for netconf Use the rtnl_dump_all to dump all netconf handlers that have been registered. Allows userspace to send a dump request for PF_UNSPEC and get all families. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:45:17 -07:00
David S. Miller	276a74d835	Merge branch 'phy-mmd-cleanup' Russell King says: ==================== Clean up PHY MMD accessors This series cleans up phylib's MMD accessors, so that we have a common way of accessing the Clause 45 register set. The current situation is far from ideal - we have phy_(read\|write)_mmd() which accesses Clause 45 registers over Clause 45 accesses, and we have phy_(read\|write)_mmd_indirect(), which accesses Clause 45 registers via Clause 22 register 13/14. Generic code uses the indirect methods to access standard Clause 45 features, and when we come to add Clause 45 PHY support to phylib, we would need to make these conditional upon the PHY type, or duplicate these functions. An alternative solution is to merge these accessors together, and select the appropriate access method depending upon the 802.3 clause that the PHY conforms with. The result is that we have a single set of phy_(read\|write)_mmd() accessors. For cases which require special handling, we still allow PHY drivers to override all MMD accesses - except rather than just overriding the indirect accesses. This keeps existing overrides working. Combining the two also has another beneficial side effect - we get rid of similar functions that take arguments in different orders. The old direct accessors took the phy structure, devad and register number, whereas the indirect accessors took the phy structure, register number and devad in that order. Care must be taken when updating future drivers that the argument order is correct, and the function name is not merely replaced. This patch set is against net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:01 -07:00
Russell King	060fbc894b	net: phy: clean up mmd_phy_indirect() Make mmd_phy_indirect() use the same terminology as the rest of the code, making clear what each address is - phy address, devad, and register number. While here, remove the "inline" from this static function, leaving it to the compiler to decide whether to inline this function, and get rid of unnecessary parens. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	3b85d8df26	net: phy: remove the indirect MMD read/write methods Remove the indirect MMD read/write methods which are now no longer necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	d11437e0af	net: phy: convert micrel to new read_mmd/write_mmd driver methods Convert micrel to the new read_mmd/write_mmd driver methods. This Clause 22 PHY does not support any MMD access method. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	a6d99fcd3f	net: phy: switch remaining users to phy_(read\|write)_mmd() Switch everyone over to using phy_read_mmd() and phy_write_mmd() now that they are able to handle both Clause 22 indirect addressing and Clause 45 direct addressing methods to the MMD registers. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	5f61367729	net: lan78xx: update for phy_(read\|write)_mmd_indirect() removal lan78xx appears to use phylib in a rather weird way, accessing the PHY partly through phylib, and partly by making direct accesses to it, including to the Clause 45 registers. As the indirect MMD accessors are going away, update this driver to use the plain phy_(read\|write)_mmd() accessors instead. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	1ee6b9bc62	net: phy: make phy_(read\|write)_mmd() generic MMD accessors Make phy_(read\|write)_mmd() generic 802.3 clause 45 register accessors for both Clause 22 and Clause 45 PHYs, using either the direct register reading for Clause 45, or the indirect method for Clause 22 PHYs. Allow this behaviour to be overriden by PHY drivers where necessary. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:43:00 -07:00
Russell King	9860118b58	net: phy: move phy MMD accessors to phy-core.c Move the phy_(read\|write)__mmd() helpers out of line, they will become our main MMD accessor functions, and so will be a little more complex. This complexity doesn't belong in an inline function. Also move the _indirect variants as well to keep like functionality together. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:42:59 -07:00
Thierry Reding	2d72d5016f	net: stmmac: Use AVB mode by default Prior to the recent multi-queue changes the driver would configure the queues to use the AVB mode, but the mode then got switched to DCB. The hardware still works fine in DCB mode, but my testing capabilities are limited, so it's safer to revert to the prior setting anyway. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:15:15 -07:00
Thierry Reding	33e85b8dd6	net: stmmac: Restore DT backwards-compatibility Recent changes to support multiple queues in the device tree bindings resulted in the number of RX and TX queues to be initialized to zero for device trees not adhering to the new bindings. Restore backwards-compatibility with those device trees by falling back to a single RX and TX queues each. Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:15:15 -07:00
Thierry Reding	f39768744f	net: stmmac: Always enable MAC RX queues The MAC RX queues always need to be enabled in order to receive network packets. Remove the condition that this only needs to be done for multi- queue configurations. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:15:14 -07:00
Reshetova, Elena	4c355cdfbb	net: convert sk_filter.refcnt from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:06:08 -07:00
Tobias Klauser	726bceca81	net: greth: Utilize of_get_mac_address() Do not open code getting the MAC address exclusively from the "local-mac-address" property, but instead use of_get_mac_address() which looks up the MAC address using the 3 typical property names. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 12:00:39 -07:00
Felix Manlunas	58ad319834	liquidio: fix Coverity scan errors Fix Coverity scan errors by not dereferencing lio->glists_dma_base pointer if it's NULL. See http://marc.info/?l=linux-netdev&m=149002294305614&w=2 Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: VSR Burru <veerasenareddy.burru@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:48:34 -07:00
Gao Feng	cfc62d878f	net: tcp: Permit user set TCP_MAXSEG to default value When user_mss is zero, it means use the default value. But the current codes don't permit user set TCP_MAXSEG to the default value. It would return the -EINVAL when val is zero. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:45:13 -07:00
David S. Miller	b2a1674aa1	Merge branch 'ovs-sample-action-optimization' Andy Zhou says: ==================== net-next sample action optimization v4 The sample action can be used for translating Openflow 'clone' action. However its implementation has not been sufficiently optimized for this use case. This series attempts to close the gap. Patch 3 commit message has more details on the specific optimizations implemented. --- v3->v4: Enhance patch 4. Fix two bugs pointed out by Pravin, Remove 'is_sample' variable. v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute(). v1->v2: Address Pravin's comment, Refactor recirc and sample to share more common code ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:28:35 -07:00
andy zhou	bef7f7567a	Openvswitch: Refactor sample and recirc actions implementation Added clone_execute() that both the sample and the recirc action implementation can use. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:28:35 -07:00
andy zhou	798c166173	openvswitch: Optimize sample action for the clone use cases With the introduction of open flow 'clone' action, the OVS user space can now translate the 'clone' action into kernel datapath 'sample' action, with 100% probability, to ensure that the clone semantics, which is that the packet seen by the clone action is the same as the packet seen by the action after clone, is faithfully carried out in the datapath. While the sample action in the datpath has the matching semantics, its implementation is only optimized for its original use. Specifically, there are two limitation: First, there is a 3 level of nesting restriction, enforced at the flow downloading time. This limit turns out to be too restrictive for the 'clone' use case. Second, the implementation avoid recursive call only if the sample action list has a single userspace action. The main optimization implemented in this series removes the static nesting limit check, instead, implement the run time recursion limit check, and recursion avoidance similar to that of the 'recirc' action. This optimization solve both #1 and #2 issues above. One related optimization attempts to avoid copying flow key as long as the actions enclosed does not change the flow key. The detection is performed only once at the flow downloading time. Another related optimization is to rewrite the action list at flow downloading time in order to save the fast path from parsing the sample action list in its original form repeatedly. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:28:35 -07:00
andy zhou	4572ef52a0	openvswitch: Refactor recirc key allocation. The logic of allocating and copy key for each 'exec_actions_level' was specific to execute_recirc(). However, future patches will reuse as well. Refactor the logic into its own function clone_key(). Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:28:35 -07:00
andy zhou	47c697aa2d	openvswitch: Deferred fifo API change. add_deferred_actions() API currently requires actions to be passed in as a fully encoded netlink message. So far both 'sample' and 'recirc' actions happens to carry actions as fully encoded netlink messages. However, this requirement is more restrictive than necessary, future patch will need to pass in action lists that are not fully encoded by themselves. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:28:34 -07:00
David S. Miller	29dd5ec094	Merge branch 'vrf-perf' David Ahern says: ==================== net: vrf: performance improvements Device based features for VRF such as qdisc, netfilter and packet captures are implemented by switching the dst on skbuffs to its per-VRF dst. This has the effect of controlling the output function which points a function in the VRF driver. [1] The skb proceeds down the stack with dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and network taps are evaluated based on this device. Finally, the skb makes it to the vrf_xmit function which resets the dst based on a FIB lookup. The feature comes at cost - between 5 and 10% depending on test (TCP vs UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB lookup in the VRF driver for each packet sent through it. The FIB lookup is required because the real dst gets dropped so that the skb can traverse the stack with dst->dev set to the VRF device. All of that is really driven by the qdisc and not replicating the processing of __dev_queue_xmit if a qdisc is set up on the device. But, VRF devices by default do not have a qdisc and really have no need for multiple Tx queues. This means the performance overhead is inflicted upon all users for the potential use case of a qdisc being configured. The overhead can be avoided by checking if the default configuration applies to a specific VRF device before switching the dst. If a device does not have a qdisc, the pass through netfilter hooks and packet taps can be done inline without dropping the dst and thus avoiding the performance penalty. With this change performance overhead of VRF drops to neglible (difference with run-over-run variance) to 3% depending on test type. netperf performance comparison for 3 cases: 1. L3_MASTER_DEVICE compiled out 2. VRF with this patch set 3. current VRF code IPv4 ---- no-l3mdev new-vrf old-vrf TCP_RR 28778 28938* 27169 TCP_CRR 10706 10490 9770 UDP_RR 30750 29813 29256 * Although higher in the final run used for submitting this patch set, I think what this really represents is a neglible performance overhead for VRF with this change (i.e, within the +-1% variance of runs). Most notably the FIB lookups in the Tx path are avoided for TCP_RR. IPv6 ---- no-l3mdev new-vrf old-vrf TCP_RR 29495 29432 27794 TCP_CRR 10520 10338 9870 UDP_RR 26137 27019* 26511 * UDP is consistently better with VRF for two reasons: 1. Source address selection with L3 domains is considering fewer addresses since only addresses on interfaces in the domain are considered for the selection. Specifically, perf-top shows shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr running much lower with vrf than without. 2. The VRF table contains all routes (i.e, there are no separate local and main tables per VRF). That means ip6_pol_route_output only has 1 lookup for VRF where it does 2 without it (1 in the local table and 1 in the main table). [1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:19:48 -07:00
David Ahern	a9ec54d1b0	net: vrf: performance improvements for IPv6 The VRF driver allows users to implement device based features for an entire domain. For example, a qdisc or netfilter rules can be attached to a VRF device or tcpdump can be used to view packets for all devices in the L3 domain. The device-based features come with a performance penalty, most notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook to switch the dst on an skb to its private dst. This allows the skb to traverse the xmit stack with the device set to the VRF device which in turn enables the netfilter and qdisc features. The VRF driver then performs the FIB lookup again and reinserts the packet. This patch avoids the redirect for IPv6 packets if a qdisc has not been attached to a VRF device which is the default config. In this case the netfilter hooks and network taps are directly traversed in the l3mdev_l3_out handler. If a qdisc is attached to a VRF device, then the redirect using the vrf dst is done. Additional overhead is removed by only checking packet taps if a socket is open on the device (vrf_dev->ptype_all list is not empty). Packet sockets bound to any device will still get a copy of the packet via the real ingress or egress interface. The end result of this change is a decrease in the overhead of VRF for the default, baseline case (ie., no netfilter rules, no packet sockets, no qdisc) from a +3% improvement for UDP which has a lookup per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR which connects a socket for each request-response. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:19:48 -07:00
David Ahern	dcdd43c41e	net: vrf: performance improvements for IPv4 The VRF driver allows users to implement device based features for an entire domain. For example, a qdisc or netfilter rules can be attached to a VRF device or tcpdump can be used to view packets for all devices in the L3 domain. The device-based features come with a performance penalty, most notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook to switch the dst on an skb to its private dst. This allows the skb to traverse the xmit stack with the device set to the VRF device which in turn enables the netfilter and qdisc features. The VRF driver then performs the FIB lookup again and reinserts the packet. This patch avoids the redirect for IPv4 packets if a qdisc has not been attached to a VRF device which is the default config. In this case the netfilter hooks and network taps are directly traversed in the l3mdev_l3_out handler. If a qdisc is attached to a VRF device, then the redirect using the vrf dst is done. Additional overhead is removed by only checking packet taps if a socket is open on the device (vrf_dev->ptype_all list is not empty). Packet sockets bound to any device will still get a copy of the packet via the real ingress or egress interface. The end result of this change is a decrease in the overhead of VRF for the default, baseline case (ie., no netfilter rules, no packet sockets, no qdisc) to ~3% for UDP which has a lookup per packet and < 1% overhead for connected sockets that leverage early demux and avoid FIB lookups. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:19:48 -07:00
Josh Hunt	a2d133b1d4	sock: introduce SO_MEMINFO getsockopt Allows reading of SK_MEMINFO_VARS via socket option. This way an application can get all meminfo related information in single socket option call instead of multiple calls. Adds helper function, sk_get_meminfo(), and uses that for both getsockopt and sock_diag_put_meminfo(). Suggested by Eric Dumazet. Signed-off-by: Josh Hunt <johunt@akamai.com> Reviewed-by: Jason Baron <jbaron@akamai.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 11:18:58 -07:00
Colin Ian King	c7cd4c9bf8	mlxsw: spectrum: fix swapped order of arguments packets and bytes The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are in the wrong order. Fix this by swapping them. Detected by CoverityScan, CID#1419705 ("Arguments in wrong order") Fixes: `7c1b8eb175` ("mlxsw: spectrum: Add support for TC flower offload statistics") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 10:59:12 -07:00
Arjun Vynipadath	bb58d07964	cxgb4: Update IngPad and IngPack values We are using the smallest padding boundary (8 bytes), which isn't smaller than the Memory Controller Read/Write Size We get best performance in 100G when the Packing Boundary is a multiple of the Maximum Payload Size. Its related to inefficient chopping of DMA packets by PCIe, that causes more overhead on bus. So driver is helping by making the starting address alignment to be MPS size. We will try to determine PCIE MaxPayloadSize capabiltiy and set IngPackBoundary based on this value. If cache line size is greater than MPS or determinig MPS fails, we will use cache line size to determine IngPackBoundary(as before). Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 10:53:49 -07:00
Arnd Bergmann	3588f29e06	net: dwc-xlgmac: add module license When building the driver as a module, we get a warning about the lack of a license: WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o see include/linux/module.h for more information Curiously the text in the .c files only mentions GPLv2+, while the license tag in the PCI driver contains both GPL and BSD. I picked the license text as the more definite reference here and put a GPL tag in there. Fixes: `65e0ace2c5` ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 10:52:34 -07:00

1 2 3 4 5 ...

662587 Commits