kernel_optimize_test

Author	SHA1	Message	Date
Michael Thalmeier	dd7bedcd26	NFC: pn533: add I2C phy driver This adds the I2C phy interface for the pn533 driver. This way the driver can be used to interact with I2C connected pn532 devices. Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-04-09 23:53:20 +02:00
Michael Thalmeier	9815c7cf22	NFC: pn533: Separate physical layer from the core implementation The driver now has all core stuff isolated in one file, and all the hardware link specifics in another. Writing a pn533 driver on top of another hardware link is now just a matter of adding a new file for that new hardware specifics. The first user of this separation will be the i2c based pn532 driver that reuses pn533 core implementation on top of an i2c layer. Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-04-09 23:53:15 +02:00
Michael Thalmeier	37f895d7e8	NFC: pn533: Fix socket deadlock A deadlock can occur when the NFC raw socket is closed while the driver is processing a command. Following is the call graph of the affected situation: send data via raw_sock: ------------- rawsock_tx_work sock_hold => socket refcnt++ nfc_data_exchange => cb = rawsock_data_exchange_complete ops->im_transceive = pn533_transceive => arg->cb = db = rawsock_data_exchange_complete pn533_send_data_async => cb = pn533_data_exchange_complete __pn533_send_async => cmd->complete_cb = cb = pn533_data_exchange_complete if_ops->send_frame_async response: -------- pn533_recv_response queue_work(priv->wq, &priv->cmd_complete_work) pn533_wq_cmd_complete pn533_send_async_complete cmd->complete_cb() = pn533_data_exchange_complete() arg->cb() = rawsock_data_exchange_complete() sock_put => socket refcnt-- => If the corresponding socket gets closed in the meantime socket will be destructed sk_free __sk_free sk->sk_destruct = rawsock_destruct nfc_deactivate_target ops->deactivate_target = pn533_deactivate_target pn533_send_cmd_sync pn533_send_cmd_async __pn533_send_async list_add_tail(&cmd->queue,&dev->cmd_queue) => add to command list because a command is currently processed wait_for_completion => the workqueue thread waits here because it is the one processing the commands => deadlock To fix the deadlock pn533_deactivate_target is changed to issue the PN533_CMD_IN_RELEASE command in async mode. This way nothing blocks and the release command is executed after the current command. Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-04-09 23:53:11 +02:00
Michael Thalmeier	e997ebbe46	NFC: pn533: Send ATR_REQ only if NFC_PROTO_NFC_DEP bit is set Currently it is not possible to only poll for passive targets with the pn533 driver. To change this ATR_REQ is only sent when NFC_PROTO_NFC_DEP is explicitly requested in poll_protocols. As most implementations (e.g. neard) poll for all protocols that are reported to be supported by the adapter, this should not have much of an effect on current implementations. Signed-off-by: Michael Thalmeier <michael.thalmeier@hale.at> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-04-09 23:53:05 +02:00
Eric Dumazet	03c5b53418	ipv6: fix inet6_lookup_listener() A stupid refactoring bug in inet6_lookup_listener() needs to be fixed in order to get proper SO_REUSEPORT behavior. Fixes: `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-09 16:53:52 -04:00
John Allen	498cd8e495	ibmvnic: Enable use of multiple tx/rx scrqs Enables the use of multiple transmit and receive scrqs allowing the ibmvnic driver to take advantage of multiqueue functionality. To achieve this, the driver must implement the process of negotiating the maximum number of queues allowed by the server. Initially, the driver will attempt to login with the maximum number of tx and rx queues supported by the server. If the server fails to allocate the requested number of scrqs, it will return partial success in the login response. In this case, we must reinitiate the login process from the request capabilities stage and attempt to login requesting fewer scrqs. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-09 00:02:41 -04:00
David S. Miller	e013b7780c	Merge branch 'dsa-voidify-ops' Vivien Didelot says: ==================== net: dsa: voidify STP setter and FDB/VLAN add ops Neither the DSA layer nor the bridge code (see br_set_state) really care about eventual errors from STP state setters, so make it void. The DSA layer separates the prepare and commit phases of switchdev in two different functions. Logical errors must not happen in commit routines, so make them void. Changes v1 -> v2: - rename port_stp_update to port_stp_state_set - don't change code flow of bcm_sf2_sw_br_set_stp_state - prefer netdev_err over netdev_warn ==================== Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:51:15 -04:00
Vivien Didelot	4d5770b397	net: dsa: make the VLAN add function return void The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_vlan_prepare and port_vlan_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the driver itself. Make the DSA port_vlan_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:41 -04:00
Vivien Didelot	8497aa618d	net: dsa: make the FDB add function return void The switchdev design implies that a software error should not happen in the commit phase since it must have been previously reported in the prepare phase. If an hardware error occurs during the commit phase, there is nothing switchdev can do about it. The DSA layer separates port_fdb_prepare and port_fdb_add for simplicity and convenience. If an hardware error occurs during the commit phase, there is no need to report it outside the DSA driver itself. Make the DSA port_fdb_add routine return void for explicitness. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:40 -04:00
Vivien Didelot	43c44a9f65	net: dsa: make the STP state function return void The DSA layer doesn't care about the return code of the port_stp_update routine, so make it void in the layer and the DSA drivers. Replace the useless dsa_slave_stp_update function with a dsa_slave_stp_state function used to reply to the switchdev SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute. In the meantime, rename port_stp_update to port_stp_state_set to explicit the state change. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:40 -04:00
Vivien Didelot	f453939c1a	net: dsa: document missing functions Add description for the missing port_vlan_prepare, port_fdb_prepare, port_fdb_dump functions in the DSA documentation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:50:02 -04:00
David S. Miller	1089ac6977	For the 4.6 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXBQ4GAAoJEGt7eEactAAdWiMP/ibaP3I79NDc0s7wCDA+KRkm hx0Qx4a0wwm7lDFlnGBjY6yKr+XFDliCvdGX7XGpLSsTioNg7eXPpwx5FQoj6RiV 8+5RKE9fTguN9ofUzqAwHd9sVOaxvdlXbKfb/N93Gzjpw/meYk58wXdF7Almkroa ukgJeMzIlIh+6D96zFEA+Ofzp5chwh+x2Dn0wXutEe9P9fOERA859veAvx65b+Ql IRGTqyuY5B/wcbkr4o+DWQwgrdt7Vop9nYVPNWtMHm2JTzfuCSaQ2cD9TnVAK/bg /vtqC46KKNLyBRGexAPqdftY9PWcfipgE+n7k+Et4iGSmNm7Z3dEyewgXmqli7XJ X8Uiaq+N6Fpe06DVSU7aSRt8NLV64A44jXSfKRI9U2POUqKMn/PMdm8bhPW8qCdM ra6myWpQGHWK9e0TQQdShq0NQKGxCZAiSRiiIrbbvXl1CwXxkPCG39wAC3Sh1tEN ou4lGraeywGnTjaq+mwLEtHLoug8Y2x+Fz+Ze4Cu2enXxna9lp4lr+rFlc+2+0Er o9oPxkTk8krZGIj9M6PNc5W+InMwchaFX3076n67hnFHzFRlOQzkfffbPYlhKJDQ f8c9JiNZIoX/fD1TAKsrdO1+EKm/xo7w7pLgbMwQal8Jr88SkITDg0i3oXc56vNQ ZK2gUzwvrD/jh0AUyDfN =sj7y -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the 4.7 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:42:31 -04:00
Daniel Borkmann	07016151a4	bpf, verifier: further improve search pruning The verifier needs to go through every path of the program in order to check that it terminates safely, which can be quite a lot of instructions that need to be processed f.e. in cases with more branchy programs. With search pruning from `f1bca824da` ("bpf: add search pruning optimization to verifier") the search space can already be reduced significantly when the verifier detects that a previously walked path with same register and stack contents terminated already (see verifier's states_equal()), so the search can skip walking those states. When working with larger programs of > ~2000 (out of max 4096) insns, we found that the current limit of 32k instructions is easily hit. For example, a case we ran into is that the search space cannot be pruned due to branches at the beginning of the program that make use of certain stack space slots (STACK_MISC), which are never used in the remaining program (STACK_INVALID). Therefore, the verifier needs to walk paths for the slots in STACK_INVALID state, but also all remaining paths with a stack structure, where the slots are in STACK_MISC, which can nearly double the search space needed. After various experiments, we find that a limit of 64k processed insns is a more reasonable choice when dealing with larger programs in practice. This still allows to reject extreme crafted cases that can have a much higher complexity (f.e. > ~300k) within the 4096 insns limit due to search pruning not being able to take effect. Furthermore, we found that a lot of states can be pruned after a call instruction, f.e. we were able to reduce the search state by ~35% in some cases with this heuristic, trade-off is to keep a bit more states in env->explored_states. Usually, call instructions have a number of preceding register assignments and/or stack stores, where search pruning has a better chance to suceed in states_equal() test. The current code marks the branch targets with STATE_LIST_MARK in case of conditional jumps, and the next (t + 1) instruction in case of unconditional jump so that f.e. a backjump will walk it. We also did experiments with using t + insns[t].off + 1 as a marker in the unconditionally jump case instead of t + 1 with the rationale that these two branches of execution that converge after the label might have more potential of pruning. We found that it was a bit better, but not necessarily significantly better than the current state, perhaps also due to clang not generating back jumps often. Hence, we left that as is for now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:16:42 -04:00
Jiri Pirko	1fc2257e83	devlink: share user_ptr pointer for both devlink and devlink_port Ptr to devlink structure can be easily obtained from devlink_port->devlink. So share user_ptr[0] pointer for both and leave user_ptr[1] free for other users. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:40:08 -04:00
David S. Miller	67b5b21f38	Merge branch 'mlxsw-next' Jiri Pirko says: ==================== mlxsw: small driver update + one tiny devlink dependency Cosmetics, in preparation to sharedbuffer patchset. First patch is here to allow patch number two. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	9efc8f655c	mlxsw: reg: Fix SBPM register name Fix copy&paste error and state the name of SBPM register correctly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	497e8592c6	mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM Same field, same values, so share the same enum. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	b2f10571b9	mlxsw: Do not pass around driver_priv directly Instead of that, pass mlxsw_core and use a helper to get driver priv from driver code. Looks much cleaner that way. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	307c2431ab	mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit* Instead of passing around driver priv, pass struct mlxsw_core * directly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	932762b69a	mlxsw: Move devlink port registration into common core code Remove devlink port reg/unreg from spectrum and switchx2 code and rather do the common work in core. That also ensures code separation where devlink is only used in core.c. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	a9844881ba	devlink: remove implicit type set in port register As we rely on caller zeroing or correctly set the struct before the call, this implicit type set is either no-op (DEVLINK_PORT_TYPE_NOTSET is 0) or it rewrites wanted value. So remove this. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
David S. Miller	24d390b2ac	Merge branch 'nfp-mtu-buffer-reconfig' Jakub Kicinski says: ==================== MTU/buffer reconfig changes I re-discussed MPLS/MTU internally, dropped it from the patch 1, re-tested everything, found out I forgot about debugfs pointers, fixed that as well. v5: - don't reserve space in RX buffers for MPLS label stack (patch 1); - fix debugfs pointers to ring structures (patch 5). v4: - cut down on unrelated patches; - don't "close" the device on error path. --- v4 cover letter Previous series included some not entirely related patches, this one is cut down. Main issue I'm trying to solve here is that .ndo_change_mtu() in nfpvf driver is doing full close/open to reallocate buffers - which if open fails can result in device being basically closed even though the interface is started. As suggested by you I try to move towards a paradigm where the resources are allocated first and the MTU change is only done once I'm certain (almost) nothing can fail. Almost because I need to communicate with FW and that can always time out. Patch 1 fixes small issue. Next 10 patches reorganize things so that I can easily allocate new rings and sets of buffers while the device is running. Patches 13 and 15 reshape the .ndo_change_mtu() and ethtool's ring-resize operation into desired form. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:07 -04:00
Jakub Kicinski	cc7c033330	nfp: allow ring size reconfiguration at runtime Since much of the required changes have already been made for changing MTU at runtime let's use it for ring size changes as well. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:06 -04:00
Jakub Kicinski	a98cb25812	nfp: pass ring count as function parameter Soon ring resize will call this functions with values different than the current configuration we need to explicitly pass the ring count as parameter. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:06 -04:00
Jakub Kicinski	36a857e4f2	nfp: convert .ndo_change_mtu() to prepare/commit paradigm When changing MTU on running device first allocate new rings and buffers and once it succeeds proceed with changing MTU. Allocation of new rings is not really necessary for this operation - it's done to keep the code simple and because size of the extra ring memory is quite small compared to the size of buffers. Operation can still fail midway through if FW communication times out. In that case we retry with old MTU (rings). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:06 -04:00
Jakub Kicinski	30d2117191	nfp: propagate list buffer size in struct rx_ring Free list buffer size needs to be propagated to few functions as a parameter and added to struct nfp_net_rx_ring since soon some of the functions will be reused to manage rings with buffers of size different than nn->fl_bufsz. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:05 -04:00
Jakub Kicinski	aba52df80b	nfp: sync ring state during FW reconfiguration FW reconfiguration in .ndo_open()/.ndo_stop() should reset/ restore queue state. Since we need IRQs to be disabled when filling rings on RX path we have to move disable_irq() from .ndo_open() all the way up to IRQ allocation. nfp_net_start_vec() becomes trivial now so it's inlined. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:05 -04:00
Jakub Kicinski	1cd0cfc498	nfp: slice .ndo_open() and .ndo_stop() up Divide .ndo_open() and .ndo_stop() into logical, callable chunks. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:05 -04:00
Jakub Kicinski	ca40feab8f	nfp: move filling ring information to FW config nfp_net_[rt]x_ring_{alloc,free} should only allocate or free ring resources without touching the device. Move setting parameters in the BAR to separate functions. This will make it possible to reuse alloc/free functions to allocate new rings while the device is running. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:05 -04:00
Jakub Kicinski	114bdef0be	nfp: preallocate RX buffers early in .ndo_open We want the .ndo_open() to have following structure: - allocate resources; - configure HW/FW; - enable the device from stack perspective. Therefore filling RX rings needs to be moved to the beginning of .ndo_open(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:05 -04:00
Jakub Kicinski	1934680f55	nfp: reorganize initial filling of RX rings Separate allocation of buffers from giving them to FW, thanks to this it will be possible to move allocation earlier on .ndo_open() path and reuse buffers during runtime reconfiguration. Similar to TX side clean up the spill of functionality from flush to freeing the ring. Unlike on TX side, RX ring reset does not free buffers from the ring. Ring reset means only that FW pointers are zeroed and buffers on the ring must be placed in [0, cnt - 1) positions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:04 -04:00
Jakub Kicinski	827deea9bc	nfp: cleanup tx ring flush and rename to reset Since we never used flush without freeing the ring later the functionality of the two operations is mixed. Rename flush to ring reset and move there all the things which have to be done after FW ring state is cleared. While at it do some clean-ups. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:04 -04:00
Jakub Kicinski	73725d9dfd	nfp: allocate ring SW structs dynamically To be able to switch rings more easily on config changes allocate them dynamically, separately from nfp_net structure. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:04 -04:00
Jakub Kicinski	d79737c25e	nfp: make *x_ring_init do all the init nfp_net_[rt]x_ring_init functions used to be called from probe path only and some of their functionality was spilled to the call site. In order to reuse them for ring reconfiguration we need them to do all the init. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:04 -04:00
Jakub Kicinski	0afbfb183b	nfp: break up nfp_net_{alloc\|free}_rings nfp_net_{alloc\|free}_rings contained strange mix of allocations and vector initialization. Remove it, declare vector init as a separate function and handle allocations explicitly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:04 -04:00
Jakub Kicinski	0ba40af963	nfp: move link state interrupt request/free calls We need to be able to disable the link state interrupt when the device is brought down. We used to just free the IRQ at the beginning of .ndo_stop(). As we now move towards more ordered .ndo_open()/.ndo_stop() paths LSC allocation should be placed in the "allocate resource" section. Since the IRQ can't be freed early in .ndo_stop(), it is disabled instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:03 -04:00
Jakub Kicinski	ff1b68ab2d	nfp: correct RX buffer length calculation When calculating the RX buffer length we need to account for up to 2 VLAN tags. Rounding up to 1k is an relic of a distant past and can be removed. While at it also remove trivial print statement. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:26:03 -04:00
David S. Miller	70f767d3af	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2016-04-07 This series contains updates to ixgbe and ixgbevf. This entire series (except for one patch from Alex) comes from Mark and is mainly to add support for our new MAC (x550em_a). So let's get Alex's patch out of the way first before we cover Mark's many changes. Alex does his enable bulk free in transmit cleanup for ixgbe and ixgbevf, like his has done for all of our other drivers. First Mark cleans up registers that were not being used, so do some house cleaning. Then to avoid casting lan_id and func fields, just make them u8 since they only hold small values anyways. Found and fixed an issue where on read operations it could be possible to modify locations beyond the length passed in, so change the check to round up in the same way. Cleaned up the interface for issuing firmware commands to use a void * instead of a u32 * which eliminates a number of casts. Added support for the new MAC and provided method pointers and use them to access IOSF-attached devices, since the new MAC will also need a new access method. Added support for SFPs with an external retimer and for an SGMII backplane interface. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 12:13:30 -04:00
David S. Miller	f8711655f8	Merge branch 'bpf-tracepoints' Alexei Starovoitov says: ==================== allow bpf attach to tracepoints Hi Steven, Peter, v1->v2: addressed Peter's comments: - fixed wording in patch 1, added ack - refactored 2nd patch into 3: 2/10 remove unused __perf_addr macro which frees up an argument in perf_trace_buf_submit 3/10 split perf_trace_buf_prepare into alloc and update parts, so that bpf programs don't have to pay performance penalty for update of struct trace_entry which is not going to be accessed by bpf 4/10 actual addition of bpf filter to perf tracepoint handler is now trivial and bpf prog can be used as proper filter of tracepoints v1 cover: last time we discussed bpf+tracepoints it was a year ago [1] and the reason we didn't proceed with that approach was that bpf would make arguments arg1, arg2 to trace_xx(arg1, arg2) call to be exposed to bpf program and that was considered unnecessary extension of abi. Back then I wanted to avoid the cost of buffer alloc and field assign part in all of the tracepoints, but looks like when optimized the cost is acceptable. So this new apporach doesn't expose any new abi to bpf program. The program is looking at tracepoint fields after they were copied by perf_trace_xx() and described in /sys/kernel/debug/tracing/events/xxx/format We made a tool [2] that takes arguments from /sys/.../format and works as: $ tplist.py -v random:urandom_read int got_bits; int pool_left; int input_left; Then these fields can be copy-pasted into bpf program like: struct urandom_read { __u64 hidden_pad; int got_bits; int pool_left; int input_left; }; and the program can use it: SEC("tracepoint/random/urandom_read") int bpf_prog(struct urandom_read *ctx) { return ctx->pool_left > 0 ? 1 : 0; } This way the program can access tracepoint fields faster than equivalent bpf+kprobe program, which is the main goal of these patches. Patch 1-4 are simple changes in perf core side, please review. I'd like to take the whole set via net-next tree, since the rest of the patches might conflict with other bpf work going on in net-next and we want to avoid cross-tree merge conflicts. Alternatively we can put patches 1-4 into both tip and net-next. Patch 9 is an example of access to tracepoint fields from bpf prog. Patch 10 is a micro benchmark for bpf+kprobe vs bpf+tracepoint. Note that for actual tracing tools the user doesn't need to run tplist.py and copy-paste fields manually. The tools do it automatically. Like argdist tool [3] can be used as: $ argdist -H 't:block:block_rq_complete():u32:nr_sector' where 'nr_sector' is name of tracepoint field taken from /sys/kernel/debug/tracing/events/block/block_rq_complete/format and appropriate bpf program is generated on the fly. [1] http://thread.gmane.org/gmane.linux.kernel.api/8127/focus=8165 [2] https://github.com/iovisor/bcc/blob/master/tools/tplist.py [3] https://github.com/iovisor/bcc/blob/master/tools/argdist.py ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -04:00
Alexei Starovoitov	e3edfdec04	samples/bpf: add tracepoint vs kprobe performance tests the first microbenchmark does fd=open("/proc/self/comm"); for() { write(fd, "test"); } and on 4 cpus in parallel: writes per sec base (no tracepoints, no kprobes) 930k with kprobe at __set_task_comm() 420k with tracepoint at task:task_rename 730k For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read. For tracepint bpf program does nothing, since arguments are copied by tracepoint. 2nd microbenchmark does: fd=open("/dev/urandom"); for() { read(fd, buf); } and on 4 cpus in parallel: reads per sec base (no tracepoints, no kprobes) 300k with kprobe at urandom_read() 279k with tracepoint at random:urandom_read 290k bpf progs attached to kprobe and tracepoint are noop. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -04:00
Alexei Starovoitov	3c9b16448c	samples/bpf: tracepoint example modify offwaketime to work with sched/sched_switch tracepoint instead of kprobe into finish_task_switch Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -04:00
Alexei Starovoitov	c07660409e	samples/bpf: add tracepoint support to bpf loader Recognize "tracepoint/" section name prefix and attach the program to that tracepoint. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -04:00
Alexei Starovoitov	32bbe0078a	bpf: sanitize bpf tracepoint access during bpf program loading remember the last byte of ctx access and at the time of attaching the program to tracepoint check that the program doesn't access bytes beyond defined in tracepoint fields This also disallows access to __dynamic_array fields, but can be relaxed in the future. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	9940d67c93	bpf: support bpf_get_stackid() and bpf_perf_event_output() in tracepoint programs needs two wrapper functions to fetch 'struct pt_regs *' to convert tracepoint bpf context into kprobe bpf context to reuse existing helper functions Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	9fd82b610b	bpf: register BPF_PROG_TYPE_TRACEPOINT program type register tracepoint bpf program type and let it call the same set of helper functions as BPF_PROG_TYPE_KPROBE Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	98b5c2c65c	perf, bpf: allow bpf programs attach to tracepoints introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be attached to the perf tracepoint handler, which will copy the arguments into the per-cpu buffer and pass it to the bpf program as its first argument. The layout of the fields can be discovered by doing 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format' prior to the compilation of the program with exception that first 8 bytes are reserved and not accessible to the program. This area is used to store the pointer to 'struct pt_regs' which some of the bpf helpers will use: +---------+ \| 8 bytes \| hidden 'struct pt_regs *' (inaccessible to bpf program) +---------+ \| N bytes \| static tracepoint fields defined in tracepoint/format (bpf readonly) +---------+ \| dynamic \| __dynamic_array bytes of tracepoint (inaccessible to bpf yet) +---------+ Not that all of the fields are already dumped to user space via perf ring buffer and broken application access it directly without consulting tracepoint/format. Same rule applies here: static tracepoint fields should only be accessed in a format defined in tracepoint/format. The order of fields and field sizes are not an ABI. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	1e1dcd93b4	perf: split perf_trace_buf_prepare into alloc and update parts split allows to move expensive update of 'struct trace_entry' to later phase. Repurpose unused 1st argument of perf_tp_event() to indicate event type. While splitting use temp variable 'rctx' instead of '*rctx' to avoid unnecessary loads done by the compiler due to -fno-strict-aliasing Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	e93735be6a	perf: remove unused __addr variable now all calls to perf_trace_buf_submit() pass 0 as 4th argument which will be repurposed in the next patch which will change the meaning of 1st arg of perf_tp_event() to event_type Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
Alexei Starovoitov	ec5e099d6e	perf: optimize perf_fetch_caller_regs avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu init logic and subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, so we can safely drop memset from all of the above cases and move it into perf_ftrace_function_call that calls it with stack allocated pt_regs. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:26 -04:00
David S. Miller	b33b0a1bf6	net: Fix build failure due to lockdep_sock_is_held(). Needs to be protected with CONFIG_LOCKDEP. Based upon a patch by Hannes Frederic Sowa. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 20:40:25 -04:00

1 2 3 4 5 ...

588999 Commits