ctnetlink does not need to allocate the conntrack entries with GFP_ATOMIC
as its code is executed in user context.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all qdiscs which allow to create classes uses a fixed sized hash
table with size 16 to hash the classes. This causes a large bottleneck
when using thousands of classes and unbound filters.
Add helpers for dynamically sized class hashes to fix this. The following
patches will convert the qdiscs to use them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add GVRP support for dynamically registering VLANs with switches.
By default GVRP is disabled because we only support the applicant-only
participant model, which means it should not be enabled on vlans that
are members of a bridge. Since there is currently no way to cleanly
determine that, the user is responsible for enabling it.
The code is pretty small and low impact, its wrapped in a config
option though because it depends on the GARP implementation and
the STP core.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an implementation of the GARP (Generic Attribute Registration Protocol)
applicant-only participant. This will be used by the following patch to
add GVRP support to the VLAN code.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add small STP demux layer for demuxing STP PDUs based on MAC address.
This is needed to run both GARP and STP in parallel (or even load the
modules) since both use LLC_SAP_BSPAN.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As simple as the patch #1 in this set.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two special cases here - one is rxrpc - I put init_net there
explicitly, since we haven't touched this part yet. The second
place is in __udp4_lib_rcv - we already have a struct net there,
but I have to move its initialization above to make it ready
at the "drop" label.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing special - all the places already have a struct sock
at hands, so use the sock_net() net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is required to pass namespace context into rt_cache_flush called from
->flush_cache.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass double tcf_proto pointers to tcf_destroy_chain() to make it
clear the start of the filter list for more consistency.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add spectrum capability and required information
elements to association request providing AP has requested it and
it is supported by the driver
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Assaf Krauss <assaf.krauss@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 refuse a WEP key whose length is not WEP40 nor
WEP104.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The workqueue provided by mac80211 should not be used for
scheduled tasks that acquire the RTNL lock. This could be done
when the driver uses the function ieee80211_iterate_active_interfaces()
within the scheduled work. Such behavior will end in locking
dependencies problems when an interface is being removed.
This patch will add a notification about the RTNL locking and
the mac80211 workqueue to prevent driver developers from
blindly using it.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some drivers may want to to use the TKIP key offsets for TX and RX
MIC so lets move this out. Lets also clear up a bit how this is used
internally in mac80211.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[ Based upon original report and patch by Karsten Keil. Karsten
has verified that this fixes the TAHI test case "ICMPv6 test
v6LC.5.1.2 Part F". -DaveM ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.
We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.
Suggested (although only half-seriously) by Evgeniy Polyakov.
Signed-off-by: David S. Miller <davem@davemloft.net>
In commits 33c732c361 ([IPV4]: Add raw
drops counter) and a92aa318b4 ([IPV6]:
Add raw drops counter), Wang Chen added raw drops counter for
/proc/net/raw & /proc/net/raw6
This patch adds this capability to UDP sockets too (/proc/net/udp &
/proc/net/udp6).
This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also
be examined for each udp socket.
# grep Udp: /proc/net/snmp
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 23971006 75 899420 16390693 146348 0
# cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt ---
uid timeout inode ref pointer drops
75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2358 2 ffff81082a538c80 0
111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 ---
0 0 2286 2 ffff81042dd35c80 146348
In this example, only port 111 (0x006F) was flooded by messages that
user program could not read fast enough. 146348 messages were lost.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ROSE network is organized through nodes connected via hamradio or Internet.
AX25 packet radio frames sent to a remote ROSE address destination are routed
through these nodes.
Without the present patch, automatic routing mechanism did not work optimally
due to an improper parameter checking.
rose_get_neigh() function is called either by rose_connect() or by
rose_route_frame().
In the case of a call from rose_connect(), f0 timer is checked to find if a connection
is already pending. In that case it returns the address of the neighbour, or returns a NULL otherwise.
When called by rose_route_frame() the purpose was to route a packet AX25 frame
through an adjacent node given a destination rose address.
However, in that case, t0 timer checked does not indicate if the adjacent node
is actually connected even if the timer is not null. Thus, for each frame sent, the
function often tried to start a new connexion even if the adjacent node was already connected.
The patch adds a "new" parameter that is true when the function is called by
rose route_frame().
This instructs rose_get_neigh() to check node parameter "restarted".
If restarted is true it means that the route to the destination address is opened via a neighbour
node already connected.
If "restarted" is false the function returns a NULL.
In that case the calling function will initiate a new connection as before.
This results in a fast routing of frames, from nodes to nodes, until
destination is reached, as originaly specified by ROSE protocole.
Signed-off-by: Bernard Pidoux <f6bvp@amsat.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix three ct_extend/NAT extension related races:
- When cleaning up the extension area and removing it from the bysource hash,
the nat->ct pointer must not be set to NULL since it may still be used in
a RCU read side
- When replacing a NAT extension area in the bysource hash, the nat->ct
pointer must be assigned before performing the replacement
- When reallocating extension storage in ct_extend, the old memory must
not be freed immediately since it may still be used by a RCU read side
Possibly fixes https://bugzilla.redhat.com/show_bug.cgi?id=449315
and/or http://bugzilla.kernel.org/show_bug.cgi?id=10875
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three major portions to this change:
1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
and IW_EV_COMPAT_POINT_LEN helper defines.
2) Delete iw_stream_check_add_*(), they are unused.
3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
size the event and pointer lengths correctly depending upon whether
IW_REQUEST_FLAG_COMPAT is set or not.
4) The mechanical transformations to the drivers and wireless stack
bits to get the iw_request_info passed down into the routines
modified in #3. Also, explicit references to IW_EV_LCP_LEN are
replaced with iwe_stream_lcp_len(info).
With a lot of help and bug fixes from Masakazu Mokuno.
Signed-off-by: David S. Miller <davem@davemloft.net>
Next we can kill the hacks in fs/compat_ioctl.c and also
dispatch compat ioctls down into the driver and 80211 protocol
helper layers in order to handle iw_point objects embedded in
stream replies which need to be translated.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many possible ways to add this "salt", thus I made this
patch to be the last in the series to change it if required.
Currently I propose to use the struct net pointer itself as this
salt, but since this pointer is most often cache-line aligned, shift
this right to eliminate the bits, that are most often zeroed.
After this, simply add this mix to prepared hashfn-s.
For CONFIG_NET_NS=n case this salt is 0 and no changes in hashfn
appear.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as for inet_hashfn, prepare its ipv6 incarnation.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although this hash takes addresses into account, the ehash chains
can also be too long when, for instance, communications via lo occur.
So, prepare the inet_hashfn to take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Listening-on-one-port sockets in many namespaces produce long
chains in the listening_hash-es, so prepare the inet_lhashfn to
take struct net into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Binding to some port in many namespaces may create too long
chains in bhash-es, so prepare the hashfn to take struct net
into account.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change struct proto destroy function pointer to return void. Noticed
by Al Viro.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take a __le16 directly rather than a host-endian value.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
reorder udp_iter_state to remove padding on 64bit builds
shrinks from 24 to 16 bytes, moving to a smaller slab when
CONFIG_NET_NS is undefined & seq_net_private = {}
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts two changesets, ec3c0982a2
("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
the follow-on bug fix 9ae27e0adb
("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
This change causes several problems, first reported by Ingo Molnar
as a distcc-over-loopback regression where connections were getting
stuck.
Ilpo Järvinen first spotted the locking problems. The new function
added by this code, tcp_defer_accept_check(), only has the
child socket locked, yet it is modifying state of the parent
listening socket.
Fixing that is non-trivial at best, because we can't simply just grab
the parent listening socket lock at this point, because it would
create an ABBA deadlock. The normal ordering is parent listening
socket --> child socket, but this code path would require the
reverse lock ordering.
Next is a problem noticed by Vitaliy Gusev, he noted:
----------------------------------------
>--- a/net/ipv4/tcp_timer.c
>+++ b/net/ipv4/tcp_timer.c
>@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> goto death;
> }
>
>+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
>+ tcp_send_active_reset(sk, GFP_ATOMIC);
>+ goto death;
Here socket sk is not attached to listening socket's request queue. tcp_done()
will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
release this sk) as socket is not DEAD. Therefore socket sk will be lost for
freeing.
----------------------------------------
Finally, Alexey Kuznetsov argues that there might not even be any
real value or advantage to these new semantics even if we fix all
of the bugs:
----------------------------------------
Hiding from accept() sockets with only out-of-order data only
is the only thing which is impossible with old approach. Is this really
so valuable? My opinion: no, this is nothing but a new loophole
to consume memory without control.
----------------------------------------
So revert this thing for now.
Signed-off-by: David S. Miller <davem@davemloft.net>