Commit Graph

2562 Commits

Author SHA1 Message Date
Herbert Xu
e9fa4f7bd2 [INET]: Use pskb_trim_unique when trimming paged unique skbs
The IPv4/IPv6 datagram output path was using skb_trim to trim paged
packets because they know that the packet has not been cloned yet
(since the packet hasn't been given to anything else in the system).

This broke because skb_trim no longer allows paged packets to be
trimmed.  Paged packets must be given to one of the pskb_trim functions
instead.

This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6
datagram output path scenario and replaces the corresponding skb_trim
calls with it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 20:12:58 -07:00
Mark Huang
dcb7cd97f1 [NETFILTER]: ulog: fix panic on SMP kernels
Fix kernel panic on various SMP machines. The culprit is a null
ub->skb in ulog_send(). If ulog_timer() has already been scheduled on
one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the
queue on another CPU by calling ulog_send() right before it exits,
there will be no skbuff when ulog_timer() acquires the lock and calls
ulog_send(). Cancelling the timer in ulog_send() doesn't help because
it has already been scheduled and is running on the first CPU.

Similar problem exists in ebt_ulog.c and nfnetlink_log.c.

Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:57:54 -07:00
Patrick McHardy
0eff66e625 [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init path
Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes
wrong during initialization.

Noticed by Rennie deGraaf <degraaf@cpsc.ucalgary.ca>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:57:28 -07:00
Stephen Hemminger
7ee66fcb94 [LLC]: multicast receive device match
Fix from Aji_Srinivas@emc.com, STP packets are incorrectly received on
all LLC datagram sockets, whichever interface they are bound to.  The
llc_sap datagram receive logic sends packets with a unicast
destination MAC to one socket bound to that SAP and MAC, and multicast
packets to all sockets bound to that SAP. STP packets are multicast,
and we do need to know on which interface they were received.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:56:26 -07:00
David S. Miller
d49c73c729 [IPSEC]: Validate properly in xfrm_dst_check()
If dst->obsolete is -1, this is a signal from the
bundle creator that we want the XFRM dst and the
dsts that it references to be validated on every
use.

I misunderstood this intention when I changed
xfrm_dst_check() to always return NULL.

Now, when we purge a dst entry, by running dst_free()
on it.  This will set the dst->obsolete to a positive
integer, and we want to return NULL in that case so
that the socket does a relookup for the route.

Thus, if dst->obsolete<0, let stale_bundle() validate
the state, else always return NULL.

In general, we need to do things more intelligently
here because we flush too much state during rule
changes.  Herbert Xu has some ideas wherein the key
manager gives us some help in this area.  We can also
use smarter state management algorithms inside of
the kernel as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:55:53 -07:00
Patrick McHardy
1c7628bd7a [NETFILTER]: xt_hashlimit: fix limit off-by-one
Hashlimit doesn't account for the first packet, which is inconsistent
with the limit match.

Reported by ryan.castellucci@gmail.com, netfilter bugzilla #500.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:06:02 -07:00
Phil Oester
97c802a113 [NETFILTER]: xt_string: fix negation
The xt_string match is broken with ! negation.
This resolves a portion of netfilter bugzilla #497.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:05:35 -07:00
David S. Miller
18b6fe64d4 [TCP]: Fix botched memory leak fix to tcpprobe_read().
Somehow I clobbered James's original fix and only my
subsequent compiler warning change went in for that
changeset.

Get the real fix in there.

Noticed by Jesper Juhl.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-13 18:05:09 -07:00
David S. Miller
fff642570d [IPX]: Fix typo, ipxhdr() --> ipx_hdr()
Noticed by Dave Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-09 17:36:15 -07:00
Herbert Xu
06aebfb7fa [IPV6]: The ifa lock is a BH lock
The ifa lock is expected to be taken in BH context (by addrconf timers)
so we must disable BH when accessing it from user context.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-09 16:52:04 -07:00
Greg Kroah-Hartman
a465714109 Merge gregkh@master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-08-09 11:49:13 -07:00
Dmitry Mishin
7c91767a6b [NET]: add_timer -> mod_timer() in dst_run_gc()
Patch from Dmitry Mishin <dim@openvz.org>:

Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.

       CPU1                            CPU2
dst_run_gc()  entered           dst_run_gc() entered
spin_lock(&dst_lock)                   .....
del_timer(&dst_gc_timer)         fail to get lock
       ....                         mod_timer() <--- puts 
                                                 timer back
                                                 to the list
add_timer(&dst_gc_timer) <--- BUG because timer is in list already.

Found during OpenVZ internal testing.

At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.

F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-09 02:25:54 -07:00
Jeff Garzik
2f96740c26 Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2006-08-08 23:48:48 -04:00
Stephen Hemminger
7b1ba8de56 [IPX]: Another nonlinear receive fix
Need to check some more cases in IPX receive.  If the skb is purely
fragments, the IPX header needs to be extracted. The function
pskb_may_pull() may in theory invalidate all the pointers in the skb,
so references to ipx header must be refreshed.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-08 16:48:51 -07:00
David S. Miller
70f8e78e15 [RTNETLINK]: Fix IFLA_ADDRESS handling.
The ->set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.

So whip up a sockaddr to wrap around the netlink
attribute for the ->set_mac_address call.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-08 16:47:37 -07:00
Wei Yongjun
bd37a08859 [TCP]: SNMPv2 tcpOutSegs counter error
Do not count retransmitted segments.

Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 21:04:15 -07:00
David S. Miller
69d8c28c95 [PKTGEN]: Make sure skb->{nh,h} are initialized in fill_packet_ipv6() too.
Mirror the bug fix from fill_packet_ipv4()

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 20:52:10 -07:00
Chen-Li Tien
aaf580601f [PKTGEN]: Fix oops when used with balance-tlb bonding
Signed-off-by: Chen-Li Tien <cltien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 20:49:07 -07:00
Kirill Korotaev
8d1502de27 [IPV4]: Limit rt cache size properly.
From: Kirill Korotaev <dev@sw.ru>

During OpenVZ stress testing we found that UDP traffic with random src
can generate too much excessive rt hash growing leading finally to OOM
and kernel panics.

It was found that for 4GB i686 system (having 1048576 total pages and
  225280 normal zone pages) kernel allocates the following route hash:
syslog: IP route cache hash table entries: 262144 (order: 8, 1048576
bytes) => ip_rt_max_size = 4194304 entries, i.e.  max rt size is
4194304 * 256b = 1Gb of RAM > normal_zone

Attached the patch which removes HASH_HIGHMEM flag from
alloc_large_system_hash() call.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 20:44:22 -07:00
Stephen Hemminger
8b5cc5ef40 [IPX]: Header length validation needed
This patch will linearize and check there is enough data.
It handles the pprop case as well as avoiding a whole audit of
the routing code.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 20:09:20 -07:00
Christoph Hellwig
7b2e497a06 [NET]: Assign skb->dev in netdev_alloc_skb
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-07 16:09:04 -07:00
Linus Torvalds
cb3f1e7b83 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [LAPB]: Fix windowsize check
  [TCP]: Fixes IW > 2 cases when TCP is application limited
  [PKT_SCHED] RED: Fix overflow in calculation of queue average
  [LLX]: SOCK_DGRAM interface fixes
  [PKT_SCHED]: Return ENOENT if qdisc module is unavailable
  [BRIDGE]: netlink status fix
2006-08-06 08:58:24 -07:00
Neil Brown
2f34931fdc [PATCH] knfsd: fix race related problem when adding items to and svcrpc auth cache
If we don't find the item we are lookng for, we allocate a new one, and
then grab the lock again and search to see if it has been added while we
did the alloc.  If it had been added we need to 'cache_put' the newly
created item that we are never going to use.  But as it hasn't been
initialised properly, putting it can cause an oops.

So move the ->init call earlier to that it will always be fully initilised
if we have to put it.

Thanks to Philipp Matthias Hahn <pmhahn@svs.Informatik.Uni-Oldenburg.de>
for reporting the problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-06 08:57:47 -07:00
Diego Calleja
558e10a57d [LAPB]: Fix windowsize check
In bug #6954, Norbert Reinartz reported the following issue:

"Function lapb_setparms() in file net/lapb/lapb_iface.c checks if the given
parameters are valid. If the given window size is in the range of 8 .. 127,
lapb_setparms() fails and returns an error value of LAPB_INVALUE, even if bit
LAPB_EXTENDED in parms->mode is set.
If bit LAPB_EXTENDED in parms->mode is set and the window size is in the range
of 8 .. 127, the first check "(parms->mode & LAPB_EXTENDED)" results true  and
the second check "(parms->window < 1 || parms->window > 127)" results false.
Both checks in conjunction result to false, thus the third check "(parms->window
< 1 || parms->window > 7)" is done by fault.
This third check results true, so that we leave lapb_setparms() by 'goto out_put'.
Seems that this bug doesn't cause any problems, because lapb_setparms() isn't
used to change the default values of LAPB. We are using kernel lapb in our
software project and also change the default parameters of lapb, so we found
this bug"

He also pasted a fix, that I've transformated into a patch:

Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-05 21:15:58 -07:00
Ilpo Järvinen
d254bcdbf2 [TCP]: Fixes IW > 2 cases when TCP is application limited
Whenever a transfer is application limited, we are allowed at least
initial window worth of data per window unless cwnd is previously
less than that.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:52 -07:00
Stephen Hemminger
30a584d944 [LLX]: SOCK_DGRAM interface fixes
The datagram interface of LLC is broken in a couple of ways.
These were discovered when trying to use it to build an out-of-kernel
version of STP.

First it didn't pass the source address of the received packet
in recvfrom(). It needs to copy the source address of received LLC packets
into the socket control block. At the same time fix a security issue
because there was uninitialized data leakage. Every recvfrom call
was just copying out old data.

Second, LLC should not merge multiple packets in one receive call
on datagram sockets. LLC should preserve packet boundaries on
SOCK_DGRAM.

This fix goes against the old historical comments about UNIX98 semantics
but without this fix SOCK_DGRAM is broken and useless. So either ANK's
interpretation was incorect or UNIX98 standard was wrong.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:50 -07:00
Jamal Hadi Salim
b9e2cc0f0e [PKT_SCHED]: Return ENOENT if qdisc module is unavailable
Return ENOENT if qdisc module is unavailable

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:49 -07:00
Stephen Hemminger
bea1b42e1b [BRIDGE]: netlink status fix
Fix code that passes back netlink status messages about
bridge changes. Submitted by Aji_Srinivas@emc.com

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-04 22:59:48 -07:00
Herbert Xu
782a667511 [PATCH] Send wireless netlink events with a clean slate
Drivers expect to be able to call wireless_send_event in arbitrary
contexts.  On the other hand, netlink really doesn't like being
invoked in an IRQ context.  So we need to postpone the sending of
netlink skb's to a tasklet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-08-04 14:57:19 -04:00
Trond Myklebust
5c3e985a2c SUNRPC: Fix obvious refcounting bugs in rpc_pipefs.
Doh!

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 496f408f2f0e7ee5481a7c2222189be6c4f5aa6c commit)
2006-08-03 16:57:26 -04:00
Trond Myklebust
e0ab53deaa RPC: Ensure that we disconnect TCP socket when client requests error out
If we're part way through transmitting a TCP request, and the client
errors, then we need to disconnect and reconnect the TCP socket in order to
avoid confusing the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 031a50c8b9ea82616abd4a4e18021a25848941ce commit)
2006-08-03 16:56:55 -04:00
Alexey Dobriyan
29bbd72d6e [NET]: Fix more per-cpu typos
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 15:02:31 -07:00
Chris Leech
e6eb307d48 [I/OAT]: Remove CPU hotplug lock from net_dma_rebalance
Remove the lock_cpu_hotplug()/unlock_cpu_hotplug() calls from
net_dma_rebalance

The lock_cpu_hotplug()/unlock_cpu_hotplug() sequence in
net_dma_rebalance is both incorrect (as pointed out by David Miller)
because lock_cpu_hotplug() may sleep while the net_dma_event_lock
spinlock is held, and unnecessary (as pointed out by Andrew Morton) as
spin_lock() disables preemption which protects from CPU hotplug
events.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 14:21:19 -07:00
Patrick Caulfield
9bbf28a1ff [DECNET]: Fix for routing bug
This patch fixes a bug in the DECnet routing code where we were
selecting a loopback device in preference to an outward facing device
even when the destination was known non-local. This patch should fix
the problem.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 14:14:44 -07:00
Catherine Zhang
dc49c1f94e [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch
From: Catherine Zhang <cxzhang@watson.ibm.com>

This patch implements a cleaner fix for the memory leak problem of the
original unix datagram getpeersec patch.  Instead of creating a
security context each time a unix datagram is sent, we only create the
security context when the receiver requests it.

This new design requires modification of the current
unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
secid_to_secctx and release_secctx.  The former retrieves the security
context and the latter releases it.  A hook is required for releasing
the security context because it is up to the security module to decide
how that's done.  In the case of Selinux, it's a simple kfree
operation.

Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 14:12:06 -07:00
Adrian Bunk
2b7e24b66d [NET]: skb_queue_lock_key() is no longer used.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 14:07:58 -07:00
Wei Dong
dafee49085 [IPV6]: SNMPv2 "ipv6IfStatsOutFragCreates" counter error
When I tested linux kernel 2.6.71.7 about statistics
"ipv6IfStatsOutFragCreates", and found that it couldn't increase
correctly. The criteria is RFC 2465:

  ipv6IfStatsOutFragCreates OBJECT-TYPE
      SYNTAX      Counter32
      MAX-ACCESS  read-only
      STATUS      current
      DESCRIPTION
         "The number of output datagram fragments that have
         been generated as a result of fragmentation at
         this output interface."
      ::= { ipv6IfStatsEntry 15 }

I think there are two issues in Linux kernel. 
1st:
RFC2465 specifies the counter is "The number of output datagram
fragments...". I think increasing this counter after output a fragment
successfully is better. And it should not be increased even though a
fragment is created but failed to output.

2nd:
If we send a big ICMP/ICMPv6 echo request to a host, and receive
ICMP/ICMPv6 echo reply consisted of some fragments. As we know that in
Linux kernel first fragmentation occurs in ICMP layer(maybe saying
transport layer is better), but this is not the "real"
fragmentation,just do some "pre-fragment" -- allocate space for date,
and form a frag_list, etc. The "real" fragmentation happens in IP layer
-- set offset and MF flag and so on. So I think in "fast path" for
ip_fragment/ip6_fragment, if we send a fragment which "pre-fragment" by
upper layer we should also increase "ipv6IfStatsOutFragCreates".

Signed-off-by: Wei Dong <weid@nanjing-fnst.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:41:21 -07:00
Wei Dong
32c524d1c4 [IPV6]: SNMPv2 "ipv6IfStatsInHdrErrors" counter error
When I tested Linux kernel 2.6.17.7 about statistics
"ipv6IfStatsInHdrErrors", found that this counter couldn't increase
correctly. The criteria is RFC2465:
  ipv6IfStatsInHdrErrors OBJECT-TYPE
      SYNTAX     Counter3
      MAX-ACCESS read-only
      STATUS     current
      DESCRIPTION
         "The number of input datagrams discarded due to
         errors in their IPv6 headers, including version
         number mismatch, other format errors, hop count
         exceeded, errors discovered in processing their
         IPv6 options, etc."
      ::= { ipv6IfStatsEntry 2 }

When I send TTL=0 and TTL=1 a packet to a router which need to be
forwarded, router just sends an ICMPv6 message to tell the sender that
TIME_EXCEED and HOPLIMITS, but no increments for this counter(in the
function ip6_forward).

Signed-off-by: Wei Dong <weid@nanjing-fnst.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:39:57 -07:00
David S. Miller
b60dfc6c20 [NET]: Kill the WARN_ON() calls for checksum fixups.
We have a more complete solution in the works, involving
the seperation of CHECKSUM_HW on input vs. output, and
having netfilter properly do incremental checksums.

But that is a very involved patch and is thus 2.6.19
material.

What we have now is infinitely better than the past,
wherein all TSO packets were dropped due to corrupt
checksums as soon at the NAT module was loaded.  At
least now, the checksums do get fixed up, it just
isn't the cleanest nor most optimal solution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:30 -07:00
Patrick McHardy
3ab720881b [NETFILTER]: xt_hashlimit/xt_string: missing string validation
The hashlimit table name and the textsearch algorithm need to be
terminated, the textsearch pattern length must not exceed the
maximum size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:29 -07:00
Patrick McHardy
b10866fd7d [NETFILTER]: SIP helper: expect RTP streams in both directions
Since we don't know in which direction the first packet will arrive, we
need to create one expectation for each direction, which is currently
prevented by max_expected beeing set to 1.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:28 -07:00
Christoph Hellwig
8af2745645 [NET]: Add netdev_alloc_skb().
Add a dev_alloc_skb variant that takes a struct net_device * paramater.
For now that paramater is unused, but I'll use it to allocate the skb
from node-local memory in a follow-up patch.  Also there have been some
other plans mentioned on the list that can use it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:25 -07:00
David S. Miller
52499afe40 [TCP]: Process linger2 timeout consistently.
Based upon guidance from Alexey Kuznetsov.

When linger2 is active, we check to see if the fin_wait2
timeout is longer than the timewait.  If it is, we schedule
the keepalive timer for the difference between the timewait
timeout and the fin_wait2 timeout.

When this orphan socket is seen by tcp_keepalive_timer()
it will try to transform this fin_wait2 socket into a
fin_wait2 mini-socket, again if linger2 is active.

Not all paths were setting this initial keepalive timer correctly.
The tcp input path was doing it correctly, but tcp_close() wasn't,
potentially making the socket linger longer than it really needs to.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:24 -07:00
James Morris
a280b89982 [SECURITY] secmark: nul-terminate secdata
The patch below fixes a problem in the iptables SECMARK target, where
the user-supplied 'selctx' string may not be nul-terminated.

From initial analysis, it seems that the strlen() called from
selinux_string_to_sid() could run until it arbitrarily finds a zero,
and possibly cause a kernel oops before then.

The impact of this appears limited because the operation requires
CAP_NET_ADMIN, which is essentially always root.  Also, the module is
not yet in wide use.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:23 -07:00
Tom Tucker
8d71740c56 [NET]: Core net changes to generate netevents
Generate netevents for:
- neighbour changes
- routing redirects
- pmtu changes

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:21 -07:00
Tom Tucker
792d1932e3 [NET]: Network Event Notifier Mechanism.
This patch uses notifier blocks to implement a network event
notifier mechanism.

Clients register their callback function by calling
register_netevent_notifier() like this:

static struct notifier_block nb = {
        .notifier_call = my_callback_func
};

...

register_netevent_notifier(&nb);

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:20 -07:00
Wei Yongjun
3687b1dc6f [TCP]: SNMPv2 tcpAttemptFails counter error
Refer to RFC2012, tcpAttemptFails is defined as following:
  tcpAttemptFails OBJECT-TYPE
      SYNTAX      Counter32
      MAX-ACCESS  read-only
      STATUS      current
      DESCRIPTION
              "The number of times TCP connections have made a direct
              transition to the CLOSED state from either the SYN-SENT
              state or the SYN-RCVD state, plus the number of times TCP
              connections have made a direct transition to the LISTEN
              state from the SYN-RCVD state."
      ::= { tcp 7 }

When I lookup into RFC793, I found that the state change should occured
under following condition:
  1. SYN-SENT -> CLOSED
     a) Received ACK,RST segment when SYN-SENT state.

  2. SYN-RCVD -> CLOSED
     b) Received SYN segment when SYN-RCVD state(came from LISTEN).
     c) Received RST segment when SYN-RCVD state(came from SYN-SENT).
     d) Received SYN segment when SYN-RCVD state(came from SYN-SENT).

  3. SYN-RCVD -> LISTEN
     e) Received RST segment when SYN-RCVD state(came from LISTEN).

In my test, those direct state transition can not be counted to
tcpAttemptFails.

Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:19 -07:00
James Morris
118075b3cd [TCP]: fix memory leak in net/ipv4/tcp_probe.c::tcpprobe_read()
Based upon a patch by Jesper Juhl.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:18 -07:00
Herbert Xu
f4d26fb336 [NET]: Fix ___pskb_trim when entire frag_list needs dropping
When the trim point is within the head and there is no paged data,
___pskb_trim fails to drop the first element in the frag_list.
This patch fixes this by moving the len <= offset case out of the
page data loop.

This patch also adds a missing kfree_skb on the frag that we just
cloned.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:16 -07:00
Herbert Xu
497c615aba [IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls
The current users of ip6_dst_lookup can be divided into two classes:

1) The caller holds no locks and is in user-context (UDP).
2) The caller does not want to lookup the dst cache at all.

The second class covers everyone except UDP because most people do
the cache lookup directly before calling ip6_dst_lookup.  This patch
adds ip6_sk_dst_lookup for the first class.

Similarly ip6_dst_store users can be divded into those that need to
take the socket dst lock and those that don't.  This patch adds
__ip6_dst_store for those (everyone except UDP/datagram) that don't
need an extra lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-08-02 13:38:14 -07:00