When NAT helpers change the TCP packet size, the highest seen sequence
number needs to be corrected. This is currently only done upwards, when
the packet size is reduced the sequence number is unchanged. This causes
TCP conntrack to falsely detect unacknowledged data and decrease the
timeout.
Fix by updating the highest seen sequence number in both directions after
packet mangling.
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
While testing the driver on PPC, we ran into a crash with LRO, Jumbo frames.
With CONFIG_PPC_64K_PAGES configured (a default in PPC), MAX_SKB_FRAGS drops to 3 and we were crossing the array limits on skb_shinfo(skb)->frags[].
Now we coalesce the frags from the same physical page into one slot in
skb_shinfo(skb)->frags[] and go to the next index when the frag is from
different physical page.
This patch is against the net-2.6 tree.
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When NAPI is disabled while we're in net_rx_action, we end up
calling __napi_complete without flushing GRO packets. This is
a bug as it would cause the GRO packets to linger, of course it
also literally BUGs to catch error like this :)
This patch changes it to napi_complete, with the obligatory IRQ
reenabling. This should be safe because we've only just disabled
IRQs and it does not materially affect the test conditions in
between.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As transparent proxying looks up the socket early and assigns
it to the skb for later processing, we must drop any existing
socket ownership prior to that in order to distinguish between
the case where tproxy is active and where it is not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mac80211 module uses rcu_call() thus it should use rcu_barrier()
on module unload.
The rcu_barrier() is placed in mech.c ieee80211_stop_mesh() which is
invoked from ieee80211_stop() in case vif.type == NL80211_IFTYPE_MESH_POINT.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
module unload.
Have not verified that the possibility for new call_rcu() callbacks
has been disabled. As a hint for checking, the functions calling
call_rcu() (unx_destroy_cred and generic_destroy_cred) are
registered as crdestroy function pointer in struct rpc_credops.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When unloading modules that uses call_rcu() callbacks, then we must
use rcu_barrier(). This module uses syncronize_net() which is not
enough to be sure that all callback has been completed.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues.
We add a rcu_barrier() anyhow for correctness.
The maintainer (Chrissie Caulfield) will look into the unload issue
when time permits.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Chrissie Caulfield <christine.caulfield@googlemail.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
sky2 driver on PowerPC targets floods kernel log with following errors:
eth1: hw csum failure.
Call Trace:
[ef84b8a0] [c00075e4] show_stack+0x50/0x160 (unreliable)
[ef84b8d0] [c02fa178] netdev_rx_csum_fault+0x3c/0x5c
[ef84b8f0] [c02f6920] __skb_checksum_complete_head+0x7c/0x84
[ef84b900] [c02f693c] __skb_checksum_complete+0x14/0x24
[ef84b910] [c0337e08] tcp_v4_rcv+0x4c8/0x6f8
[ef84b940] [c031a9c8] ip_local_deliver+0x98/0x210
[ef84b960] [c031a788] ip_rcv+0x38c/0x534
[ef84b990] [c0300338] netif_receive_skb+0x260/0x36c
[ef84b9c0] [c025de00] sky2_poll+0x5dc/0xcf8
[ef84ba20] [c02fb7fc] net_rx_action+0xc0/0x144
The NIC is Yukon-2 EC chip revision 1.
Converting checksum field from le16 to CPU byte order fixes the issue.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing GPL flag and description.
mdio: module license 'unspecified' taints kernel.
Disabling lock debugging due to kernel taint
Signed-off-by: Nicolas Reinecke <nr <at> das-labor.org>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the case when ucc_geth or gianfar are compiled
as modules. Without this patch the call to phy_connect() fails.
Signed-off-by: Ionut Nicu <ionut.nicu@freescale.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid showing wrong high values when the preferred lifetime of an address
is expired.
Signed-off-by: Jens Rosenboom <me@jayr.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC0793 defined that in FIN-WAIT-2 state if the ACK bit is off drop
the segment and return[Page 72]. But this check is missing in function
tcp_timewait_state_process(). This cause the segment with FIN flag but
no ACK has two diffent action:
Case 1:
Node A Node B
<------------- FIN,ACK
(enter FIN-WAIT-1)
ACK ------------->
(enter FIN-WAIT-2)
FIN -------------> discard
(move sk to tw list)
Case 2:
Node A Node B
<------------- FIN,ACK
(enter FIN-WAIT-1)
ACK ------------->
(enter FIN-WAIT-2)
(move sk to tw list)
FIN ------------->
<------------- ACK
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU barriers, rcu_barrier(), is inserted two places.
In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
kmem_cache_destroy(). Firstly to make sure the callback to the
nf_ct_expect_free_rcu() code is still around. Secondly because I'm
unsure about the consequence of having in flight
nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a
kmem_cache_destroy() slab destroy.
And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
invoked by __nf_ct_ext_add(). It might be more efficient to call
rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
thats make it more difficult to read the code (as the callback code
in located in nf_conntrack_extend.c).
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tell PCI core that atl1* device can wakeup the system when WOL is
enabled by calling device_set_wakeup_enable.
Joerg noted that his atl1e device WOL fine after enabling it with
ethtool and changing /sys/class/net/eth0/device/power/wakeup to enabled
Tested on atl1e: https://bugzilla.novell.com/show_bug.cgi?id=493214
Tested by: Joerg Reuter <jreuter@novell.com>
Signed-off-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink address deletion events were not sent when a network device
vanished neither when Phonet was unloaded.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit ae0e8e8220.
This change had two problems:
1) Since it frees the stats in the drivers' close method, we
can OOPS in the transmit routine.
2) stats are no longer remembered across ifdown/ifup which
disagrees with how every other device operates.
Thanks to analysis and test patch from Serge E. Hallyn
and initial OOPS report by Sachin Sant.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes and obvious typo in the netdev_ops initialization:
ndo_so_ioctl should be ndo_do_ioctl.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our CAST algorithm is called cast5, not cast128. Clearly nobody
has ever used it :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 9e9f46c44e.
Quoting from the commit message:
"At this point, it seems to solve more problems than it causes, so let's
try using it by default. It's an easy revert if it ends up causing
trouble."
And guess what? The _CRS code causes trouble.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/battery-2.6:
da9030_battery: Fix race between event handler and monitor
Add MAX17040 Fuel Gauge driver
w1: ds2760_battery: add support for sleep mode feature
w1: ds2760: add support for EEPROM read and write
ds2760_battery: cleanups in ds2760_battery_probe()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
another race fix in jfs_check_acl()
Get "no acls for this inode" right, fix shmem breakage
inline functions left without protection of ifdef (acl)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
audit: inode watches depend on CONFIG_AUDIT not CONFIG_AUDIT_SYSCALL
Even though one cannot make use of the audit watch code without
CONFIG_AUDIT_SYSCALL the spaghetti nature of the audit code means that
the audit rule filtering requires that it at least be compiled.
Thus build the audit_watch code when we build auditfilter like it was
before cfcad62c74
Clearly this is a point of potential future cleanup..
Reported-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit 64d1304a64 (futex: setup writeable mapping for futex ops which
modify user space data) did address only half of the problem of write
access faults.
The patch was made on two wrong assumptions:
1) access_ok(VERIFY_WRITE,...) would actually check write access.
On x86 it does _NOT_. It's a pure address range check.
2) a RW mapped region can not go away under us.
That's wrong as well. Nobody can prevent another thread to call
mprotect(PROT_READ) on that region where the futex resides. If that
call hits between the get_user_pages_fast() verification and the
actual write access in the atomic region we are toast again.
The solution is to not rely on access_ok and get_user() for any write
access related fault on private and shared futexes. Instead we need to
fault it in with verification of write access.
There is no generic non destructive write mechanism which would fault
the user page in trough a #PF, but as we already know that we will
fault we can as well call get_user_pages() directly and avoid the #PF
overhead.
If get_user_pages() returns -EFAULT we know that we can not fix it
anymore and need to bail out to user space.
Remove a bunch of confusing comments on this issue as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
SLUB uses higher order allocations by default but falls back to small
orders under memory pressure. Make sure the GFP mask used in the initial
allocation doesn't include __GFP_NOFAIL.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Traditionally, we never failed small orders (even regardless of any
__GFP_NOFAIL flags), and slab will allocate order-1 allocations even for
small allocations that could fit in a single page (in order to avoid
excessive fragmentation).
Maybe we should remove this warning entirely, but before making that
judgement, at least limit it to bigger allocations.
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
Staging: octeon-ethernet: Fix race freeing transmit buffers.
Staging: octeon-ethernet: Convert to use net_device_ops.
MIPS: Cavium: Add CPU hotplugging code.
MIPS: SMP: Allow suspend and hibernation if CPU hotplug is available
MIPS: Add arch generic CPU hotplug
DMA: txx9dmac: use dma_unmap_single if DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE set
MIPS: Sibyte: Fix build error if CONFIG_SERIAL_SB1250_DUART is undefined.
MIPS: MIPSsim: Fix build error if MSC01E_INT_BASE is undefined.
MIPS: Hibernation: Remove SMP TLB and cacheflushing code.
MIPS: Build fix - include <linux/smp.h> into all smp_processor_id() users.
MIPS: bug.h Build fix - include <linux/compiler.h>.
The existing code had the following race:
Thread-1 Thread-2
inc/read in_use
inc/read in_use
inc tx_free_list[qos].len
inc tx_free_list[qos].len
The actual in_use value was incremented twice, but thread-1 is going
to free memory based on its stale value, and will free one too many
times. The result is that memory is freed back to the kernel while
its packet is still in the transmit buffer. If the memory is
overwritten before it is transmitted, the hardware will put a valid
checksum on it and send it out (just like it does with good packets).
If by chance the TCP flags are clobbered but not the addresses or
ports, the result can be a broken TCP stream.
The fix is to track the number of freed packets in a single location
(a Fetch-and-Add Unit register). That way it can never get out of sync
with itself.
We try to free up to MAX_SKB_TO_FREE (currently 10) buffers at a time.
If fewer are available we adjust the free count with the difference.
The action of claiming buffers to free is atomic so two threads cannot
claim the same buffers.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Convert the driver to use net_device_ops as it is now mandatory.
Also compensate for the removal of struct sk_buff's dst field.
The changes are mostly mechanical, the content of ethernet-common.c
was moved to ethernet.c and ethernet-common.{c,h} are removed.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The SMP implementation of suspend and hibernate depends on CPU hotplugging.
In the past we didn't have CPU hotplug so suspend and hibernation were not
possible on SMP systems.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Each platform has to add support for CPU hotplugging itself by providing
suitable definitions for the cpu_disable and cpu_die of the smp_ops
methods and setting SYS_SUPPORTS_HOTPLUG_CPU. A platform should only set
SYS_SUPPORTS_HOTPLUG_CPU once all it's smp_ops definitions have the
necessary changes. This patch contains the changes to the dummy smp_ops
definition for uni-processor systems.
Parts of the code contributed by Cavium Inc.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch does not change actual behaviour since dma_unmap_page is just
an alias of dma_unmap_single on MIPS.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We can't perform any flushes on SMP from swsusp_arch_resume because
interrupts are disabled. A cross-CPU flush is unnecessary anyway
because all but the local CPU have already been disabled. A local
flush is not needed either because we didn't change any mappings. So
just delete the code.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Some of the were relying into smp.h being dragged in by another header
which of course is fragile. <asm/cpu-info.h> uses smp_processor_id()
only in macros and including smp.h there leads to an include loop, so
don't change cpu-info.h.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (48 commits)
dm mpath: change to be request based
dm: disable interrupt when taking map_lock
dm: do not set QUEUE_ORDERED_DRAIN if request based
dm: enable request based option
dm: prepare for request based option
dm raid1: add userspace log
dm: calculate queue limits during resume not load
dm log: fix create_log_context to use logical_block_size of log device
dm target:s introduce iterate devices fn
dm table: establish queue limits by copying table limits
dm table: replace struct io_restrictions with struct queue_limits
dm table: validate device logical_block_size
dm table: ensure targets are aligned to logical_block_size
dm ioctl: support cookies for udev
dm: sysfs add suspended attribute
dm table: improve warning message when devices not freed before destruction
dm mpath: add service time load balancer
dm mpath: add queue length load balancer
dm mpath: add start_io and nr_bytes to path selectors
dm snapshot: use barrier when writing exception store
...
* 'audit.b63' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
Fix rule eviction order for AUDIT_DIR
Audit: clean up all op= output to include string quoting
Audit: move audit_get_nd completely into audit_watch
audit: seperate audit inode watches into a subfile
Audit: clean up audit_receive_skb
Audit: cleanup netlink mesg handling
Audit: unify the printk of an skb when auditd not around
Audit: dereferencing krule as if it were an audit_watch
Audit: better estimation of execve record length
Audit: fix audit watch use after free
Posting to the dev-etrax mailing list is only allowed for subscribers,
and the list is more geared toward user applications than kernel
developers.
Change to newly created mailing list for CRIS.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>