"struct proto" currently uses an array stats[NR_CPUS] to track change on
'inuse' sockets per protocol.
If NR_CPUS is big, this means we use a big memory area for this.
Moreover, all this memory area is located on a single node on NUMA
machines, increasing memory pressure on the boot node.
In this patch, I tried to :
- Keep a fast !CONFIG_SMP implementation
- Keep a fast CONFIG_SMP implementation for often used protocols
(tcp,udp,raw,...)
- Introduce a NUMA efficient implementation
Some helper macros are defined in include/net/sock.h
These macros take into account CONFIG_SMP
If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
REF_PROTO_INUSE
macros, it will automatically use a default implementation, using a
dynamically allocated percpu zone.
This default implementation will be NUMA efficient, but might use 32/64
bytes per possible cpu
because of current alloc_percpu() implementation.
However it still should be better than previous implementation based on
stats[NR_CPUS] field.
When a "struct proto" is changed to use the new macros, we use a single
static "int" percpu variable,
lowering the memory and cpu costs, still preserving NUMA efficiency.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes made on 18-sep to fix skb handling in the pppol2tp driver
broke the transmit and receive paths. Users are only running into this
now because distros are now using 2.6.23 and I must have messed up
when I tested the change.
For receive, we now do our own calculation of how much to pull from
the skb (variable length L2TP header) rather than using
skb_transport_offset(). Also, if the skb isn't a data packet, it must
be passed back to UDP with skb->data pointing to the UDP header.
For transmit, make sure skb->sk is set up because ip_queue_xmit()
needs it.
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
which looks completely bad. Similar ifdefs inside the functions
looks a bit better, but they are also not recommended to be used.
Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the checksum verification is postponed till user calls recv or poll,
the inrementation of Udp6InErrors counter should be also postponed.
Currently, it is postponed in non-blocking operation case. However it
should be postponed in all case like the IPv4 code.
Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip6_push_pending_frames and ip6_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~100 bytes from the .text
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix --arp-gratuitous matching dependence on --arp-ip-{src,dst}
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Lutz Preßler <Lutz.Pressler@SerNet.DE>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code is using knowledge that nf_sockopt_ops::list list_head is first
field in structure by using casts. Switch to list_for_each_entry()
itetators while I am at it.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Paul McKenney, the rcu_dereference calls in the init path
of NAT modules are unneeded, remove them.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sort matches and targets in the NF makefiles.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sort matches and targets in the Kbuild file.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transfer all my copyright over to our company.
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I plan to kill ->get_info which means killing proc_net_create().
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have nothing to do here, but there are continually drivers that
fail to build without it. Stub it in.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Follows the changes of some of the other RTC drivers. If the tm
value is bogus, just zero it out. Adds some sanity for RTC_RD_TIME.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
After the fresh boot:
ionice -c3 -p $$
echo cfq >> /sys/block/XXX/queue/scheduler
dd if=/dev/XXX of=/dev/null bs=512 count=1
Now dd hangs in D state and the queue is completely stalled for approximately
INITIAL_JIFFIES + CFQ_IDLE_GRACE jiffies. This is because cfq_init_queue()
forgets to initialize cfq_data->last_end_request.
(I guess this patch is not complete, overflow is still possible)
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Spotted by Nick <gentuu@gmail.com>, hopefully can explain the second trace in
http://bugzilla.kernel.org/show_bug.cgi?id=9180.
If ->async_idle_cfqq != NULL cfq_put_async_queues() puts it IOPRIO_BE_NR times
in a loop. Fix this.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently if rtc_device_register() fails we have an IS_ERR() on
the wrong pointer, which causes this to always be skipped. Fix
this up to actually check the right pointer. The return value
was always correct, even though the check was wrong.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Access size to LED is not added on Solution Engine series.
LED doesn't work. Fixed this problem.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When SH7710 and SH7712 are used, SCI_NPORTS redefined.
Remove it.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Now that copy_to_user_page()/copy_from_user_page() are wired up, we
can drop the old __copy_xxx() implementations. Now that the page
colouring scheme has changed via kmap_coherent(), we can avoid the
flush in these specific helpers.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This moves copy_{to,from}_user_page() out-of-line on SH-4 and
converts for the kmap_coherent() API. Based on the MIPS
implementation.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
With the kmap_coherent() API in place, this is trivial to implement,
and lets us avoid the cache flush in certain cases.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The ST40 stuff in-tree hasn't built for some time, and hasn't been
updated for over 3 years. ST maintains their own out-of-tree changes
and rebases occasionally, and that's ultimately where all of the ST40
users go anyways.
In order for the ST40 code to be brought up to date most of the stuff
removed in this changeset would have to be rewritten anyways, so there's
very little benefit in keeping the remnants around either.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently these are only being exported for CONFIG_CPU_SH4. This
invariably breaks when building for an SH-3 that includes multiple
targets in multilib.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This was all reworked some time ago, the old debug_enter was ripped
out with everything going through a debug trap jump table instead.
Kill off the debug_enter target and reference kgdb_handle_exception
directly.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
SH-4A parts generally don't have any use for this, and it requires an
alternate implementation anyways. Leave this as an SH-4 only option,
as that's the only place this has been needed in the past.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
When uImage is made by using 'make uImage', zImage is used.
If zImage is used, the compression method need not be set.
However, it is set for "gzip" for a compression method.
I corrected to set "none".
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The common linux/ptrace.h already defines PTRACE_O_TRACESYSGOOD so there is no
need to have arches do it. This also keeps glibc-2.7 from breaking since it
has an enum for the PTRACE_O_* flags.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Since patch "fw-sbp2: use an own workqueue (fix system responsiveness)"
increased parallelism between fw-sbp2 and fw-core, it was possible that
fw-sbp2 didn't release the SCSI device when the FireWire device was
disconnected.
This happened if sbp2_update() ran during sbp2_login(), because a bus
reset occurred during sbp2_login(). The sbp2_login() work would [try
to] reschedule itself because it failed due to the bus reset, and it
would _not_ drop its reference on the target. However, sbp2_update()
would schedule sbp2_login() too before sbp2_login() rescheduled itself
and hence sbp2_update() would take an additional reference. And then
we would have one reference too many.
The fix is to _always_ drop the reference when leaving the sbp2_login()
work. If the sbp2_login() work reschedules itself, it takes a
reference, but only if it wasn't already rescheduled by sbp2_update().
Ditto in the sbp2_reconnect() work.
The resulting code is actually simpler than before: We _always_ take
a reference when successfully scheduling work. And we _always_ drop
a reference when leaving a workqueue job. No exceptions.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Clean up /proc/interrupts output on the system that has 10 or more
CPUs.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When the CPE handler encounters too many CPEs (such as a solid single
bit memory error), it sets up a polling timer and disables the CPE
interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() calls chip->disable() to provide
a chipset specifiec interface for disabling the interrupt. This patch
adds the Altix specific support to disable and re-enable the CPE interrupt.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
No need to print "McKinley Errata 9 workaround not needed; disabling it"
on every non-McKinley Itanium, which at this point is almost all of them.
Signed-off-by: Russ Anderson (rja@sgi.com)
Signed-off-by: Tony Luck <tony.luck@intel.com>
If you build the kernel `in-place' then do a git update, git
complains about arch/ia64/kernel/gate.lds being modified and
untracked.
Add that (generated) file to a .gitignore file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
If another node unlinks the destination while ocfs2_rename() is waiting on a
cluster lock, ocfs2_rename() simply logs an error and continues. This causes
a crash because the renaming node is now trying to delete a non-existent
inode. The correct solution is to return -ENOENT.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
We should subtract start of our IO from PAGE_CACHE_SIZE to get the right
length of the write we want to perform.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
On file systems which don't support sparse files, Ocfs2_map_page_blocks()
was reading blocks on appending writes. This caused write performance to
suffer dramatically. Fix this by detecting an appending write on a nonsparse
fs and skipping the read.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>