POSIX states that poll() shall fail with EINVAL if nfds > OPEN_MAX. In
this context, POSIX is referring to sysconf(OPEN_MAX), which is the value
of current->signal->rlim[RLIMIT_NOFILE].rlim_cur in the linux kernel, not
the compile-time constant which happens to also be named OPEN_MAX. In the
current code, an application may poll up to max_fdset file descriptors,
even if this exceeds RLIMIT_NOFILE. The current code also breaks
applications which poll more than max_fdset descriptors, which worked circa
2.4.18 when the check was against NR_OPEN, which is 1024*1024. This patch
enforces the limit precisely as POSIX defines, even if RLIMIT_NOFILE has
been changed at run time with ulimit -n.
To elaborate on the rationale for this, there are three cases:
1) RLIMIT_NOFILE is at the default value of 1024
In this (default) case, the patch changes nothing. Calls with nfds > 1024
fail with EINVAL both before and after the patch, and calls with nfds <=
1024 pass the check both before and after the patch, since 1024 is the
initial value of max_fdset.
2) RLIMIT_NOFILE has been raised above the default
In this case, poll() becomes more permissive, allowing polling up to
RLIMIT_NOFILE file descriptors even if less than 1024 have been opened.
The patch won't introduce new errors here. If an application somehow
depends on poll() failing when it polls with duplicate or invalid file
descriptors, it's already broken, since this is already allowed below 1024,
and will also work above 1024 if enough file descriptors have been open at
some point to cause max_fdset to have been increased above nfds.
3) RLIMIT_NOFILE has been lowered below the default
In this case, the system administrator or the user has gone out of their
way to protect the system from inefficient (or malicious) applications
wasting kernel memory. The current code allows polling up to 1024 file
descriptors even if RLIMIT_NOFILE is much lower, which is not what the user
or administrator intended. Well-written applications which only poll
valid, unique file descriptors will never notice the difference, because
they'll hit the limit on open() first. If an application gets broken
because of the patch in this case, then it was already poorly/maliciously
designed, and allowing it to work in the past was a violation of POSIX and
a DoS risk on low-resource systems.
With this patch, poll() will permit exactly what POSIX suggests, no more,
no less, and for any run-time value set with ulimit -n, not just 256 or
1024. There are existing apps which which poll a large number of file
descriptors, some of which may be invalid, and if those numbers stradle
1024, they currently fail with or without the patch in -mm, though they
worked fine under 2.4.18.
Signed-off-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For len equal to 4, we never call sppp_lcp_conf_parse_options(),
therefore rmagic does not get initialized.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've been using systemtap for some debugging and I noticed that it can't
probe a lot of modules. Turns out it's kind of silly, the sections section
of /sys/module is limited to 32byte filenames and many of the actual
sections are a a bit longer than that.
[akpm@osdl.org: rewrite to use dymanic allocation]
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I just got a bounce telling me my contributions aren't welcome.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
People search maintainers for NBD and then decide it is not
maintained.
(akpm: ditto LVM. And other things, but I forget what they were)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up SubmittingPatches a bit.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Constify two structs.
Correct some typos.
Compile-tested and run-tested (module inserted) on 2.6.18-rc4-mm3.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch limits the messages when ldisc open faulures happen. It happens
under memory pressure.
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
akpm draws my attention to the fact that sysctl(VM_PAGE_CLUSTER) might
conceivably change page_cluster to 0 while valid_swaphandles() is in the
middle of using it, leading to an embarrassingly long loop: take a local
snapshot of page_cluster and work with that.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ratelimit_pages in page-writeback.c is recalculated (in set_ratelimit())
every time a CPU is hot-added/removed. But this value is not recalculated
when new pages are hot-added.
This patch fixes that problem by calling set_ratelimit() when new pages
are hot-added.
[akpm@osdl.org: cleanups]
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
page-writeback.c has a static local variable "total_pages", which is the
total number of pages in the system.
There is a global variable "vm_total_pages", which is the total number of
pages the VM controls.
Both are assigned from the return value of nr_free_pagecache_pages().
This patch removes the local variable and uses the global variable in that
place.
One more issue with the local static variable "total_pages" is that it is
not updated when new pages are hot-added. Since vm_total_pages is updated
when new pages are hot-added, this patch fixes that problem too.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change corrects the logic on the preprocessor conditionals that
include support for ISA port i/o (/dev/ioports) into the mem character
driver.
This fixes the following error when building for powerpc platforms with
CONFIG_PCI=n.
drivers/built-in.o: undefined reference to `pci_io_base'
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Linas Vepstas <lins@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I think this is a time to step down from my SUPERH architecture
maintainerships. The major development issues for this port seem to shift
on the hardwares I can't access and I have no recent activity on kernel. I
shouldn't qualify as a maintainer of SUPERH port now and there is no
problem because Paul is actively maintaining it. The attached patch drops
my name, address and web URL from MAINTAINERS file.
Signed-off-by: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace current->fs by fs helper variable to reduce some indirection
overhead and (at least at the moment, before the current_thread_info() %gs
PDA improvement is available) get rid of more costly current references.
Reduces fs/namei.o from 37786 to 37082 Bytes (704 Bytes saved).
[akpm@osdl.org: cleanup]
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch (as776) adds a new chapter to Documentation/CodingStyle,
explaining the circumstances under which a function should return 0 for
failure and non-zero for success as opposed to a negative error code for
failure and 0 for success.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Un-inlining rwsem_down_failed_common() (two callsites) reduced lib/rwsem.o
on my Athlon, gcc 4.1.2 from 5935 to 5480 Bytes (455 Bytes saved).
I thus guess that reduced icache footprint (and better function caching) is
worth more than any function call overhead.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds Jim Lewis to the MAINTAINERS file for the Spidernet
network driver.
Signed-off-by: James K Lewis <jklewis@us.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Forward port of the patch by Solar and ported by Julio.
Compiles, boots, and passes my looptorturetest.sh.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Julio Auto <mindvortex@gmail.com>
Cc: Solar Designer <solar@openwall.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This appears to be a verbatim copy-n-paste of the GPL.
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cpuset code handling hot unplug of CPUs or Memory Nodes was incorrect -
it could remove a CPU or Node from the top cpuset, while leaving it still
in some child cpusets.
One basic rule of cpusets is that each cpusets cpus and mems are subsets of
its parents. The cpuset hot unplug code violated this rule.
So the cpuset hotunplug handler must walk down the tree, removing any
removed CPU or Node from all cpusets.
However, it is not allowed to make a cpusets cpus or mems become empty.
They can only transition from empty to non-empty, not back.
So if the last CPU or Node would be removed from a cpuset by the above
walk, we scan back up the cpuset hierarchy, finding the nearest ancestor
that still has something online, and copy its CPU or Memory placement.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Nathan Lynch <ntl@pobox.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the list of memory nodes allowed to tasks in the top (root) nodeset
to dynamically track what cpus are online, using a call to a cpuset hook
from the memory hotplug code. Make this top cpus file read-only.
On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.
If that system does support memory hotplug, then these tasks cannot make
use of memory nodes that are added after system boot, because the memory
nodes are not allowed in the top cpuset. This is a surprising regression
over earlier kernels that didn't have cpusets enabled.
One key motivation for this change is to remain consistent with the
behaviour for the top_cpuset's 'cpus', which is also read-only, and which
automatically tracks the cpu_online_map.
This change also has the minor benefit that it fixes a long standing,
little noticed, minor bug in cpusets. The cpuset performance tweak to
short circuit the cpuset_zone_allowed() check on systems with just a single
cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
changing the 'mems' of the top_cpuset had no affect, even though the change
(the write system call) appeared to succeed. With the following change,
that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
refuses to be changed via user space writes. Thus no one should be mislead
into thinking they've changed the top_cpusets's 'mems' when in affect they
haven't.
In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'mems' file in the top (root) cpuset, making it read
only, and making it automatically track the value of node_online_map. Thus
tasks in the top cpuset will have automatic use of hot plugged memory nodes
allowed by their cpuset.
[akpm@osdl.org: build fix]
[bunk@stusta.de: build fix]
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A previous patch to allow an exiting task to OOM kill itself (and thereby
avoid a little deadlock) introduced a problem. We don't want the
PF_EXITING task, even if it is 'current', to access mem reserves if there
is already a TIF_MEMDIE process in the system sucking up reserves.
Also make the commenting a little bit clearer, and note that our current
scheme of effectively single threading the OOM killer is not itself
perfect.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- It is not possible to have task->mm == &init_mm.
- task_lock() buys nothing for 'if (!p->mm)' check.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No logic changes, but imho easier to read.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The only one usage of TASK_DEAD outside of last schedule path,
select_bad_process:
for_each_task(p) {
if (!p->mm)
continue;
...
if (p->state == TASK_DEAD)
continue;
...
TASK_DEAD state is set at the end of do_exit(), this means that p->mm
was already set == NULL by exit_mm(), so this task was already rejected
by 'if (!p->mm)' above.
Note also that the caller holds tasklist_lock, this means that p can't
pass exit_notify() and then set TASK_DEAD when p->mm != NULL.
Also, remove open-coded is_init().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I am not sure about this patch, I am asking Ingo to take a decision.
task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
live only in ->exit_state as documented in sched.h.
Note that this state is not visible to user-space, get_task_state() masks off
unsuitable states.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD
is in fact a "random" value, we can use any bit except normal TASK_XXX values.
It is better to set this state in do_exit() along with PF_DEAD flag and remove
that check in schedule().
We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
can not change task's ->state: the 'state' argument of try_to_wake_up() can't
have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of
->state == TASK_RUNNING it will do nothing.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce the disable_irq_nosync_lockdep_irqsave() and
enable_irq_lockdep_irqrestore() APIs. These are needed for NE2000; basically
NE2000 calls disable_irq and enable_irq as locking against the IRQ handler,
but both in cases where interrupts are on and off. This means that lockdep
needs to track the old state of the virtual irq flags on disable_irq, and
restore these at enable_irq time.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If register_filesystem() fails mux workqueue must be killed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@lanl.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It always returns 0, so relying on it is useless. The only caller isn't
checking return value. In general, un-, de-, -free functions should return
void.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If register_filesystem() fails, vxfs_inode cache must be destroyed.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently, __acquire and __release take a lock expression, but __cond_lock
takes only a condition, not the lock acquired if the expression evaluates
to true. Change __cond_lock to accept a lock expression, and change all
the callers to pass in a lock expression.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix "quiet" parameter doc. No trailing '=' sign, no value after it. And
it disables "most" kernel messages, not all of them.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
At the beginning of the routine, "copied" is set to 0, but it is no good
because in lines 805 and 812 it is set to other values. Finally, the
routine returns as if it copied 12 (=ENOMEM) bytes less than it actually
did.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case below we are locking the whole disk not a partition. This
change simply brings the code in line with the piece above where when we
are the 'first' opener, and we are a partition.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
spin_trylock_irq and spin_trylock_irqsave use _spin_trylock, which does not
use the __cond_lock wrapper annotation and thus does not affect the lock
context; change them to use spin_trylock instead, which does use
__cond_lock.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The lock annotations used on spinlocks and rwlocks currently use
__{acquires,releases}(spinlock_t) and __{acquires,releases}(rwlock_t),
respectively. This loses the information of which lock actually got
acquired or released, and assumes a different type for the parameter of
__acquires and __releases than the rest of the kernel. While the current
implementations of __acquires and __releases throw away their argument,
this will not always remain the case. Change this to use the lock
parameter instead, to preserve this information and increase consistency in
usage of __acquires and __releases.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If your driver implements "break on" and "break off" this ensures you won't
get multiple overlapping requests or requests in parallel. If your driver
has its own break handling then its still your problem as the driver
author.
Break is also now serialized against writes from user space properly but no
new guarantees are made driver level about writes from the line discipline
itself (eg flow control or echo)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[akpm@osdl.org: build fix]
[akpm@osdl.org: warning fix]
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adds a missing exit, if the file that should be parsed couldn't be opened.
Without it crashes with a segfault, cause the filedescriptor is accessed
even if the file could not be opened.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It is ok to do find_task_by_pid() + get_task_struct() under
rcu_read_lock(), we cand drop tasklist_lock.
Note that testing of ->exit_state is racy with or without tasklist anyway.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
During testing I've found that the mount pending flag can be left set at
exit from autofs4_lookup after a failed mount request. This shouldn't be
allowed to happen and causes incorrect error returns.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The check for an empty directory in the autofs4_follow_link method fails
occassionally due to old dentrys. We had the same problem
autofs4_revalidate ages ago. I thought we wouldn't need this in
autofs4_follow_link, silly me.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>