This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based upon a report by Daniel Smolik.
We do it too early, which triggers a BUG in
cpufreq_register_notifier().
Signed-off-by: David S. Miller <davem@davemloft.net>
We're calling request_irq() with a IRQs disabled.
No straightforward fix exists because we want to
enable these IRQs and setup state atomically before
getting into the IRQ handler the first time.
What happens now is that we mark the VIRQ to not be
automatically enabled by request_irq(). Then we
make explicit enable_irq() calls when we grab the
LDC channel.
This way we don't need to call request_irq() illegally
under the LDC channel lock any more.
Bump LDC version and release date.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod
One place is just a comment, the other a conditional, unused
inclusion of linux/kmod.h.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This converts all instances of bus_id in the sparc core kernel to use
either dev_set_name(), or dev_name() depending on the need.
This is done in anticipation of removing the bus_id field from struct
driver.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kobjects do not have a limit in name size since a while, so stop
pretending that they do.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
* 'tracing/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (228 commits)
ftrace: build fix for ftraced_suspend
ftrace: separate out the function enabled variable
ftrace: add ftrace_kill_atomic
ftrace: use current CPU for function startup
ftrace: start wakeup tracing after setting function tracer
ftrace: check proper config for preempt type
ftrace: trace schedule
ftrace: define function trace nop
ftrace: move sched_switch enable after markers
ftrace: prevent ftrace modifications while being kprobe'd, v2
fix "ftrace: store mcount address in rec->ip"
mmiotrace broken in linux-next (8-bit writes only)
ftrace: avoid modifying kprobe'd records
ftrace: freeze kprobe'd records
kprobes: enable clean usage of get_kprobe
ftrace: store mcount address in rec->ip
ftrace: build fix with gcc 4.3
namespacecheck: fixes
ftrace: fix "notrace" filtering priority
ftrace: fix printout
...
Today's linux-next build (spac64 allmodconfig) failed like this:
arch/sparc64/kernel/stacktrace.c:50: warning: type defaults to `int' in declaration of `EXPORT_SYMBOL_GPL'
arch/sparc64/kernel/stacktrace.c:50: warning: parameter names (without types) in function declaration
arch/sparc64/kernel/stacktrace.c:50: warning: data definition has no type or storage class
Signed-off-by: Stephen Rothwell <sf@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Also fixes up the sparc code that was assuming this is not a constant.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Andrew Morton reported this against linux-next:
ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Alexander Beregalov reported this build failure:
$ make CROSS_COMPILE=sparc64-unknown-linux-gnu- image modules && sudo
make modules_install
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALL scripts/checksyscalls.sh
CHK include/linux/compile.h
dnsdomainname: Unknown host
CC arch/sparc64/kernel/sparc64_ksyms.o
arch/sparc64/kernel/sparc64_ksyms.c:116: error: '_mcount' undeclared
here (not in a function)
cc1: warnings being treated as errors
arch/sparc64/kernel/sparc64_ksyms.c:116: error: type defaults to 'int'
in declaration of '_mcount'
And bisected it back to:
| commit 395a59d0f8
| Author: Abhishek Sagar <sagar.abhishek@gmail.com>
| Date: Sat Jun 21 23:47:27 2008 +0530
|
| ftrace: store mcount address in rec->ip
the mcount prototype is only available under CONFIG_FTRACE,
extend it to CONFIG_MCOUNT as well.
Reported-and-bisected-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's never used and the comments refer to nonatomic and retry
interchangably. So get rid of it.
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds kernel/smp.c which contains helpers for IPI function calls. In
addition to supporting the existing smp_call_function() in a more efficient
manner, it also adds a more scalable variant called smp_call_function_single()
for calling a given function on a single CPU only.
The core of this is based on the x86-64 patch from Nick Piggin, lots of
changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
contributed lots of fixes and suggestions as well. Also thanks to
Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
and getting rid of the data allocation fallback deadlock.
Acked-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Record the address of the mcount call-site. Currently all archs except sparc64
record the address of the instruction following the mcount call-site. Some
general cleanups are entailed. Storing mcount addresses in rec->ip enables
looking them up in the kprobe hash table later on to check if they're kprobe'd.
Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: davem@davemloft.net
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we fully commit to returning back to kernel mode from
a trap, zero out the regs->magic value to prevent false
positives during stack backtraces.
Signed-off-by: David S. Miller <davem@davemloft.net>
The offset to the pt_regs area was wrong, so we weren't
looking at the right location for the magic cookie.
A trap frame is composed of a "struct sparc_stackf" then
a "struct pt_regs", the code was using "struct reg_window"
instead of "struct sparc_stackf".
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the silly way I set up the initial stack for
new kernel threads, there is a loop at the top of the
stack.
To fix this, properly add another stack frame that is copied
from the parent and terminate it in the child by setting
the frame pointer in that frame to zero.
Signed-off-by: David S. Miller <davem@davemloft.net>
When a cpu really is stuck in the kernel, it can be often
impossible to figure out which cpu is stuck where. The
worst case is when the stuck cpu has interrupts disabled.
Therefore, implement a global cpu state capture that uses
SMP message interrupts which are not disabled by the
normal IRQ enable/disable APIs of the kernel.
As long as we can get a sysrq 'y' to the kernel, we can
get a dump. Even if the console interrupt cpu is wedged,
we can trigger it from userspace using /proc/sysrq-trigger
The output is made compact so that this facility is more
useful on high cpu count systems, which is where this
facility will likely find itself the most useful :)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just like mmap, we need to validate address ranges regardless
of MAP_FIXED.
sparc{,64}_mmap_check()'s flag argument is unused, remove.
Based upon a report and preliminary patch by
Jan Lieskovsky <jlieskov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So, forever, we've had this ptrace_signal_deliver implementation
which tries to handle all of the nasties that can occur when the
debugger looks at a process about to take a signal. It's meant
to address all of these issues inside of the kernel so that the
debugger need not be mindful of such things.
Problem is, this doesn't work.
The idea was that we should do the syscall restart business first, so
that the debugger captures that state. Otherwise, if the debugger for
example saves the child's state, makes the child execute something
else, then restores the saved state, we won't handle the syscall
restart properly because we lose the "we're in a syscall" state.
The code here worked for most cases, but if the debugger actually
passes the signal through to the child unaltered, it's possible that
we would do a syscall restart when we shouldn't have.
In particular this breaks the case of debugging a process under a gdb
which is being debugged by yet another gdb. gdb uses sigsuspend
to wait for SIGCHLD of the inferior, but if gdb itself is being
debugged by a top-level gdb we get a ptrace_stop(). The top-level gdb
does a PTRACE_CONT with SIGCHLD to let the inferior gdb see the
signal. But ptrace_signal_deliver() assumed the debugger would cancel
out the signal and therefore did a syscall restart, because the return
error was ERESTARTNOHAND.
Fix this by simply making ptrace_signal_deliver() a nop, and providing
a way for the debugger to control system call restarting properly:
1) Report a "in syscall" software bit in regs->{tstate,psr}.
It is set early on in trap entry to a system call and is fully
visible to the debugger via ptrace() and regsets.
2) Test this bit right before doing a syscall restart. We have
to do a final recheck right after get_signal_to_deliver() in
case the debugger cleared the bit during ptrace_stop().
3) Clear the bit in trap return so we don't accidently try to set
that bit in the real register.
As a result we also get a ptrace_{is,clear}_syscall() for sparc32 just
like sparc64 has.
M68K has this same exact bug, and is now the only other user of the
ptrace_signal_deliver hook. It needs to be fixed in the same exact
way as sparc.
Signed-off-by: David S. Miller <davem@davemloft.net>
Forever we had a PTRACE_SUNOS_DETACH which was unconditionally
recognized, regardless of the personality of the process.
Unfortunately, this value is what ended up in the GLIBC sys/ptrace.h
header file on sparc as PTRACE_DETACH and PT_DETACH.
So continue to recognize this old value. Luckily, it doesn't conflict
with anything we actually care about.
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be more liberal about the alignment of the buffer given to
us by sigaltstack(). The user should not need to be mindful of all of
the alignment constraints we have for the stack frame.
This mirrors how we handle this situation in clone() as well.
Also, we align the stack even in non-SA_ONSTACK cases so that signals
due to bad stack alignment can be delivered properly. This makes such
errors easier to debug and recover from.
Finally, add the sanity check x86 has to make sure we won't overflow
the signal stack.
This fixes glibc testcases nptl/tst-cancel20.c and
nptl/tst-cancelx20.c
Signed-off-by: David S. Miller <davem@davemloft.net>
We clobber %i1 as well as %i0 for these system calls,
because they give two return values.
Therefore, on error, we have to restore %i1 properly
or else the restart explodes since it uses the wrong
arguments.
This fixes glibc's nptl/tst-eintr1.c testcase.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 2664ef44cf.
Ingo moved around where the softlockup dependency sits
so this change is no longer necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
The change I put into copy_thread() just papered over the real
problem.
When we are looking to see if we should do a syscall restart, when
deliverying a signal, we should only interpret the syscall return
value as an error if the carry condition code(s) are set.
Otherwise it's a success return.
Also, sigreturn paths should do a pt_regs_clear_trap_type().
It turns out that doing a syscall restart when returning from a fork()
does and should happen, from time to time. Even if copy_thread()
returns success, copy_process() can still unwind and signal
-ERESTARTNOINTR in the parent.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just creates confusion, errors, and bugs.
For one thing, this can cause dup sysfs or procfs nodes to get
created:
[ 1.198015] proc_dir_entry '00.0' already registered
[ 1.198036] Call Trace:
[ 1.198052] [00000000004f2534] create_proc_entry+0x7c/0x98
[ 1.198092] [00000000005719e4] pci_proc_attach_device+0xa4/0xd4
[ 1.198126] [00000000007d991c] pci_proc_init+0x64/0x88
[ 1.198158] [00000000007c62a4] kernel_init+0x190/0x330
[ 1.198183] [0000000000426cf8] kernel_thread+0x38/0x48
[ 1.198210] [00000000006a0d90] rest_init+0x18/0x5c
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dulicated include file <asm/timer.h> in arch/sparc64/kernel/smp.c.
Signed-off-by: Huang Weiyi <hwy@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current limitations:
1) On SMP single stepping has some fundamental issues,
shared with other sw single-step architectures such
as mips and arm.
2) On 32-bit sparc we don't support SMP kgdb yet. That
requires some reworking of the IPI mechanisms and
infrastructure on that platform.
Signed-off-by: David S. Miller <davem@davemloft.net>
entry.S was a hodge-podge of several totally unrelated
sets of assembler routines, ranging from FPU trap handlers
to hypervisor call functions.
Split it up into topic-sized pieces.
Signed-off-by: David S. Miller <davem@davemloft.net>