Commit Graph

157377 Commits

Author SHA1 Message Date
Linus Torvalds
b71b7dc09a Merge branch 'fix/oxygen' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/oxygen' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  sound: oxygen: handle cards with missing EEPROM
  sound: oxygen: fix MCLK rate for 192 kHz playback
2009-09-05 14:55:30 -07:00
Linus Torvalds
59430c2f43 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  tc: Fix unitialized kernel memory leak
  pkt_sched: Revert tasklet_hrtimer changes.
  net: sk_free() should be allowed right after sk_alloc()
  gianfar: gfar_remove needs to call unregister_netdev()
  ipw2200: firmware DMA loading rework
2009-09-05 14:52:41 -07:00
Linus Torvalds
e9ee3a54a1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Fix skcipher_dequeue_givcrypt NULL test
2009-09-05 14:51:45 -07:00
Linus Torvalds
3bb314f01c Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Re-enable cpufreq suspend and resume code
2009-09-05 14:51:24 -07:00
Linus Torvalds
535e0c1726 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] fix csum_ipv6_magic()
  [IA64] Fix warning in dma-mapping.c
2009-09-05 14:50:53 -07:00
Linus Torvalds
0edfa2b1b5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: actually enable the swapext compat handler
2009-09-05 14:25:14 -07:00
Linus Torvalds
5a09adf130 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key
2009-09-05 14:24:33 -07:00
Linus Torvalds
931f70350e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: don't assume existence of cpu0
2009-09-05 14:22:00 -07:00
Linus Torvalds
e305fc5ecd Merge branch 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
2009-09-05 13:57:53 -07:00
Linus Torvalds
154f807e55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm snapshot: fix on disk chunk size validation
  dm exception store: split set_chunk_size
  dm snapshot: fix header corruption race on invalidation
  dm snapshot: refactor zero_disk_area to use chunk_io
  dm log: userspace add luid to distinguish between concurrent log instances
  dm raid1: do not allow log_failure variable to unset after being set
  dm log: remove incorrect field from userspace table output
  dm log: fix userspace status output
  dm stripe: expose correct io hints
  dm table: add more context to terse warning messages
  dm table: fix queue_limit checking device iterator
  dm snapshot: implement iterate devices
  dm multipath: fix oops when request based io fails when no paths
2009-09-05 13:51:07 -07:00
Linus Torvalds
9b6a3df372 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI SR-IOV: correct broken resource alignment calculations
2009-09-05 13:50:46 -07:00
Linus Torvalds
d3acd16cda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix bootup with mcount in some configs.
  sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.
2009-09-05 13:49:06 -07:00
Linus Torvalds
93697a3cab Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_counter/powerpc: Fix cache event codes for POWER7
  perf_counter: Fix /0 bug in swcounters
  perf_counters: Increase paranoia level
2009-09-05 13:48:37 -07:00
Linus Torvalds
6399534472 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: atkbd - add Compaq Presario R4000-series repeat quirk
  Input: i8042 - add Acer Aspire 5536 to the nomux list
2009-09-05 13:41:29 -07:00
Nicolas Pitre
9de6886ec6 ext2: fix unbalanced kmap()/kunmap()
In ext2_rename(), dir_page is acquired through ext2_dotdot().  It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever.  Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 13:41:08 -07:00
Linus Torvalds
ac7ac9f2b9 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: ocfs2_write_begin_nolock() should handle len=0
  ocfs2: invalidate dentry if its dentry_lock isn't initialized.
2009-09-05 13:38:37 -07:00
Linus Torvalds
ac89a9174d pty: don't limit the writes to 'pty_space()' inside 'pty_write()'
The whole write-room thing is something that is up to the _caller_ to
worry about, not the pty layer itself.  The total buffer space will
still be limited by the buffering routines themselves, so there is no
advantage or need in having pty_write() artificially limit the size
somehow.

And what happened was that the caller (the n_tty line discipline, in
this case) may have verified that there is room for 2 bytes to be
written (for NL -> CRNL expansion), and it used to then do those writes
as two single-byte writes.  And if the first byte written (CR) then
caused a new tty buffer to be allocated, pty_space() may have returned
zero when trying to write the second byte (LF), and then incorrectly
failed the write - leading to a lost newline character.

This should finally fix

	http://bugzilla.kernel.org/show_bug.cgi?id=14015

Reported-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 13:27:10 -07:00
Linus Torvalds
37f81fa1f6 n_tty: do O_ONLCR translation as a single write
When translating CR to CRNL in the n_tty line discipline, we did it as
two tty_put_char() calls.  Which works, but is stupid, and has caused
problems before too with bad interactions with the write_room() logic.
The generic USB serial driver had that problem, for example.

Now the pty layer had similar issues after being moved to the generic
tty buffering code (in commit d945cb9cce:
"pty: Rework the pty layer to use the normal buffering logic").

So stop doing the silly separate two writes, and do it as a single write
instead.  That's what the n_tty layer already does for the space
expansion of tabs (XTABS), and it means that we'll now always have just
a single write for the CRNL to match the single 'tty_write_room()' test,
which hopefully means that the next time somebody screws up buffering,
it won't cause weeks of debugging.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 12:46:07 -07:00
Oleg Nesterov
a2a8474c3f exec: do not sleep in TASK_TRACED under ->cred_guard_mutex
Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since

	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
	04b836cbf19e885f8366bccb2e4b0474346c02d

commit in 2.6.31.

But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.

The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.

With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:

	- introduce prepare_bprm_creds() helper, it locks the mutex
	  and calls prepare_exec_creds() to initialize bprm->cred.

	- install_exec_creds() drops the mutex after commit_creds(),
	  and thus before tracehook_report_exec()->ptrace_stop().

	  or, if exec fails,

	  free_bprm() drops this mutex when bprm->cred != NULL which
	  indicates install_exec_creds() was not called.

Reported-by: Tom Horsley <tom.horsley@att.net>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 11:30:42 -07:00
Mel Gorman
dd5d241ea9 page-allocator: always change pageblock ownership when anti-fragmentation is disabled
On low-memory systems, anti-fragmentation gets disabled as fragmentation
cannot be avoided on a sufficiently large boundary to be worthwhile.  Once
disabled, there is a period of time when all the pageblocks are marked
MOVABLE and the expectation is that they get marked UNMOVABLE at each call
to __rmqueue_fallback().

However, when MAX_ORDER is large the pageblocks do not change ownership
because the normal criteria are not met.  This has the effect of
prematurely breaking up too many large contiguous blocks.  This is most
serious on NOMMU systems which depend on high-order allocations to boot.
This patch causes pageblocks to change ownership on every fallback when
anti-fragmentation is disabled.  This prevents the large blocks being
prematurely broken up.

This is a fix to commit 49255c619f [page
allocator: move check for disabled anti-fragmentation out of fastpath] and
the problem affects 2.6.31-rc8.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Tested-by: Paul Mundt <lethal@linux-sh.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 11:30:42 -07:00
David Howells
a190887b58 nommu: fix error handling in do_mmap_pgoff()
Fix the error handling in do_mmap_pgoff().  If do_mmap_shared_file() or
do_mmap_private() fail, we jump to the error_put_region label at which
point we cann __put_nommu_region() on the region - but we haven't yet
added the region to the tree, and so __put_nommu_region() may BUG
because the region tree is empty or it may corrupt the region tree.

To get around this, we can afford to add the region to the region tree
before calling do_mmap_shared_file() or do_mmap_private() as we keep
nommu_region_sem write-locked, so no-one can race with us by seeing a
transient region.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 11:30:42 -07:00
Oleg Nesterov
4e49627b9b workqueues: introduce __cancel_delayed_work()
cancel_delayed_work() has to use del_timer_sync() to guarantee the timer
function is not running after return.  But most users doesn't actually
need this, and del_timer_sync() has problems: it is not useable from
interrupt, and it depends on every lock which could be taken from irq.

Introduce __cancel_delayed_work() which calls del_timer() instead.

The immediate reason for this patch is
http://bugzilla.kernel.org/show_bug.cgi?id=13757
but hopefully this helper makes sense anyway.

As for 13757 bug, actually we need requeue_delayed_work(), but its
semantics are not yet clear.

Merge this patch early to resolves cross-tree interdependencies between
input and infiniband.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05 11:30:42 -07:00
Stefan Richter
baed6b82d9 firewire: sbp2: fix freeing of unallocated memory
If a target writes invalid status (typically status of a command that
already timed out), firewire-sbp2 attempts to put away an ORB that
doesn't exist.  https://bugzilla.redhat.com/show_bug.cgi?id=519772

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05 15:59:34 +02:00
Stefan Richter
4fe0badd58 firewire: ohci: fix Ricoh R5C832, video reception
In dual-buffer DMA mode, no video frames are ever received from R5C832
by libdc1394.  Fallback to packet-per-buffer DMA works reliably.
http://thread.gmane.org/gmane.linux.kernel.firewire.devel/13393/focus=13476

Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05 15:59:34 +02:00
Stefan Richter
fc383796a8 firewire: ohci: fix Agere FW643 and multiple cameras
An Agere FW643 OHCI 1.1 card works fine for video reception from one
camera but fails early if receiving from two cameras.  After a short
while, no IR IRQ events occur and the context control register does not
react anymore.  This happens regardless whether both IR DMA contexts are
dual-buffer or one is dual-buffer and the other packet-per-buffer.

This can be worked around by disabling dual buffer DMA mode entirely.
http://sourceforge.net/mailarchive/message.php?msg_name=4A7C0594.2020208%40gmail.com
(Reported by Samuel Audet.)

In another report (by Jonathan Cameron), an FW643 works OK with two
cameras in dual buffer mode.  Whether this is due to different chip
revisions or different usage patterns (different video formats) is not
yet clear.  However, as far as the current capabilities of
firewire-core's isochronous I/O interface are concerned, simply
switching off dual-buffer on non-working and working FW643s alike is not
a problem in practice.  We only need to revisit this issue if we are
going to enhance the interface, e.g. so that applications can explicitly
choose modes.

Reported-by: Samuel Audet <samuel.audet@gmail.com>
Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05 15:59:34 +02:00
Stefan Richter
1821bc19d5 firewire: core: fix crash in iso resource management
This fixes a regression due to post 2.6.30 commit "firewire: core: do
not DMA-map stack addresses" 6fdc037094.

As David Moore noted, a previously correct sizeof() expression became
wrong since the commit changed its argument from an array to a pointer.
This resulted in an oops in ohci_cancel_packet in the shared workqueue
thread's context when an isochronous resource was to be freed.

Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05 15:59:34 +02:00
Sunil Mushran
8379e7c46c ocfs2: ocfs2_write_begin_nolock() should handle len=0
Bug introduced by mainline commit e7432675f8
The bug causes ocfs2_write_begin_nolock() to oops when len=0.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04 14:28:31 -07:00
Zhenyu Wang
5e17ee74b5 drm/i915: do dynamic clock freq control only in kernel modesetting
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:46 -07:00
Keith Packard
57cdaf90f5 drm/I915: Use the CRT DDC to get the EDID for DVI-connector on Mac
mac Mini's have a single DDC line on the DVI connector, shared between the
analog link and the digital link. So, if DDC isn't detected on GPIOE (the
usual SDVO DDC link), try GPIOA (the usual VGA DDC link) when there isn't a
VGA monitor connected.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:45 -07:00
Zhenyu Wang
553bd149bb drm/i915: fix tiling on IGDNG
It seems that on IGDNG the same swizzling setup always applys.
And front buffer tiling needs to set address swizzle in display
arb control too.

Fix plane tricle feed setting in v1 which should be disable bit,
and always setup address swizzle to let hardware care for buffer
tiling in all cases.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:44 -07:00
Daniel Vetter
65655d4ab7 drm/i915: modeset: always set intel_crtc->dpms_mode by moving the assignment up.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:43 -07:00
Daniel Vetter
c05422d52e drm/i915: remove open-coded drm_mode_object_find
And clean up a small whitespace goof-up in the same function, while
I was looking at it.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:41 -07:00
Eric Anholt
67cf781bea drm/i915: Make the downclocking debug code be under DRM_DEBUG not DRM_ERROR.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-04 13:05:40 -07:00
Kyle McMartin
d6073d775c drm/i915: i915_modeset is signed
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:39 -07:00
Jesse Barnes
652c393a33 drm/i915: add dynamic clock frequency control
There are several sources of unnecessary power consumption on Intel
graphics systems. The first is the LVDS clock. TFTs don't suffer from
persistence issues like CRTs, and so we can reduce the LVDS refresh rate
when the screen is idle. It will be automatically upclocked when
userspace triggers graphical activity. Beyond that, we can enable memory
self refresh. This allows the memory to go into a lower power state when
the graphics are idle. Finally, we can drop some clocks on the gpu
itself. All of these things can be reenabled between frames when GPU
activity is triggered, and so there should be no user visible graphical
changes.

Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:38 -07:00
Shaohua Li
0430296558 drm/i915: Support IGD EOS
In the event that any one of the DAC analog outputs (R,G,B) were driven
at full-scale (white video) or some analog level close to full-scale
voltage, and if the video cable were then disconnected, the analog video
voltage level would exceed the maximum electrical overstress limit of the
native (thin-oxide) transistors thus causing a long-term reliability concern.
The electrical overstress condition occurs in this particular case.

This patch address the IGD EOS (electrical overstress condition) issue.
When the EOS interrupt occurs, OS should disable DAC and then disable EOS,
then the normal hotplug operation follows.

TODO: it appears the normal unplug interrupt is missed as reported by Li Peng,
need more checks here.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:30 -07:00
Zhao Yakui
ce6feabd1b drm/i915: Enable PAL and SECAM format and add the propery for SDVO-TV
Currently SDVO TV only support NTSC-M format. In this patch
we introduce PAL and SECAM formats available and create seting-format
property at init time. When user dynamically chose preferred
format by xrandr command, it will refine all modelines
provided by SDVO device, then instruct SDVO device to execute.
At the same time the property is added for SDVO-TV so that the SDVO-TV mode can be changed
by using xrandr.

https://bugs.freedesktop.org/show_bug.cgi?id=22891

Signed-off-by: Ma Ling <ling.ma@intel.com>
review-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:11 -07:00
Ma Ling
213c2e6431 drm/i915: select TV format according to connector type
For integrated TV there are 3 connector types: S-VIDEO, Composite and
Component(YprPb). Those tv formats whose component flag is true should
be assigned to Component connector, others are for S-VIDEO and Composite.
The patch intends to find appropriate tv format for each connector.
In such case it will return the correct modeline to user space. Otherwise
it will return the incorrect modeline when S-video/composite is connected.

Signed-off-by: Ma Ling <ling.ma@intel.com>
reviewed-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:10 -07:00
Zhenyu Wang
5f6a169598 drm/i915: update debugfs interrupt info on IGDNG
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:09 -07:00
Ben Gamari
9e3a6d155e drm/i915: Add i915 register dumping debugfs file
Add a debugfs file to dump the entire register range. Here we
assume that reading write-only/reserved registers won't make the chip
angry. Seems to hold true, thankfully.

Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:09 -07:00
Ben Gamari
27c202ad7f drm/i915: Move i915_gem_debugfs.c to i915_debugfs.c
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
[anholt: hand-applied for conflicts]
Signed-off-by: Eric Anholt <eric@anholt.net>
2009-09-04 13:05:08 -07:00
Mikulas Patocka
ae0b7448e9 dm snapshot: fix on disk chunk size validation
Fix some problems seen in the chunk size processing when activating a
pre-existing snapshot.

For a new snapshot, the chunk size can either be supplied by the creator
or a default value can be used.  For an existing snapshot, the
chunk size in the snapshot header on disk should always be used.

If someone attempts to load an existing snapshot and has the 'default
chunk size' option set, the kernel uses its default value even when it
is incorrect for the snapshot being loaded.  This patch ensures the
correct on-disk value is always used.

Secondly, when the code does use the chunk size stored on the disk it is
prudent to revalidate it, so the code can exit cleanly if it got
corrupted as happened in
https://bugzilla.redhat.com/show_bug.cgi?id=461506 .

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:43 +01:00
Mikulas Patocka
2defcc3fb4 dm exception store: split set_chunk_size
Break the function set_chunk_size to two functions in preparation for
the fix in the following patch.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:41 +01:00
Mikulas Patocka
61578dcd3f dm snapshot: fix header corruption race on invalidation
If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting).  This patch fixes the race.

When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"
location.

Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being
invalidated.

The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).

The fix is to use separate memory areas for each.

See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:39 +01:00
Mikulas Patocka
02d2fd31de dm snapshot: refactor zero_disk_area to use chunk_io
Refactor chunk_io to prepare for the fix in the following patch.

Pass an area pointer to chunk_io and simplify zero_disk_area to use
chunk_io.  No functional change.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:37 +01:00
Jonathan Brassow
7ec23d5094 dm log: userspace add luid to distinguish between concurrent log instances
Device-mapper userspace logs (like the clustered log) are
identified by a universally unique identifier (UUID).  This
identifier is used to associate requests from the kernel to
a specific log in userspace.  The UUID must be unique everywhere,
since multiple machines may use this identifier when communicating
about a particular log, as is the case for cluster logs.

Sometimes, device-mapper/LVM may re-use a UUID.  This is the
case during pvmoves, when moving from one segment of an LV
to another, or when resizing a mirror, etc.  In these cases,
a new log is created with the same UUID and loaded in the
"inactive" slot.  When a device-mapper "resume" is issued,
the "live" table is deactivated and the new "inactive" table
becomes "live".  (The "inactive" table can also be removed
via a device-mapper 'clear' command.)

The above two issues were colliding.  More than one log was being
created with the same UUID, and there was no way to distinguish
between them.  So, sometimes the wrong log would be swapped
out during the exchange.

The solution is to create a locally unique identifier,
'luid', to go along with the UUID.  This new identifier is used
to determine exactly which log is being referenced by the kernel
when the log exchange is made.  The identifier is not
universally safe, but it does not need to be, since
create/destroy/suspend/resume operations are bound to a specific
machine; and these are the operations that make up the exchange.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:34 +01:00
Jonathan Brassow
d2b698644c dm raid1: do not allow log_failure variable to unset after being set
This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.

The case involves the failure of the primary device along with
the transient failure of the log device.  The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'.  The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure.  Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.

The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.

In any case, it is inappropiate for us to reset the 'log_failure'
variable.  The above bug simply illustrates that it can actually
hurt us.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:32 +01:00
Jonathan Brassow
b8313b6da7 dm log: remove incorrect field from userspace table output
The output of 'dmsetup table' includes an internal field that should not
be there.  This patch removes it.  To make the fix simpler, we first
reorder a constructor argument

The 'device size' argument is generated internally.  Currently it is
placed as the last space-separated word of the constructor string.
However, we need to use a version of the string without this word, so we
move it to the beginning instead so it is trivial to skip past it.

We keep a copy of the arguments passed to userspace for creating a log,
just in case we need to resend them.  These are the same arguments that
are desired in the STATUSTYPE_TABLE request, except for one.  When
creating the userspace log, the userspace daemon must know the size of
the mirror, so that is added to the arguments given in the constructor
table.  We were printing this extra argument out as well, which is a
mistake.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:30 +01:00
Jonathan Brassow
4142a96917 dm log: fix userspace status output
Fix 'dmsetup table' output.

There is a missing ' ' at the end of the string causing two
words to run together.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:28 +01:00
Mike Snitzer
40bea43127 dm stripe: expose correct io hints
Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.

Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04 20:40:25 +01:00