Commit Graph

180076 Commits

Author SHA1 Message Date
Boaz Harrosh
96391e2bae exofs: Error recovery if object is missing from storage
If an object is referenced by a directory but does not
exist on a target, it is a very serious corruption that
means:
1. Either a power failure with very slim chance of it
  happening. Because the directory update is always submitted
  much after object creation, but if a directory is written
  to one device and the object creation to another it might
  theoretically happen.
2. It only ever happened to me while developing with BUGs
  causing file corruption. Crashes could also cause it but
  they are more like case 1.

In any way the object does not exist, so data is surely lost.
If there is a mix-up in the obj-id or data-map, then lost objects
can be salvaged by off-line fsck. The only recoverable information
is the directory name. By letting it appear as a regular empty file,
with date==0 (1970 Jan 1st) ownership to root, we enable recovery
of the only useful information. And also enable deletion or over-write.
I can see how this can hurt.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:44:43 -08:00
Boaz Harrosh
86093aaff5 exofs: convert io_state to use pages array instead of bio at input
* inode.c operations are full-pages based, and not actually
  true scatter-gather
* Lets us use more pages at once upto 512 (from 249) in 64 bit
* Brings us much much closer to be able to use exofs's io_state engine
  from objlayout driver. (Once I decide where to put the common code)

After RAID0 patch the outer (input) bio was never used as a bio, but
was simply a page carrier into the raid engine. Even in the simple
mirror/single-dev arrangement pages info was copied into a second bio.
It is now easer to just pass a pages array into the io_state and prepare
bio(s) once.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:44:42 -08:00
Boaz Harrosh
5d952b8391 exofs: RAID0 support
We now support striping over mirror devices. Including variable sized
stripe_unit.

Some limits:
* stripe_unit must be a multiple of PAGE_SIZE
* stripe_unit * stripe_count is maximum upto 32-bit (4Gb)

Tested RAID0 over mirrors, RAID0 only, mirrors only. All check.

Design notes:
* I'm not using a vectored raid-engine mechanism yet. Following the
  pnfs-objects-layout data-map structure, "Mirror" is just a private
  case of "group_width" == 1, and RAID0 is a private case of
  "Mirrors" == 1. The performance lose of the general case over the
  particular special case optimization is totally negligible, also
  considering the extra code size.

* In general I added a prepare_stripes() stage that divides the
  to-be-io pages to the participating devices, the previous
  exofs_ios_write/read, now becomes _write/read_mirrors and a new
  write/read upper layer loops on all devices calling
  _write/read_mirrors. Effectively the prepare_stripes stage is the all
  secret.
  Also truncate need fixing to accommodate for striping.

* In a RAID0 arrangement, in a regular usage scenario, if all inode
  layouts will start at the same device, the small files fill up the
  first device and the later devices stay empty, the farther the device
  the emptier it is.

  To fix that, each inode will start at a different stripe_unit,
  according to it's obj_id modulus number-of-stripe-units. And
  will then span all stripe-units in the same incrementing order
  wrapping back to the beginning of the device table. We call it
  a stripe-units moving window.

  Special consideration was taken to keep all devices in a mirror
  arrangement identical. So a broken osd-device could just be cloned
  from one of the mirrors and no FS scrubbing is needed. (We do that
  by rotating stripe-unit at a time and not a single device at a time.)

TODO:
 We no longer verify object_length == inode->i_size in exofs_iget.
 (since i_size is stripped on multiple objects now).
 I should introduce a multiple-device attribute reading, and use
 it in exofs_iget.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:43:08 -08:00
Boaz Harrosh
d9c740d225 exofs: Define on-disk per-inode optional layout attribute
* Layouts describe the way a file is spread on multiple devices.
  The layout information is stored in the objects attribute introduced
  in this patch.

* There can be multiple generating function for the layout.
  Currently defined:
    - No attribute present - use below moving-window on global
      device table, all devices.
      (This is the only one currently used in exofs)
    - an obj_id generated moving window - the obj_id is a randomizing
      factor in the otherwise global map layout.
    - An explicit layout stored, including a data_map and a device
      index list.
    - More might be defined in future ...

* There are two attributes defined of the same structure:
  A-data-files-layout - This layout is used by data-files. If present
                        at a directory, all files of that directory will
                        be created with this layout.
  A-meta-data-layout - This layout is used by a directory and other
                       meta-data information. Also inherited at creation
                       of subdirectories.

* At creation time inodes are created with the layout specified above.
  A usermode utility may change the creation layout on a give directory
  or file. Which in the case of directories, will also apply to newly
  created files/subdirectories, children of that directory.
  In the simple unaltered case of a newly created exofs, no layout
  attributes are present, and all layouts adhere to the layout specified
  at the device-table.

* In case of a future file system loaded in an old exofs-driver.
  At iget(), the generating_function is inspected and if not supported
  will return an IO error to the application and the inode will not
  be loaded. So not to damage any data.
  Note: After this patch we do not yet support any type of layout
        only the RAID0 patch that enables striping at the super-block
        level will add support for RAID0 layouts above. This way we
        are past and future compatible and fully bisectable.

* Access to the device table is done by an accessor since
  it will change according to above information.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:28 -08:00
Boaz Harrosh
46f4d973f6 exofs: unindent exofs_sbi_read
The original idea was that a mirror read can be sub-divided
to multiple devices. But this has very little gain and only
at very large IOes so it's not going to be implemented soon.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:27 -08:00
Boaz Harrosh
45d3abcb1a exofs: Move layout related members to a layout structure
* Abstract away those members in exofs_sb_info that are related/needed
  by a layout into a new exofs_layout structure. Embed it in exofs_sb_info.

* At exofs_io_state receive/keep a pointer to an exofs_layout. No need for
  an exofs_sb_info pointer, all we need is at exofs_layout.

* Change any usage of above exofs_sb_info members to their new name.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:27 -08:00
Boaz Harrosh
22ddc55638 exofs: Recover in the case of read-passed-end-of-file
In check_io, implement the case of reading passed end of
file, by clearing the pages and recover with no error. In
a raid arrangement this can become a legitimate situation
in case of holes in the file.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:26 -08:00
Boaz Harrosh
518f167a37 exofs: Micro-optimize exofs_i_info
optimize the exofs_i_info struct usage by moving the embedded
vfs_inode to be first. A compiler might optimize away an "add"
operation with constant zero. (Which it cannot with other constants)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:25 -08:00
Boaz Harrosh
34ce4e7c23 exofs: debug print even less
* Last debug trimming left in some stupid print, remove them.
  Fixup some other prints
* Shift printing from inode.c to ios.c
* Add couple of prints when memory allocation fails.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:25 -08:00
Linus Torvalds
abe94c756c Linux 2.6.33-rc6 2010-01-29 13:57:50 -08:00
Dmitry Artamonow
4995c0b367 mfd: Fix asic3 build
asic3 also needs tmio_core or otherwise will fail to build.

Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2010-01-29 21:03:09 +01:00
Linus Torvalds
499a267371 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: update multi-touch protocol documentation
  Input: add the ABS_MT_PRESSURE event
  Input: winbond-cir - remove dmesg spam
  Input: lifebook - add another Lifebook DMI signature
  Input: ad7879 - support auxiliary GPIOs via gpiolib
2010-01-29 11:15:32 -08:00
Hugh Dickins
a7016235a6 mm: fix migratetype bug which slowed swapping
After memory pressure has forced it to dip into the reserves, 2.6.32's
5f8dcc2121 "page-allocator: split per-cpu
list into one-list-per-migrate-type" has been returning MIGRATE_RESERVE
pages to the MIGRATE_MOVABLE free_list: in some sense depleting reserves.

Fix that in the most straightforward way (which, considering the overheads
of alternative approaches, is Mel's preference): the right migratetype is
already in page_private(page), but free_pcppages_bulk() wasn't using it.

How did this bug show up?  As a 20% slowdown in my tmpfs loop kbuild
swapping tests, on PowerMac G5 with SLUB allocator.  Bisecting to that
commit was easy, but explaining the magnitude of the slowdown not easy.

The same effect appears, but much less markedly, with SLAB, and even
less markedly on other machines (the PowerMac divides into fewer zones
than x86, I think that may be a factor).  We guess that lumpy reclaim
of short-lived high-order pages is implicated in some way, and probably
this bug has been tickling a poor decision somewhere in page reclaim.

But instrumentation hasn't told me much, I've run out of time and
imagination to determine exactly what's going on, and shouldn't hold up
the fix any longer: it's valid, and might even fix other misbehaviours.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 10:28:09 -08:00
Linus Torvalds
67f15b06c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: check total number of devices when removing missing
  Btrfs: check return value of open_bdev_exclusive properly
  Btrfs: do not mark the chunk as readonly if in degraded mode
  Btrfs: run orphan cleanup on default fs root
  Btrfs: fix a memory leak in btrfs_init_acl
  Btrfs: Use correct values when updating inode i_size on fallocate
  Btrfs: remove tree_search() in extent_map.c
  Btrfs: Add mount -o compress-force
2010-01-29 10:27:37 -08:00
David Miller
94673e968c sparc: TIF_ABI_PENDING bit removal
Here are the sparc bits to remove TIF_ABI_PENDING now that
set_personality() is called at the appropriate place in exec.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
H. Peter Anvin
05d43ed8a8 x86: get rid of the insane TIF_ABI_PENDING bit
Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.

Everything becomes much more straightforward this way.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
Linus Torvalds
221af7f87b Split 'flush_old_exec' into two functions
'flush_old_exec()' is the point of no return when doing an execve(), and
it is pretty badly misnamed.  It doesn't just flush the old executable
environment, it also starts up the new one.

Which is very inconvenient for things like setting up the new
personality, because we want the new personality to affect the starting
of the new environment, but at the same time we do _not_ want the new
personality to take effect if flushing the old one fails.

As a result, the x86-64 '32-bit' personality is actually done using this
insane "I'm going to change the ABI, but I haven't done it yet" bit
(TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
personality, but just the "pending" bit, so that "flush_thread()" can do
the actual personality magic.

This patch in no way changes any of that insanity, but it does split the
'flush_old_exec()' function up into a preparatory part that can fail
(still called flush_old_exec()), and a new part that will actually set
up the new exec environment (setup_new_exec()).  All callers are changed
to trivially comply with the new world order.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
Henrik Rydberg
f6bdc2303d Input: update multi-touch protocol documentation
This patch documents a new ABS_MT parameter and adds further text to
clarify some points around the MT protocol.

Requested-by: Yoonyoung Shim <jy0922.shim@samsung.com>
Requested-by: Mika Kuoppala <mika.kuoppala@nokia.com>
Requested-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-01-28 22:32:52 -08:00
Henrik Rydberg
cb6ecf6f7a Input: add the ABS_MT_PRESSURE event
For pressure-based multi-touch devices, a direct way to send sensor
intensity data per finger is needed. This patch adds the ABS_MT_PRESSURE
event to the MT protocol.

Requested-by: Yoonyoung Shim <jy0922.shim@samsung.com>
Requested-by: Mika Kuoppala <mika.kuoppala@nokia.com>
Requested-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-01-28 22:32:45 -08:00
David Härdeman
93fb84b50f Input: winbond-cir - remove dmesg spam
I missed converting one dev_info call to deb_dbg before submitting the driver.
Without this change, a message will be printed to dmesg for each button press
if a RC6 remote is used.

Signed-off-by: David Härdeman <david@hardeman.nu>
Cc: stable <stable@kernel.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-01-28 22:32:38 -08:00
Linus Torvalds
64a028a6de Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Fix failure exit in ipathfs
  fix oops in fs/9p late mount failure
  fix leak in romfs_fill_super()
  get rid of pointless checks after simple_pin_fs()
  Fix failure exits in bfs_fill_super()
  fix affs parse_options()
  Fix remount races with symlink handling in affs
  Fix a leak in affs_fill_super()
2010-01-28 18:48:53 -08:00
Linus Torvalds
3d29935ff0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: remove IOH range fetching
  PCI: fix nested spinlock hang in aer_inject
2010-01-28 16:33:12 -08:00
Linus Torvalds
474118d06d Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] Update mach-types
  [ARM] orion5x: D-link DNS-323 rev. B1 power-off
  [ARM] Orion5x: add GPIO LED and buttons for wrt350n v2
  [ARM] pxa: fix irq suspend/resume for pxa25x
  [ARM] pxa: fix the incorrect naming of AC97 reset pin config for pxa26x
  [ARM] pxa/corgi: fix incorrect default GPIO for UDC Vbus
  [ARM] Kirkwood: drive USB VBUS pin on rd88f6192-nas high on boot
  [ARM] Orion: fix PCIe inbound window programming when RAM size is not a power of two
2010-01-28 14:34:11 -08:00
Russell King
ba45d52574 [ARM] Update mach-types
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-01-28 22:17:45 +00:00
Russell King
0b6c135ea9 Merge branch 'fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 2010-01-28 21:59:58 +00:00
Josef Bacik
035fe03a7a Btrfs: check total number of devices when removing missing
If you have a disk failure in RAID1 and then add a new disk to the
array, and then try to remove the missing volume, it will fail.  The
reason is the sanity check only looks at the total number of rw devices,
which is just 2 because we have 2 good disks and 1 bad one.  Instead
check the total number of devices in the array to make sure we can
actually remove the device.  Tested this with a failed disk setup and
with this test we can now run

btrfs-vol -r missing /mount/point

and it works fine.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
7f59203abe Btrfs: check return value of open_bdev_exclusive properly
Hit this problem while testing RAID1 failure stuff.  open_bdev_exclusive
returns ERR_PTR(), not NULL.  So change the return value properly.  This
is important if you accidently specify a device that doesn't exist when
trying to add a new device to an array, you will panic the box
dereferencing bdev.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
f48b90756b Btrfs: do not mark the chunk as readonly if in degraded mode
If a RAID setup has chunks that span multiple disks, and one of those
disks has failed, btrfs_chunk_readonly will return 1 since one of the
disks in that chunk's stripes is dead and therefore not writeable.  So
instead if we are in degraded mode, return 0 so we can go ahead and
allocate stuff.  Without this patch all of the block groups in a RAID1
setup will end up read-only, which will mean we can't add new disks to
the array since we won't be able to make allocations.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Josef Bacik
e3acc2a685 Btrfs: run orphan cleanup on default fs root
This patch revert's commit

6c090a11e1

Since it introduces this problem where we can run orphan cleanup on a
volume that can have orphan entries re-added.  Instead of my original
fix, Yan Zheng pointed out that we can just revert my original fix and
then run the orphan cleanup in open_ctree after we look up the fs_root.
I have tested this with all the tests that gave me problems and this
patch fixes both problems.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Yang Hongyang
f858153c36 Btrfs: fix a memory leak in btrfs_init_acl
In btrfs_init_acl() cloned acl is not released

Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:39 -05:00
Aneesh Kumar K.V
d1ea6a6145 Btrfs: Use correct values when updating inode i_size on fallocate
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Jan 20 12:57:53 2010 +0530

    Btrfs: Use correct values when updating inode i_size on fallocate

    Even though we allocate more, we should be updating inode i_size
    as per the arguments passed

    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:38 -05:00
Miao Xie
b8d9bfeb18 Btrfs: remove tree_search() in extent_map.c
This patch removes tree_search() in extent_map.c because it is not called by
anything.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:20:38 -05:00
Chris Mason
a555f810af Btrfs: Add mount -o compress-force
The default btrfs mount -o compress mode will quickly back off
compressing a file if it notices that compression does not reduce the
size of the data being written.  This can save considerable CPU because
all future writes to the file go through uncompressed.

But some files are both very large and have mixed data stored in
them.  In that case, we want to add the ability to always try
compressing data before writing it.

This commit adds mount -o compress-force.  A later commit will add
a new inode flag that does the same thing.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28 16:18:15 -05:00
Linus Torvalds
d4d37bde3d Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  MIPS: PowerTV: Fix support for timer interrupts with > 64 external IRQs
  MIPS: PowerTV: Streamline access to platform device registers
  MIPS: Fix vmlinuz build for 32bit-only math shells
  MIPS: Add support of LZO-compressed kernels
2010-01-28 12:59:43 -08:00
Linus Torvalds
551e28dbe8 Merge branch 'for-linus' of git://git.infradead.org/ubi-2.6
* 'for-linus' of git://git.infradead.org/ubi-2.6:
  UBI: fix volume creation input checking
2010-01-28 12:57:50 -08:00
Linus Torvalds
b39bda6e73 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: ohci: fix crashes with TSB43AB23 on 64bit systems
  firewire: core: fix use-after-free regression in FCP handler
  firewire: cdev: add_descriptor documentation fix
  firewire: core: add_descriptor size check
2010-01-28 12:56:23 -08:00
Jeff Garrett
e8e06eae4f x86/PCI: remove IOH range fetching
Turned out to cause trouble on single IOH machines, and is superceded by
_CRS on multi-IOH machines with production BIOSes.

Signed-off-by: Jeff Garrett <jeff@jgarrett.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-01-28 08:24:11 -08:00
Jon Dodgson
57b5e2ae5b Input: lifebook - add another Lifebook DMI signature
There are many many ways one can capitalize "Lifebook B Series"...

Signed-off-by: Jon Dodgson <crayzeejon@gmail.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2010-01-28 00:29:03 -08:00
David VomLehn
010c108d7a MIPS: PowerTV: Fix support for timer interrupts with > 64 external IRQs
The MIPS processor is limited to 64 external interrupt sources. Using a
greater number without IRQ sharing requires reading platform-specific
registers. On such platforms, reading the IntCtl register to determine
which interrupt corresponds to a timer interrupt will not work.

On MIPSR2 systems there is a solution - the TI bit in the Cause register,
specifically indicates that a timer interrupt has occured. This patch uses
that bit to detect interrupts for MIPSR2 processors, which may be expected
to work regardless of how the timer interrupt may be routed in the hardware.

Signed-off-by: David VomLehn (dvomlehn@cisco.com)
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/804/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-01-28 00:03:31 +01:00
David VomLehn
59dfa2fcae MIPS: PowerTV: Streamline access to platform device registers
Pre-compute addresses for the basic ASIC registers. This speeds up access
and allows memory for unused configurations to be freed. In addition,
uninitialized register addresses will be returned as NULL to catch bad
usage quickly.

Signed-off-by: David VomLehn <dvomlehn@cisco.com>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/806/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-01-28 00:03:31 +01:00
Alexander Clouter
9c4a6fce20 MIPS: Fix vmlinuz build for 32bit-only math shells
POSIX requires $((<expression>)) arithmetic in sh only to have long
arithmetic so on 32-bit sh binaries might do only 32-bit arithmetic but
the arithmetic done in arch/mips/boot/compressed/Makefile needs 64-bit.

I play with the AR7 platform, so VMLINUX_LOAD_ADDRESS is
0xffffffff94100000, and for an example 4MiB kernel
VMLINUZ_LOAD_ADDRESS is made out to be:
----
alex@berk:~$ bash -c 'printf "%x\n" $((0xffffffff94100000 + 0x400000))'
ffffffff94500000
alex@berk:~$ dash -c 'printf "%x\n" $((0xffffffff94100000 + 0x400000))'
80000000003fffff
----

The former is obviously correct whilst the later breaks things royally.

Fortunately working with only the lower 32bit's works for both bash and
dash:
----
$ bash -c 'printf "%x\n" $((0x94100000 + 0x400000))'
94500000
$ dash -c 'printf "%x\n" $((0x94100000 + 0x400000))'
94500000
----

So, we can split the original 64bit string to two parts, and only
calculate the low 32bit part, which is big enough (1GiB kernel sizes
anyone?) for a normal Linux kernel image file, now, we calculate the
VMLINUZ_LOAD_ADDRESS like this:

1. if present, append top 32bit of VMLINUX_LOAD_ADDRESS" as a prefix
2. get the sum of the low 32bit of VMLINUX_LOAD_ADDRESS + VMLINUX_SIZE

This patch fixes vmlinuz kernel builds on systems where only a
32bit-only math shell is available.

Patch Changelog:
  Version 2
    - simplified method by using 'expr' for 'substr' and making it work
	with dash once again
  Version 1
    - Revert the removals of '-n "$(VMLINUX_SIZE)"' to avoid the error
        of "make clean"
    - Consider more cases of the VMLINUX_LOAD_ADDRESS
  Version 0
    - initial release

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Acked-by: Wu Zhangjin <wuzhangjin@gmail.com>
Patchwork: http://patchwork.linux-mips.org/patch/861/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-01-28 00:03:30 +01:00
Wu Zhangjin
fe1d45e086 MIPS: Add support of LZO-compressed kernels
The necessary changes to the x86 Kconfig and boot/compressed to allow the
use of this new compression method.

Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Patchwork: http://patchwork.linux-mips.org/patch/857/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2010-01-28 00:03:30 +01:00
Russell King
00e4acb1e2 Merge branch 'for-rmk' of git://git.marvell.com/orion 2010-01-27 22:16:05 +00:00
Linus Torvalds
be8cde8b24 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] aic79xx: check for non-NULL scb in ahd_handle_nonpkt_busfree
  [SCSI] zfcp: Set hardware timeout as requested by BSG request.
  [SCSI] zfcp: Introduce bsg_timeout callback.
  [SCSI] scsi_transport_fc: Allow LLD to reset FC BSG timeout
  [SCSI] zfcp: add missing compat ptr conversion
  [SCSI] zfcp: Fix linebreak in hba trace
  [SCSI] zfcp: Issue zfcp_fc_wka_port_put after FC CT BSG request
  [SCSI] qla2xxx: Update version number to 8.03.01-k10.
  [SCSI] fc-transport: Use packed modifier for fc_bsg_request structure.
  [SCSI] qla2xxx: Perform fast mailbox read of flash regardless of size nor address alignment.
  [SCSI] qla2xxx: Correct FCP2 recovery handling.
  [SCSI] scsi_lib: Fix bug in completion of bidi commands
  [SCSI] mptsas: Fix issue with chain pools allocation on katmai
  [SCSI] aacraid: fix File System going into read-only mode
  [SCSI] lpfc: fix file permissions
2010-01-27 09:54:08 -08:00
Linus Torvalds
981a2edd19 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] fix single stepped svcs with TRACE_IRQFLAGS=y
  [S390] zcrypt: Do not remove coprocessor for error 8/72
  [S390] sclp_vt220: set initial terminal window size
  [S390] use set_current_state in sigsuspend
  [S390] irqflags: add missing types.h include
  [S390] dasd: fix possible NULL pointer errors
2010-01-27 09:27:44 -08:00
Chris Wilson
4bdadb9785 drm/i915: Selectively enable self-reclaim
Having missed the ENOMEM return via i915_gem_fault(), there are probably
other paths that I also missed. By not enabling NORETRY by default these
paths can run the shrinker and take memory from the system (but not from
our own inactive lists because our shrinker can not run whilst we hold
the struct mutex) and this may allow the system to survive a little longer
whilst our drivers consume all available memory.

References:
  OOM killer unexpectedly called with kernel 2.6.32
  http://bugzilla.kernel.org/show_bug.cgi?id=14933

v2: Pass gfp into page mapping.
v3: Use new read_cache_page_gfp() instead of open-coding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-27 09:26:43 -08:00
Stefan Richter
7a48143678 firewire: ohci: fix crashes with TSB43AB23 on 64bit systems
Unsurprisingly, Texas Instruments TSB43AB23 exhibits the same behaviour
as TSB43AB22/A in dual buffer IR DMA mode:  If descriptors are located
at physical addresses above the 31 bit address range (2 GB), the
controller will overwrite random memory.  With luck, this merely
prevents video reception.  With only a little less luck, the machine
crashes.

We use the same workaround here as with TSB43AB22/A:  Switch off the
dual buffer capability flag and use packet-per-buffer IR DMA instead.
Another possible workaround would be to limit the coherent DMA mask to
31 bits.

In Linux 2.6.33, this change serves effectively only as documentation
since dual buffer mode is not used for any controller anymore.  But
somebody might want to re-enable it in the future to make use of
features of dual buffer DMA that are not available in packet-per-buffer
mode.

In Linux 2.6.32 and older, this update is vital for anyone with this
controller, more than 2 GB RAM, a 64 bit kernel, and FireWire video or
audio applications.

We have at least four reports:
http://bugzilla.kernel.org/show_bug.cgi?id=13808
http://marc.info/?l=linux1394-user&m=126154279004083
https://bugzilla.redhat.com/show_bug.cgi?id=552142
http://marc.info/?l=linux1394-user&m=126432246128386

Reported-by: Paul Johnson
Reported-by: Ronneil Camara
Reported-by: G Zornetzer
Reported-by: Mark Thompson
Cc: stable@kernel.org
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-01-27 18:24:53 +01:00
Linus Torvalds
0531b2aac5 mm: add new 'read_cache_page_gfp()' helper function
It's a simplified 'read_cache_page()' which takes a page allocation
flag, so that different paths can control how aggressive the memory
allocations are that populate a address space.

In particular, the intel GPU object mapping code wants to be able to do
a certain amount of own internal memory management by automatically
shrinking the address space when memory starts getting tight.  This
allows it to dynamically use different memory allocation policies on a
per-allocation basis, rather than depend on the (static) address space
gfp policy.

The actual new function is a one-liner, but re-organizing the helper
functions to the point where you can do this with a single line of code
is what most of the patch is all about.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-27 09:20:03 -08:00
Linus Torvalds
caf0801e0c Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, msr/cpuid: Pass the number of minors when unregistering MSR and CPUID drivers.
  x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)
  Revert "x86: ucode-amd: Load ucode-patches once ..."
  x86: Disable HPET MSI on ATI SB700/SB800
  x86: Set hotpluggable nodes in nodes_possible_map
2010-01-27 02:49:10 -08:00
Linus Torvalds
5bc6d799e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix UP build.
2010-01-27 02:47:24 -08:00