The `last_bh' logic probably isn't worth much. In those situations where only
the front part of the page is being written out we will save some looping but
in the vastly more common case of an all-page writeout if just adds more code.
Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove all those get_bh()'s and put_bh()'s by extending lock_page() to cover
the troublesome regions.
(get_bh() and put_bh() happen every time whereas contention on a page's lock
in there happens basically never).
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
page size over loopback to an image file on a tmpfs filesystem, I would
very quickly hit
BUG_ON(!buffer_async_write(bh));
in fs/buffer.c:end_buffer_async_write
It seems that more than one request would be submitted for a given bh
at a time.
What would happen is the following:
2 threads doing __mpage_writepages on the same page.
Thread 1 - lock the page first, and enter __block_write_full_page.
Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
Thread 1 - set page writeback, unlock page.
Thread 2 - lock page, wait on page writeback
Thread 1 - submit_bh on the first 2 buffers.
=> both requests complete, none of the page buffers are async_write,
end_page_writeback is called.
Thread 2 - wakes up. enters __block_write_full_page.
Thread 2 - mark_buffer_async_write on (eg.) the last buffer
Thread 1 - finds the last buffer has async_write set, submit_bh on that.
Thread 2 - submit_bh on the last buffer.
=> oops.
So change __block_write_full_page to explicitly keep track of the last bh
we need to issue, so we don't touch anything after issuing the last
request.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a race where __block_prepare_write can leak out an in-flight read
against a bh if get_block returns an error. This can lead to the page
becoming unlocked while the buffer is locked and the read still in flight.
__mpage_writepage BUGs on this condition.
BUG sighted on a 2-way Itanium2 system with 16K PAGE_SIZE running
fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
where $DIR is a new ext2 filesystem with 4K blocks that is quite
small (causing get_block to fail often with -ENOSPC).
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This cleans up the error handling and fixes a crash if a hostfs mount fails.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes sure that reclaimable buffer headers and reclaimable inodes
are accounted properly during the overcommit checks.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
handling for unwritten extents can be moved out of interrupt context.
SGI Modid: xfs-linux:xfs-kern:22343a
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Christoph Hellwig <hch@sgi.com>
Modify xtSearch so that it returns the next allocated block when the
requested block is unmapped. This can be used to make sure we don't
create a new extent that overlaps the next one.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds jfs_syncpt, which calls lmLogSync to write sync points
to the journal both in jfs_sync_fs and when sync barrier processing
completes.
lmLogSync accomplishes two things: 1) it pushes logged-but-dirty
metadata pages to disk, and 2) it writes a sync record to the journal
so that jfs_fsck doesn't need to replay more transactions than is
necessary.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
jfs has never worked on architecutures where the page size was not 4K.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JFS code has always assumed a page size of 4K. This patch fixes the
non-pagecache uses of pages to deal with larger pages.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JFS was creating a new IAG (inode aggregate group) in one address
space, and afterwards, accessing it from another. This could lead to
complications when cache pages contain more than one page of jfs
metadata. This patch causes the IAG to be initialized in the same
address space that it is subsequently accessed with.
This also elimitates an I/O, but IAG's aren't created too often.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use an inline pxd list rather than an xad list in the xadlock.
When the number of extents being modified can fit with the xadlock,
a transaction can be committed asynchronously. Using a list of
pxd's instead of xad's allows us to fit 4 extents, rather than 2.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some KernelDoc descriptions are updated to match the current code.
No code changes.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I have recompiled Linux kernel 2.6.11.5 documentation for me and our
university students again. The documentation could be extended for more
sources which are equipped by structured comments for recent 2.6 kernels. I
have tried to proceed with that task. I have done that more times from 2.6.0
time and it gets boring to do same changes again and again. Linux kernel
compiles after changes for i386 and ARM targets. I have added references to
some more files into kernel-api book, I have added some section names as well.
So please, check that changes do not break something and that categories are
not too much skewed.
I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
by kernel convention. Most of the other changes are modifications in the
comments to make kernel-doc happy, accept some parameters description and do
not bail out on errors. Changed <pid> to @pid in the description, moved some
#ifdef before comments to correct function to comments bindings, etc.
You can see result of the modified documentation build at
http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz
Some more sources are ready to be included into kernel-doc generated
documentation. Sources has been added into kernel-api for now. Some more
section names added and probably some more chaos introduced as result of quick
cleanup work.
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The extra race-with-truncate-then-retry logic around
ext3_get_block_handle(), which was inherited from ext2, becomes unecessary
for ext3, since we have already obtained the ei->truncate_sem in
ext3_get_block_handle() before calling ext3_alloc_branch(). The
ei->truncate_sem is already there to block concurrent truncate and block
allocation on the same inode. So the inode's indirect addressing tree
won't be changed after we grab that semaphore.
We could, after get the semaphore, re-verify the branch is up-to-date or
not. If it has been changed, then get the updated branch. If we still
need block allocation, we will have a safe version of the branch to work
with in the ext3_find_goal()/ext3_splice_branch().
The code becomes more readable after remove those retry logic. The patch
also clean up some gotos in ext3_get_block_handle() to make it more
readable.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
comp_short_keys() massaged into sane form, which kills the last place where
pointer to in_core_key (or any object containing such) would be cast to or
from something else. At that point we are free to change layout of
in_core_key - nothing depends on it anymore.
So we drop the mess with union in there and simply use (unconditional) __u64
k_offset and __u8 k_type instead; places using in_core_key switched to those.
That gives _far_ better code than current mess - on all platforms.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fixes for a couple of bugs exposed by the above: le32_to_cpu() used on 16bit
value and missing conversion in comparison of host- and little-endian values.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
little-endian objects annotated as such; again, obviously no changes of
resulting code, we only replace __u16 with __le16, etc. in relevant places.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
struct reiserfs_key cloned; (currently) identical struct in_core_key added.
Places that expect host-endian data in reiserfs_key switched to in_core_key.
Basically, we get annotation of reiserfs_key users and keep the resulting tree
obviously equivalent to original.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Cc: <reiserfs-dev@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For tree mount maps, a call to chdir or chroot, to a directory above the
moint point directories at a certain time during the expire results in the
expire incorrectly thinking the tree is not busy. This patch adds a check
to see if the filesystem above the tree mount points is busy and also locks
the filesystem during the tree mount expire to prevent the race.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's possible for an event wait request to arive before the event
requestor. If this happens the daemon never gets notified and autofs
hangs.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes the leak of sb->s_fs_info in both the HFS and HFS+
modules. In addition to this, it fixes an oops happening when trying to
mount a non-hfsplus filesystem using hfsplus. This patch is from Roman
Zippel, based off patches sent by myself.
Signed-off-by: Colin Leroy <colin@colino.net>
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch optimizes io_submit_one to call aio_run_iocb() directly if
ctx->run_list is empty. When the list is empty, the operation of adding to
the list, then call to __aio_run_iocbs() is unnecessary because these
operations are done in one atomic step. ctx->run_list always has only one
element in this case. This optimization speeds up industry standard db
transaction processing benchmark by 0.2%.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Clean up code that was previously used for debug purpose. Remove aio_run,
aio_wakeups, iocb->ki_queued and iocb->ki_kicked. Also clean up unused
variable count in __aio_run_iocbs() and debug code in read_events().
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the tail pointer in aio_ring structure never wrap ring size more than
once, so a simple compare is sufficient to wrap the index around. This avoid
a more expensive mod operation.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes superfluous kiocb member initialization in the AIO
allocation and deallocation path. For example, in really_put_req(),
right before kiocb is returned to slab, 5 variables are reset to NULL.
The same variables will be initialized at the kiocb allocation time,
so why bother reset them knowing that they will be set to valid data
at alloc time? Another example: ki_retry is initialized in __aio_get_req,
but is initialized again in io_submit_one.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Suparna Bhattacharya <suparna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert most of the current code that uses _NSIG directly to instead use
valid_signal(). This avoids gcc -W warnings and off-by-one errors.
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes segmentation fault when specifying bad journal device via
a mount option.
Don't pass a zero pointer to bdevname() if filp_open() returns error.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow rewriting of a file and extending a file upto the end of the
allocated block on a full filesystem.
From: Chris Mason <mason@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's trivial for the resize option to auto-get the underlying device size,
while it's harder for the user. I've copied the code from jfs.
Since of the different reiserfs option parser (which does not use the
superior match_token used by almost every other filesystem), I've had to
use the "resize=auto" and not "resize" option to specify this behaviour.
Changing the option parser to the kernel one wouldn't be bad but I've no
time to do this cleanup in this moment.
Btw, the mount(8) man page should be updated to include this option. Cc
the relevant people, please (I hope I cc'ed the right people).
Cc: <reiserfs-dev@namesys.com>
Cc: <reiserfs-list@namesys.com>
Cc: <mtk-manpages@gmx.net>
Cc: Alex Zarochentsev <zam@namesys.com>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current logic assumes that a /proc/<PID>/task directory should have a
hardlink count of 3, probably counting ".", "..", and a directory for a
single child task.
It's fairly obvious that this doesn't work out correctly when a PID has
more than one child task, which is quite often the case.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The pid directories in /proc/ currently return the wrong hardlink count - 3,
when there are actually 4 : ".", "..", "fd", and "task".
This is easy to notice using find(1):
cd /proc/<pid>
find
In the output, you'll see a message similar to:
find: WARNING: Hard link count is wrong for .: this may be a bug in your
filesystem driver. Automatically turning on find's -noleaf option.
Earlier results may have failed to include directories that should have
been searched.
http://bugs.gentoo.org/show_bug.cgi?id=86031
I also noticed that CONFIG_SECURITY can add a 5th: attr, and performed a
similar fix on the task directories too.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove PAGE_BUG - repalce it with BUG and BUG_ON.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use this:
.set_page_dirty = __set_page_dirty_nobuffers,
We already dropped the inclusion of <linux/buffer_head.h>, and we don't have a
backing block device for this FS.
"Without having looked at it, I'm sure that hostfs does not use buffer_heads.
So setting your ->set_page_dirty a_op to point at __set_page_dirty_nobuffers()
is a reasonable thing to do - it'll provide a slight speedup."
This speedup is one less spinlock held and one less conditional branch, which
isn't bad.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace a number of memory barriers with smp_ variants. This means we won't
take the unnecessary hit on UP machines.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In rare situations, drop_buffers() can be called for a page which has buffers,
but no ->mapping (it was truncated, but the buffers were left behind because
ext3 was still fiddling with them).
But if there was an I/O error in a buffer_head, drop_buffers() will try to get
at the address_space and will oops.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When ->writepage() returns WRITEPAGE_ACTIVATE, the page is still locked.
Explicitly unlock the page in mpage_writepages().
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached is a new patch that solves the issue of getting valid credentials
into the LOGIN message. The current code was assuming that the audit context
had already been copied. This is not always the case for LOGIN messages.
To solve the problem, the patch passes the task struct to the function that
emits the message where it can get valid credentials.
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Main change is in path_lookup: added a goto to do audit_inode
instead of return statement, when emul_lookup_dentry for root
is successful.The existing code does audit_inode only when
lookup is done in normal root or cwd.
Other changes: Some lookup routines are returning zero on success,
and some are returning zero on failure. I documented the related
function signatures in this code path, so that one can glance over
abstract functions without understanding the entire code.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
.. since it can be due to pending kill.
Update readme information to better describe cifs umount
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
if cifsd thread is no longer running to demultixplex responses.
Do not send FindClose request when FindFirst failed without reaching end
of search.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pointed out by Dave Stahl and Vince Negri in which cifs can update the
last modify time on a server modified file without invalidating the
local cached data due to an intervening readdir.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
unless response is larger than 256 bytes. This cuts more than 1/3 of
the large memory allocations that cifs does and should be a huge help to
memory pressure under stress.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
And fix to not needlessly send new POSIX QFSInfo when server does not
explicitly claim support for the new protocol extensions.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. and do not double endian convert the special characters whem mounted
with mapchars mount parm.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For handling seven special characters that shells use for filenames.
This first parts implements conversions from Unicode.
Signed-off-by: Steve French
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
remove sparse warnings, unnecessary pad in QueryFileInfo and redundant
function define.
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Old servers such as NT4 do not support this level of FindFirst (and
retry with a lower infolevel)
Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If arch_setup_additional_pages fails, the error path will do some double-frees.
This fixes it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch removes __user from compat_uptr_t types in the NFS4 mount
32-bit->64-bit compatibility structures.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fs/isofs includes trimmed down to something resembling sanity.
Kernel-only parts of linux/iso_fs.h and entire linux/iso_fs_{sb,i}.h
moved to fs/isofs/isofs.h.
A lot of useless #include in fs/isofs/*.c killed.
Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes some needlessly global code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kobject_add() and kobject_del() don't emit hotplug events anymore. Do it
ourselves if we are finished populating the device directory.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sysfs: allow changing the permissions for already created attributes
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds 32-bit compatibility for mounting an NFSv4 mount on a 64-bit
kernel (such as happens with PPC64).
The problem is that the mount data for the NFS4 mount process includes
auxilliary data pointers, probably because the NFS4 mount data may
conceivably exceed PAGE_SIZE in size - thus breaking against the hard
limit imposed by sys_mount().
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts a fs/char_dev.c patch that was merged into BK on March 3.
The problem is that it breaks things ... __register_chrdev_region() has
a block of code, commented "temporary" for over two years now, which
fails rudely during PCMCIA initialization or other register_chrdev()
calls, because it doesn't "degrade to linked list". This keeps whole
subsystems from working.
A real fix to that "temporary" code should be possible, using some better
scheme to allocate major numbers, but it's not something I want to spend
time on just now.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We were failing to close on an error path, resulting in a leak of struct files
which could take a v4 server down fairly quickly.... So call
nfs4_close_delegation instead of just open-coding parts of it.
Simplify the cleanup on delegation failure while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
rpc_create_clnt and friends return errors, not NULL, on failure.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes the error "RPC: failed to contact portmap (errno -512)." when the server
later tries to unregister from the portmapper.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This fixes the lots-of-fsx-linux-instances-cause-a-slow-leak bug.
It's been there since 2.6.6, caused by:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.5/2.6.5-mm4/broken-out/jbd-move-locked-buffers.patch
That patch moves under-writeout ordered-data buffers onto a separate journal
list during commit. It took out the old code which was based on a single
list.
The old code (necessarily) had logic which would restart I/O against buffers
which had been redirtied while they were on the committing transaction's
t_sync_datalist list. The new code only writes buffers once, ignoring
redirtyings by a later transaction, which is good.
But over on the truncate side of things, in journal_unmap_buffer(), we're
treating buffers on the t_locked_list as inviolable things which belong to the
committing transaction, and we just leave them alone during concurrent
truncate-vs-commit.
The net effect is that when truncate tries to invalidate a page whose buffers
are on t_locked_list and have been redirtied, journal_unmap_buffer() just
leaves those buffers alone. truncate will remove the page from its mapping
and we end up with an anonymous clean page with dirty buffers, which is an
illegal state for a page. The JBD commit will not clean those buffers as they
are removed from t_locked_list. The VM (try_to_free_buffers) cannot reclaim
these pages.
The patch teaches journal_unmap_buffer() about buffers which are on the
committing transaction's t_locked_list. These buffers have been written and
I/O has completed. We can take them off the transaction and undirty them
within the context of journal_invalidatepage()->journal_unmap_buffer().
Acked-by: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The direct I/O code is mapping the read request to the file system block. If
the file size was not on a block boundary, the result would show the the read
reading past EOF. This was only happening for the AIO case. The non-AIO case
truncates the result to match file size (in direct_io_worker). This patch
does the same thing for the AIO case, it truncates the result to match the
file size if the read reads past EOF.
When I/O completes the result can be truncated to match the file size
without using i_size_read(), thus the aio result now matches the number of
bytes read to the end of file.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>