Commit Graph

30175 Commits

Author SHA1 Message Date
Jeff Layton
f6488c9ba5 nfs: don't allow nfs_find_actor to match inodes of the wrong type
Benny Halevy reported the following oops when testing RHEL6:

<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
<4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000

I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.

Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.

Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.

This should fix the oops reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=913660

Reported-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-27 17:28:20 -08:00
Weston Andros Adamson
a47970ff78 NFSv4.1: Hold reference to layout hdr in layoutget
This fixes an oops where a LAYOUTGET is in still in the rpciod queue,
but the requesting processes has been killed.  Without this, killing
the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout
to NULL while the LAYOUTGET rpc task still references it.

Example oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
PGD 7365b067 PUD 7365d067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4
CPU 0
Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa01bd586>]  [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
RSP: 0018:ffff88007b0c1c88  EFLAGS: 00010246
RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc
RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958
RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958
R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e
FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0)
Stack:
 ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388
 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900
 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791
Call Trace:
 [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4]
 [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc]
 [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc]

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-25 18:32:59 -08:00
Benny Halevy
78f33277f9 pnfs: fix resend_to_mds for directio
Pass the directio request on pageio_init to clean up the API.

Percolate pg_dreq from original nfs_pageio_descriptor to the
pnfs_{read,write}_done_resend_to_mds and use it on respective
call to nfs_pageio_init_{read,write} on the newly created
nfs_pageio_descriptor.

Reproduced by command:
 mount -o vers=4.1 server:/ /mnt
 dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
PGD 34786067 PUD 34794067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4
CPU 1
Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs
RIP: 0010:[<ffffffffa021a3a8>]  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
RSP: 0018:ffff880038f8fa68  EFLAGS: 00010206
RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000
RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028
RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510
R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40
R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878
FS:  0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480)
Stack:
 ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82
 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d
 ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48
Call Trace:
 [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs]
 [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs]
 [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs]
 [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs]
 [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4]
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files]
 [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files]
 [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4]
 [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs]
 [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs]
 [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files]
 [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc]
 [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc]
 [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173
 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff8105f8c1>] process_one_work+0x226/0x422
 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422
 [<ffffffff81094757>] ? lock_acquired+0x210/0x249
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff810600d8>] worker_thread+0x126/0x1c4
 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240
 [<ffffffff81064ef8>] kthread+0xb1/0xb9
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f
RIP  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
 RSP <ffff880038f8fa68>
CR2: 0000000000000028

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [>= 3.6]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-24 10:07:36 -05:00
Trond Myklebust
5a7a613a47 NFS: Don't allow NFS silly-renamed files to be deleted, no signal
Commit 73ca100 broke the code that prevents the client from deleting
a silly renamed dentry.  This affected "delete on last close"
semantics as after that commit, nothing prevented removal of
silly-renamed files.  As a result, a process holding a file open
could easily get an ESTALE on the file in a directory where some
other process issued 'rm -rf some_dir_containing_the_file' twice.
Before the commit, any attempt at unlinking silly renamed files would
fail inside may_delete() with -EBUSY because of the
DCACHE_NFSFS_RENAMED flag.  The following testcase demonstrates
the problem:
  tail -f /nfsmnt/dir/file &
  rm -rf /nfsmnt/dir
  rm -rf /nfsmnt/dir
  # second removal does not fail, 'tail' process receives ESTALE

The problem with the above commit is that it unhashes the old and
new dentries from the lookup path, even in the normal case when
a signal is not encountered and it would have been safe to call
d_move.  Unfortunately the old dentry has the special
DCACHE_NFSFS_RENAMED flag set on it.  Unhashing has the
side-effect that future lookups call d_alloc(), allocating a new
dentry without the special flag for any silly-renamed files.  As a
result, subsequent calls to unlink silly renamed files do not fail
but allow the removal to go through.  This will result in ESTALE
errors for any other process doing operations on the file.

To fix this, go back to using d_move on success.
For the signal case, it's unclear what we may safely do beyond d_drop.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
2013-02-22 14:55:34 -05:00
Trond Myklebust
666b3d803a NLM: Ensure that we resend all pending blocking locks after a reclaim
Currently, nlmclnt_lock will break out of the for(;;) loop when
the reclaimer wakes up the blocking lock thread by setting
nlm_lck_denied_grace_period. This causes the lock request to fail
with an ENOLCK error.
The intention was always to ensure that we resend the lock request
after the grace period has expired.

Reported-by: Wangyuan Zhang <Wangyuan.Zhang@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-19 12:18:27 -05:00
fanchaoting
5a12cca697 umount oops when remove blocklayoutdriver first
now pnfs client uses block layout, maybe we can remove
blocklayoutdriver first. if we umount later,
it can cause oops in unset_pnfs_layoutdriver.
because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.

reproduce it:
 modprobe  blocklayoutdriver
 mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
 rmmod blocklayoutdriver
 umount /mnt

then you can see following

CPU 0
Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
Stack:
ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
Call Trace:
[<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
[<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
[<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
[<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
[<ffffffff8117986a>] deactivate_super+0x4a/0x70
[<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
[<ffffffff81194d62>] sys_umount+0x72/0xe0
[<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP <ffff8800022d9e48>
CR2: ffffffffa04a1b38
---[ end trace 29f75aaedda058bf ]---

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-17 15:40:15 -05:00
Tim Gardner
96aa1549af nfs: remove kfree() redundant null checks
smatch analysis:

fs/nfs/getroot.c:130 nfs_get_root() info: redundant null
 check on name calling kfree()

fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null
 check on devname_garbage calling kfree()

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-17 15:27:21 -05:00
Weston Andros Adamson
085b7a45c6 NFSv4.1: Don't decode skipped layoutgets
layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0).
Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean
that the RPC was successfully sent, received and parsed.

To fix this, use the result's len member to see if parsing took place.

This fixes the following OOPS -- calling xdr_init_decode() with a buffer length
0 doesn't set the stream's 'p' member and ends up using uninitialized memory
in filelayout_decode_layout.

BUG: unable to handle kernel paging request at 0000000000008050
IP: [<ffffffff81282e78>] memcpy+0x18/0x120
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq
CPU 1
Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib]

Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffff81282e78>]  [<ffffffff81282e78>] memcpy+0x18/0x120
RSP: 0018:ffff88003dfab588  EFLAGS: 00010206
RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009
RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000
RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024
R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0
FS:  0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540)
Stack:
ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0
<d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70
<d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
[<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files]
[<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files]
[<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-17 15:24:16 -05:00
Trond Myklebust
fd9a8d7160 NFSv4.1: Fix bulk recall and destroy of layouts
The current code in pnfs_destroy_all_layouts() assumes that removing
the layout from the server->layouts list is sufficient to make it
invisible to other processes. This ignores the fact that most
users access the layout through the nfs_inode->layout...
There is further breakage due to lack of reference counting of the
layouts, meaning that the whole thing Oopses at the drop of a hat.

The code in initiate_bulk_draining() is almost correct, and can be
used as a model for pnfs_destroy_all_layouts(), so move that
code to pnfs.c, and refactor the code to allow us to choose between
a single filesystem bulk recall, and a recall of all layouts.
Also note that initiate_bulk_draining() currently calls iput() while
holding locks. Fix that too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-14 13:22:50 -05:00
Trond Myklebust
c8da19b986 NFSv4.1: Fix an ABBA locking issue with session and state serialisation
Ensure that if nfs_wait_on_sequence() causes our rpc task to wait for
an NFSv4 state serialisation lock, then we also drop the session slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-11 19:04:25 -05:00
Trond Myklebust
c21443c2c7 NFSv4: Fix a reboot recovery race when opening a file
If the server reboots after it has replied to our OPEN, but before we
call nfs4_opendata_to_nfs4_state(), then the reboot recovery thread
will not see a stateid for this open, and so will fail to recover it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:14 -05:00
Trond Myklebust
65b62a29f7 NFSv4: Ensure delegation recall and byte range lock removal don't conflict
Add a mutex to the struct nfs4_state_owner to ensure that delegation
recall doesn't conflict with byte range lock removal.

Note that we nest the new mutex _outside_ the state manager reclaim
protection (nfsi->rwsem) in order to avoid deadlocks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:13 -05:00
Trond Myklebust
37380e4264 NFSv4: Fix up the return values of nfs4_open_delegation_recall
Adjust the return values so that they return EAGAIN to the caller in
cases where we might want to retry the delegation recall after
the state recovery has run.
Note that we can't wait and retry in this routine, because the caller
may be the state manager thread.

If delegation recall fails due to a session or reboot related issue,
also ensure that we mark the stateid as delegated so that
nfs_delegation_claim_opens can find it again later.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:13 -05:00
Trond Myklebust
d25be546a8 NFSv4.1: Don't lose locks when a server reboots during delegation return
If the server reboots while we are converting a delegation into
OPEN/LOCK stateids as part of a delegation return, the current code
will simply exit with an error. This causes us to lose both
delegation state and locking state (i.e. locking atomicity).

Deal with this by exposing the delegation stateid during delegation
return, so that we can recover the delegation, and then resume
open/lock recovery.

Note that not having to hold the nfs_inode->rwsem across the
calls to nfs_delegation_claim_opens() also fixes a deadlock against
the NFSv4.1 reboot recovery code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:12 -05:00
Trond Myklebust
9a99af494b NFSv4.1: Prevent deadlocks between state recovery and file locking
We currently have a deadlock in which the state recovery thread
ends up blocking due to one of the locks which it is trying to
recover holding the nfs_inode->rwsem.
The situation is as follows: the state recovery thread is
scheduled in order to recover from a reboot. It immediately
drains the session, forcing all ordinary NFSv4.1 calls to
nfs41_setup_sequence() to be put to sleep.  This includes the
file locking process that holds the nfs_inode->rwsem.
When the thread gets to nfs4_reclaim_locks(), it tries to
grab a write lock on nfs_inode->rwsem, and boom...

Fix is to have the lock drop the nfs_inode->rwsem while it is
doing RPC calls. We use a sequence lock in order to signal to
the locking process whether or not a state recovery thread has
run on that inode, in which case it should retry the lock.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:12 -05:00
Trond Myklebust
c137afabe3 NFSv4: Allow the state manager to mark an open_owner as being recovered
This patch adds a seqcount_t lock for use by the state manager to
signal that an open owner has been recovered. This mechanism will be
used by the delegation, open and byte range lock code in order to
figure out if they need to replay requests due to collisions with
lock recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:11 -05:00
Trond Myklebust
322b2b9032 Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
This reverts commit 324d003b0c.

The deadlock turned out to be caused by a workqueue limitation that has
now been worked around in the RPC code (see comment in rpc_free_task).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:48 -05:00
Linus Torvalds
bf6c8a8148 NFS client bugfixe for Linux 3.8
- Error reporting in nfs_xdev_mount incorrectly maps all errors to ENOMEM
 - Fix an NFSv4 refcounting issue
 - Fix a mount failure when the server reboots during NFSv4 trunking discovery
 - NFSv4.1 mounts may need to run the lease recovery thread.
 - Don't silently fail setattr() requests on mountpoints
 - Fix a SUNRPC socket/transport livelock and priority queue issue
 - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRCpS4AAoJEGcL54qWCgDyqucP/2CTv5leu+X5/0PXBOykAIHg
 8oEsTEz7/4IvIxTXzuHDYirMnm/mulfGF6NrdPqfvxpAHqRfVBfLFocfNLMVhQci
 97RmBfEEGM22AToYUubML5bIxr0QllV4s9Vmyh/zGDan52y7zNNlZX+v6aLjZbJB
 Fbolihpcch6lhQEUNAzK0B0ddimDl9lazx/WTmMOD/JrwOqzA4FJC+YxBe88nfzQ
 c6sYyEptBaSirbCOlueqGpv8skB1CLpFJXguXToPXFxpWed6uoGrIwLO7MLUdFpJ
 Xw+j8cuv/wjyYJGVKjhW7kXtwK8T7+u4bT2L883R01XYXr8XfkkLON0dgG1X/unk
 80mLzCO1+qRdoDSQ4b/V4B0nScPRCJuoZpftjCi2uhKewNcxQPMZ2V5/D7pO3uyE
 NdhxByB8D86JfNrIcBcRaxfuiQsurQBDvsDNmWPBZSOmH/dmHqTQGLcIe6N94A0B
 c7KFtXrN2MzOl8S68dUpbhftObq9X0oK2oxFFLWRQoqjrFtiDLU5JilV0bEH2CMo
 gJX7CRrPoJ2wmNToKdPJRWBYDmtMMIThq3vIpCj1FFzX17r5grJpqCmp/TViu/ER
 r8rmJzni+nmHaO1NLJMTCHzLJ8soiKyBq8PZKAbipTJxy+TXI/69jnWzpIQ/pbM/
 JN+0tiCcqmUmXsO+hrCP
 =lWFV
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Error reporting in nfs_xdev_mount incorrectly maps all errors to
   ENOMEM

 - Fix an NFSv4 refcounting issue

 - Fix a mount failure when the server reboots during NFSv4 trunking
   discovery

 - NFSv4.1 mounts may need to run the lease recovery thread.

 - Don't silently fail setattr() requests on mountpoints

 - Fix a SUNRPC socket/transport livelock and priority queue issue

 - We must handle NFS4ERR_DELAY when resetting the NFSv4.1 session.

* tag 'nfs-for-3.8-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
  SUNRPC: When changing the queue priority, ensure that we change the owner
  NFS: Don't silently fail setattr() requests on mountpoints
  NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
  NFSv4: Fix NFSv4 trunking discovery
  NFSv4: Fix NFSv4 reference counting for trunked sessions
  NFS: Fix error reporting in nfs_xdev_mount
2013-02-01 08:43:52 +11:00
Trond Myklebust
c489ee290b NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It
usually means that the server is busy handling an unfinished RPC
request. Just sleep for a second and then retry.
We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return
value. If the NFS server has outstanding callbacks, we just want to
similarly sleep & retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:45:15 -05:00
Trond Myklebust
ab22541782 NFS: Don't silently fail setattr() requests on mountpoints
Ensure that any setattr and getattr requests for junctions and/or
mountpoints are sent to the server. Ever since commit
0ec26fd069 (vfs: automount should ignore LOOKUP_FOLLOW), we have
silently dropped any setattr requests to a server-side mountpoint.
For referrals, we have silently dropped both getattr and setattr
requests.

This patch restores the original behaviour for setattr on mountpoints,
and tries to do the same for referrals, provided that we have a
filehandle...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:41:04 -05:00
Linus Torvalds
f96736e1ba xfs: bugfixes for 3.8-rc6
- fix return value when filesystem probe finds no XFS magic, a
   regression introduced in 9802182.
 
 - fix stack switch in __xfs_bmapi_allocate by moving the check for stack
   switch up into xfs_bmapi_write.
 
 - fix oops in _xfs_buf_find by validating that the requested block is
   within the filesystem bounds.
 
 - limit speculative preallocation near ENOSPC.
 
 - fix an unmount hang in xfs_wait_buftarg by freeing the
   xfs_buf_log_item in xfs_buf_item_unlock.
 
 - fix a possible use after free with AIO.
 
 - fix xfs_swap_extents after removal of xfs_flushinval_pages, a
   regression introduced in fb59581404.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRBvmgAAoJENaLyazVq6ZOOacP/RilCPsi41NkJqRx1Rs5aRGE
 UvinrHfAL/tBupS2JVo1niIilBNJG/cI+lLcpV/P5omLBJfpEu0trzZUSxU7S1Vc
 a2M8J0qhmKfBcl70fuCALAxPY52895y+44gufxaH0O5HQDN6tB8n4MMqYGPmS8hz
 Ul/q3MO601hVyBHaoYa7BNGS3YG0TCdFGtWcC5tQaR3v7upTLR2ouZrGQ8CV0BBa
 Ek1xdxLh4D0fRybSL7lUw64W957iyldoLsEg+zQrE9NSfTE8DSqUG+NPWB0wjPce
 ICtmO6TbE5c6q1ScOL3YCC2cmYvjR9mlAHnPy73SqWSIsTqUsVzdibNo+tUJJZ5r
 RZf3u6Uri6uKC6Hl4XEtg4LVnnquKosTXfoiHmn+eh0dhYL7sZG0Ya5we5pH5Tmi
 P6B2DlfUA1fj4Ne4Asx2d7mwOJaZcLHDZoeCs/Haz2Z6kGVEm7ImyAb1h76uNOZo
 l0NFhXJGcOQLyjPtQjl81SGjQmntiIN0Poia3528zjxxGXlBNwAwalkOtdJnk5iN
 IaYRKtvIcrdjFvunasiKZIsV/O9w3/mguXlrSqBDgUsKPUc/cq5vLfsa70jYGc2j
 M6ldJRRqTvSjkVXc/7SXv4GLt/qbUWa92ESzhZQXEABIjZJUnOZpZuLmSVdRUyZk
 +SaGvbMphE0U/4ps3pO/
 =wViN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.8-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 "Here are fixes for returning EFSCORRUPTED on probe of a non-xfs
  filesystem, the stack switch in xfs_bmapi_allocate, a crash in
  _xfs_buf_find, speculative preallocation as the filesystem nears
  ENOSPC, an unmount hang, a race with AIO, and a regression with
  xfs_fsr:

   - fix return value when filesystem probe finds no XFS magic, a
     regression introduced in 9802182.

   - fix stack switch in __xfs_bmapi_allocate by moving the check for
     stack switch up into xfs_bmapi_write.

   - fix oops in _xfs_buf_find by validating that the requested block is
     within the filesystem bounds.

   - limit speculative preallocation near ENOSPC.

   - fix an unmount hang in xfs_wait_buftarg by freeing the
     xfs_buf_log_item in xfs_buf_item_unlock.

   - fix a possible use after free with AIO.

   - fix xfs_swap_extents after removal of xfs_flushinval_pages, a
     regression introduced in commit fb59581404a."

* tag 'for-linus-v3.8-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
  xfs: Fix possible use-after-free with AIO
  xfs: fix shutdown hang on invalid inode during create
  xfs: limit speculative prealloc near ENOSPC thresholds
  xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
  xfs: pull up stack_switch check into xfs_bmapi_write
  xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic
2013-01-30 11:59:37 +11:00
Torsten Kaiser
65e3aa77f1 xfs: Fix xfs_swap_extents() after removal of xfs_flushinval_pages()
Commit fb59581404 removed
xfs_flushinval_pages() and changed its callers to use
filemap_write_and_wait() and  truncate_pagecache_range() directly.

But in xfs_swap_extents() this change accidental switched the argument
for 'tip' to 'ip'. This patch switches it back to 'tip'

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 16:05:10 -06:00
Jan Kara
4b05d09c18 xfs: Fix possible use-after-free with AIO
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: xfs@oss.sgi.com
CC: Ben Myers <bpm@sgi.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:51:22 -06:00
Dave Chinner
9f87832a82 xfs: fix shutdown hang on invalid inode during create
When the new inode verify in xfs_iread() fails, the create
transaction is aborted and a shutdown occurs. The subsequent unmount
then hangs in xfs_wait_buftarg() on a buffer that has an elevated
hold count. Debug showed that it was an AGI buffer getting stuck:

[   22.576147] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   22.976213] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.376206] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck
[   23.776325] XFS (vdb): buffer 0x2/0x1, hold 0x2 stuck

The trace of this buffer leading up to the shutdown (trimmed for
brevity) looks like:

xfs_buf_init:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 1 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 1 caller xfs_trans_read_buf_map
xfs_buf_iorequest:   bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_iorequest
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_iorequest
xfs_buf_iowait:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_ioerror:     bno 0x2 len 0x200 hold 1 caller xfs_buf_bio_end_io
xfs_buf_iodone:      bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_ioend
xfs_buf_iowait_done: bno 0x2 nblks 0x1 hold 1 caller _xfs_buf_read
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 1 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_trans_brelse:    bno 0x2 len 0x200 hold 2 recur 0 refcount 1
xfs_buf_item_relse:  bno 0x2 nblks 0x1 hold 2 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_relse
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 1 caller xfs_trans_brelse
xfs_buf_trylock:     bno 0x2 nblks 0x1 hold 2 caller _xfs_buf_find
xfs_buf_find:        bno 0x2 len 0x200 hold 2 caller xfs_buf_get_map
xfs_buf_get:         bno 0x2 len 0x200 hold 2 caller xfs_buf_read_map
xfs_buf_read:        bno 0x2 len 0x200 hold 2 caller xfs_trans_read_buf_map
xfs_buf_hold:        bno 0x2 nblks 0x1 hold 2 caller xfs_buf_item_init
xfs_trans_read_buf:  bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_trans_log_buf:   bno 0x2 len 0x200 hold 3 recur 0 refcount 1
xfs_buf_item_unlock: bno 0x2 len 0x200 hold 3 flags DIRTY liflags ABORTED
xfs_buf_unlock:      bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock
xfs_buf_rele:        bno 0x2 nblks 0x1 hold 3 caller xfs_buf_item_unlock

And that is the AGI buffer from cold cache read into memory to
transaction abort. You can see at transaction abort the bli is dirty
and only has a single reference. The item is not pinned, and it's
not in the AIL. Hence the only reference to it is this transaction.

The problem is that the xfs_buf_item_unlock() call is dropping the
last reference to the xfs_buf_log_item attached to the buffer (which
holds a reference to the buffer), but it is not freeing the
xfs_buf_log_item. Hence nothing will ever release the buffer, and
the unmount hangs waiting for this reference to go away.

The fix is simple - xfs_buf_item_unlock needs to detect the last
reference going away in this case and free the xfs_buf_log_item to
release the reference it holds on the buffer.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:51:12 -06:00
Dave Chinner
f2a459565b xfs: limit speculative prealloc near ENOSPC thresholds
There is a window on small filesytsems where specualtive
preallocation can be larger than that ENOSPC throttling thresholds,
resulting in specualtive preallocation trying to reserve more space
than there is space available. This causes immediate ENOSPC to be
triggered, prealloc to be turned off and flushing to occur. One the
next write (i.e. next 4k page), we do exactly the same thing, and so
effective drive into synchronous 4k writes by triggering ENOSPC
flushing on every page while in the window between the prealloc size
and the ENOSPC prealloc throttle threshold.

Fix this by checking to see if the prealloc size would consume all
free space, and throttle it appropriately to avoid premature
ENOSPC...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:50:50 -06:00
Dave Chinner
eb178619f9 xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.

In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported.  Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:49:21 -06:00
Brian Foster
d26978dd86 xfs: pull up stack_switch check into xfs_bmapi_write
The stack_switch check currently occurs in __xfs_bmapi_allocate,
which means the stack switch only occurs when xfs_bmapi_allocate()
is called in a loop. Pull the check up before the loop in
xfs_bmapi_write() such that the first iteration of the loop has
consistent behavior.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:48:55 -06:00
Eric Sandeen
1bee12b8c4 xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magic
9802182 changed the return value from EWRONGFS (aka EINVAL)
to EFSCORRUPTED which doesn't seem to be handled properly by
the root filesystem probe.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2013-01-28 12:48:21 -06:00
David Teigland
d4e0bfec9b GFS2: fix skip unlock condition
The recent commit fb6791d100
included the wrong logic.  The lvbptr check was incorrectly
added after the patch was tested.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-28 09:49:15 +00:00
Trond Myklebust
65436ec0c8 NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
We do need to start the lease recovery thread prior to waiting for the
client initialisation to complete in NFSv4.1.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:41 -05:00
Trond Myklebust
202c312dba NFSv4: Fix NFSv4 trunking discovery
If walking the list in nfs4[01]_walk_client_list fails, then the most
likely explanation is that the server dropped the clientid before we
actually managed to confirm it. As long as our nfs_client is the very
last one in the list to be tested, the caller can be assured that this
is the case when the final return value is NFS4ERR_STALE_CLIENTID.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Tested-by: Ben Greear <greearb@candelatech.com>
2013-01-27 15:51:28 -05:00
Trond Myklebust
4ae19c2dd7 NFSv4: Fix NFSv4 reference counting for trunked sessions
The reference counting in nfs4_init_client assumes wongly that it
is safe for nfs4_discover_server_trunking() to return a pointer to a
nfs_client prior to bumping the reference count.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:15 -05:00
Trond Myklebust
dee972b967 NFS: Fix error reporting in nfs_xdev_mount
Currently, nfs_xdev_mount converts all errors from clone_server() to
ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that.
Also ensure that if nfs_fs_mount_common() returns an error, we
don't dprintk(0)...

The regression originated in commit 3d176e3fe4
(NFS: Use nfs_fs_mount_common() for xdev mounts)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.5]
2013-01-27 15:51:15 -05:00
Linus Torvalds
d7df025eb4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "It turns out that we had two crc bugs when running fsx-linux in a
  loop.  Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
  all down.  Miao also has a new OOM fix in this v2 pull as well.

  Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
  and resuming a running balance across drives.

  Josef's orphan truncate patch fixes an obscure corruption we'd see
  during xfstests.

  Arne's patches address problems with subvolume quotas.  If the user
  destroys quota groups incorrectly the FS will refuse to mount.

  The rest are smaller fixes and plugs for memory leaks."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
  Btrfs: fix repeated delalloc work allocation
  Btrfs: fix wrong max device number for single profile
  Btrfs: fix missed transaction->aborted check
  Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
  Btrfs: put csums on the right ordered extent
  Btrfs: use right range to find checksum for compressed extents
  Btrfs: fix panic when recovering tree log
  Btrfs: do not allow logged extents to be merged or removed
  Btrfs: fix a regression in balance usage filter
  Btrfs: prevent qgroup destroy when there are still relations
  Btrfs: ignore orphan qgroup relations
  Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
  Btrfs: fix unlock order in btrfs_ioctl_rm_dev
  Btrfs: fix unlock order in btrfs_ioctl_resize
  Btrfs: fix "mutually exclusive op is running" error code
  Btrfs: bring back balance pause/resume logic
  btrfs: update timestamps on truncate()
  btrfs: fix btrfs_cont_expand() freeing IS_ERR em
  Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
  Btrfs: fix off-by-one in lseek
  ...
2013-01-25 10:55:21 -08:00
Linus Torvalds
66e2d3e8c2 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Two small cifs fixes"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
  cifs: fix srcip_matches() for ipv6
2013-01-24 19:15:43 -08:00
Miao Xie
1eafa6c737 Btrfs: fix repeated delalloc work allocation
btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
btrfs_queue_worker for this inode, and then it locks the list, checks the
head of the list again. But because we don't delete the first inode that it
deals with before, it will fetch the same inode. As a result, this function
allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.

Fix this problem by splice this delalloc list.

Reported-by: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:27 -05:00
Miao Xie
c9f01bfe0c Btrfs: fix wrong max device number for single profile
The max device number of single profile is 1, not 0 (0 means 'as many as
possible'). Fix it.

Cc: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:26 -05:00
Miao Xie
2cba30f172 Btrfs: fix missed transaction->aborted check
First, though the current transaction->aborted check can stop the commit early
and avoid unnecessary operations, it is too early, and some transaction handles
don't end, those handles may set transaction->aborted after the check.

Second, when we commit the transaction, we will wake up some worker threads to
flush the space cache and inode cache. Those threads also allocate some transaction
handles and may set transaction->aborted if some serious error happens.

So we need more check for ->aborted when committing the transaction. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:25 -05:00
Miao Xie
8d25a086eb Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
We may access and update transaction->aborted on the different CPUs without
lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
unsolicited accesses and make sure we can get the right value.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:23 -05:00
Josef Bacik
e58dd74bcc Btrfs: put csums on the right ordered extent
I noticed a WARN_ON going off when adding csums because we were going over
the amount of csum bytes that should have been allowed for an ordered
extent.  This is a leftover from when we used to hold the csums privately
for direct io, but now we use the normal ordered sum stuff so we need to
make sure and check if we've moved on to another extent so that the csums
are added to the right extent.  Without this we could end up with csums for
bytenrs that don't have extents to cover them yet.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:22 -05:00
Liu Bo
192000dda2 Btrfs: use right range to find checksum for compressed extents
For compressed extents, the range of checksum is covered by disk length,
and the disk length is different with ram length, so we need to use disk
length instead to get us the right checksum.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:51:17 -05:00
Josef Bacik
b0175117b9 Btrfs: fix panic when recovering tree log
A user reported a BUG_ON(ret) that occured during tree log replay.  Ret was
-EAGAIN, so what I think happened is that we removed an extent that covered
a bitmap entry and an extent entry.  We remove the part from the bitmap and
return -EAGAIN and then search for the next piece we want to remove, which
happens to be an entire extent entry, so we just free the sucker and return.
The problem is ret is still set to -EAGAIN so we trip the BUG_ON().  The
user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
added a WARN_ON() to catch the other possibility.  Thanks,

Reported-by: Jan Steffens <jan.steffens@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:49:49 -05:00
Josef Bacik
201a903894 Btrfs: do not allow logged extents to be merged or removed
We drop the extent map tree lock while we're logging extents, so somebody
could come in and merge another extent into this one and screw up our
logging, or they could even remove us from the list which would keep us from
logging the extent or freeing our ref on it, so we need to make sure to not
clear LOGGING until after the extent is logged, and then we can merge it to
adjacent extents.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-01-24 12:49:48 -05:00
Cong Ding
10b8c7dff5 fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
When it goes to error through line 144, the memory allocated to *devname is
not freed, and the caller doesn't free it either in line 250. So we free the
memroy of *devname in function cifs_compose_mount_options() when it goes to
error.

Signed-off-by: Cong Ding <dinggnu@gmail.com>
CC: stable <stable@kernel.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-01-22 23:58:16 -06:00
Linus Torvalds
262060ea46 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This contain a bugfix for CUSE and miscellaneous small fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: remove unused variable in fuse_try_move_page()
  fuse: make fuse_file_fallocate() static
  fuse: Move CUSE Kconfig entry from fs/Kconfig into fs/fuse/Kconfig
  cuse: fix uninitialized variable warnings
  cuse: do not register multiple devices with identical names
  cuse: use mutex as registration lock instead of spinlocks
2013-01-22 11:53:19 -08:00
Linus Torvalds
05c2cf35c3 f2fs fixes for v3.8-rc5
o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQ/fpIAAoJEEAUqH6CSFDS6BAP/jz/JIPoSu4NunWgVH68tzA4
 xx4lsmJQJQG+1241uA9v4MMkcTzxE0QVNTOX3BpUXBhCMc00aPOPWlWjULridLI9
 0vRE6LpG+NntkyKF+M5T1ydGuQoDlEvqoGs6c5p3yaI6PxzbzmBmipsmqXUA8fyu
 280+OWJoAcALEMJiQ8JsHcDmvM9wdQ+BV/j3BNCm4dqBUA4dYPfDzRKUJYfwqiig
 qmVRseJJaekxrQ2lHG/K/WPAXa8aRcV6khP9tv/BPGRMt+I/fli/J4sWEFT6c73B
 +qYmhrf/RLNJ1O13dePlo3URwMu083PL8QN355GAKUJbMaX/UPjEnq0DLbBOkCS2
 KBQI5O1eiFauEE6YU7p7GuvnLeVkukcXSQNVnRrnWzTUA9CMThZZ2mAgb2lz+iEP
 oZWNirRwnxdcTQRXjPyTCtpPTCgJi4GO1WS5s6HeLP8G8Muo4PzDzcEwMX0Mw5ih
 s4n4wpQ+Zp3h53cc/DxdFAK15uM3XYtUfb92kJwqaEG5VmBy6KvliXDRnNXg7WMI
 imCb08c0Fr0M8ZHOd+UCveICcndFj25jkjx8w7PoE1KBbGJkKf+cjpZ3OhOAliux
 sGtH3EZLV6jx1MjV79OpDXQGoVsvktesCbRcaxc7BbqljXYVNiap8QbzjPnmAt6z
 KKN0GU32eolGgK4Zd/iF
 =vJQc
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
 o Support swap file and link generic_file_remap_pages
 o Enhance the bio streaming flow and free section control
 o Major bug fix on recovery routine
 o Minor bug/warning fixes and code cleanups

* tag 'f2fs-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (22 commits)
  f2fs: use _safe() version of list_for_each
  f2fs: add comments of start_bidx_of_node
  f2fs: avoid issuing small bios due to several dirty node pages
  f2fs: support swapfile
  f2fs: add remap_pages as generic_file_remap_pages
  f2fs: add __init to functions in init_f2fs_fs
  f2fs: fix the debugfs entry creation path
  f2fs: add global mutex_lock to protect f2fs_stat_list
  f2fs: remove the blk_plug usage in f2fs_write_data_pages
  f2fs: avoid redundant time update for parent directory in f2fs_delete_entry
  f2fs: remove redundant call to set_blocksize in f2fs_fill_super
  f2fs: move f2fs_balance_fs to punch_hole
  f2fs: add f2fs_balance_fs in several interfaces
  f2fs: revisit the f2fs_gc flow
  f2fs: check return value during recovery
  f2fs: avoid null dereference in f2fs_acl_from_disk
  f2fs: initialize newly allocated dnode structure
  f2fs: update f2fs partition info about SIT/NAT layout
  f2fs: update f2fs document to reflect SIT/NAT layout correctly
  f2fs: remove unneeded INIT_LIST_HEAD at few places
  ...
2013-01-22 10:33:17 -08:00
Dan Carpenter
d8b79b2f94 f2fs: use _safe() version of list_for_each
This is calling list_del() inside a loop which is a problem when we try
move to the next item on the list.  I've converted it to use the _safe
version.  And also, as a cleanup, I've converted it to use
list_for_each_entry instead of list_for_each.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:49:00 +09:00
Jaegeuk Kim
9af45ef5ab f2fs: add comments of start_bidx_of_node
The caller of start_bidx_of_node() should give proper node offsets which
point only direct node blocks. Otherwise, it is a caller's bug.
This patch adds comments to make it clear.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim
a7fdffbd3e f2fs: avoid issuing small bios due to several dirty node pages
If some small bios of dirty node pages are supposed to be issued during the
sequential data writes, there-in well-produced consecutive data bios are able
to be split by the small node bios, resulting in performance degradation.
So, let's collect a number of dirty node pages until reaching a threshold.
And, by default, I set the threshold as 2MB, a segment size.

This improves sequential write performance on i5, 512GB SSD (830 w/ SATA2) as
follows.
Before: 231 MB/s -> After: 255 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim
c01e54b770 f2fs: support swapfile
This patch adds f2fs_bmap operation to the data address space.
This enables f2fs to support swapfile.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00