Make better use of inode* dereferencing macros to hide dereferencing chains
(including NFS_PROTO and NFS_CLIENT).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Maintain a common server record for NFS2/3 as well as for NFS4 so that common
stuff can be moved there from struct nfs_server.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add some extra const qualifiers into NFS.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the nominated dentry's superblock directly in the NFS statfs() op to get a
file handle, rather than using s_root (which will become a dummy dentry in a
future patch).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Generalise the nfs_client structure by:
(1) Moving nfs_client to a more general place (nfs_fs_sb.h).
(2) Renaming its maintenance routines to be non-NFS4 specific.
(3) Move those maintenance routines to a new non-NFS4 specific file (client.c)
and move the declarations to internal.h.
(4) Make nfs_find/get_client() take a full sockaddr_in to include the port
number (will be required for NFS2/3).
(5) Make nfs_find/get_client() take the NFS protocol version (again will be
required to differentiate NFS2, 3 & 4 client records).
Also:
(6) Make nfs_client construction proceed akin to inodes, marking them as under
construction and providing a function to indicate completion.
(7) Make nfs_get_client() wait interruptibly if it finds a client that it can
share, but that client is currently being constructed.
(8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a set_capabilities NFS RPC op so that the server capabilities can be set.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a lookup filehandle NFS RPC op so that a file handle can be looked up
without requiring dentries and inodes and other VFS stuff when doing an NFS4
pathwalk during mounting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Return an error when starting the idmapping pipe so that we can detect it
failing.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename nfs_server::nfs4_state to nfs_client as it will be used to represent the
client state for NFS2 and NFS3 also.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename struct nfs4_client to struct nfs_client so that it can become the basis
for a general client record for NFS2 and NFS3 in addition to NFS4.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make the nfs_callback_up()/down() prototypes just do nothing if NFS4 is not
enabled. Also make the down function void type since we can't really do
anything if it fails.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename the NFS4 version of nfs_stat_to_errno() so that it doesn't conflict with
the common one used by NFS2 and NFS3.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix ups for the splitting of the superblock stuff out of fs/nfs/inode.c,
including:
(*) Move the callback tcpport module param into callback.c.
(*) Move the idmap cache timeout module param into idmap.c.
(*) Changes to internal.h:
(*) namespace-nfs4.c was renamed to nfs4namespace.c.
(*) nfs_stat_to_errno() is in nfs2xdr.c, not nfs4xdr.c.
(*) nfs4xdr.c is contingent on CONFIG_NFS_V4.
(*) nfs4_path() is only uses if CONFIG_NFS_V4 is set.
Plus also:
(*) The sec_flavours[] table should really be const.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch adds a new directory cache management function that prepares
a disconnected anonymous function to be connected into the dentry tree. The
anonymous dentry is transferred the name and parentage from another dentry.
The following changes were made in [try #2]:
(*) d_materialise_dentry() now switches the parentage of the two nodes around
correctly when one or other of them is self-referential.
The following changes were made in [try #7]:
(*) d_instantiate_unique() has had the interior part split out as function
__d_instantiate_unique(). Callers of this latter function must be holding
the appropriate locks.
(*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
with the most obvious hash list (the one from the name). d_rehash() now
calls _d_rehash().
(*) d_materialise_dentry() is now __d_materialise_dentry() and is static.
(*) d_materialise_unique() added to perform the combination of d_find_alias(),
d_materialise_dentry() and d_add_unique() that the NFS client was doing
twice, all within a single dcache_lock critical section. This reduces the
number of times two different spinlocks were being accessed.
The following further changes were made:
(*) Add the dentries onto their parents d_subdirs lists.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A pinned inode may in theory end up filling memory with cached ACCESS
calls. This patch ensures that the VM may shrink away the cache in these
particular cases.
The shrinker works by iterating through the list of inodes on the global
nfs_access_lru_list, and removing the least recently used access
cache entry until it is done (or until the entire cache is empty).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current access cache only allows one entry at a time to be cached for each
inode. Add a per-inode red-black tree in order to allow more than one to
be cached at a time.
Should significantly cut down the time spent in path traversal for shared
directories such as ${PATH}, /usr/share, etc.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[CIFS] statfs for cifs unix extensions no longer experimental
[CIFS] New POSIX locking code not setting rc properly to zero on successful
[CIFS] Support deep tree mounts (e.g. mounts to //server/share/path)
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] sw_any_bug_dmi_table can be used on resume, so it isn't initdata
[CPUFREQ] Fix some more CPU hotplug locking.
[CPUFREQ] Workaround for BIOS bug in software coordination of frequency
[CPUFREQ] Longhaul - Add voltage scaling to driver
[CPUFREQ] Fix sparse warning in ondemand
[CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static
[CPUFREQ] Longhaul - Add ignore_latency option
[CPUFREQ] Longhaul - Disable arbiter
[CPUFREQ][2/2] ondemand: updated add powersave_bias tunable
[CPUFREQ][1/2] ondemand: updated tune for hardware coordination
[CPUFREQ] Fix typo.
In
|Author: James Simmons <jsimmons@kozmo.(none)>
|Date: Thu Mar 13 22:37:08 2003 -0800
|
| [FBCON] Cursor handling clean up. I nuked several static variables.
we have
-static void fbcon_vbl_handler(int irq, void *dummy, struct pt_regs *fp)
+static void fb_vbl_handler(int irq, void *dev_id, struct pt_regs *fp)
and 3 years later a couple of instances missed back then still remains
there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
iommu_init() and iounit_init() are never called for sun4, but that's not
enough - these calls should be ifdefed out since the functions in question
simply do not exist for CONFIG_SUN4 kernel.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Back when pci_dev had base_address[], loop of form
base = &...->base_address[0];
for (.....) {
...
*base++ = addr;
}
was fine, but when that array got spread in ->resource[...].start
replacing the initialization with
base = &...->resource[0].start;
was not a sufficient modification. IOW this code got broken for cases
when there had been more than one resource to fill. All way back in
2.3.41-pre3...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
several targets have no ....at() family and m32r calls its only chown variant
chown32(), with __NR_chown being undefined. creat(2) is also absent in some
targets.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sw_any_bug_dmi_table can be used on resume, so it isn't initdata.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Lukewarm IQ detected in hotplug locking
BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug()
[<b0134a42>] lock_cpu_hotplug+0x42/0x65
[<b02f8af1>] cpufreq_update_policy+0x25/0xad
[<b0358756>] kprobe_flush_task+0x18/0x40
[<b0355aab>] schedule+0x63f/0x68b
[<b01377c2>] __link_module+0x0/0x1f
[<b0119e7d>] __cond_resched+0x16/0x34
[<b03560bf>] cond_resched+0x26/0x31
[<b0355b0e>] wait_for_completion+0x17/0xb1
[<f965c547>] cpufreq_stat_cpu_callback+0x13/0x20 [cpufreq_stats]
[<f9670074>] cpufreq_stats_init+0x74/0x8b [cpufreq_stats]
[<b0137872>] sys_init_module+0x91/0x174
[<b0102c81>] sysenter_past_esp+0x56/0x79
As there are other places that call cpufreq_update_policy without
the hotplug lock, it seems better to keep the hotplug locking
at the lower level for the time being until this is revamped.
Signed-off-by: Dave Jones <davej@redhat.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (65 commits)
IB: Fix typo in kerneldoc for ib_set_client_data()
IPoIB: Add some likely/unlikely annotations in hot path
IPoIB: Remove unused include of vmalloc.h
IPoIB: Rejoin all multicast groups after a port event
IPoIB: Create MCGs with all attributes required by RFC
IB/sa: fix ib_sa_selector names
IB/iser: INFINIBAND_ISER depends on INET
IB/mthca: Simplify calls to mthca_cq_clean()
RDMA/cma: Document rdma_accept() error handling
IB/mthca: Recover from catastrophic errors
RDMA/cma: Document rdma_destroy_id() function
IB/cm: Do not track remote QPN in timewait state
IB/sa: Require SA registration
IPoIB: Refactor completion handling
IB/iser: Do not use FMR for a single dma entry sg
IB/iser: fix some debug prints
IB/iser: make FMR "page size" be 4K and not PAGE_SIZE
IB/iser: Limit the max size of a scsi command
IB/iser: fix a check of SG alignment for RDMA
RDMA/cma: Protect against adding device during destruction
...
IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When ipoib_ib_dev_flush() is called because of a port event, the
driver needs to rejoin all multicast groups, since the flush will call
ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()). Otherwise no
(non-broadcast) multicast groups will be rejoined until the networking
core calls ->set_multicast_list again, and so multicast reception will
be broken for potentially a long time.
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:
If the IB multicast group does not already exist, one must be
created first with the IPoIB link MTU. The MGID MUST use the same
P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
broadcast-GID. The rest of attributes SHOULD follow the values used
in the broadcast-GID as well.
However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set. Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Relevant SA queries are actually "greater than" / "less than", not
"greater than or equal" / "less than or equal" as the names imply.
(See IB spec 1.2 Vol 1, 15.2.5.16 PATHRECORD/Table 205 PathRecord)
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a QP has separate send and receive CQs, then the send CQ will never
have receive completions from that QP in it. So when cleaning the
send CQ, there's no need to pass in an SRQ pointer, even if the QP is
attached to an SRQ.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Document the reject sending and modifying QP to error done in rdma_accept().
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Trigger device remove and then add when a catastrophic error is
detected in hardware. This, in turn, will cause a device reset, which
we hope will recover from the catastrophic condition.
Since this might interefere with debugging the root cause, add a
module option to suppress this behaviour.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Clarify that rdma_destroy_id cancels outstanding asynchronous operations on the
Associated id.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Do not track remote QPN in TimeWait state, since QP is not connected.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running. Update all in-tree users for the new interface.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and
ipoib_ib_handle_tx_wc() to make the code easier to read. This will
also help implement NAPI in the future.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fast Memory Registration (fmr) is used to register for rdma an sg whose
elements are not linearly sequential after dma mapping.
The IB verbs layer provides an "all dma memory MR (memory region)" which
can be used for RDMA-ing a dma linearly sequential buffer.
Change the code to use the dma mr instead of doing fmr when dma mapping
produces a single dma entry sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
fix and add some debug prints related to iser
handling of memory for rdma.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
As iser is able to use at most one rdma operation for the
execution of a scsi command, and registration of the sg
associated with scsi command has its restrictions, the code
checks if an sg is "aligned for rdma".
Alignment for rdma is measured in "fmr page" units whose
possible resolutions are different between HCAs and can be
smaller, equal or bigger to the system page size.
When the system page size is bigger than 4KB (eg the default
with ia64 kernels) there a bigger chance that an sg would be
aligned for rdma if the fmr page size is 4KB.
Change the code to create FMR whose pages are of size 4KB
and to take that into account when processing the sg.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>