We should consistently treat uid's as unsigned--it's confusing when
the display of uid's in the cache contents isn't consistent with their
representation in upcalls.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
One the changes in commit d7979ae4a "svc: Move close processing to a
single place" is:
err_delete:
- svc_delete_socket(svsk);
+ set_bit(SK_CLOSE, &svsk->sk_flags);
return -EAGAIN;
This is insufficient. The recvfrom methods must always call
svc_xprt_received on completion so that the socket gets re-queued if
there is any more work to do. This particular path did not make that
call because it actually destroyed the svsk, making requeue pointless.
When the svc_delete_socket was change to just set a bit, we should have
added a call to svc_xprt_received,
This is the problem that b0401d7253 attempted to fix, incorrectly.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit b0401d7253, which
moved svc_delete_xprt() outside of XPT_BUSY, and allowed it to be called
after svc_xpt_recived(), removing its last reference and destroying it
after it had already been queued for future processing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit b292cf9ce7. The
commit that it attempted to patch up,
b0401d7253, was fundamentally wrong, and
will also be reverted.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The 'struct svc_deferred_req's on the xpt_deferred queue do not
own a reference to the owning xprt. This is seen in svc_revisit
which is where things are added to this queue. dr->xprt is set to
NULL and the reference to the xprt it put.
So when this list is cleaned up in svc_delete_xprt, we mustn't
put the reference.
Also, replace the 'for' with a 'while' which is arguably
simpler and more likely to compile efficiently.
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.
It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().
On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up(). This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener. An ENOENT error
return results in a confusing error message from the mount command.
Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Clean up: Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers: the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set. I'm about to add another case
that does just the same.
If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Relax the address family check at the top of svc_addsock() to allow AF_INET6
listener sockets to be specified via /proc/fs/nfsd/portlist.
Signed-off-by: Aime Le Rouzic <aime.le-rouzic@bull.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The existing logic in ip_map_parse() can not currently parse
shorthanded IPv6 addresses (anything with a double colon), nor can
it parse an IPv6 presentation address with a scope ID. An
IPv6-enabled mountd can pass down both.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
nfs: fix oops in nfs_rename()
sunrpc: fix build-time warning
sunrpc: on successful gss error pipe write, don't return error
SUNRPC: Fix the return value in gss_import_sec_context()
SUNRPC: Fix up an error return value in gss_import_sec_context_kerberos()
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
sunrpc: fix peername failed on closed listener
nfsd: make sure data is on disk before calling ->fsync
nfsd: fix "insecure" export option
There're some warnings of "nfsd: peername failed (err 107)!"
socket error -107 means Transport endpoint is not connected.
This warning message was outputed by svc_tcp_accept() [net/sunrpc/svcsock.c],
when kernel_getpeername returns -107. This means socket might be CLOSED.
And svc_tcp_accept was called by svc_recv() [net/sunrpc/svc_xprt.c]
if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
<snip>
newxpt = xprt->xpt_ops->xpo_accept(xprt);
<snip>
So this might happen when xprt->xpt_flags has both XPT_LISTENER and XPT_CLOSE.
Let's take a look at commit b0401d72, this commit has moved the close
processing after do recvfrom method, but this commit also introduces this
warnings, if the xpt_flags has both XPT_LISTENER and XPT_CLOSED, we should
close it, not accpet then close.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Fix auth_gss printk format warning:
net/sunrpc/auth_gss/auth_gss.c:660: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When handling the gssd downcall, the kernel should distinguish between a
successful downcall that contains an error code and a failed downcall
(i.e. where the parsing failed or some other sort of problem occurred).
In the former case, gss_pipe_downcall should be returning the number of
bytes written to the pipe instead of an error. In the event of other
errors, we generally want the initiating task to retry the upcall so
we set msg.errno to -EAGAIN. An unexpected error code here is a bug
however, so BUG() in that case.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the context allocation fails, it will return GSS_S_FAILURE, which is
neither a valid error code, nor is it even negative.
Return ENOMEM instead...
Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the context allocation fails, the function currently returns a random
error code, since the variable 'p' still points to a valid memory location.
Ensure that it returns ENOMEM...
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4: Fix a regression in the NFSv4 state manager
NFSv4: Release the sequence id before restarting a CLOSE rpc call
nfs41: fix session fore channel negotiation
nfs41: do not zero seqid portion of stateid on close
nfs: run state manager in privileged mode
nfs: make recovery state manager operations privileged
nfs: enforce FIFO ordering of operations trying to acquire slot
rpc: add a new priority in RPC task
nfs: remove rpc_task argument from nfs4_find_slot
rpc: add rpc_queue_empty function
nfs: change nfs4_do_setlk params to identify recovery type
nfs: do not do a LOOKUP after open
nfs: minor cleanup of session draining
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits)
nfsd: remove pointless paths in file headers
nfsd: move most of nfsfh.h to fs/nfsd
nfsd: remove unused field rq_reffh
nfsd: enable V4ROOT exports
nfsd: make V4ROOT exports read-only
nfsd: restrict filehandles accepted in V4ROOT case
nfsd: allow exports of symlinks
nfsd: filter readdir results in V4ROOT case
nfsd: filter lookup results in V4ROOT case
nfsd4: don't continue "under" mounts in V4ROOT case
nfsd: introduce export flag for v4 pseudoroot
nfsd: let "insecure" flag vary by pseudoflavor
nfsd: new interface to advertise export features
nfsd: Move private headers to source directory
vfs: nfsctl.c un-used nfsd #includes
lockd: Remove un-used nfsd headers #includes
s390: remove un-used nfsd #includes
sparc: remove un-used nfsd #includes
parsic: remove un-used nfsd #includes
compat.c: Remove dependence on nfsd private headers
...
The pointer to struct gss_auth parameter in gss_add_msg is not really needed
after commit 5b7ddd4a. Zap it.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
Introduce soft connect behavior for UDP transports. In this case, a
major timeout returns ETIMEDOUT instead of EIO.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, if a remote RPC service is unreachable, an RPC ping will
hang until the underlying transport connect attempt times out. A more
desirable behavior might be to have the ping fail immediately so upper
layers can recover appropriately.
In the case of an NFS mount, for instance, this would mean the
mount(2) system call could fail immediately if the server isn't
listening, rather than hanging uninterruptibly for more than 3
minutes.
Change rpc_ping() so that it fails immediately for connection-oriented
transports. rpc_create() will then fail immediately for such
transports if an RPC ping was requested.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Autobinding is handled by the rpciod process, not in user processes
that are generating regular RPC requests. Thus autobinding is usually
not affected by signals targetting user processes, such as KILL or
timer expiration events.
In addition, an RPC request generated by a user process that has
RPC_TASK_SOFTCONN set and needs to perform an autobind will hang if
the remote rpcbind service is not available.
For rpcbind queries on connection-oriented transports, let's use the
new soft connect semantic to return control to the user's process
quickly, if the kernel's rpcbind client can't connect to the remote
rpcbind service.
Logic is introduced in call_bind_status() to handle connection errors
that occurred during an asynchronous rpcbind query. The logic
abandons the rpcbind query if the RPC request has SOFTCONN set, and
retries after a few seconds in the normal case.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use TCP with the soft connect semantic for local rpcbind upcalls so
the kernel can detect immediately if the local rpcbind daemon is not
running.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The kernel's rpcbind client creates and deletes an rpc_clnt and its
underlying transport socket for every upcall to the local rpcbind
daemon.
When starting a typical NFS server on IPv4 and IPv6, the NFS service
itself does three upcalls (one per version) times two upcalls (one
per transport) times two upcalls (one per address family), making 12,
plus another one for the initial call to unregister previous NFS
services. Starting the NLM service adds an additional 13 upcalls,
for similar reasons.
(Currently the NFS service doesn't start IPv6 listeners, but it will
soon enough).
Instead, let's create an rpc_clnt for rpcbind upcalls during the
first local rpcbind query, and cache it. This saves the overhead of
creating and destroying an rpc_clnt and a socket for every upcall.
The new logic also prevents the kernel from attempting an RPCB_SET or
RPCB_UNSET if it knows from the start that the local portmapper does
not support rpcbind protocol version 4. This will cut down on the
number of rpcbind upcalls in legacy environments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: At one point, rpcb_local_clnt() handled IPv6 loopback
addresses too, but it doesn't any more; only IPv4 loopback is used
now. Get rid of the @addr and @addrlen arguments to
rpcb_local_clnt().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.
In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable. This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.
Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.
We don't want soft connection semantics to apply to other errors. The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.
The transport's connection re-establishment timeout is also ignored for
such requests. We want the request to fail immediately, so the
reconnect delay is skipped. Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The success case, where task->tk_status == 0, is by far the most
frequent case in call_transmit_status().
The default: arm of the switch statement in call_transmit_status()
handles the 0 case. default: was moved close to the top of the switch
statement in call_transmit_status() under the theory that the compiler
places object code for the earliest arms of a switch statement first,
making the CPU do less work.
The default: arm of a switch statement, however, is executed only
after all the other cases have been checked. Even if the compiler
rearranges the object code, the default: arm is the "last resort",
meaning all of the other cases have been explicitly exhausted. That
makes the current arrangement about as inefficient as it gets for the
common case.
To fix this, add an explicit check for zero before the switch
statement. That forces the compiler to do the zero check first, no
matter what optimizations it might try to do to the switch statement.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Recent changes to snprintf() introduced the %pI6c formatter, which can
display an IPv6 address with standard shorthanding. Using a
shorthanded address can save us a few bytes of memory for each stored
presentation address, or a few bytes on the wire when sending these in
a universal address.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It is possible for rpcauth_destroy_credcache() to cause the rpc credentials
to be unhashed while put_rpccred is waiting for the rpc_credcache_lock on
another cpu. Should this happen, then we can end up calling
hlist_del_rcu(&cred->cr_hash) a second time in put_rpccred, thus causing
list corruption.
Should the credential actually be hashed, it is also possible for
rpcauth_lookup_credcache to find and reference it before we get round to
unhashing it. In this case, the call to rpcauth_unhash_cred will fail, and
so we should just exit without destroying the cred.
Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the XPRT_CLOSE_WAIT flag is set, we need to ensure that we call
xprt->ops->close() while holding xprt_lock_write() before we can
start reconnecting.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Not including net/atm/
Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 59a252ff8c.
This helps in an entirely cached workload but not necessarily in
workloads that require waiting on disk.
Conflicts:
include/linux/sunrpc/svc.h
net/sunrpc/svc_xprt.c
Reported-by: Simon Kirby <sim@hostway.ca>
Tested-by: Jesper Krogh <jesper@krogh.cc>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The size of buf[] must account for the string termination needed for
the first strict_strtoul() call. Introduced in commit a02d6926.
Fábio Olivé Leite points out that strict_strtoul() requires _either_
'\n\0' _or_ '\0' termination, so use the simpler '\0' here instead.
See http://bugzilla.kernel.org/show_bug.cgi?id=14546 .
Reported-by: argp@census-labs.com
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Fábio Olivé Leite <fleite@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.
In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.
Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Conflicts:
drivers/net/usb/cdc_ether.c
All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.
Signed-off-by: David S. Miller <davem@davemloft.net>