SFQ enqueue algo puts a new flow _behind_ all pre-existing flows in the
circular list. In fact this is probably an old SFQ implementation bug.
100 Mbits = ~8333 full frames per second, or ~8 frames per ms.
With 50 flows, it means your "new flow" will have to wait 50 packets
being sent before its own packet. Thats the ~6ms.
We certainly can change SFQ to give a priority advantage to new flows,
so that next dequeued packet is taken from a new flow, not an old one.
Reported-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The get operation was not sending the message that was built to
user-space. This patch also includes the appropriate handling for
the return value of netlink_unicast().
Moreover, fix error codes on error (for example, for non-existing
entry was uncorrect).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The sanity check (timeout < 0) never works; the dividend is unsigned
and so is the division, which should have been a signed division.
long timeout = (ct->timeout.expires - jiffies) / HZ;
if (timeout < 0)
timeout = 0;
This patch converts the time values to signed for the division.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 10f6dfcfde (Revert "sch_netem: Remove classful functionality")
reintroduced classful functionality to netem, but broke basic netem
behavior :
netem uses an t(ime)fifo queue, and store timestamps in skb->cb[]
If qdisc is changed, time constraints are not respected and other qdisc
can destroy skb->cb[] and block netem at dequeue time.
Fix this by always using internal tfifo, and optionally attach a child
qdisc to netem (or a tree of qdiscs)
Example of use :
DEV=eth3
tc qdisc del dev $DEV root
tc qdisc add dev $DEV root handle 30: est 1sec 8sec netem delay 20ms 10ms
tc qdisc add dev $DEV handle 40:0 parent 30:0 tbf \
burst 20480 limit 20480 mtu 1514 rate 32000bps
qdisc netem 30: root refcnt 18 limit 1000 delay 20.0ms 10.0ms
Sent 190792 bytes 413 pkt (dropped 0, overlimits 0 requeues 0)
rate 18416bit 3pps backlog 0b 0p requeues 0
qdisc tbf 40: parent 30: rate 256000bit burst 20Kb/8 mpu 0b lat 0us
Sent 190792 bytes 413 pkt (dropped 6, overlimits 10 requeues 0)
backlog 0b 5p requeues 0
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During some debugging I needed to look into how /proc/net/ipv6_route
operated and in my digging I found its calling fib6_clean_all() which uses
"write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
found this on 2.6.32, but reading the code I believe the same basic idea
exists currently. Looking at the rtnetlink code they are only calling
"read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
reading from proc isn't the recommended way of fetching the ipv6 route
table; taking a write lock seems unnecessary and would probably cause
network performance issues.
To verify this I loaded up the ipv6 route table and then ran iperf in 3
cases:
* doing nothing
* reading ipv6 route table via proc
(while :; do cat /proc/net/ipv6_route > /dev/null; done)
* reading ipv6 route table via rtnetlink
(while :; do ip -6 route show table all > /dev/null; done)
* Load the ipv6 route table up with:
* for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
* iperf commands:
* client: iperf -i 1 -V -c <ipv6 addr>
* server: iperf -V -s
* iperf results - 3 runs each (in Mbits/sec)
* nothing: client: 927,927,927 server: 927,927,927
* proc: client: 179,97,96,113 server: 142,112,133
* iproute: client: 928,927,928 server: 927,927,927
lock_stat shows taking the write lock is causing the slowdown. Using this
info I decided to write a version of fib6_clean_all() which replaces
write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
this new function I see the same results as with my rtnetlink iperf test.
Signed-off-by: Josh Hunt <joshhunt00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.
I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tcp diag reports rqlen and wqlen values similar to how
the CINQ/COUTQ iotcls do. To make unix diag report these values
in the same way move the respective code into helpers.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[ Fix indentation of sock_diag*() calls. -DaveM ]
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a routine that dumps memory-related values of a socket.
It's made as an array to make it possible to add more stuff
here later without breaking compatibility.
Since v1: The SK_MEMINFO_ constants are in userspace
visible part of sock_diag.h, the rest is under __KERNEL__.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of the "s" to indicate pluralization is intentional,
since the struct actually contains two name variants.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This changes both the struct bcbearer and struct bcbearer_pair to
have the "tipc_" prefix. Runtime behaviour is unchanged.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Give it a meaningful prefix, as suggested by DaveM, so that it
is consistent with things like struct tipc_bearer, and so it isn't
confused with anything else. This has no impact on the actual
runtime code behaviour.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In some of the rt6_bind_neighbour() call sites, it hasn't hooked
up the rt->dst.dev pointer yet, so we'd deref a NULL pointer when
obtaining dev->ifindex for the neighbour hash function computation.
Just pass the netdevice explicitly in to fix this problem.
Reported-by: Bjarke Istrup Pedersen <gurligebis@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
warning: (NETFILTER_XT_MATCH_NFACCT) selects NETFILTER_NETLINK_ACCT which has
unmet direct dependencies (NET && INET && NETFILTER && NETFILTER_ADVANCED)
and then
ERROR: "nfnetlink_subsys_unregister" [net/netfilter/nfnetlink_acct.ko] undefined!
ERROR: "nfnetlink_subsys_register" [net/netfilter/nfnetlink_acct.ko] undefined!
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.
And it makes grepping for dst_entry member uses much harder too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, create and use an rt6_bind_neighbour() in net/ipv6/route.c to
consolidate some common logic.
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.
Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check setsockopt arguments to avoid overflows and return -EINVAL for
too large arguments.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit be639ac6 ("NET: AX.25: Check ioctl arguments to avoid overflows
further down the road") rejects very large arguments, but doesn't
completely fix overflows on 64-bit systems. Consider the AX25_T2 case.
int opt;
...
if (opt < 1 || opt > ULONG_MAX / HZ) {
res = -EINVAL;
break;
}
ax25->t2 = opt * HZ;
The 32-bit multiplication opt * HZ would overflow before being assigned
to 64-bit ax25->t2. This patch changes "opt" to unsigned long.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing L2TP support, I discovered that the l2tp module is not autoloaded
as are other netlink interfaces. There is because of lack of hook in genetlink to call
request_module and load the module.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The route we have here is for the address being added to the interface,
ie. for input packet processing.
Therefore using that route to determine whether an output nexthop gateway
is known and resolved doesn't make any sense.
So, simply remove this test, it never triggered anyways.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
If bind is fail when bind is called after set PACKET_FANOUT
sock option, the dev refcnt will leak.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using /proc/net/nf_conntrack has been deprecated in favour of the
conntrack(8) tool.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
References: http://www.spinics.net/lists/netfilter-devel/msg18875.html
Augment xt_ecn by facilities to match on IPv6 packets' DSCP/TOS field
similar to how it is already done for the IPv4 packet field.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use the new macro and struct names in xt_ecn.h, and put the old
definitions into a definition-forwarding ipt_ecn.h.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Prepare the ECN match for augmentation by an IPv6 counterpart. Since
no symbol dependencies to ipv6.ko are added, having a single ecn match
module is the more so welcome.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Migrates the buf_seqno() helper routine from broadcast link level to
unicast link level so that it can be used both types of TIPC links.
This is a cosmetic change only, and does not affect the operation of TIPC.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds checks to TIPC's broadcast link so that it ignores any
acknowledgement message containing a sequence number that does not
correspond to an unacknowledged message currently in the broadcast
link's transmit queue.
This change prevents the broadcast link from becoming stalled if a
newly booted node receives stale broadcast link acknowledgement
information from another node that has not yet fully synchronized
its end of the broadcast link to reflect the current state of the
new node's end.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds code to release any unsent broadcast messages in the broadcast link
transmit queue if TIPC loses contact with its only neighboring node.
Previously, a broadcast link that was in the congested state would hold
on to the unsent messages, even though the messages were now undeliverable.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The two broadcast link statistics fields that are used to derive the
average length of that link's transmit queue are now updated only after
a successful attempt to send a broadcast message, since there is no need
to update these values when an unsuccessful send attempt leaves the
queue unchanged.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Adds a check to detect when an attempt is made to send a message
via the broadcast link and no neighboring nodes are currently available
to receive it. Rather than wasting effort passing the message to the
broadcast link and broadcast bearer, who will only throw it away,
TIPC now frees the message immediately and reports success (i.e. the
message has been delivered to all available destinations).
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Fixes oversight that allowed broadcast link node map to be updated without
first taking the broadcast link spinlock that protects the map. As part
of this fix the node map has been incorporated into the broadcast link
structure to make the need for such protection more evident.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Creates global variables to hold the broadcast link's pseudo-bearer and
pseudo-link structures, rather than allocating them dynamically. There
is only a single instance of each structure, and changing over to static
allocation allows elimination of code to handle the cases where dynamic
allocation was unsuccessful.
The memset in the teardown code may look like they aren't used, but
the same teardown code is run when there is a non-fatal error at
init-time, so that stale data isn't present when the user fixes the
cause of the soft error.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Gets rid of an unnecessary check in the routine that updates the port id
of a node's name publications when the node is assigned a network address,
since the routine is only invoked if the new address is different from
the existing one.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies TIPC's module unloading logic to switch itself into "single
node" mode before starting to terminate networking support. This helps
to ensure that no operations that require TIPC to be in "networking"
mode can initiate once unloading starts.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Gets rid of two pointless operations that zero out the array used to
record information about TIPC's Ethernet bearers. There is no need to
initialize the array on start up since it is a global variable that is
already zero'd out, and there is no need to zero it out on exit because
the array is never referenced again.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Modifies Ethernet bearer disable logic to break the association between
the bearer and its device driver at the time the bearer is disabled,
rather than when the TIPC module is unloaded. This allows the array
entry used by the disabled bearer to be re-used if the same bearer (or
a different one) is subsequently enabled.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>