Commit Graph

427489 Commits

Author SHA1 Message Date
Geert Uytterhoeven
a93c125604 packet: doc: Spelling s/than/that/
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:15:17 -04:00
David S. Miller
598f2e2bfb Merge branch 'mlx4'
Or Gerlitz says:

====================
mlx4 fixes

These short series fixes two bugs related to the vxlan support and a
missing req module call for the IB driver which is needed to support
IB/RDMA over Ethernet.

Pathes done over the net tree, commit dd38743 "vlan: Set correct
source MAC address with TX VLAN offload enabled"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:13:17 -04:00
Or Gerlitz
7855bff42e net/mlx4_core: Load the IB driver when the device supports IBoE
When checking what protocol drivers to load, the IB driver should be
requested also over Ethernet ports, if the device supports IBoE (RoCE).

Fixes: b046ffe 'net/mlx4_core: Load higher level modules according to ports type'
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:12:42 -04:00
Or Gerlitz
2a2083f7f3 net/mlx4_en: Handle vxlan steering rules for mac address changes
When the device mac address is changed, we must deregister the vxlan
steering rule associated with the previous mac, and register a new
steering rule using the new mac.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:12:41 -04:00
Or Gerlitz
56cb456746 net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
Fix the value used to dump the vxlan offloads device capability to align
with the MLX4_DEV_CAP_FLAG2_yyy definition. While on that, add dump to
the IPoIB flow-steering device capability and fix small typo.

The vxlan cap value wasn't fully handled when a conflict was resolved
between MLX4_DEV_CAP_FLAG2_DMFS_IPOIB coming from the IB tree to
MLX4_DEV_CAP_FLAG2_VXLAN_OFFLOADS coming from net-next.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 16:12:41 -04:00
Wei Liu
836fbaf459 xen-netback: use skb_is_gso in xenvif_start_xmit
In 5bd076708 ("Xen-netback: Fix issue caused by using gso_type wrongly")
we use skb_is_gso to determine if we need an extra slot to accommodate
the SKB. There's similar error in interface.c. Change that to use
skb_is_gso as well.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 15:36:32 -04:00
hayeswang
f75761b6b5 r8169: fix the incorrect tx descriptor version
The tx descriptor version of RTL8111B belong to RTL_TD_0.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 00:11:43 -04:00
Markos Chandras
406764622f tools/net/Makefile: Define PACKAGE to fix build problems
Fixes the following build problem with binutils-2.24

gcc -Wall -O2   -c -o bpf_jit_disasm.o bpf_jit_disasm.c
In file included from bpf_jit_disasm.c:25:0:
/usr/include/bfd.h:35:2: error: #error config.h must be included
before this header
 #error config.h must be included before this header

This is similar to commit 3ce711a6ab
"perf tools: bfd.h/libbfd detection fails with recent binutils"

See: https://sourceware.org/bugzilla/show_bug.cgi?id=14243

CC: David S. Miller <davem@davemloft.net>
CC: Daniel Borkmann <dborkman@redhat.com>
CC: netdev@vger.kernel.org
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-12 00:07:55 -04:00
Alexei Starovoitov
fdfaf64e75 x86: bpf_jit: support negative offsets
Commit a998d43423 claimed to introduce negative offset support to x86 jit,
but it couldn't be working, since at the time of the execution
of LD+ABS or LD+IND instructions via call into
bpf_internal_load_pointer_neg_helper() the %edx (3rd argument of this func)
had junk value instead of access size in bytes (1 or 2 or 4).

Store size into %edx instead of %ecx (what original commit intended to do)

Fixes: a998d43423 ("bpf jit: Let the x86 jit handle negative offsets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jan Seiffert <kaffeemonster@googlemail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 23:25:22 -04:00
Linus Lüssing
20a599bec9 bridge: multicast: enable snooping on general queries only
Without this check someone could easily create a denial of service
by injecting multicast-specific queries to enable the bridge
snooping part if no real querier issuing periodic general queries
is present on the link which would result in the bridge wrongly
shutting down ports for multicast traffic as the bridge did not learn
about these listeners.

With this patch the snooping code is enabled upon receiving valid,
general queries only.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 23:22:10 -04:00
Linus Lüssing
9ed973cc40 bridge: multicast: add sanity check for general query destination
General IGMP and MLD queries are supposed to have the multicast
link-local all-nodes address as their destination according to RFC2236
section 9, RFC3376 section 4.1.12/9.1, RFC2710 section 8 and RFC3810
section 5.1.15.

Without this check, such malformed IGMP/MLD queries can result in a
denial of service: The queries are ignored by most IGMP/MLD listeners
therefore they will not respond with an IGMP/MLD report. However,
without this patch these malformed MLD queries would enable the
snooping part in the bridge code, potentially shutting down the
according ports towards these hosts for multicast traffic as the
bridge did not learn about these listeners.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 23:22:10 -04:00
Eric Dumazet
c3f9b01849 tcp: tcp_release_cb() should release socket ownership
Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb40 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Tested-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:45:59 -04:00
David S. Miller
c7b76f854e Merge branch 'skb_frags'
Michael S. Tsirkin says:

====================
skbuff: fix skb_segment with zero copy skbs

This fixes a bug in skb_segment where it moves frags
between skbs without orphaning them.
This causes userspace to assume it's safe to
reuse the buffer, and receiver gets corrupted data.
This further might leak information from the
transmitter on the wire.

To fix track which skb does a copied frag belong
to, and orphan frags when copying them.

As we are tracking multiple skbs here, using
short names (skb,nskb,fskb,skb_frag,frag) becomes confusing.
So before adding another one, I refactor these names
slightly.

Patch is split out to make it easier to
verify that all trasformations are trivially correct.

The problem was observed in the field,
so I think that the patch is necessary on stable
as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-03-11 16:26:46 -04:00
Michael S. Tsirkin
1fd819ecb9 skbuff: skb_segment: orphan frags before copying
skb_segment copies frags around, so we need
to copy them carefully to avoid accessing
user memory after reporting completion to userspace
through a callback.

skb_segment doesn't normally happen on datapath:
TSO needs to be disabled - so disabling zero copy
in this case does not look like a big deal.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:26:38 -04:00
Michael S. Tsirkin
1a4cedaf65 skbuff: skb_segment: s/fskb/list_skb/
fskb is unrelated to frag: it's coming from
frag_list. Rename it list_skb to avoid confusion.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:26:38 -04:00
Michael S. Tsirkin
df5771ffef skbuff: skb_segment: s/skb/head_skb/
rename local variable to make it easier to tell at a glance that we are
dealing with a head skb.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:26:38 -04:00
Michael S. Tsirkin
4e1beba12d skbuff: skb_segment: s/skb_frag/frag/
skb_frag can in fact point at either skb
or fskb so rename it generally "frag".

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:26:38 -04:00
Michael S. Tsirkin
8cb19905e9 skbuff: skb_segment: s/frag/nskb_frag/
frag points at nskb, so name it appropriately

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:26:38 -04:00
David S. Miller
9d79b3c7aa Merge branch 'stmmac'
Giuseppe Cavallaro says:

====================
stmmac fixes: EEE and chained mode

These patches are to fix some new problems in the STMMAC driver.

Mandatory changes are for EEE that needs to be disabled if not supported
and for the chain mode that is broken and the kernel panics if this mode
is enabled.

v3: removed a patch from my previous set that touched the stmmac_tx path
    that has not to be applied. Other patches for cleaning-up will be
    sent on top of net-next git repo.

v4: do not surround the defaul buffer selection using Koption and adopt
    a default to 1536bytes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:15:29 -04:00
Giuseppe CAVALLARO
c5e9103dc3 stmmac: dwmac-sti: fix broken STiD127 compatibility
This is to fix the compatibility to the STiD127 SoC.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:14:31 -04:00
Giuseppe CAVALLARO
29896a674c stmmac: fix chained mode
This patch is to fix the chain mode that was broken
and generated a panic. This patch reviews the chain/ring
modes now shaing the same structure and taking care
about the pointers and callbacks.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:14:31 -04:00
Giuseppe CAVALLARO
d916701c67 stmmac: fix and better tune the default buffer sizes
This patch is to fix and tune the default buffer sizes.
It reduces the default bufsize used by the driver from
4KiB to 1536 bytes.

Patch has been tested on both ARM and SH4 platform based.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:14:31 -04:00
Giuseppe CAVALLARO
83bf79b6bb stmmac: disable at run-time the EEE if not supported
This patch is to disable the EEE (so HW and timers)
for example when the phy communicates that the EEE
can be supported anymore.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:14:31 -04:00
Neil Horman
d25f06ea46 vmxnet3: fix netpoll race condition
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shreyas Bhatewara <sbhatewara@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: stable@vger.kernel.org
Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:13:55 -04:00
Peter Boström
dd38743b4c vlan: Set correct source MAC address with TX VLAN offload enabled
With TX VLAN offload enabled the source MAC address for frames sent using the
VLAN interface is currently set to the address of the real interface. This is
wrong since the VLAN interface may be configured with a different address.

The bug was introduced in commit 2205369a31
("vlan: Fix header ops passthru when doing TX VLAN offload.").

This patch sets the source address before calling the create function of the
real interface.

Signed-off-by: Peter Boström <peter.bostrom@netrounds.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 22:21:51 -04:00
Annie Li
5bd0767086 Xen-netback: Fix issue caused by using gso_type wrongly
Current netback uses gso_type to check whether the skb contains
gso offload, and this is wrong. Gso_size is the right one to
check gso existence, and gso_type is only used to check gso type.

Some skbs contains nonzero gso_type and zero gso_size, current
netback would treat these skbs as gso and create wrong response
for this. This also causes ssh failure to domu from other server.

V2: use skb_is_gso function as Paul Durrant suggested

Signed-off-by: Annie Li <annie.li@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 21:57:50 -04:00
Eric Dumazet
2818fa0fa0 pkt_sched: fq: do not hold qdisc lock while allocating memory
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 16:17:52 -04:00
Ben Hutchings
bc48bc8064 bna: Replace large udelay() with mdelay()
udelay() does not work on some architectures for values above
2000, in particular on ARM:

ERROR: "__bad_udelay" [drivers/net/ethernet/brocade/bna/bna.ko] undefined!

Reported-by: Vagrant Cascadian <vagrant@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 15:56:59 -04:00
Eric Dumazet
37314363cd pkt_sched: move the sanity test in qdisc_list_add()
The WARN_ON(root == &noop_qdisc)) added in qdisc_list_add()
can trigger in normal conditions when devices are not up.
It should be done only right before the list_add_tail() call.

Fixes: e57a784d8c ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()")
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Tested-by: Mirco Tischler <mt-ml@gmx.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 15:44:21 -04:00
David S. Miller
92f092d16c Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Please pull this batch of fixes intende for the 3.14 stream...

For the mac80211 bits, Johannes says:

"Here I have a fix from Eliad for the minimal channel width calculation
in the mac80211 code which lead to monitor mode not working at all for
drivers using that. One of my fixes is for an issue noticed by Michal,
we clear an already cleared value but do it without locking, so just
remove that. The other is for a data leak - we leak two bytes of kernel
memory out over the air in QoS NULL frames because those don't get a
sequence number assigned in the TX path."

For the iwlwifi bits, Emmanuel says:

"One more fix and an update for device IDs.
There is a bugzilla reported for the fix which is mentioned in the commit message."

Along with those...

Amitkumar Karwar provides two mwifiex fixes, both correcting some
data transcription problems.

Ivaylo Dimitrov uses skb_trim in the wl1251 driver to avoid
HAVE_EFFICIENT_UNALIGNED_ACCESS problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 14:10:13 -04:00
Michael Chan
a8d9bc2e9f bnx2: Fix shutdown sequence
The pci shutdown handler added in:

    bnx2: Add pci shutdown handler
    commit 25bfb1dd4b

created a shutdown down sequence without chip reset if the device was
never brought up.  This can cause the firmware to shutdown the PHY
prematurely and cause MMIO read cycles to be unresponsive.  On some
systems, it may generate NMI in the bnx2's pci shutdown handler.

The fix is to tell the firmware not to shutdown the PHY if there was
no prior chip reset.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-09 19:02:27 -04:00
John W. Linville
97bd5f0054 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-03-07 15:09:32 -05:00
Sabrina Dubroca
c88507fbad ipv6: don't set DST_NOCOUNT for remotely added routes
DST_NOCOUNT should only be used if an authorized user adds routes
locally. In case of routes which are added on behalf of router
advertisments this flag must not get used as it allows an unlimited
number of routes getting added remotely.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:29:27 -05:00
Amir Vadai
97989356af net/mlx4_core: mlx4_init_slave() shouldn't access comm channel before PF is ready
Currently, the PF call to pci_enable_sriov from the PF probe function
stalls for 10 seconds times the number of VFs probed on the host. This
happens because the way for such VFs to determine of the PF
initialization finished, is by attempting to issue reset on the
comm-channel and get timeout (after 10s).

The PF probe function is called from a kenernel workqueue, and therefore
during that time, rcu lock is being held and kernel's workqueue is
stalled. This blocks other processes that try to use the workqueue
or rcu lock.  For example, interface renaming which is calling
rcu_synchronize is blocked, and timedout by systemd.

Changed mlx4_init_slave() to allow VF probed on the host to immediatly
detect that the PF is not ready, and return EPROBE_DEFER instantly.

Only when the PF finishes the initialization, allow such VFs to
access the comm channel.

This issue and fix are relevant only for probed VFs on the hypervisor,
there is no way to pass this information to a VM until comm channel is
ready, so in a VM, if PF is not ready, the first command will be timedout
after 10 seconds and return EPROBE_DEFER.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:09:04 -05:00
Amir Vadai
57352ef4f5 net/mlx4_core: Fix memory access error in mlx4_QUERY_DEV_CAP_wrapper()
Fix a regression introduced by [1]. outbox was accessed instead of
outbox->buf. Typo was copy-pasted to [2] and [3].

[1] - cc1ade9 mlx4_core: Disable memory windows for virtual functions
[2] - 4de6580 mlx4_core: Add support for steerable IB UD QPs
[3] - 7ffdf72 net/mlx4_core: Add basic support for TCP/IP offloads under
      tunneling

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:09:04 -05:00
Sasha Levin
5bd4e4c158 bonding: correctly handle out of range parameters for lp_interval
We didn't correctly check cases where the value for lp_interval is not
within the legal range due to a missing table terminator.

This would let userspace trigger a kernel panic by specifying a value out
of range:

	echo -1 > /sys/devices/virtual/net/bond0/bonding/lp_interval

Introduced by commit 4325b374f8 ("bonding: convert lp_interval to use
the new option API").

Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 17:06:17 -05:00
Anton Nayshtut
d2d273ffab ipv6: Fix exthdrs offload registration.
Without this fix, ipv6_exthdrs_offload_init doesn't register IPPROTO_DSTOPTS
offload, but returns 0 (as the IPPROTO_ROUTING registration actually succeeds).

This then causes the ipv6_gso_segment to drop IPv6 packets with IPPROTO_DSTOPTS
header.

The issue detected and the fix verified by running MS HCK Offload LSO test on
top of QEMU Windows guests, as this test sends IPv6 packets with
IPPROTO_DSTOPTS.

Signed-off-by: Anton Nayshtut <anton@swortex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:35:55 -05:00
Anton Blanchard
d746ca9561 ibmveth: Fix endian issues with MAC addresses
The code to load a MAC address into a u64 for passing to the
hypervisor via a register is broken on little endian.

Create a helper function called ibmveth_encode_mac_addr
which does the right thing in both big and little endian.

We were storing the MAC address in a long in struct ibmveth_adapter.
It's never used so remove it - we don't need another place in the
driver where we create endian issues with MAC addresses.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:26:41 -05:00
Anton Blanchard
0a13404dd3 net: unix socket code abuses csum_partial
The unix socket code is using the result of csum_partial to
hash into a lookup table:

	unix_hash_fold(csum_partial(sunaddr, len, 0));

csum_partial is only guaranteed to produce something that can be
folded into a checksum, as its prototype explains:

 * returns a 32-bit number suitable for feeding into itself
 * or csum_tcpudp_magic

The 32bit value should not be used directly.

Depending on the alignment, the ppc64 csum_partial will return
different 32bit partial checksums that will fold into the same
16bit checksum.

This difference causes the following testcase (courtesy of
Gustavo) to sometimes fail:

#include <sys/socket.h>
#include <stdio.h>

int main()
{
	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

	int i = 1;
	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);

	struct sockaddr addr;
	addr.sa_family = AF_LOCAL;
	bind(fd, &addr, 2);

	listen(fd, 128);

	struct sockaddr_storage ss;
	socklen_t sslen = (socklen_t)sizeof(ss);
	getsockname(fd, (struct sockaddr*)&ss, &sslen);

	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);

	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
		perror(NULL);
		return 1;
	}
	printf("OK\n");
	return 0;
}

As suggested by davem, fix this by using csum_fold to fold the
partial 32bit checksum into a 16bit checksum before using it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:19:33 -05:00
Andrew Lutomirski
adca476782 net: Improve SO_TIMESTAMPING documentation and fix a minor code bug
The original documentation was very unclear.

The code fix is presumably related to the formerly unclear
documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
__sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
set is pointless.  This should have no user-observable effect.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:18:01 -05:00
Bjorn Helgaas
4ae6e50c76 phy: fix compiler array bounds warning on settings[]
With -Werror=array-bounds, gcc v4.7.x warns that in phy_find_valid(), the
settings[] "array subscript is above array bounds", I think because idx is
a signed integer and if the caller supplied idx < 0, we pass the guard but
still reference out of bounds.

Fix this by making idx unsigned here and elsewhere.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 16:07:25 -05:00
Florian Westphal
e588e2f286 inet: frag: make sure forced eviction removes all frags
Quoting Alexander Aring:
  While fragmentation and unloading of 6lowpan module I got this kernel Oops
  after few seconds:

  BUG: unable to handle kernel paging request at f88bbc30
  [..]
  Modules linked in: ipv6 [last unloaded: 6lowpan]
  Call Trace:
   [<c012af4c>] ? call_timer_fn+0x54/0xb3
   [<c012aef8>] ? process_timeout+0xa/0xa
   [<c012b66b>] run_timer_softirq+0x140/0x15f

Problem is that incomplete frags are still around after unload; when
their frag expire timer fires, we get crash.

When a netns is removed (also done when unloading module), inet_frag
calls the evictor with 'force' argument to purge remaining frags.

The evictor loop terminates when accounted memory ('work') drops to 0
or the lru-list becomes empty.  However, the mem accounting is done
via percpu counters and may not be accurate, i.e. loop may terminate
prematurely.

Alter evictor to only stop once the lru list is empty when force is
requested.

Reported-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Alexander Aring <alex.aring@gmail.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 15:28:45 -05:00
David S. Miller
409e145643 Merge branch 'tipc'
Eric Hugne says:

====================
tipc: refcount and memory leak fixes

v3: Remove error logging from data path completely. Rebased on top of
    latest net merge.

v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection
    shutdown callback to be invoked in advance) And add a general error
    message if an internal server tries to send a message on a
    closed/nonexisting connection.

In addition to the fix for refcount leak and memory leak during
module removal, we also fix a problem where the topology server
listening socket where unexpectedly closed. We also eliminate an
unnecessary context switch during accept()/recvmsg() for nonblocking
sockets.

It might be good to include this patchset in stable aswell. After the
v3 rebase on latest merge from net all patches apply cleanly on that
tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:38 -05:00
Erik Hugne
2892505ea1 tipc: don't log disabled tasklet handler errors
Failure to schedule a TIPC tasklet with tipc_k_signal because the
tasklet handler is disabled is not an error. It means TIPC is
currently in the process of shutting down. We remove the error
logging in this case.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:24 -05:00
Erik Hugne
1bb8dce57f tipc: fix memory leak during module removal
When the TIPC module is removed, the tasklet handler is disabled
before all other subsystems. This will cause lingering publications
in the name table because the node_down tasklets responsible to
clean up publications from an unreachable node will never run.
When the name table is shut down, these publications are detected
and an error message is logged:
tipc: nametbl_stop(): orphaned hash chain detected
This is actually a memory leak, introduced with commit
993b858e37 ("tipc: correct the order
of stopping services at rmmod")

Instead of just logging an error and leaking memory, we free
the orphaned entries during nametable shutdown.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:24 -05:00
Erik Hugne
edcc0511b5 tipc: drop subscriber connection id invalidation
When a topology server subscriber is disconnected, the associated
connection id is set to zero. A check vs zero is then done in the
subscription timeout function to see if the subscriber have been
shut down. This is unnecessary, because all subscription timers
will be cancelled when a subscriber terminates. Setting the
connection id to zero is actually harmful because id zero is the
identity of the topology server listening socket, and can cause a
race that leads to this socket being closed instead.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:23 -05:00
Ying Xue
fe8e464939 tipc: avoid to unnecessary process switch under non-block mode
When messages are received via tipc socket under non-block mode,
schedule_timeout() is called in tipc_wait_for_rcvmsg(), that is,
the process of receiving messages will be scheduled once although
timeout value passed to schedule_timeout() is 0. The same issue
exists in accept()/wait_for_accept(). To avoid this unnecessary
process switch, we only call schedule_timeout() if the timeout
value is non-zero.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:23 -05:00
Ying Xue
4652edb70e tipc: fix connection refcount leak
When tipc_conn_sendmsg() calls tipc_conn_lookup() to query a
connection instance, its reference count value is increased if
it's found. But subsequently if it's found that the connection is
closed, the work of sending message is not queued into its server
send workqueue, and the connection reference count is not decreased.
This will cause a reference count leak. To reproduce this problem,
an application would need to open and closes topology server
connections with high intensity.

We fix this by immediately decrementing the connection reference
count if a send fails due to the connection being closed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:23 -05:00
Ying Xue
6d4ebeb4df tipc: allow connection shutdown callback to be invoked in advance
Currently connection shutdown callback function is called when
connection instance is released in tipc_conn_kref_release(), and
receiving packets and sending packets are running in different
threads. Even if connection is closed by the thread of receiving
packets, its shutdown callback may not be called immediately as
the connection reference count is non-zero at that moment. So,
although the connection is shut down by the thread of receiving
packets, the thread of sending packets doesn't know it. Before
its shutdown callback is invoked to tell the sending thread its
connection has been closed, the sending thread may deliver
messages by tipc_conn_sendmsg(), this is why the following error
information appears:

"Sending subscription event failed, no memory"

To eliminate it, allow connection shutdown callback function to
be called before connection id is removed in tipc_close_conn(),
which makes the sending thread know the truth in time that its
socket is closed so that it doesn't send message to it. We also
remove the "Sending XXX failed..." error reporting for topology
and config services.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:46:23 -05:00
Guillaume Nault
9e9cb6221a l2tp: fix userspace reception on plain L2TP sockets
As pppol2tp_recv() never queues up packets to plain L2TP sockets,
pppol2tp_recvmsg() never returns data to userspace, thus making
the recv*() system calls unusable.

Instead of dropping packets when the L2TP socket isn't bound to a PPP
channel, this patch adds them to its reception queue.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06 14:25:39 -05:00