When building using clang, always specify -EB or -EL in order to ensure
we target the desired endianness.
Since clang cross compiles using a single compiler build with multiple
targets, our -dumpmachine tests which don't specify clang's --target
argument check output based upon the build machine rather than the
machine our build will target. This means our detection of whether to
specify -EB fails miserably & we never do. Providing the endianness flag
unconditionally for clang resolves this issue & simplifies the clang
path somewhat.
Signed-off-by: Paul Burton <paul.burton@mips.com>
llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.
Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.
Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].
In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.
When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.
[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503] dump_stack+0xf1/0x17b
[40851.451331] ? show_regs_print_info+0x5/0x5
[40851.503555] ubsan_epilogue+0x9/0x7c
[40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796] ? xfrm4_output_finish+0x80/0x80
[40851.739827] ? lock_downgrade+0x6d0/0x6d0
[40851.789744] ? xfrm4_prepare_output+0x160/0x160
[40851.845912] ? ip_queue_xmit+0x810/0x1db0
[40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833] ? sched_clock+0x5/0x10
[40852.298508] ? sched_clock+0x5/0x10
[40852.342194] ? inet_create+0xdf0/0xdf0
[40852.388988] sock_sendmsg+0xd9/0x160
...
Fixes: 113ced1f52 ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 32 bit the kernel sections are not huge-page aligned. When we clone
them on PMD-level we unevitably map some areas that are normal kernel
memory and may contain secrets to user-space. To prevent that we need to
clone the kernel-image on PTE-level for 32 bit.
Also make the page-table cloning code more general so that it can handle
PMD and PTE level cloning. This can be generalized further in the future to
also handle clones on the P4D-level.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-4-git-send-email-joro@8bytes.org
The function sets the global-bit on cloned PMD entries, which only makes
sense when the permissions are identical between the user and the kernel
page-table. Further, only write-permissions are cleared for entry-text and
kernel-text sections, which are not writeable at the end of the boot
process.
The reason why this RW clearing exists is that in the early PTI
implementations the cloned kernel areas were set up during early boot
before the kernel text is set to read only and not touched afterwards.
This is not longer true. The cloned areas are still set up early to get the
entry code working for interrupts and other things, but after the kernel
text has been set RO the clone is repeated which copies the RO PMD/PTEs
over to the user visible clone. That means the initial clearing of the
writable bit can be avoided.
[ tglx: Amended changelog ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-3-git-send-email-joro@8bytes.org
I am currently running a large bare metal instance (i3.metal)
on EC2 with 72 cores, 512GB of RAM and NVME drives, with a
4.18 kernel. I have a workload that simulates a database
workload and I am running into lockup issues when writeback
throttling is enabled,with the hung task detector also
kicking in.
Crash dumps show that most CPUs (up to 50 of them) are
all trying to get the wbt wait queue lock while trying to add
themselves to it in __wbt_wait (see stack traces below).
[ 0.948118] CPU: 45 PID: 0 Comm: swapper/45 Not tainted 4.14.51-62.38.amzn1.x86_64 #1
[ 0.948119] Hardware name: Amazon EC2 i3.metal/Not Specified, BIOS 1.0 10/16/2017
[ 0.948120] task: ffff883f7878c000 task.stack: ffffc9000c69c000
[ 0.948124] RIP: 0010:native_queued_spin_lock_slowpath+0xf8/0x1a0
[ 0.948125] RSP: 0018:ffff883f7fcc3dc8 EFLAGS: 00000046
[ 0.948126] RAX: 0000000000000000 RBX: ffff887f7709ca68 RCX: ffff883f7fce2a00
[ 0.948128] RDX: 000000000000001c RSI: 0000000000740001 RDI: ffff887f7709ca68
[ 0.948129] RBP: 0000000000000002 R08: 0000000000b80000 R09: 0000000000000000
[ 0.948130] R10: ffff883f7fcc3d78 R11: 000000000de27121 R12: 0000000000000002
[ 0.948131] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[ 0.948132] FS: 0000000000000000(0000) GS:ffff883f7fcc0000(0000) knlGS:0000000000000000
[ 0.948134] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.948135] CR2: 000000c424c77000 CR3: 0000000002010005 CR4: 00000000003606e0
[ 0.948136] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.948137] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.948138] Call Trace:
[ 0.948139] <IRQ>
[ 0.948142] do_raw_spin_lock+0xad/0xc0
[ 0.948145] _raw_spin_lock_irqsave+0x44/0x4b
[ 0.948149] ? __wake_up_common_lock+0x53/0x90
[ 0.948150] __wake_up_common_lock+0x53/0x90
[ 0.948155] wbt_done+0x7b/0xa0
[ 0.948158] blk_mq_free_request+0xb7/0x110
[ 0.948161] __blk_mq_complete_request+0xcb/0x140
[ 0.948166] nvme_process_cq+0xce/0x1a0 [nvme]
[ 0.948169] nvme_irq+0x23/0x50 [nvme]
[ 0.948173] __handle_irq_event_percpu+0x46/0x300
[ 0.948176] handle_irq_event_percpu+0x20/0x50
[ 0.948179] handle_irq_event+0x34/0x60
[ 0.948181] handle_edge_irq+0x77/0x190
[ 0.948185] handle_irq+0xaf/0x120
[ 0.948188] do_IRQ+0x53/0x110
[ 0.948191] common_interrupt+0x87/0x87
[ 0.948192] </IRQ>
....
[ 0.311136] CPU: 4 PID: 9737 Comm: run_linux_amd64 Not tainted 4.14.51-62.38.amzn1.x86_64 #1
[ 0.311137] Hardware name: Amazon EC2 i3.metal/Not Specified, BIOS 1.0 10/16/2017
[ 0.311138] task: ffff883f6e6a8000 task.stack: ffffc9000f1ec000
[ 0.311141] RIP: 0010:native_queued_spin_lock_slowpath+0xf5/0x1a0
[ 0.311142] RSP: 0018:ffffc9000f1efa28 EFLAGS: 00000046
[ 0.311144] RAX: 0000000000000000 RBX: ffff887f7709ca68 RCX: ffff883f7f722a00
[ 0.311145] RDX: 0000000000000035 RSI: 0000000000d80001 RDI: ffff887f7709ca68
[ 0.311146] RBP: 0000000000000202 R08: 0000000000140000 R09: 0000000000000000
[ 0.311147] R10: ffffc9000f1ef9d8 R11: 000000001a249fa0 R12: ffff887f7709ca68
[ 0.311148] R13: ffffc9000f1efad0 R14: 0000000000000000 R15: ffff887f7709ca00
[ 0.311149] FS: 000000c423f30090(0000) GS:ffff883f7f700000(0000) knlGS:0000000000000000
[ 0.311150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.311151] CR2: 00007feefcea4000 CR3: 0000007f7016e001 CR4: 00000000003606e0
[ 0.311152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.311153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.311154] Call Trace:
[ 0.311157] do_raw_spin_lock+0xad/0xc0
[ 0.311160] _raw_spin_lock_irqsave+0x44/0x4b
[ 0.311162] ? prepare_to_wait_exclusive+0x28/0xb0
[ 0.311164] prepare_to_wait_exclusive+0x28/0xb0
[ 0.311167] wbt_wait+0x127/0x330
[ 0.311169] ? finish_wait+0x80/0x80
[ 0.311172] ? generic_make_request+0xda/0x3b0
[ 0.311174] blk_mq_make_request+0xd6/0x7b0
[ 0.311176] ? blk_queue_enter+0x24/0x260
[ 0.311178] ? generic_make_request+0xda/0x3b0
[ 0.311181] generic_make_request+0x10c/0x3b0
[ 0.311183] ? submit_bio+0x5c/0x110
[ 0.311185] submit_bio+0x5c/0x110
[ 0.311197] ? __ext4_journal_stop+0x36/0xa0 [ext4]
[ 0.311210] ext4_io_submit+0x48/0x60 [ext4]
[ 0.311222] ext4_writepages+0x810/0x11f0 [ext4]
[ 0.311229] ? do_writepages+0x3c/0xd0
[ 0.311239] ? ext4_mark_inode_dirty+0x260/0x260 [ext4]
[ 0.311240] do_writepages+0x3c/0xd0
[ 0.311243] ? _raw_spin_unlock+0x24/0x30
[ 0.311245] ? wbc_attach_and_unlock_inode+0x165/0x280
[ 0.311248] ? __filemap_fdatawrite_range+0xa3/0xe0
[ 0.311250] __filemap_fdatawrite_range+0xa3/0xe0
[ 0.311253] file_write_and_wait_range+0x34/0x90
[ 0.311264] ext4_sync_file+0x151/0x500 [ext4]
[ 0.311267] do_fsync+0x38/0x60
[ 0.311270] SyS_fsync+0xc/0x10
[ 0.311272] do_syscall_64+0x6f/0x170
[ 0.311274] entry_SYSCALL_64_after_hwframe+0x42/0xb7
In the original patch, wbt_done is waking up all the exclusive
processes in the wait queue, which can cause a thundering herd
if there is a large number of writer threads in the queue. The
original intention of the code seems to be to wake up one thread
only however, it uses wake_up_all() in __wbt_done(), and then
uses the following check in __wbt_wait to have only one thread
actually get out of the wait loop:
if (waitqueue_active(&rqw->wait) &&
rqw->wait.head.next != &wait->entry)
return false;
The problem with this is that the wait entry in wbt_wait is
define with DEFINE_WAIT, which uses the autoremove wakeup function.
That means that the above check is invalid - the wait entry will
have been removed from the queue already by the time we hit the
check in the loop.
Secondly, auto-removing the wait entries also means that the wait
queue essentially gets reordered "randomly" (e.g. threads re-add
themselves in the order they got to run after being woken up).
Additionally, new requests entering wbt_wait might overtake requests
that were queued earlier, because the wait queue will be
(temporarily) empty after the wake_up_all, so the waitqueue_active
check will not stop them. This can cause certain threads to starve
under high load.
The fix is to leave the woken up requests in the queue and remove
them in finish_wait() once the current thread breaks out of the
wait loop in __wbt_wait. This will ensure new requests always
end up at the back of the queue, and they won't overtake requests
that are already in the wait queue. With that change, the loop
in wbt_wait is also in line with many other wait loops in the kernel.
Waking up just one thread drastically reduces lock contention, as
does moving the wait queue add/remove out of the loop.
A significant drop in lockdep's lock contention numbers is seen when
running the test application on the patched kernel.
Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 9faa89d4ed ("tipc: make function tipc_net_finalize() thread
safe") tries to make it thread safe to set node address, so it uses
node_list_lock lock to serialize the whole process of setting node
address in tipc_net_finalize(). But it causes the following interrupt
unsafe locking scenario:
CPU0 CPU1
---- ----
rht_deferred_worker()
rhashtable_rehash_table()
lock(&(&ht->lock)->rlock)
tipc_nl_compat_doit()
tipc_net_finalize()
local_irq_disable();
lock(&(&tn->node_list_lock)->rlock);
tipc_sk_reinit()
rhashtable_walk_enter()
lock(&(&ht->lock)->rlock);
<Interrupt>
tipc_disc_rcv()
tipc_node_check_dest()
tipc_node_create()
lock(&(&tn->node_list_lock)->rlock);
*** DEADLOCK ***
When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
disable BH. So if an interrupt happens after the lock, it can create
an inverse lock ordering between ht->lock and tn->node_list_lock. As
a consequence, deadlock might happen.
The reason causing the inverse lock ordering scenario above is because
the initial purpose of node_list_lock is not designed to do the
serialization of node address setting.
As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
we use it to replace node_list_lock to ensure setting node address can
be atomically finished. It turns out the potential deadlock can be
avoided as well.
Fixes: 9faa89d4ed ("tipc: make function tipc_net_finalize() thread safe")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.
The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.
Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.
Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.
PRE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
POST:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
NATIVE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():
ODEBUG: init active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
Cc: Andy king <acking@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The allocation of lmac->dmacs is not being checked for allocation
failure. Add the check.
Fixes: 3a34ecfd9d ("net: thunderx: add MAC address filter tracking for LMAC")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
sets fs.val.{l,f}ip to net-endian values without conversion - they come
straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp. So
the assignment in mk_act_open_req() ought to be a straight copy.
As far as I know, T4 PCIe cards do exist, so it's not as if that
thing could only be found on little-endian systems...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_CIFS_STATS2 is enabled keep counters for slow
commands (ie server took longer than 1 second to respond)
by SMB2/SMB3 command code. This can help in diagnosing
whether performance problems are on server (instead of
client) and which commands are causing the problem.
Sample output (the new lines contain words "slow responses ...")
$ cat /proc/fs/cifs/Stats
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 10 Small 490 Allocations
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 67 maximum at one time: 2
4 slow responses from localhost for command 5
1 slow responses from localhost for command 6
1 slow responses from localhost for command 14
1 slow responses from localhost for command 16
1) \\localhost\test
SMBs: 243
Bytes read: 1024000 Bytes written: 104857600
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 40 total 0 failed
Closes: 39 total 0 failed
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
server->secmech.sdeschmacsha256 is not properly initialized before
smb2_shash_allocate(), set shash after that call.
also fix typo in error message
Fixes: 8de8c4608f ("cifs: Fix validation of signed data in smb2")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
An earlier commit had a typo which prevented the
optimization from working:
commit 18dd8e1a65 ("Do not send SMB3 SET_INFO request if nothing is changing")
Thank you to Metze for noticing this. Also clear a
reserved field in the FILE_BASIC_INFO struct we send
that should be zero (all the other fields in that
struct were set or cleared explicitly already in
cifs_set_file_info).
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.9.x+
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
CONFIG_CIFS_STATS is now always enabled (to simplify the
code and since the STATS are important for some common
customer use cases and also debugging), but needed one
minor change so that STATS shows as enabled in the debug
output in /proc/fs/cifs/DebugData, otherwise it could
get confusing with STATS no longer showing up in the
"Features" list in /proc/fs/cifs/DebugData when basic
stats were in fact available.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
If responses take longer than one second from the server,
we can optionally log them to dmesg in current cifs.ko code
(CONFIG_CIFS_STATS2 must be configured and a
/proc/fs/cifs/cifsFYI flag must be set), but can be more useful
to log these via ftrace (tracepoint is smb3_slow_rsp) which
is easier and more granular (still requires CONFIG_CIFS_STATS2
to be configured in the build though).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
These are used for SMB3 encryption and compounded requests.
Update these functions and the other functions related to SMB3 encryption to
take an array of requests.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
echo 0 > /proc/fs/cifs/Stats is supposed to reset the stats
but there were four (see example below) that were not reset
(bytes read and witten, total vfs ops and max ops
at one time).
...
0 session 0 share reconnects
Total vfs operations: 100 maximum at one time: 2
1) \\localhost\test
SMBs: 0
Bytes read: 502092 Bytes written: 31457286
TreeConnects: 0 total 0 failed
TreeDisconnects: 0 total 0 failed
...
This patch fixes cifs_stats_proc_write to properly reset
those four.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
We were only displaying bytes_read and bytes_written in cifs
stats, fix smb3 stats to also display them. Sample output
with this patch:
cat /proc/fs/cifs/Stats:
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 94 maximum at one time: 2
1) \\localhost\test
SMBs: 214
Bytes read: 502092 Bytes written: 31457286
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 52 total 3 failed
Closes: 48 total 0 failed
Flushes: 0 total 0 failed
Reads: 17 total 0 failed
Writes: 31 total 0 failed
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CONFIG_CIFS_STATS should always be enabled as Pavel recently
noted. Simple statistics are not a significant performance hit,
and removing the ifdef simplifies the code slightly.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: <stable@vger.kernel.org>
Add tracepoints for reconnecting an smb3 session
Example output (from trace-cmd) with the patch
(showing the session marked for reconnect, the stat failing, and then
the subsequent SMB3 commands after the server comes back up).
The "smb3_reconnect" event is the new one.
cifsd-25993 [000] .... 29635.368265: smb3_reconnect: server=localhost current_mid=0x1e
stat-26200 [001] .... 29638.516403: smb3_enter: cifs_revalidate_dentry_attr: xid=22
stat-26200 [001] .... 29648.723296: smb3_exit_err: cifs_revalidate_dentry_attr: xid=22 rc=-112
kworker/0:1-22830 [000] .... 29653.850947: smb3_cmd_done: sid=0x0 tid=0x0 cmd=0 mid=0
kworker/0:1-22830 [000] .... 29653.851191: smb3_cmd_err: sid=0x8ae4683c tid=0x0 cmd=1 mid=1 status=0xc0000016 rc=-5
kworker/0:1-22830 [000] .... 29653.855254: smb3_cmd_done: sid=0x8ae4683c tid=0x0 cmd=1 mid=2
kworker/0:1-22830 [000] .... 29653.855482: smb3_cmd_done: sid=0x8ae4683c tid=0x8084f30d cmd=3 mid=3
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
In debugging reconnection problems, want to be able to more easily
trace cases in which the server has marked the SMB3 session
expired or deleted (to distinguish from timeout cases).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
These timers were a good idea but weren't used in current code,
and the idea was cifs specific. Future patch will add similar timers
for SMB2/SMB3, but no sense using memory for cifs timers that
aren't used in current code.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Fixes problem pointed out by Pavel in discussions about commit
729c0c9dd5
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org> # 3.18.x+
Remove counters from the per-tree connection /proc/fs/cifs/Stats
output that will always be zero (since they are not per-tcon ops)
ie SMB3 Negotiate, SessionSetup, Logoff, Echo, Cancel.
Also clarify "sent" to be "total" per-Pavel's suggestion
(since this "total" includes total for all operations that we try to
send whether or not succesffully sent). Sample output below:
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
1 session 2 share reconnects
Total vfs operations: 23 maximum at one time: 2
1) \\localhost\test
SMBs: 45
TreeConnects: 2 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 13 total 2 failed
Closes: 9 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 1 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 3 total 1 failed
QueryDirectories: 4 total 2 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 10 total 0 failed
SetInfos: 3 total 0 failed
OplockBreaks: 0 sent 0 failed
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
For SMB2/SMB3 the number of requests sent was not displayed
in /proc/fs/cifs/Stats unless CONFIG_CIFS_STATS2 was
enabled (only number of failed requests displayed). As
with earlier dialects, we should be displaying these
counters if CONFIG_CIFS_STATS is enabled. They
are important for debugging.
e.g. when you cat /proc/fs/cifs/Stats (before the patch)
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 690 maximum at one time: 2
1) \\localhost\test
SMBs: 975
Negotiates: 0 sent 0 failed
SessionSetups: 0 sent 0 failed
Logoffs: 0 sent 0 failed
TreeConnects: 0 sent 0 failed
TreeDisconnects: 0 sent 0 failed
Creates: 0 sent 2 failed
Closes: 0 sent 0 failed
Flushes: 0 sent 0 failed
Reads: 0 sent 0 failed
Writes: 0 sent 0 failed
Locks: 0 sent 0 failed
IOCTLs: 0 sent 1 failed
Cancels: 0 sent 0 failed
Echos: 0 sent 0 failed
QueryDirectories: 0 sent 63 failed
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
snapshot mounts were not marked as read-only and did not display the snapshot
time (in /proc/mounts) specified on mount
With this patch - note that can not write to the snapshot mount (see "ro" in
/proc/mounts line) and also the missing snapshot timewarp token time is
dumped. Sample line from /proc/mounts with the patch:
//127.0.0.1/scratch /mnt2 smb3 ro,relatime,vers=default,cache=strict,username=testuser,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=127.0.0.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,noperm,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=1234567,actimeo=1 0 0
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Some servers, like Samba, don't support the fsctl for
query_network_interface_info so don't log a noisy warning
message on mount for this by default unless the error is more serious.
Lower the error to an FYI level so it does not get logged by
default.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We really, really want to be encouraging use of secure dialects,
and SMB3.1.1 offers useful security features, and will soon
be the recommended dialect for many use cases. Simplify the code
by removing the CONFIG_CIFS_SMB311 ifdef so users don't disable
it in the build, and create compatibility and/or security issues
with modern servers - many of which have been supporting this
dialect for multiple years.
Also clarify some of the Kconfig text for cifs.ko about
SMB3.1.1 and current supported features in the module.
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
/proc/fs/cifs/DebugData displays the features (Kconfig options)
used to build cifs.ko but it was missing some, and needed comma
separator. These can be useful in debugging certain problems
so we know which optional features were enabled in the user's build.
Also clarify them, by making them more closely match the
corresponding CONFIG_CIFS_* parm.
Old format:
Features: dfs fscache posix spnego xattr acl
New format:
Features: DFS,FSCACHE,SMB_DIRECT,STATS,DEBUG2,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
CC: Stable <stable@vger.kernel.org>
Following up on a suggestion by Matthew Wilcox ...
The cifs CHANGES documentation file is out of date, and more
current information is in the wiki. Delete the old version
information that is of little use to make this documentation
file more readable.
CC: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Output now matches expected stat -f output for all fields
except for Namelen and ID which were addressed in a companion
patch (which retrieves them from existing SMB3 mechanisms
and works whether POSIX enabled or not)
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Fil in the correct namelen (typically 255 not 4096) in the
statfs response and also fill in a reasonably unique fsid
(in this case taken from the volume id, and the creation time
of the volume).
In the case of the POSIX statfs all fields are now filled in,
and in the case of non-POSIX mounts, all fields are filled
in which can be.
Signed-off-by: Steve French <stfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Check if every data page is signed correctly in sigining helper.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
also fixes error code in smb311_posix_mkdir() (where
the error assignment needs to go before the goto)
a typo that Dan Carpenter and Paulo and Gustavo
pointed out.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
config for the build of cifs.ko if want to always prevent mounting
with these less secure dialects.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
If user specifies "posix" on an SMB3.11 mount, then fail the mount
if server does not return the POSIX negotiate context indicating
support for posix.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
In the fscache, we just need the timestamps as cookies to check for
changes, so we don't really care about the overflow, but it's better
to stop using the deprecated timespec so we don't have to go through
explicit conversion functions.
To avoid comparing uninitialized padding values that are copied
while assigning the timespec values, this rearranges the members of
cifs_fscache_inode_auxdata to avoid padding, and assigns them
individually.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
In cifs, the timestamps are stored in memory in the cifs_fattr structure,
which uses the deprecated 'timespec' structure. Now that the VFS code
has moved on to 'timespec64', the next step is to change over the fattr
as well.
This also makes 32-bit and 64-bit systems behave the same way, and
no longer overflow the 32-bit time_t in year 2038.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is not really a runtime issue but Smatch complains that:
fs/cifs/smb2ops.c:1740 smb2_query_symlink()
error: uninitialized symbol 'resp_buftype'.
The warning is right that it can be uninitialized... Also "err_buf"
would be NULL at this point and we're not supposed to pass NULLs to
free_rsp_buf() or it might trigger some extra output if we turn on
debugging.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Y Soft is headquartered in the Czech Republic and it is a worldwide
provider of enterprise office solutions for print management.
Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
Signed-off-by: Rob Herring <robh@kernel.org>
When the OF code was originally made common by Grant in commit
51975db0b7 ("of/flattree: merge early_init_dt_scan_memory() common
code") (Feb 2010), the common code inherited a hack to handle
PPC "longtrail" machines, which had a "memory@0" node with no
device_type.
That check was then made to only apply to PPC32 in b44aa25d20 ("of:
Handle memory@0 node on PPC32 only") (May 2014).
But according to Paul Mackerras the "longtrail" machines are long
dead, if they were ever seen in the wild at all. If someone does still
have one, we can handle this firmware wart in powerpc platform code.
So remove the hack once and for all.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rob Herring <robh@kernel.org>
Colin Ian King reports that commit 82ff27bc52 ("xfs: automatic dfops
buffer relogging") leaves around some dead error handling code in
xfs_dquot_disk_alloc(). This was discovered via Coverity scan.
Since the associated commit eliminates the act of joining a buffer
to a dfops, this intermediate error state is no longer possible and
the error handling code can be removed. Since the caller cancels the
transaction on error, which cancels the dfops, eliminate the
unnecessary xfs_defer_cancel() call and error handling labels.
Fixes: 82ff27bc52 ("xfs: automatic dfops buffer relogging")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This adds ordering of the updates and makes sure we always see the if_seq
update before the extent tree is modified.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The code in __write_64bit_c0_split() is used by MIPS32 kernels running
on MIPS64 CPUs to write a 64-bit value to a 64-bit coprocessor 0
register using a single 64-bit dmtc0 instruction. It does this by
combining the 2x 32-bit registers used to hold the 64-bit value into a
single register, which in the existing code involves three steps:
1) Zero extend register A which holds bits 31:0 of our data, since it
may have previously held a sign-extended value.
2) Shift register B which holds bits 63:32 of our data in bits 31:0
left by 32 bits, such that the bits forming our data are in the
position they'll be in the final 64-bit value & bits 31:0 of the
register are zero.
3) Or the two registers together to form the 64-bit value in one
64-bit register.
From MIPS r2 onwards we have a dins instruction which can effectively
perform all 3 of those steps using a single instruction.
Add a path for MIPS r2 & beyond which uses dins to take bits 31:0 from
register B & insert them into bits 63:32 of register A, giving us our
full 64-bit value in register A with one instruction.
Since we know that MIPS r2 & above support the sel field for the dmtc0
instruction, we don't bother special casing sel==0. Omiting the sel
field would assemble to exactly the same instruction as when we
explicitly specify that it equals zero.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Commit c22c804310 ("MIPS: Fix input modify in
__write_64bit_c0_split()") modified __write_64bit_c0_split() constraints
such that we have both an input & an output which we hope to assign to
the same registers, and modify the output rather than incorrectly
clobbering an input.
The way in which we use both an output & an input parameter with the
input constrained to share the output registers is a little convoluted &
also problematic for clang, which complains if the input & output values
have different widths. For example:
In file included from kernel/fork.c:98:
./arch/mips/include/asm/mmu_context.h:149:19: error: unsupported
inline asm: input with type 'unsigned long' matching output with
type 'unsigned long long'
write_c0_entryhi(cpu_asid(cpu, next));
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/mmu_context.h:93:2: note: expanded from macro
'cpu_asid'
(cpu_context((cpu), (mm)) & cpu_asid_mask(&cpu_data[cpu]))
^
./arch/mips/include/asm/mipsregs.h:1617:65: note: expanded from macro
'write_c0_entryhi'
#define write_c0_entryhi(val) __write_ulong_c0_register($10, 0, val)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
./arch/mips/include/asm/mipsregs.h:1430:39: note: expanded from macro
'__write_ulong_c0_register'
__write_64bit_c0_register(reg, sel, val); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
./arch/mips/include/asm/mipsregs.h:1400:41: note: expanded from macro
'__write_64bit_c0_register'
__write_64bit_c0_split(register, sel, value); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
./arch/mips/include/asm/mipsregs.h:1498:13: note: expanded from macro
'__write_64bit_c0_split'
: "r,0" (val)); \
^~~
We can both fix this build failure & simplify the code somewhat by
assigning the __tmp variable with the input value in C prior to our
inline assembly, and then using a single read-write output operand (ie.
a constraint beginning with +) to provide this value to our assembly.
Signed-off-by: Paul Burton <paul.burton@mips.com>