kernel_optimize_test/arch/arm/lib
Vincent Whitchurch f441882a52 ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS
ARMv6+ processors do not use CONFIG_CPU_USE_DOMAINS and use privileged
ldr/str instructions in copy_{from/to}_user.  They are currently
unnecessarily using single ldr/str instructions and can use ldm/stm
instructions instead like memcpy does (but with appropriate fixup
tables).

This speeds up a "dd if=foo of=bar bs=32k" on a tmpfs filesystem by
about 4% on my Cortex-A9.

before:134217728 bytes (128.0MB) copied, 0.543848 seconds, 235.4MB/s
before:134217728 bytes (128.0MB) copied, 0.538610 seconds, 237.6MB/s
before:134217728 bytes (128.0MB) copied, 0.544356 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.544364 seconds, 235.1MB/s
before:134217728 bytes (128.0MB) copied, 0.537130 seconds, 238.3MB/s
before:134217728 bytes (128.0MB) copied, 0.533443 seconds, 240.0MB/s
before:134217728 bytes (128.0MB) copied, 0.545691 seconds, 234.6MB/s
before:134217728 bytes (128.0MB) copied, 0.534695 seconds, 239.4MB/s
before:134217728 bytes (128.0MB) copied, 0.540561 seconds, 236.8MB/s
before:134217728 bytes (128.0MB) copied, 0.541025 seconds, 236.6MB/s

 after:134217728 bytes (128.0MB) copied, 0.520445 seconds, 245.9MB/s
 after:134217728 bytes (128.0MB) copied, 0.527846 seconds, 242.5MB/s
 after:134217728 bytes (128.0MB) copied, 0.519510 seconds, 246.4MB/s
 after:134217728 bytes (128.0MB) copied, 0.527231 seconds, 242.8MB/s
 after:134217728 bytes (128.0MB) copied, 0.525030 seconds, 243.8MB/s
 after:134217728 bytes (128.0MB) copied, 0.524236 seconds, 244.2MB/s
 after:134217728 bytes (128.0MB) copied, 0.523659 seconds, 244.4MB/s
 after:134217728 bytes (128.0MB) copied, 0.525018 seconds, 243.8MB/s
 after:134217728 bytes (128.0MB) copied, 0.519249 seconds, 246.5MB/s
 after:134217728 bytes (128.0MB) copied, 0.518527 seconds, 246.9MB/s

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
2018-11-12 10:51:59 +00:00
..
ashldi3.S
ashrdi3.S
backtrace.S
bitops.h
bswapsdi2.S
call_with_stack.S
changebit.S
clear_user.S
clearbit.S
copy_from_user.S ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS 2018-11-12 10:51:59 +00:00
copy_page.S
copy_template.S
copy_to_user.S ARM: 8812/1: Optimise copy_{from/to}_user for !CPU_USE_DOMAINS 2018-11-12 10:51:59 +00:00
csumipv6.S
csumpartial.S
csumpartialcopy.S
csumpartialcopygeneric.S
csumpartialcopyuser.S
delay-loop.S
delay.c
div64.S
ecard.S
findbit.S
floppydma.S
getuser.S
io-acorn.S
io-readsb.S
io-readsl.S
io-readsw-armv3.S
io-readsw-armv4.S
io-writesb.S
io-writesl.S
io-writesw-armv3.S
io-writesw-armv4.S
lib1funcs.S
lshrdi3.S
Makefile
memchr.S
memcpy.S
memmove.S
memset.S
muldi3.S
putuser.S
setbit.S
strchr.S
strrchr.S
testchangebit.S
testclearbit.S
testsetbit.S
uaccess_with_memcpy.c ARM: 8797/1: spectre-v1.1: harden __copy_to_user 2018-10-05 10:51:15 +01:00
ucmpdi2.S
xor-neon.c