forked from luck/tmp_suning_uos_patched
f441882a52
ARMv6+ processors do not use CONFIG_CPU_USE_DOMAINS and use privileged ldr/str instructions in copy_{from/to}_user. They are currently unnecessarily using single ldr/str instructions and can use ldm/stm instructions instead like memcpy does (but with appropriate fixup tables). This speeds up a "dd if=foo of=bar bs=32k" on a tmpfs filesystem by about 4% on my Cortex-A9. before:134217728 bytes (128.0MB) copied, 0.543848 seconds, 235.4MB/s before:134217728 bytes (128.0MB) copied, 0.538610 seconds, 237.6MB/s before:134217728 bytes (128.0MB) copied, 0.544356 seconds, 235.1MB/s before:134217728 bytes (128.0MB) copied, 0.544364 seconds, 235.1MB/s before:134217728 bytes (128.0MB) copied, 0.537130 seconds, 238.3MB/s before:134217728 bytes (128.0MB) copied, 0.533443 seconds, 240.0MB/s before:134217728 bytes (128.0MB) copied, 0.545691 seconds, 234.6MB/s before:134217728 bytes (128.0MB) copied, 0.534695 seconds, 239.4MB/s before:134217728 bytes (128.0MB) copied, 0.540561 seconds, 236.8MB/s before:134217728 bytes (128.0MB) copied, 0.541025 seconds, 236.6MB/s after:134217728 bytes (128.0MB) copied, 0.520445 seconds, 245.9MB/s after:134217728 bytes (128.0MB) copied, 0.527846 seconds, 242.5MB/s after:134217728 bytes (128.0MB) copied, 0.519510 seconds, 246.4MB/s after:134217728 bytes (128.0MB) copied, 0.527231 seconds, 242.8MB/s after:134217728 bytes (128.0MB) copied, 0.525030 seconds, 243.8MB/s after:134217728 bytes (128.0MB) copied, 0.524236 seconds, 244.2MB/s after:134217728 bytes (128.0MB) copied, 0.523659 seconds, 244.4MB/s after:134217728 bytes (128.0MB) copied, 0.525018 seconds, 243.8MB/s after:134217728 bytes (128.0MB) copied, 0.519249 seconds, 246.5MB/s after:134217728 bytes (128.0MB) copied, 0.518527 seconds, 246.9MB/s Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> |
||
---|---|---|
.. | ||
ashldi3.S | ||
ashrdi3.S | ||
backtrace.S | ||
bitops.h | ||
bswapsdi2.S | ||
call_with_stack.S | ||
changebit.S | ||
clear_user.S | ||
clearbit.S | ||
copy_from_user.S | ||
copy_page.S | ||
copy_template.S | ||
copy_to_user.S | ||
csumipv6.S | ||
csumpartial.S | ||
csumpartialcopy.S | ||
csumpartialcopygeneric.S | ||
csumpartialcopyuser.S | ||
delay-loop.S | ||
delay.c | ||
div64.S | ||
ecard.S | ||
findbit.S | ||
floppydma.S | ||
getuser.S | ||
io-acorn.S | ||
io-readsb.S | ||
io-readsl.S | ||
io-readsw-armv3.S | ||
io-readsw-armv4.S | ||
io-writesb.S | ||
io-writesl.S | ||
io-writesw-armv3.S | ||
io-writesw-armv4.S | ||
lib1funcs.S | ||
lshrdi3.S | ||
Makefile | ||
memchr.S | ||
memcpy.S | ||
memmove.S | ||
memset.S | ||
muldi3.S | ||
putuser.S | ||
setbit.S | ||
strchr.S | ||
strrchr.S | ||
testchangebit.S | ||
testclearbit.S | ||
testsetbit.S | ||
uaccess_with_memcpy.c | ||
ucmpdi2.S | ||
xor-neon.c |