kernel_optimize_test/crypto
Arnd Bergmann 7d6e910502 crypto: improve gcc optimization flags for serpent and wp512
An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11 17:52:26 +08:00
..
asymmetric_keys Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-12-15 11:41:37 -08:00
async_tx async_pq_val: fix DMA memory leak 2016-10-05 06:18:09 +05:30
.gitignore
842.c crypto: acomp - add support for 842 via scomp 2016-10-25 11:08:33 +08:00
ablk_helper.c crypto: ablk_helper - Fix cryptd reordering 2016-06-23 18:29:53 +08:00
ablkcipher.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
acompress.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
aead.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
aes_generic.c crypto: aes-generic - drop alignment requirement 2017-02-11 17:50:43 +08:00
aes_ti.c crypto: aes - add generic time invariant AES cipher 2017-02-11 17:50:43 +08:00
af_alg.c
ahash.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
akcipher.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
algapi.c crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg 2017-01-23 22:41:32 +08:00
algboss.c crypto: testmgr - Do not test internal algorithms 2016-11-28 21:23:20 +08:00
algif_aead.c crypto: algif_aead - Fix kernel panic on list_del 2017-02-03 17:45:48 +08:00
algif_hash.c crypto: algif_hash - avoid zero-sized array 2016-12-27 17:50:52 +08:00
algif_rng.c
algif_skcipher.c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-12-14 13:31:29 -08:00
ansi_cprng.c
anubis.c
api.c crypto: api - Do not clear type bits in crypto_larval_lookup 2016-11-28 21:23:18 +08:00
arc4.c
authenc.c crypto: skcipher - Get rid of crypto_spawn_skcipher2() 2016-11-01 08:37:17 +08:00
authencesn.c crypto: skcipher - Get rid of crypto_spawn_skcipher2() 2016-11-01 08:37:17 +08:00
blkcipher.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
blowfish_common.c
blowfish_generic.c
camellia_generic.c
cast_common.c
cast5_generic.c
cast6_generic.c
cbc.c crypto: cbc - Export CBC implementation 2016-11-28 21:23:21 +08:00
ccm.c crypto: ccm - switch to separate cbcmac driver 2017-02-11 17:50:45 +08:00
chacha20_generic.c crypto: chacha20 - convert generic and x86 versions to skcipher 2016-12-27 17:47:31 +08:00
chacha20poly1305.c crypto: skcipher - Get rid of crypto_spawn_skcipher2() 2016-11-01 08:37:17 +08:00
cipher.c crypto: api - Remove no-op exit_ops code 2016-10-21 11:03:42 +08:00
cmac.c crypto: cmac - fix alignment of 'consts' 2016-10-21 11:03:42 +08:00
compress.c crypto: api - Remove no-op exit_ops code 2016-10-21 11:03:42 +08:00
crc32_generic.c
crc32c_generic.c
crct10dif_common.c
crct10dif_generic.c crypto: squash lines for simple wrapper functions 2016-09-13 20:27:26 +08:00
cryptd.c crypto: cryptd - Add support for skcipher 2016-11-28 21:23:18 +08:00
crypto_engine.c crypto: engine - Handle the kthread worker using the new API 2016-10-25 11:08:25 +08:00
crypto_null.c crypto: null - Remove default null blkcipher 2016-07-18 17:35:44 +08:00
crypto_user.c crypto: acomp - add asynchronous compression api 2016-10-25 11:08:30 +08:00
crypto_wq.c
ctr.c crypto: skcipher - Get rid of crypto_spawn_skcipher2() 2016-11-01 08:37:17 +08:00
cts.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
deflate.c crypto: acomp - add support for deflate via scomp 2016-10-25 11:08:36 +08:00
des_generic.c
dh_helper.c crypto: dh - Add DH software implementation 2016-06-23 18:29:56 +08:00
dh.c crypto: dh - Consistenly return negative error codes 2016-11-13 17:45:04 +08:00
drbg.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2016-11-30 19:53:12 +08:00
ecb.c
ecc_curve_defs.h crypto: ecdh - Add ECDH software support 2016-06-23 18:29:57 +08:00
ecc.c crypto: ecdh - make ecdh_shared_secret unique 2016-06-24 21:24:59 +08:00
ecc.h crypto: ecdh - make ecdh_shared_secret unique 2016-06-24 21:24:59 +08:00
ecdh_helper.c crypto: ecdh - Add ECDH software support 2016-06-23 18:29:57 +08:00
ecdh.c crypto: ecdh - make ecdh_shared_secret unique 2016-06-24 21:24:59 +08:00
echainiv.c crypto: echainiv - Replace chaining with multiplication 2016-09-13 18:44:57 +08:00
fcrypt.c
fips.c
gcm.c crypto: skcipher - Get rid of crypto_spawn_skcipher2() 2016-11-01 08:37:17 +08:00
gf128mul.c crypto: gf128mul - Zero memory when freeing multiplication table 2016-11-17 23:34:59 +08:00
ghash-generic.c crypto: ghash-generic - move common definitions to a new header file 2016-10-02 22:26:40 +08:00
hash_info.c
hmac.c
internal.h crypto: api - Remove no-op exit_ops code 2016-10-21 11:03:42 +08:00
jitterentropy-kcapi.c crypto: jitterentropy - drop duplicate header module.h 2016-11-17 23:34:52 +08:00
jitterentropy.c
Kconfig crypto: ccm - switch to separate cbcmac driver 2017-02-11 17:50:45 +08:00
keywrap.c
khazad.c
kpp.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
lrw.c crypto: lrw - Convert to skcipher 2016-11-28 21:23:17 +08:00
lz4.c crypto: acomp - add support for lz4 via scomp 2016-10-25 11:08:32 +08:00
lz4hc.c crypto: acomp - add support for lz4hc via scomp 2016-10-25 11:08:32 +08:00
lzo.c crypto: acomp - add support for lzo via scomp 2016-10-25 11:08:31 +08:00
Makefile crypto: improve gcc optimization flags for serpent and wp512 2017-02-11 17:52:26 +08:00
mcryptd.c crypto: mcryptd - Check mcryptd algorithm compatibility 2016-12-07 19:55:37 +08:00
md4.c
md5.c
memneq.c
michael_mic.c
pcbc.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
pcrypt.c
poly1305_generic.c crypto: poly1305 - Use unaligned access where required 2016-11-13 17:45:03 +08:00
proc.c
ripemd.h
rmd128.c
rmd160.c
rmd256.c
rmd320.c
rng.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
rsa_helper.c crypto: rsa - allow keys >= 2048 bits in FIPS mode 2016-08-24 21:07:10 +08:00
rsa-pkcs1pad.c crypto: rsa-pkcs1pad - Handle leading zero for decryption 2016-09-22 17:42:08 +08:00
rsa.c crypto: rsa - Generate fixed-length output 2016-07-01 23:45:18 +08:00
rsaprivkey.asn1 crypto: rsa - Store rest of the private key components 2016-07-05 23:05:26 +08:00
rsapubkey.asn1
salsa20_generic.c
scatterwalk.c crypto: scatterwalk - Remove unnecessary aliasing check in map_and_copy 2016-11-22 15:02:25 +08:00
scompress.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
seed.c
seqiv.c crypto: skcipher - Remove top-level givcipher interface 2016-07-18 17:35:46 +08:00
serpent_generic.c
sha1_generic.c
sha3_generic.c crypto: sha3 - Add missing ULL suffixes for 64-bit constants 2016-08-08 23:43:46 +08:00
sha256_generic.c
sha512_generic.c
shash.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
simd.c crypto: simd - Add simd skcipher helper 2016-11-28 21:23:18 +08:00
skcipher.c crypto: Replaced gcc specific attributes with macros from compiler.h 2017-01-13 00:24:39 +08:00
tcrypt.c crypto: tcrypt - Add debug prints 2017-01-23 22:50:24 +08:00
tcrypt.h
tea.c
testmgr.c crypto: testmgr - add test cases for cbcmac(aes) 2017-02-11 17:50:44 +08:00
testmgr.h crypto: testmgr - add test cases for cbcmac(aes) 2017-02-11 17:50:44 +08:00
tgr192.c
twofish_common.c
twofish_generic.c
vmac.c
wp512.c
xcbc.c
xor.c crypto: xor - Fix warning when XOR_SELECT_TEMPLATE is unset 2016-08-31 23:00:48 +08:00
xts.c crypto: xts - Convert to skcipher 2016-11-28 21:23:18 +08:00