User Tools

Site Tools


vectorization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
vectorization [2016/03/01 22:19]
root
vectorization [2016/03/03 20:47]
root
Line 8: Line 8:
   - Most versions of the Intel compiler don't vectorize at -O3 with no -x specified, except for 14.0, which apparently does   - Most versions of the Intel compiler don't vectorize at -O3 with no -x specified, except for 14.0, which apparently does
   - Intel compiler on AMD CPU is a little complicated,​ but there is a workaround. ​ The 6136 claims to have sse3 and sse4.a capability where 4.a is an AMD extension not recognized by the Intel compiler. ​ The issue is that the Intel compiler disables sse3 on AMD. The executable checks the CPUID string and then uses different code paths for different processors. The reason for this is disputed [[http://​www.agner.org/​optimize/​blog/​read.php?​i=49#​49]] .  For Intel compiler versions through 13, the executable can be patched [[https://​github.com/​jimenezrick/​patch-AuthenticAMD]] by a binary editor that changes the comparison string for the CPUID from GenuineIntel to AuthenticAMD .  The patch doesn'​t work on 14+ binaries, and where it works it doesn'​t allow vectorization that the CPU doesn'​t have, so for the 6136 it enables sse3 only. Fortunately,​ the 14.0 compiler with -O3 and no -x specification appears by the execution time to enable sse3 without the GenuineIntel check. ​ Also by the execution time, it appears that the 16.0 compiler with -O3 and no -x is similar to versions up to 13, that is no vectorization.   - Intel compiler on AMD CPU is a little complicated,​ but there is a workaround. ​ The 6136 claims to have sse3 and sse4.a capability where 4.a is an AMD extension not recognized by the Intel compiler. ​ The issue is that the Intel compiler disables sse3 on AMD. The executable checks the CPUID string and then uses different code paths for different processors. The reason for this is disputed [[http://​www.agner.org/​optimize/​blog/​read.php?​i=49#​49]] .  For Intel compiler versions through 13, the executable can be patched [[https://​github.com/​jimenezrick/​patch-AuthenticAMD]] by a binary editor that changes the comparison string for the CPUID from GenuineIntel to AuthenticAMD .  The patch doesn'​t work on 14+ binaries, and where it works it doesn'​t allow vectorization that the CPU doesn'​t have, so for the 6136 it enables sse3 only. Fortunately,​ the 14.0 compiler with -O3 and no -x specification appears by the execution time to enable sse3 without the GenuineIntel check. ​ Also by the execution time, it appears that the 16.0 compiler with -O3 and no -x is similar to versions up to 13, that is no vectorization.
-  - **Recommended** best single executable that runs on AHPCC **AMD and Intel systems**: <​code>​Intel 14 compiler with -O3, no -x, -axsse4.2</​code>​+  - **Recommended** best single executable that runs on AHPCC **AMD and Intel systems**: <​code>​Intel 14 compiler with -O3 -axsse4.2 ​(don't set -x)</​code>​
   - **Recommended** best single executable that runs on AHPCC **Intel systems only**: <​code>​Intel 14+ compiler with -O3 -xsse4.2 -axavx</​code>​   - **Recommended** best single executable that runs on AHPCC **Intel systems only**: <​code>​Intel 14+ compiler with -O3 -xsse4.2 -axavx</​code>​
  
vectorization.txt · Last modified: 2017/02/07 21:16 by root