← Back to team overview

kicad-developers team mailing list archive



Hi Wayne,

On 29.04.19 20:15, Wayne Stambaugh wrote:

> Rather than remove it completely, how about build option to disable the
> SIMD on platforms that don't support it.  Even better, there is probably
> a way to query the compiler to check if it supports SIMD and
> enable/disable accordingly.

GLM already has a list which platforms, compilers and versions support
which vector intrinsics, but for several compilers, the intrinsics
themselves are not constexpr, so e.g.

    constexpr vec4 max(vec4 const &lhs, vec4 const &rhs)
        return vec4(__builtin_ia32_maxps(lhs, rhs));

fails to compile, while

    constexpr vec4 max(vec4 const &lhs, vec4 const &rhs)
        return vec4(max(lhs[0], rhs[0]),
                    max(lhs[1], rhs[1]),
                    max(lhs[2], rhs[2]),
                    max(lhs[3], rhs[3]));

would work as long as the "max" function itself was constexpr. The
compiler should also generate a "maxps" instruction for this, but that
depends on whether the vectorizer recognizes the pattern.

GLM knows this and drops the "constexpr" if it uses SIMD intrinsics. The
GLM_FORCE_PURE define forbids intrinsics, which reenables "constexpr",
which then causes a problem in C++11 mode, as constexpr member functions
are implicitly const, so

    constexpr T &operator[](unsigned int);
    constexpr T const &operator[](unsigned int) const;

only differ in return type, so this is an invalid overload. In C++14
mode, this becomes valid again.

So this isn't really a problem of whether or not we use vector
instructions, as we pretty much always do on amd64 anyway (the
architecture baseline has SSE2), but a tradeoff between "using more
vector instructions" (gains dozens of CPU cycles) and "resolving more
stuff at compile time" (also gains dozens of CPU cycles).

I'd expect the performance impact to be so minimal that it doesn't even
matter which way it goes, the main question is how we can make it build
on the most platforms.

Deleting the #define is in the Works For Me™ category (i.e. succeeds on
Linux x86, Linux amd64, msys2 x86, msys2 amd64, MSVC x86, MSVC amd64)
and fixes builds with unpatched GLM

Removing the check allows building with patched GLM (in Debian
buster, i.e. the next stable release), so that is a strong incentive. We
could also turn this into a compile test to reject only broken
installations if we wanted to keep the #define for some reason, but I
haven't found a good one.

Our main problem on arm is OpenGL anyway, most devices support the EGL
subset only, so whether we use NEON there is probably not that relevant
either. I haven't found time to do anything I wanted to in the last
weeks so I'm unsure if I can set up a sensible ARM autobuild to actually
verify what happens there (and CMake is rather bad at cross-compiling).

The #define was introduced in commit 80f8e74797b


and contains no explanation.


Attachment: signature.asc
Description: OpenPGP digital signature

Follow ups