← Back to team overview

kicad-developers team mailing list archive



Hi Mário,

On 29.04.19 23:49, Mário Luzeiro wrote:

> I was checking the commit, it is a commit by Cirilo from some of my indications for includes and he copied the file at that state.
> I checked my log where that changes come from and there was also noting useful there.
> So it may was some test or something else at that moment.
> I hope SIMD performs better, but it can also be profiled.

I've asked the compiler and performance geeks on Twitter a kind of
backhanded question on the performance impact:


In summary, "it's complicated."

From a performance point of view, scalarizing everything and then
autovectorizing after loop unrolling is way better than trusting the
programmer on vectorization, and GLM using intrinsics forces operands
into xmm registers in a particular layout, which then in turn requires
gcc to use vector instructions matching that layout or shuffling them

vec3 is particularly unsuitable for xmm instructions, because there is
no three-element dot product, a quarter of the lanes goes unused all the
time but still impacts performance if it goes denormal or encounters a
domain error.

Intel's OpenCL implementation for GPUs begins with a scalarize pass for
precisely that reason, I expect others to do that as well ­— but OpenCL
is special in that the topmost loop is external to the compiled code,
which is a luxury we don't have.

So I guess we need to profile this to make a good decision, but we also
need to be able to offer something to people compiling from source on
Debian buster.

> I found also this on the mailing list that may be helpful for you:
> https://www.mail-archive.com/kicad-developers@xxxxxxxxxxxxxxxxxxx/msg32827.html

Yes, that is consistent with the current thread. C++11's constexpr is
slightly different from C++14's, and GLM not taking this into account is
a GLM bug, which they've fixed in later versions, and the fix has also
been backported to Debian buster, so they have a version that
works, which means our current test is too strict.

Avenues I could see:

 - the patch as is

unclear performance impact, might be positive or negative or most likely

 - switching to C++14

likely no performance impact, also avoids the problem

 - detecting broken GLM at configure time, rejecting

the minimal change

 - detecting broken GLM at configure time, setting GLM_FORCE_PURE
globally there

might cause two different build configurations with different bugs, so
it will make debugging harder, but at least compiling from source works
for everyone

 - repeatedly explaining to people how to update their GLM if kicad
fails to configure

in Brexit terms, the "no-deal" option


Attachment: signature.asc
Description: OpenPGP digital signature

Follow ups