kicad-developers team mailing list archive
Mailing list archive
Re: GLM 0.9.9.3 and GLM_FORCE_PURE
> - detecting broken GLM at configure time, setting GLM_FORCE_PURE globally there
> might cause two different build configurations with different bugs, so
> it will make debugging harder, but at least compiling from source works
> for everyone
I vote for that one.
I feel there won't be any issued caused by GLM using intrinsics, in any case, that will be only closed to 3D related stuff.
Regarding the performance,
The best performance using SIMD is achieve when the implementation is cache-friendly and data batch processed - that is not the case of 3D Viewer.
So the possible minimal impact will be on internal GLM functions (eg matrix, etc) or related with some compiler time optimization...
I would say for the use 3D Viewer is using it (the possible intrinsics optimization), it may be unnoticeable for the user perspective.
From: Simon Richter <Simon.Richter@xxxxxxxxxx>
Sent: 30 April 2019 00:54
To: Mário Luzeiro; kicad-developers@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Kicad-developers] GLM 0.9.9.3 and GLM_FORCE_PURE
On 29.04.19 23:49, Mário Luzeiro wrote:
> I was checking the commit, it is a commit by Cirilo from some of my indications for includes and he copied the file at that state.
> I checked my log where that changes come from and there was also noting useful there.
> So it may was some test or something else at that moment.
> I hope SIMD performs better, but it can also be profiled.
I've asked the compiler and performance geeks on Twitter a kind of
backhanded question on the performance impact:
In summary, "it's complicated."
>From a performance point of view, scalarizing everything and then
autovectorizing after loop unrolling is way better than trusting the
programmer on vectorization, and GLM using intrinsics forces operands
into xmm registers in a particular layout, which then in turn requires
gcc to use vector instructions matching that layout or shuffling them
vec3 is particularly unsuitable for xmm instructions, because there is
no three-element dot product, a quarter of the lanes goes unused all the
time but still impacts performance if it goes denormal or encounters a
Intel's OpenCL implementation for GPUs begins with a scalarize pass for
precisely that reason, I expect others to do that as well — but OpenCL
is special in that the topmost loop is external to the compiled code,
which is a luxury we don't have.
So I guess we need to profile this to make a good decision, but we also
need to be able to offer something to people compiling from source on
> I found also this on the mailing list that may be helpful for you:
Yes, that is consistent with the current thread. C++11's constexpr is
slightly different from C++14's, and GLM not taking this into account is
a GLM bug, which they've fixed in later versions, and the fix has also
been backported to Debian buster, so they have a 0.9.9.3 version that
works, which means our current test is too strict.
Avenues I could see:
- the patch as is
unclear performance impact, might be positive or negative or most likely
- switching to C++14
likely no performance impact, also avoids the problem
- detecting broken GLM at configure time, rejecting
the minimal change
- detecting broken GLM at configure time, setting GLM_FORCE_PURE
might cause two different build configurations with different bugs, so
it will make debugging harder, but at least compiling from source works
- repeatedly explaining to people how to update their GLM if kicad
fails to configure
in Brexit terms, the "no-deal" option