Model bugs with latest GCC 14 and Clang 19 #1312

illwieckz · 2024-09-17T18:11:27Z

I remember @DolceTriade said one day he seen bugs while compiling with GCC 14.

I confirm I see bugs affecting models in a weird way when compiling with GCC 14 (no issue with GCC 13):

First stable version of Clang 19 was released today and then out of curiosity I test it, and I got another bug but affecting the models too, but in a different way (no issue with Clang 18):

Maybe we actually do something wrong if two compilers fail on it.

illwieckz · 2024-09-17T18:14:07Z

An easy way to test both GCC 14 and Clang 19 is to use Ubuntu Noble (it has GCC 14 in repositories) and to install Clang 19 from the LLVM upstream repository for Ubuntu: https://apt.llvm.org

slipher · 2024-09-17T18:24:12Z

As usual this doesn't happen with GCC 14.2 on Windows.

Does it matter what vm.cgame.type you use? Does it matter if you disable r_vboModels?

illwieckz · 2024-09-17T19:14:25Z

I get the same results with both DLL and EXE cgame.

I get the same results with r_vboVertexSkinning 0 and r_vboModel 0 (actually r_vboVertexSkinning 0 has to be preferred over r_vboModel 0).

illwieckz · 2024-09-17T19:21:18Z

Interesting, with NEXE cgame, I reproduce the Clang 19 bug, but not the GCC 14 bug.

DolceTriade · 2024-10-15T14:28:15Z

Removing -ffast-math fixes issue on GCC14 for me.

illwieckz · 2024-11-19T22:46:51Z

Engine build with GCC13 (unaffected) with game built with current Saigo (based on Clang 19, affected):

So that looks to be a game code bug.

illwieckz · 2024-11-19T23:11:36Z

The bug affects both the engine and the cgame, but it looks different if both engine and games are compiled with affected compilers than only one of them.

engine	game	result
GCC 13 no fast math	Saigo/Clang 19 no fast math	✅️
GCC 13 fast math	Saigo/Clang 19 no fast math	✅️
GCC 13 fast math	Saigo/Clang 19 fast math	❌️
Clang 19 no fast math	Saigo/Clang 19 no fast math	✅️
Clang 19 no fast math	Saigo/Clang 19 fast math	❌️
Clang 19 fast math	Saigo/Clang 19 no fast math	❌️
Clang 19 fast math	Saigo/Clang 19 fast math	❌️

The weirdness seen with non-fast-math Clang 19 engine with fast-math Saigo is the same as fast-math GCC 13 engine with fast-math Saigo.

illwieckz · 2024-11-27T22:55:44Z

I have no idea what is causing this, I was hoping to catch some division by zero.

I have noticed that using a debug build workarounds the bug too, even when enabling fast math.

illwieckz · 2024-11-27T22:59:36Z

I wonder if some underflow I catch on model loading is related or not:

Fix some divisions by zero Unvanquished/Unvanquished#3204 (comment)

illwieckz · 2024-11-28T01:09:30Z

According to https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html the -ffast-math option does:

Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

So with GCC 14 (known to reproduce the bug) I replaced -ffast-math with all of them and disabled some of them to see when I reproduced or not the bug. I noticed I can enable all of them but -funsafe-math-optimizations and -ffinite-math-only and get no bug.

But, I could also enable either one of -funsafe-math-optimizations or -ffinite-math-only and get no bug. Only the combination of both reproduces the bug.

illwieckz · 2024-11-28T01:15:11Z

That GCC documentation page also says that -funsafe-math-optimizations does:

Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

But when I replace -funsafe-math-optimizations with all of them, the bug is not reproduced.

illwieckz · 2024-11-28T01:20:52Z

Interestingly, using -funsafe-math-optimizations but with -fsigned-zeros, -ftrapping-math, -fno-associative-math and -fnoreciprocal-math (so the reverse of what it enables), the bug disappears.

illwieckz · 2024-11-28T01:33:59Z

It looks like what makes the difference is to use -fsigned-zeros or not, using -fno-signed-zeros reproduces the bug on GCC 14.

illwieckz · 2024-11-28T01:40:16Z

But using -ffast-math with -fsigned-zeros or -fno-signed-zeros always reproduce the bug on GCC 14.

It should be noted that using -fsigned-zeros forces the use of -fno-fassociative-math, but using this one without -fsigned-zeros doesn't reproduce the bug.

So the flag combination is not totally reversible, but the bug has strong chance to be related to using -fno-signed-zeros anyway (and maybe -fassociative-math).

illwieckz · 2024-11-28T01:58:42Z

Hmm, good to know, using -ftrapping-math with -ffast-math makes the bug go away (-ftrapping-math also forces -fno-associative-math).

So, that may still be some division by zero or things like that, the problem is that we cannot catch it with feenableexcept() when when -ffast-math and then -fno-trapping-math is set because:

Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation.

illwieckz · 2024-11-28T02:02:06Z

But when I use -ffast-math with -ftrapping-math, the bug disappears and I catch no division by zero. If that's due to some other errors (like underflow), I can't catch them because of the already known ones being caught at load time.

illwieckz · 2024-11-28T02:06:15Z

Also using -ffast-math with -fsignaling-nans makes the but go away (but it is says -fno-trapping-math requires -fno-trapping-math to be used instead).

illwieckz · 2024-11-29T07:45:19Z

It works without bug on CLANG 19 with -ffast-math -fhonor-nans, so that's likely a NaN occurring somewhere.

It's likely not a division by zero because feenableexcept(FE_DIVBYZERO); doesn't catch it, and -ffast-math -fhonor-infinities -ftrapping-math doesn't as well.

illwieckz · 2024-11-29T08:35:54Z

R_TBNtoQtangents is full of NaN.

illwieckz · 2024-11-29T08:55:16Z

Disabling custom SSE code workarounds the bug.

DolceTriade · 2024-11-29T23:16:25Z

Nice find!

We should also see if -ffast-math actually makes a noticeable difference in performance.

DolceTriade · 2024-11-29T23:23:34Z

A quick gut check for plat23 as spec in A base (/setviewpos 2966 1514 292 127 17) with release build and LTO on shows:
no -ffast-math: 577 (stable)
with -ffast-math: 578-579 (unstable)

Based on this single datapoint, it might not even be worth the hassle.

slipher · 2024-11-30T02:37:42Z

TransAddRotationQuat doesn't seem to work correctly with Clang 19. I have a failing unit test:

static void ExpectTransformEqual(const transform_t &t1, const transform_t &t2)
{
    for (int i = 0; i < 8; i++)
    {
        float n1, n2;
        memcpy(&n1, reinterpret_cast<const char*>(&t1) + 4 * i, 4);
        memcpy(&n2, reinterpret_cast<const char*>(&t2) + 4 * i, 4);
        EXPECT_FLOAT_EQ(n1, n2) << "transform_t's unequal at offset " << 4 * i;
    }
}

TEST(QMathTest, TransAddRotationQuat)
{
    transform_t t1, t2;
    TransInit(&t1);
    TransInit(&t2);

    quat_t q;
    QuatFromAngles(q, 33, 59, 124);

    TransAddRotationQuat(q, &t1);
    TransInsRotationQuat(q, &t2);

    ExpectTransformEqual(t1, t2);
}

This unit test might not even be correct if there is more than 1 way to represent a quat or something, but it passes with another compiler.

slipher · 2024-11-30T03:09:19Z

I added __attribute__((noinline)) to TransAddRotationQuat so I could disassemble it. The last 3 instructions set the entire sseTransScale part of the transform to 0. All of our coordinates being multiplied by 0 could certainly explain models not showing up.

Dump of assembler code for function TransAddRotationQuat(float const*, transform_t*):
   0x0000000000039e40 <+0>:	movups (%rdi),%xmm1
   0x0000000000039e43 <+3>:	movaps %xmm1,%xmm0
   0x0000000000039e46 <+6>:	shufps $0xff,%xmm1,%xmm0
   0x0000000000039e4a <+10>:	movaps (%rsi),%xmm2
   0x0000000000039e4d <+13>:	mulps  %xmm2,%xmm0
   0x0000000000039e50 <+16>:	movaps %xmm1,%xmm3
   0x0000000000039e53 <+19>:	shufps $0x24,%xmm1,%xmm3
   0x0000000000039e57 <+23>:	movaps %xmm2,%xmm4
   0x0000000000039e5a <+26>:	shufps $0x3f,%xmm2,%xmm4
   0x0000000000039e5e <+30>:	mulps  %xmm3,%xmm4
   0x0000000000039e61 <+33>:	movaps 0xad6e8(%rip),%xmm3        # 0xe7550
   0x0000000000039e68 <+40>:	xorps  %xmm3,%xmm4
   0x0000000000039e6b <+43>:	movaps %xmm1,%xmm5
   0x0000000000039e6e <+46>:	shufps $0x49,%xmm1,%xmm5
   0x0000000000039e72 <+50>:	movaps %xmm2,%xmm6
   0x0000000000039e75 <+53>:	shufps $0x52,%xmm2,%xmm6
   0x0000000000039e79 <+57>:	mulps  %xmm5,%xmm6
   0x0000000000039e7c <+60>:	xorps  %xmm3,%xmm6
   0x0000000000039e7f <+63>:	addps  %xmm4,%xmm6
   0x0000000000039e82 <+66>:	shufps $0x89,%xmm2,%xmm2
   0x0000000000039e86 <+70>:	shufps $0x92,%xmm1,%xmm1
   0x0000000000039e8a <+74>:	mulps  %xmm2,%xmm1
   0x0000000000039e8d <+77>:	subps  %xmm1,%xmm0
   0x0000000000039e90 <+80>:	addps  %xmm6,%xmm0
   0x0000000000039e93 <+83>:	movaps %xmm0,(%rsi)
   0x0000000000039e96 <+86>:	xorps  %xmm0,%xmm0
   0x0000000000039e99 <+89>:	movaps %xmm0,0x10(%rsi)
   0x0000000000039e9d <+93>:	ret

slipher · 2024-11-30T03:55:04Z

The problem with Clang is that it now can't correctly handle SSE bit mask intrinsics on floating point vectors when -ffinite-math is on: llvm/llvm-project#118152. The arguments to _mm_and_ps, _mm_or_ps, etc., may be bit masks, which Clang wrongly interprets as floats. When some component of the mask is all ones, it interprets it as NaN and invokes undefined behavior.

We can work around this by using shuffle intrinsics instead of bit masks. These are presumably faster anyway: if you put the code with masks into (a non-bugged configuration of) Clang, it transforms them into shuffles.

Fixes the Clang part of DaemonEngine#1312. The problem is that LLVM wrongly considers __m128 to have floating-point semantics at all times. When it sees an all-ones mask in some component it wrongly sees it as a NaN and replaces it with an undefined value when in -ffinite-math mode. In the cases where the new first_XYZ_second_W is used, that is actually what Clang itself outputs after optimizing the original intrinsics code (with a non-bugged flag configuration). So hopefully it is faster anyway.

illwieckz · 2024-12-02T13:21:06Z

Disabling SSE doesn't fix the GCC 14 bug, so that's a different bug than the Clang one.

Fixes the Clang part of #1312. The problem is that LLVM wrongly considers __m128 to have floating-point semantics at all times. When it sees an all-ones mask in some component it wrongly sees it as a NaN and replaces it with an undefined value when in -ffinite-math mode. In the cases where the new first_XYZ_second_W is used, that is actually what Clang itself outputs after optimizing the original intrinsics code (with a non-bugged flag configuration). So hopefully it is faster anyway.

illwieckz · 2024-12-19T22:44:48Z

I wonder what is that remaining GCC bug. I noticed the distortion is affected by the orientation of the model on vertical axis. With some angles there is no distortion, with some there is a few, with some others there is much.

illwieckz · 2024-12-19T22:45:45Z

This can be seen while playing with a dretch and cg_thirdPerson enabled and moving around.

illwieckz · 2024-12-20T01:17:52Z

I now have a branch with divby0, overflow and invalid float exceptions and all of them fixed, but I still get the model distortion while not getting any of those float exceptions…

I now suspect a compiler bug in some optimization.

illwieckz · 2024-12-20T03:01:43Z

So, as a summary, the -ftrapping-math flag is making the bug go away.

flags	result
`-ffast-math -fno-associative-math`	❌️ bug
`-ffast-math -fno-associative-math -ftrapping-math`	✅️ no bug

illwieckz added T-Bug A-Compiler Compiler bugs labels Sep 17, 2024

This was referenced Nov 19, 2024

Migrating NaCl toolchain From PNaCl to Saigo Unvanquished/Unvanquished#3197

Open

Early support of the Saigo toolchain #981

Merged

slipher mentioned this issue Nov 30, 2024

[clang] -ffast-math in 19.1.0 prevents function from returning intended __m128 bitmask llvm/llvm-project#118152

Open

illwieckz mentioned this issue Dec 2, 2024

Fix Clang model bug, improve SSE intrinsics code and add tests for q_math transform functions #1452

Merged

illwieckz mentioned this issue Dec 3, 2024

Report what compiler is used to build the game #982

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model bugs with latest GCC 14 and Clang 19 #1312

Model bugs with latest GCC 14 and Clang 19 #1312

illwieckz commented Sep 17, 2024 •

edited

Loading

illwieckz commented Sep 17, 2024 •

edited

Loading

slipher commented Sep 17, 2024

illwieckz commented Sep 17, 2024

illwieckz commented Sep 17, 2024

DolceTriade commented Oct 15, 2024

illwieckz commented Nov 19, 2024 •

edited

Loading

illwieckz commented Nov 19, 2024

illwieckz commented Nov 27, 2024 •

edited

Loading

illwieckz commented Nov 27, 2024

illwieckz commented Nov 28, 2024 •

edited

Loading

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024 •

edited

Loading

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 29, 2024

illwieckz commented Nov 29, 2024

illwieckz commented Nov 29, 2024 •

edited

Loading

DolceTriade commented Nov 29, 2024

DolceTriade commented Nov 29, 2024

slipher commented Nov 30, 2024

slipher commented Nov 30, 2024

slipher commented Nov 30, 2024

illwieckz commented Dec 2, 2024

illwieckz commented Dec 19, 2024 •

edited

Loading

illwieckz commented Dec 19, 2024

illwieckz commented Dec 20, 2024

illwieckz commented Dec 20, 2024

Model bugs with latest GCC 14 and Clang 19 #1312

Model bugs with latest GCC 14 and Clang 19 #1312

Comments

illwieckz commented Sep 17, 2024 • edited Loading

illwieckz commented Sep 17, 2024 • edited Loading

slipher commented Sep 17, 2024

illwieckz commented Sep 17, 2024

illwieckz commented Sep 17, 2024

DolceTriade commented Oct 15, 2024

illwieckz commented Nov 19, 2024 • edited Loading

illwieckz commented Nov 19, 2024

illwieckz commented Nov 27, 2024 • edited Loading

illwieckz commented Nov 27, 2024

illwieckz commented Nov 28, 2024 • edited Loading

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024 • edited Loading

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 28, 2024

illwieckz commented Nov 29, 2024

illwieckz commented Nov 29, 2024

illwieckz commented Nov 29, 2024 • edited Loading

DolceTriade commented Nov 29, 2024

DolceTriade commented Nov 29, 2024

slipher commented Nov 30, 2024

slipher commented Nov 30, 2024

slipher commented Nov 30, 2024

illwieckz commented Dec 2, 2024

illwieckz commented Dec 19, 2024 • edited Loading

illwieckz commented Dec 19, 2024

illwieckz commented Dec 20, 2024

illwieckz commented Dec 20, 2024

illwieckz commented Sep 17, 2024 •

edited

Loading

illwieckz commented Sep 17, 2024 •

edited

Loading

illwieckz commented Nov 19, 2024 •

edited

Loading

illwieckz commented Nov 27, 2024 •

edited

Loading

illwieckz commented Nov 28, 2024 •

edited

Loading

illwieckz commented Nov 28, 2024 •

edited

Loading

illwieckz commented Nov 29, 2024 •

edited

Loading

illwieckz commented Dec 19, 2024 •

edited

Loading