Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model bugs with latest GCC 14 and Clang 19 #1312

Open
illwieckz opened this issue Sep 17, 2024 · 30 comments
Open

Model bugs with latest GCC 14 and Clang 19 #1312

illwieckz opened this issue Sep 17, 2024 · 30 comments
Labels
A-Compiler Compiler bugs T-Bug

Comments

@illwieckz
Copy link
Member

illwieckz commented Sep 17, 2024

I remember @DolceTriade said one day he seen bugs while compiling with GCC 14.

I confirm I see bugs affecting models in a weird way when compiling with GCC 14 (no issue with GCC 13):

unvanquished_2024-09-13_181819_000

unvanquished_2024-09-13_181834_000

First stable version of Clang 19 was released today and then out of curiosity I test it, and I got another bug but affecting the models too, but in a different way (no issue with Clang 18):

unvanquished_2024-09-17_200312_000

unvanquished_2024-09-17_200930_000

Maybe we actually do something wrong if two compilers fail on it.

@illwieckz illwieckz added T-Bug A-Compiler Compiler bugs labels Sep 17, 2024
@illwieckz
Copy link
Member Author

illwieckz commented Sep 17, 2024

An easy way to test both GCC 14 and Clang 19 is to use Ubuntu Noble (it has GCC 14 in repositories) and to install Clang 19 from the LLVM upstream repository for Ubuntu: https://apt.llvm.org

@slipher
Copy link
Member

slipher commented Sep 17, 2024

As usual this doesn't happen with GCC 14.2 on Windows.

Does it matter what vm.cgame.type you use? Does it matter if you disable r_vboModels?

@illwieckz
Copy link
Member Author

I get the same results with both DLL and EXE cgame.

I get the same results with r_vboVertexSkinning 0 and r_vboModel 0 (actually r_vboVertexSkinning 0 has to be preferred over r_vboModel 0).

@illwieckz
Copy link
Member Author

Interesting, with NEXE cgame, I reproduce the Clang 19 bug, but not the GCC 14 bug.

@DolceTriade
Copy link
Contributor

Removing -ffast-math fixes issue on GCC14 for me.

@illwieckz
Copy link
Member Author

illwieckz commented Nov 19, 2024

Engine build with GCC13 (unaffected) with game built with current Saigo (based on Clang 19, affected):

unvanquished_2024-11-19_234514_000

unvanquished_2024-11-19_234532_000

So that looks to be a game code bug.

@illwieckz
Copy link
Member Author

The bug affects both the engine and the cgame, but it looks different if both engine and games are compiled with affected compilers than only one of them.

engine game result
GCC 13 no fast math Saigo/Clang 19 no fast math ✅️
GCC 13 fast math Saigo/Clang 19 no fast math ✅️
GCC 13 fast math Saigo/Clang 19 fast math ❌️
Clang 19 no fast math Saigo/Clang 19 no fast math ✅️
Clang 19 no fast math Saigo/Clang 19 fast math ❌️
Clang 19 fast math Saigo/Clang 19 no fast math ❌️
Clang 19 fast math Saigo/Clang 19 fast math ❌️

The weirdness seen with non-fast-math Clang 19 engine with fast-math Saigo is the same as fast-math GCC 13 engine with fast-math Saigo.

@illwieckz
Copy link
Member Author

illwieckz commented Nov 27, 2024

I have no idea what is causing this, I was hoping to catch some division by zero.

I have noticed that using a debug build workarounds the bug too, even when enabling fast math.

@illwieckz
Copy link
Member Author

I wonder if some underflow I catch on model loading is related or not:

@illwieckz
Copy link
Member Author

illwieckz commented Nov 28, 2024

According to https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html the -ffast-math option does:

Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

So with GCC 14 (known to reproduce the bug) I replaced -ffast-math with all of them and disabled some of them to see when I reproduced or not the bug. I noticed I can enable all of them but -funsafe-math-optimizations and -ffinite-math-only and get no bug.

But, I could also enable either one of -funsafe-math-optimizations or -ffinite-math-only and get no bug. Only the combination of both reproduces the bug.

@illwieckz
Copy link
Member Author

That GCC documentation page also says that -funsafe-math-optimizations does:

Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

But when I replace -funsafe-math-optimizations with all of them, the bug is not reproduced.

@illwieckz
Copy link
Member Author

Interestingly, using -funsafe-math-optimizations but with -fsigned-zeros, -ftrapping-math, -fno-associative-math and -fnoreciprocal-math (so the reverse of what it enables), the bug disappears.

@illwieckz
Copy link
Member Author

It looks like what makes the difference is to use -fsigned-zeros or not, using -fno-signed-zeros reproduces the bug on GCC 14.

@illwieckz
Copy link
Member Author

illwieckz commented Nov 28, 2024

But using -ffast-math with -fsigned-zeros or -fno-signed-zeros always reproduce the bug on GCC 14.

It should be noted that using -fsigned-zeros forces the use of -fno-fassociative-math, but using this one without -fsigned-zeros doesn't reproduce the bug.

So the flag combination is not totally reversible, but the bug has strong chance to be related to using -fno-signed-zeros anyway (and maybe -fassociative-math).

@illwieckz
Copy link
Member Author

Hmm, good to know, using -ftrapping-math with -ffast-math makes the bug go away (-ftrapping-math also forces -fno-associative-math).

So, that may still be some division by zero or things like that, the problem is that we cannot catch it with feenableexcept() when when -ffast-math and then -fno-trapping-math is set because:

Compile code assuming that floating-point operations cannot generate user-visible traps. These traps include division by zero, overflow, underflow, inexact result and invalid operation.

@illwieckz
Copy link
Member Author

But when I use -ffast-math with -ftrapping-math, the bug disappears and I catch no division by zero. If that's due to some other errors (like underflow), I can't catch them because of the already known ones being caught at load time.

@illwieckz
Copy link
Member Author

Also using -ffast-math with -fsignaling-nans makes the but go away (but it is says -fno-trapping-math requires -fno-trapping-math to be used instead).

@illwieckz
Copy link
Member Author

It works without bug on CLANG 19 with -ffast-math -fhonor-nans, so that's likely a NaN occurring somewhere.

It's likely not a division by zero because feenableexcept(FE_DIVBYZERO); doesn't catch it, and -ffast-math -fhonor-infinities -ftrapping-math doesn't as well.

@illwieckz
Copy link
Member Author

R_TBNtoQtangents is full of NaN.

@illwieckz
Copy link
Member Author

illwieckz commented Nov 29, 2024

Disabling custom SSE code workarounds the bug.

@DolceTriade
Copy link
Contributor

Nice find!

We should also see if -ffast-math actually makes a noticeable difference in performance.

@DolceTriade
Copy link
Contributor

A quick gut check for plat23 as spec in A base (/setviewpos 2966 1514 292 127 17) with release build and LTO on shows:
no -ffast-math: 577 (stable)
with -ffast-math: 578-579 (unstable)

Based on this single datapoint, it might not even be worth the hassle.

@slipher
Copy link
Member

slipher commented Nov 30, 2024

TransAddRotationQuat doesn't seem to work correctly with Clang 19. I have a failing unit test:

static void ExpectTransformEqual(const transform_t &t1, const transform_t &t2)
{
    for (int i = 0; i < 8; i++)
    {
        float n1, n2;
        memcpy(&n1, reinterpret_cast<const char*>(&t1) + 4 * i, 4);
        memcpy(&n2, reinterpret_cast<const char*>(&t2) + 4 * i, 4);
        EXPECT_FLOAT_EQ(n1, n2) << "transform_t's unequal at offset " << 4 * i;
    }
}

TEST(QMathTest, TransAddRotationQuat)
{
    transform_t t1, t2;
    TransInit(&t1);
    TransInit(&t2);

    quat_t q;
    QuatFromAngles(q, 33, 59, 124);

    TransAddRotationQuat(q, &t1);
    TransInsRotationQuat(q, &t2);

    ExpectTransformEqual(t1, t2);
}

This unit test might not even be correct if there is more than 1 way to represent a quat or something, but it passes with another compiler.

@slipher
Copy link
Member

slipher commented Nov 30, 2024

I added __attribute__((noinline)) to TransAddRotationQuat so I could disassemble it. The last 3 instructions set the entire sseTransScale part of the transform to 0. All of our coordinates being multiplied by 0 could certainly explain models not showing up.

Dump of assembler code for function TransAddRotationQuat(float const*, transform_t*):
   0x0000000000039e40 <+0>:	movups (%rdi),%xmm1
   0x0000000000039e43 <+3>:	movaps %xmm1,%xmm0
   0x0000000000039e46 <+6>:	shufps $0xff,%xmm1,%xmm0
   0x0000000000039e4a <+10>:	movaps (%rsi),%xmm2
   0x0000000000039e4d <+13>:	mulps  %xmm2,%xmm0
   0x0000000000039e50 <+16>:	movaps %xmm1,%xmm3
   0x0000000000039e53 <+19>:	shufps $0x24,%xmm1,%xmm3
   0x0000000000039e57 <+23>:	movaps %xmm2,%xmm4
   0x0000000000039e5a <+26>:	shufps $0x3f,%xmm2,%xmm4
   0x0000000000039e5e <+30>:	mulps  %xmm3,%xmm4
   0x0000000000039e61 <+33>:	movaps 0xad6e8(%rip),%xmm3        # 0xe7550
   0x0000000000039e68 <+40>:	xorps  %xmm3,%xmm4
   0x0000000000039e6b <+43>:	movaps %xmm1,%xmm5
   0x0000000000039e6e <+46>:	shufps $0x49,%xmm1,%xmm5
   0x0000000000039e72 <+50>:	movaps %xmm2,%xmm6
   0x0000000000039e75 <+53>:	shufps $0x52,%xmm2,%xmm6
   0x0000000000039e79 <+57>:	mulps  %xmm5,%xmm6
   0x0000000000039e7c <+60>:	xorps  %xmm3,%xmm6
   0x0000000000039e7f <+63>:	addps  %xmm4,%xmm6
   0x0000000000039e82 <+66>:	shufps $0x89,%xmm2,%xmm2
   0x0000000000039e86 <+70>:	shufps $0x92,%xmm1,%xmm1
   0x0000000000039e8a <+74>:	mulps  %xmm2,%xmm1
   0x0000000000039e8d <+77>:	subps  %xmm1,%xmm0
   0x0000000000039e90 <+80>:	addps  %xmm6,%xmm0
   0x0000000000039e93 <+83>:	movaps %xmm0,(%rsi)
   0x0000000000039e96 <+86>:	xorps  %xmm0,%xmm0
   0x0000000000039e99 <+89>:	movaps %xmm0,0x10(%rsi)
   0x0000000000039e9d <+93>:	ret    

@slipher
Copy link
Member

slipher commented Nov 30, 2024

The problem with Clang is that it now can't correctly handle SSE bit mask intrinsics on floating point vectors when -ffinite-math is on: llvm/llvm-project#118152. The arguments to _mm_and_ps, _mm_or_ps, etc., may be bit masks, which Clang wrongly interprets as floats. When some component of the mask is all ones, it interprets it as NaN and invokes undefined behavior.

We can work around this by using shuffle intrinsics instead of bit masks. These are presumably faster anyway: if you put the code with masks into (a non-bugged configuration of) Clang, it transforms them into shuffles.

slipher added a commit to slipher/Daemon that referenced this issue Dec 2, 2024
Fixes the Clang part of
DaemonEngine#1312. The problem is that
LLVM wrongly considers __m128 to have floating-point semantics at all
times. When it sees an all-ones mask in some component it wrongly sees
it as a NaN and replaces it with an undefined value when in
-ffinite-math mode.

In the cases where the new first_XYZ_second_W is used, that is actually
what Clang itself outputs after optimizing the original intrinsics code
(with a non-bugged flag configuration). So hopefully it is faster
anyway.
@illwieckz
Copy link
Member Author

Disabling SSE doesn't fix the GCC 14 bug, so that's a different bug than the Clang one.

slipher added a commit that referenced this issue Dec 5, 2024
Fixes the Clang part of
#1312. The problem is that
LLVM wrongly considers __m128 to have floating-point semantics at all
times. When it sees an all-ones mask in some component it wrongly sees
it as a NaN and replaces it with an undefined value when in
-ffinite-math mode.

In the cases where the new first_XYZ_second_W is used, that is actually
what Clang itself outputs after optimizing the original intrinsics code
(with a non-bugged flag configuration). So hopefully it is faster
anyway.
@illwieckz
Copy link
Member Author

illwieckz commented Dec 19, 2024

I wonder what is that remaining GCC bug. I noticed the distortion is affected by the orientation of the model on vertical axis. With some angles there is no distortion, with some there is a few, with some others there is much.

@illwieckz
Copy link
Member Author

This can be seen while playing with a dretch and cg_thirdPerson enabled and moving around.

@illwieckz
Copy link
Member Author

I now have a branch with divby0, overflow and invalid float exceptions and all of them fixed, but I still get the model distortion while not getting any of those float exceptions…

I now suspect a compiler bug in some optimization.

@illwieckz
Copy link
Member Author

So, as a summary, the -ftrapping-math flag is making the bug go away.

flags result
-ffast-math -fno-associative-math ❌️ bug
-ffast-math -fno-associative-math -ftrapping-math ✅️ no bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Compiler Compiler bugs T-Bug
Projects
None yet
Development

No branches or pull requests

3 participants