Skip to content

Commit

Permalink
middle-end: rework vectorizable_store to iterate over single index [P…
Browse files Browse the repository at this point in the history
…R117557]

The testcase

#include <stdint.h>
#include <string.h>

#define N 8
#define L 8

void f(const uint8_t * restrict seq1,
       const uint8_t *idx, uint8_t *seq_out) {
  for (int i = 0; i < L; ++i) {
    uint8_t h = idx[i];
    memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
  }
}

compiled at -O3 -mcpu=neoverse-n1+sve

miscompiles to:

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    st1w    z31.s, p7, [x24, z12.s, sxtw]

rather than

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    addvl   x3, x24, #2
    st1w    z31.s, p3, [x3, z12.s, sxtw]

Where two things go wrong, the wrong mask is used and the address pointers to
the stores are wrong.

This issue is happening because the codegen loop in vectorizable_store is a
nested loop where in the outer loop we iterate over ncopies and in the inner
loop we loop over vec_num.

For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
determined by only the outerloop index and the pointer address is only updated
in the outer loop.

As such for SLP we always use the same predicate and the same memory location.
This patch flattens the two loops and instead iterates over ncopies * vec_num
and simplified the indexing.

This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error
moves somewhere else.  I will look at that next but fixes some other libraries
that also started failing.

gcc/ChangeLog:

	PR tree-optimization/117557
	* tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
	vec_num loops.

gcc/testsuite/ChangeLog:

	PR tree-optimization/117557
	* gcc.target/aarch64/pr117557.c: New test.
  • Loading branch information
TamarChristinaArm committed Nov 28, 2024
1 parent aa9f12e commit 1b3bff7
Show file tree
Hide file tree
Showing 2 changed files with 281 additions and 252 deletions.
29 changes: 29 additions & 0 deletions gcc/testsuite/gcc.target/aarch64/pr117557.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/* { dg-do compile } */
/* { dg-options "-O3 -mcpu=neoverse-n1+sve -fdump-tree-vect" } */
/* { dg-final { check-function-bodies "**" "" } } */

#include <stdint.h>
#include <string.h>

#define N 8
#define L 8

/*
**f:
** ...
** ld1w z[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
** ld1w z[0-9]+.s, p([0-9]+)/z, \[x[0-9]+, z[0-9]+.s, sxtw\]
** st1w z[0-9]+.s, p\1, \[x[0-9]+, z[0-9]+.s, sxtw\]
** incb x([0-9]+), all, mul #2
** st1w z[0-9]+.s, p\2, \[x\3, z[0-9]+.s, sxtw\]
** ret
** ...
*/
void f(const uint8_t * restrict seq1,
const uint8_t *idx, uint8_t *seq_out) {
for (int i = 0; i < L; ++i) {
uint8_t h = idx[i];
memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
}
}

Loading

0 comments on commit 1b3bff7

Please sign in to comment.