Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data race in tracer/payload.go #2631

Open
jared-mackey opened this issue Mar 22, 2024 · 4 comments
Open

[BUG] Data race in tracer/payload.go #2631

jared-mackey opened this issue Mar 22, 2024 · 4 comments
Assignees
Labels
bug unintended behavior that has to be fixed waiting-for-info waiting for answer from issue creator

Comments

@jared-mackey
Copy link

jared-mackey commented Mar 22, 2024

Version of dd-trace-go
v1.61.0

Describe what happened:
It appears that there is a data race in reporting traces to DD. Might be related to #330. However, we are seeing it in our staging environment where we have data race detection enabled and sending traces to real DD servers.

Apologies for the poor formatting. Got it from a CSV export from DD logs.

==================
WARNING: DATA RACE
Read at 0x00c002f35a58 by goroutine 644368:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:144 +0x72
  io.copyBuffer()
  /usr/local/go/src/io/io.go:429 +0x29a
  io.CopyBuffer()
  /usr/local/go/src/io/io.go:402 +0x8f
  net/http.(*transferWriter).doBodyCopy()
  /usr/local/go/src/net/http/transfer.go:416 +0x144
  net/http.(*transferWriter).writeBody()
  /usr/local/go/src/net/http/transfer.go:360 +0xee9
  net/http.(*Request).write()
  /usr/local/go/src/net/http/request.go:755 +0x1413
  net/http.(*persistConn).writeLoop()
  /usr/local/go/src/net/http/transport.go:2447 +0x379
  net/http.(*Transport).dialConn.gowrap3()
  /usr/local/go/src/net/http/transport.go:1800 +0x33

Previous write at 0x00c002f35a58 by goroutine 644622:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).updateHeader()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:125 +0x39b
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/payload.go:99 +0x2e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:121 +0x204
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:126 +0x41

Goroutine 644368 (running) created at:
  net/http.(*Transport).dialConn()
  /usr/local/go/src/net/http/transport.go:1800 +0x27fe
  net/http.(*Transport).dialConnFor()
  /usr/local/go/src/net/http/transport.go:1485 +0x124
  net/http.(*Transport).queueForDial.gowrap1()
  /usr/local/go/src/net/http/transport.go:1449 +0x44

Goroutine 644622 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/writer.go:91 +0x2e4
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/tracer.go:367 +0x284
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
  /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.61.0/ddtrace/tracer/tracer.go:318 +0x16d

Some other logs that are relevant

Mar 22 07:13:14.692 Datadog Tracer v1.61.0 ERROR: lost 2 traces: timeout
Mar 22 07:13:14.692 (Status: Request Timeout) (occurred: 22 Mar 24 13:12 UTC)
Mar 22 07:13:14.692 Datadog Tracer v1.61.0 ERROR: failure sending traces (attempt 1), will retry: timeout
Mar 22 07:13:14.692 (Status: Request Timeout) (occurred: 22 Mar 24 13:12 UTC)

Describe what you expected:
No data races :)

Additional environment details (Version of Go, Operating System, etc.):
Golang 1.22.1
Binaries compiled with -race
Running inside EKS on x86 instances
Agent (“DD servers”) running as a daemonset. Should be a local pod to this pod.

@jared-mackey jared-mackey added the bug unintended behavior that has to be fixed label Mar 22, 2024
@github-actions github-actions bot added the needs-triage New issues that have not yet been triaged label Mar 22, 2024
@darccio
Copy link
Member

darccio commented Mar 27, 2024

@jared-mackey Thanks for reaching out! We are going to review it.

@darccio darccio removed the needs-triage New issues that have not yet been triaged label Mar 27, 2024
@SvenGasterstaedt
Copy link
Contributor

We are also experiencing some data races regarding tracer/payload.go

this is similar to the one mentioned above (but read/write are switched)

WARNING: DATA RACE
Write at 0x00c00197fc98 by goroutine 95651:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).updateHeader()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:129 +0x1c8
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:99 +0x33
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:121 +0x93e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:126 +0x46

Previous read at 0x00c00197fc98 by goroutine 93651:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:144 +0x9e
  io.copyBuffer()
      /usr/local/go/src/io/io.go:429 +0x596
  io.CopyBuffer()
      /usr/local/go/src/io/io.go:402 +0x145
  net/http.(*transferWriter).doBodyCopy()
      /usr/local/go/src/net/http/transfer.go:416 +0x247
  net/http.(*transferWriter).writeBody()
      /usr/local/go/src/net/http/transfer.go:360 +0x530
  net/http.(*Request).write()
      /usr/local/go/src/net/http/request.go:755 +0x2445
  net/http.(*persistConn).writeLoop()
      /usr/local/go/src/net/http/transport.go:2447 +0x478
  net/http.(*Transport).dialConn.gowrap3()
      /usr/local/go/src/net/http/transport.go:1800 +0x38

Goroutine 95651 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:91 +0x32c
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:379 +0x6ad
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:326 +0x2af

Goroutine 93651 (running) created at:
  net/http.(*Transport).dialConn()
      /usr/local/go/src/net/http/transport.go:1800 +0x45d7
  net/http.(*Transport).dialConnFor()
      /usr/local/go/src/net/http/transport.go:1485 +0x195
  net/http.(*Transport).queueForDial.gowrap1()
      /usr/local/go/src/net/http/transport.go:1449 +0x50

and this is one data race regarding the reset of the payload:

WARNING: DATA RACE
Write at 0x00c000a626e0 by goroutine 95651:
  bytes.(*Reader).Seek()
      /usr/local/go/src/bytes/reader.go:117 +0x68
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).reset()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:101 +0x9a
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.func1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:121 +0x93e
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush.gowrap1()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:126 +0x46

Previous write at 0x00c000a626e0 by goroutine 93651:
  bytes.(*Reader).Read()
      /usr/local/go/src/bytes/reader.go:43 +0x132
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*payload).Read()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/payload.go:153 +0x3c5
  io.copyBuffer()
      /usr/local/go/src/io/io.go:429 +0x596
  io.CopyBuffer()
      /usr/local/go/src/io/io.go:402 +0x145
  net/http.(*transferWriter).doBodyCopy()
      /usr/local/go/src/net/http/transfer.go:416 +0x247
  net/http.(*transferWriter).writeBody()
      /usr/local/go/src/net/http/transfer.go:360 +0x530
  net/http.(*Request).write()
      /usr/local/go/src/net/http/request.go:755 +0x2445
  net/http.(*persistConn).writeLoop()
      /usr/local/go/src/net/http/transport.go:2447 +0x478
  net/http.(*Transport).dialConn.gowrap3()
      /usr/local/go/src/net/http/transport.go:1800 +0x38

Goroutine 95651 (running) created at:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*agentTraceWriter).flush()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/writer.go:91 +0x32c
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).worker()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:379 +0x6ad
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newTracer.func2()
      /app/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/tracer.go:326 +0x2af

Goroutine 93651 (running) created at:
  net/http.(*Transport).dialConn()
      /usr/local/go/src/net/http/transport.go:1800 +0x45d7
  net/http.(*Transport).dialConnFor()
      /usr/local/go/src/net/http/transport.go:1485 +0x195
  net/http.(*Transport).queueForDial.gowrap1()
      /usr/local/go/src/net/http/transport.go:1449 +0x50

@darccio darccio self-assigned this Apr 30, 2024
@darccio
Copy link
Member

darccio commented Jul 19, 2024

@jared-mackey @SvenGasterstaedt Are you still observing this kind of issue in your services? We tried a few times since April to debug this but no luck yet.

Did you reproduce it in higher versions? Thanks!

@darccio darccio added the waiting-for-info waiting for answer from issue creator label Jul 31, 2024
@darccio
Copy link
Member

darccio commented Jul 31, 2024

@jared-mackey @SvenGasterstaedt If you are still experiencing this issue, we recommend to open a support ticket so we can gather more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unintended behavior that has to be fixed waiting-for-info waiting for answer from issue creator
Projects
None yet
Development

No branches or pull requests

3 participants