Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: PHP message: [ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" } #2991

Open
piotrekkr opened this issue Dec 11, 2024 · 21 comments
Labels
🐛 bug Something isn't working

Comments

@piotrekkr
Copy link

piotrekkr commented Dec 11, 2024

Bug report

Seems like this bug reappeared in 1.5.1 on Cloud Run in GCP. After deploy we had a lot of those errors today. Revert to 1.1.0 helped.

Errors:

PHP message: [ddtrace] [error] Failed sending remote config data: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

PHP message: [ddtrace] [error] Failed signaling lifecycle end: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

Docker image we use

FROM php:8.2.26-fpm-bookworm@sha256:f9bee1c4e181e5d8f3ba06b00763210982c927dcd4df86435b24b9bcc10aac1c 

# ......

COPY --from=datadog/serverless-init:1.5.1 /datadog-init /app/datadog-init
ADD https://github.com/DataDog/dd-trace-php/releases/download/1.5.1/datadog-setup.php /datadog-setup.php
RUN php /datadog-setup.php --php-bin=all --enable-profiling

ARG APP_VERSION
ENV APP_VERSION="${APP_VERSION}"

ENV DD_VERSION="${APP_VERSION}"
ENV DD_LOGS_ENABLED=true
ENV DD_TRACE_ENABLED=false
ENV DD_TRACE_PROPAGATION_STYLE=datadog
ENV DD_APM_ENABLED=false
ENV DD_SERVICE=my-service
ENV DD_SITE=datadoghq.eu
ENV DD_DOGSTATSD_PORT=8125

ENTRYPOINT ["/app/datadog-init"]

CMD ["/usr/local/sbin/php-fpm"]

When running container we set

DD_ENV=production
DD_APM_ENABLED=true
DD_TRACE_ENABLED=true

PHP version

8.2.26

Tracer or profiler version

1.5.1

Installed extensions

bcmath
bz2
Core
ctype
curl
datadog-profiling
date
ddappsec
ddtrace
dom
fileinfo
filter
gd
gmp
hash
iconv
imagick
intl
json
libxml
mbstring
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
random
readline
Reflection
session
SimpleXML
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache
datadog-profiling
ddappsec
ddtrace

Output of phpinfo()



Datadog PHP tracer extension
For help, check out the documentation at https://docs.datadoghq.com/tracing/languages/php/
(c) Datadog 2020

Datadog tracing support => enabled
Version => 1.5.1
DATADOG TRACER CONFIGURATION => {
    "date": "2024-12-11T17:56:29Z",
    "os_name": "Linux d68a4bc6aca3 6.8.0-47-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct  2 16:16:55 UTC 2 x86_64",
    "os_version": "6.8.0-47-generic",
    "version": "1.5.1",
    "lang": "php",
    "lang_version": "8.2.25",
    "env": null,
    "enabled": true,
    "service": "api-core",
    "enabled_cli": true,
    "agent_url": "http:\/\/localhost:8126",
    "debug": false,
    "analytics_enabled": false,
    "sample_rate": -1,
    "sampling_rules": [],
    "tags": [],
    "service_mapping": [],
    "distributed_tracing_enabled": true,
    "dd_version": "1.1862.0",
    "architecture": "x86_64",
    "instrumentation_telemetry_enabled": false,
    "sapi": "cli",
    "datadog.trace.sources_path": "\/opt\/datadog\/dd-library\/1.5.1\/dd-trace-sources\/src",
    "open_basedir_configured": false,
    "uri_fragment_regex": null,
    "uri_mapping_incoming": null,
    "uri_mapping_outgoing": null,
    "auto_flush_enabled": true,
    "generate_root_span": true,
    "http_client_split_by_domain": false,
    "measure_compile_time": true,
    "report_hostname_on_root_span": false,
    "traced_internal_functions": null,
    "enabled_from_env": false,
    "opcache.file_cache": null,
    "sidecar_trace_sender": false,
    "agent_error": "Failed to connect to localhost port 8126 after 0 ms: Couldn't connect to server"
}

Upgrading from

Upgrading from 1.1.0

@piotrekkr piotrekkr added the 🐛 bug Something isn't working label Dec 11, 2024
@piotrekkr
Copy link
Author

Could it be because I set DD_DOGSTATSD_PORT=8125 and it is not used by ddtrace and it fails with Failed to connect to localhost port 8126 after 0 ms: Couldn't connect to server?

@piotrekkr
Copy link
Author

Just happened on some other services we deploy. Same serverless-init and ddtrace version. I was forced to revert to 1.1.0.

@doctenahasib
Copy link

I have the same issue on 1.5.1. I reverted back to 1.1.0 as suggested by @piotrekkr and the error is gone.

@gael-donat
Copy link

is there any news to this ? maybe @bwoebi have an idea ?

@rdab
Copy link

rdab commented Jan 6, 2025

same issue with php8.0 and tracer 1.5.1. Any suggestions?

@piotrekkr
Copy link
Author

same issue with php8.0 and tracer 1.5.1. Any suggestions?

@rdab Downgrade tracer. 1.1.0 works for me but maybe some new version between 1.1.0 and 1.5.1 also works. You can test if you have time to waste :)

@nina9753
Copy link

nina9753 commented Jan 9, 2025

The issue seems to be related to remote config from your error message Failed sending remote config data. The remote config needs to be disabled DD_REMOTE_CONFIG_ENABLED=False . Can you please try setting this environment variable either inside your dockerfile or the GCP UI?

@piotrekkr
Copy link
Author

@nina9753 In docs they mention that this needs to be enabled using env var or agent config file to work. I did not enable any remote config at all. I'll try today to see if this helps tho.

@piotrekkr
Copy link
Author

piotrekkr commented Jan 10, 2025

Okay so my findings so far are that DD_REMOTE_CONFIG_ENABLED=false seem to be fixing issue. Without remote config my prod app started to generate those errors just after deploy. After redeploying with this flag set to false it seems to be not producing any such errors. So far it works for like 30 min without issues.

Not sure why it is enabled by default, but it probably should not be, since it breaks just after deploying it wit serverless. Maybe this was not tested that well before release 🤷 . I've found some code setting this up:

https://github.com/DataDog/dd-trace-php/blame/9607a5fb34cecb38bf846949aaefdcb27405e5a8/ext/configuration.h#L236

It was added 4 months ago so probably versions before this will work without this flag.
DD devs should consider disabling this by default if possible or docs should be updated flagging this as possible issue with serverless and how to fix it.

@247software-harshal-ringe

Hi, I'm facing the same issue in AWS lambda that uses PHP:8.1, /datadog/lambda-extension:68, dd-trace-php:1.6.2.
I tried DD_REMOTE_CONFIG_ENABLED=false but it has not resolved my issue.

Any suggestions?

@piotrekkr
Copy link
Author

@247software-harshal-ringe You sure it is set at runtime properly? Maybe there is some wrong config somewhere or deployment was not fully done. For me, it failed immediately after deploy and was fixed just after redeploying it with this env var set to false. Maybe try setting it inside Dockerfile like ENV DD_REMOTE_CONFIG_ENABLED=false so you will be sure that it is always set.

@doctenahasib
Copy link

@piotrekkr Hello, are you sure it is DD_REMOTE_CONFIG_ENABLED and not DD_REMOTE_CONFIGURATION_ENABLED like mentioned in the documentation ? Or is maybe the documentation wrong ?

doc: https://docs.datadoghq.com/agent/remote_config/?tab=environmentvariable

@247software-harshal-ringe

Hi @piotrekkr ,
I have added below environment variables in Lambda.

DD_REMOTE_CONFIG_ENABLED=false
DD_REMOTE_CONFIGURATION_ENABLED=false

DD_EXTENSION_VERSION=compatibility

DD_SITE=datadoghq.eu

DD_APM_ENABLED=true
DD_DISTRIBUTED_TRACING=true
DD_LOGS_INJECTION=true
DD_TRACE_CLI_ENABLED=true
DD_TRACE_CURL_ENABLED=true
DD_TRACE_ELOQUENT_ENABLED=true
DD_TRACE_ENABLED=true
DD_TRACE_GUZZLE_ENABLED=true
DD_TRACE_LARAVEL_ENABLED=true
DD_TRACE_LARAVELQUEUE_ENABLED=true
DD_TRACE_MONGO_ENABLED=true
DD_TRACE_MONGODB_ENABLED=true
DD_TRACE_MYSQLI_ENABLED=true
DD_TRACE_PDO_ENABLED=true
DD_TRACE_PREDIS_ENABLED=true

@piotrekkr
Copy link
Author

piotrekkr commented Jan 13, 2025

@247software-harshal-ringe Can you add a log line when handling some requests with DD_REMOTE_CONFIG_ENABLED? Something like $this->log->info("DD_REMOTE_CONFIG_ENABLED=".getenv('DD_REMOTE_CONFIG_ENABLED')). Just to be sure that all is loaded and applied on app level. Is error exactly the same as mine?

This error message do not block app running and just pollutes logs so other logs should also be sent to DD.

@piotrekkr
Copy link
Author

piotrekkr commented Jan 13, 2025

@piotrekkr Hello, are you sure it is DD_REMOTE_CONFIG_ENABLED and not DD_REMOTE_CONFIGURATION_ENABLED like mentioned in the documentation ? Or is maybe the documentation wrong ?

doc: docs.datadoghq.com/agent/remote_config?tab=environmentvariable

Hi @doctenahasib . I believe more in code than in docs in case of DD. It may sometimes be not updated or misleading. In this repo code I found DD_REMOTE_CONFIG_ENABLED only (other one was only set in docker compose file). Maybe the other one is used by DD agent app 🤷 .

@247software-harshal-ringe

@247software-harshal-ringe Can you add a log line when handling some requests with DD_REMOTE_CONFIG_ENABLED? Something like $this->log->info("DD_REMOTE_CONFIG_ENABLED=".getenv('DD_REMOTE_CONFIG_ENABLED')). Just to be sure that all is loaded and applied on app level. Is error exactly the same as mine?

This error message do not block app running and just pollutes logs so other logs should also be sent to DD.

Hi @piotrekkr , I logged the DD_REMOTE_CONFIG_ENABLED value and It is showing the correct value i.e DD_REMOTE_CONFIG_ENABLED=false

Any suggestions?

@piotrekkr
Copy link
Author

@247software-harshal-ringe I'm out of ideas right now. If you could provide minimal hello world app you deploy on AWS lambda that shows same issues, DD devs could probably debug this. For now you are stuck with downgrading ddtrace to earlier version.

Is it really identical error as in my case?

@247software-harshal-ringe

Thanks, @piotrekkr for the help.

Yes, the error message is identical & I have raised the issue with the support team

@bwoebi
Copy link
Collaborator

bwoebi commented Jan 15, 2025

Hey everyone,

We managed to reproduce the issue a few days ago, and there's good news: DD_SPAWN_WORKER_USE_EXEC=1 will fix it.
It's not optimal, and we found that the root cause of the issue is actually DataDog/libdatadog@8d6b9cc. This code snippet causes serverless environments to break. (I don't fully understand why it is unhappy with /proc/ instead of /proc/self, but it does.)

We'll work on a proper fix for this.
Additionally we found that logging of the sidecar was broken on serverless environments, because they use special paths (e.g. host:[4] for standard output and error, which the sidecar did not understand). We're going to add some workaround for this soon.

But just to be sure, can you all confirm that the DD_SPAWN_WORKER_USE_EXEC=1 indeed fixes it?

Thank you!

@247software-harshal-ringe

Hi @bwoebi, It did not resolve my issue.

@bwoebi
Copy link
Collaborator

bwoebi commented Jan 16, 2025

Hm. @247software-harshal-ringe Could you please install 1.6.3 and set DD_TRACE_LOG_FILE=/proc/1/fd/2 (this should make the log output of sidecar go to the container log) and DD_TRACE_DEBUG=1.
This may contain useful information for getting to the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants