Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 'spring gateway' client frequently reports a "get services from nacos server fail Waited 3000 milliseconds Client not connected, current status:UNHEALTHY" exception #12999

Open
Robert-QiLin opened this issue Dec 27, 2024 · 3 comments

Comments

@Robert-QiLin
Copy link

Spring Gateway service intermittently reports a timeout exception.

  • Version [nacos-server 2.4.3, nacos-client 2.4.3]
  • SDK [spring cloud 2023.0.3, spring-boot 3.2.11 spring-cloud-alibaba 2023.0.1.2, spring 6.1.14]

[exception behavior]:
2024-12-26T19:58:27.622+08:00 ERROR 3431369 --- [cbsc-gateway] [ndedElastic-789] c.a.c.n.d.r.NacosReactiveDiscoveryClient : get services from nacos server fail,

com.alibaba.nacos.api.exception.NacosException: java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 71976 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@4c30f251[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@7ad1f4e4, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@10210d44, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@65f07418}}}]]
at com.alibaba.nacos.common.remote.client.grpc.GrpcConnection.request(GrpcConnection.java:82) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:647) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:624) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.requestToServer(NamingGrpcClientProxy.java:447) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.getServiceList(NamingGrpcClientProxy.java:368) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.NamingClientProxyDelegate.getServiceList(NamingClientProxyDelegate.java:159) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:515) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:503) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.cloud.nacos.discovery.NacosServiceDiscovery.getServices(NacosServiceDiscovery.java:70) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at com.alibaba.cloud.nacos.discovery.reactive.NacosReactiveDiscoveryClient.lambda$getServices$3(NacosReactiveDiscoveryClient.java:87) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.6.11.jar!/:3.6.11]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 71976 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@4c30f251[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@7ad1f4e4, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@10210d44, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@65f07418}}}]]
at com.alibaba.nacos.shaded.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:531) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.grpc.GrpcConnection.request(GrpcConnection.java:79) ~[nacos-client-2.4.3.jar!/:na]
... 18 common frames omitted

2024-12-26T19:58:27.622+08:00 ERROR 3431369 --- [cbsc-gateway] [ask-Scheduler-1] c.a.c.n.discovery.NacosDiscoveryClient : get service name from nacos server failed.

com.alibaba.nacos.api.exception.NacosException: java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 107677 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@11536a85[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@7ad1f4e4, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@10210d44, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@65f07418}}}]]
at com.alibaba.nacos.common.remote.client.grpc.GrpcConnection.request(GrpcConnection.java:82) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:647) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:624) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.requestToServer(NamingGrpcClientProxy.java:447) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.getServiceList(NamingGrpcClientProxy.java:368) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.NamingClientProxyDelegate.getServiceList(NamingClientProxyDelegate.java:159) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:515) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:503) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.cloud.nacos.discovery.NacosServiceDiscovery.getServices(NacosServiceDiscovery.java:70) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at com.alibaba.cloud.nacos.discovery.NacosDiscoveryClient.getServices(NacosDiscoveryClient.java:80) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at org.springframework.cloud.client.discovery.composite.CompositeDiscoveryClient.getServices(CompositeDiscoveryClient.java:68) ~[spring-cloud-commons-4.1.4.jar!/:4.1.4]
at com.github.xiaoymin.knife4j.spring.gateway.discover.ServiceChangeListener.discover(ServiceChangeListener.java:49) ~[knife4j-gateway-spring-boot-starter-4.5.0.jar!/:na]
at jdk.internal.reflect.GeneratedMethodAccessor101.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.context.event.ApplicationListenerMethodAdapter.doInvoke(ApplicationListenerMethodAdapter.java:365) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.ApplicationListenerMethodAdapter.processEvent(ApplicationListenerMethodAdapter.java:237) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.ApplicationListenerMethodAdapter.onApplicationEvent(ApplicationListenerMethodAdapter.java:168) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:185) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:178) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:156) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:452) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:385) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.cloud.gateway.route.RouteRefreshListener.reset(RouteRefreshListener.java:73) ~[spring-cloud-gateway-server-4.1.5.jar!/:4.1.5]
at org.springframework.cloud.gateway.route.RouteRefreshListener.resetIfNeeded(RouteRefreshListener.java:68) ~[spring-cloud-gateway-server-4.1.5.jar!/:4.1.5]
at org.springframework.cloud.gateway.route.RouteRefreshListener.onApplicationEvent(RouteRefreshListener.java:62) ~[spring-cloud-gateway-server-4.1.5.jar!/:4.1.5]
at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:185) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:178) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:156) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:452) ~[spring-context-6.1.14.jar!/:6.1.14]
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:385) ~[spring-context-6.1.14.jar!/:6.1.14]
at com.alibaba.cloud.nacos.discovery.NacosDiscoveryHeartBeatPublisher.publishHeartBeat(NacosDiscoveryHeartBeatPublisher.java:104) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-6.1.14.jar!/:6.1.14]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: java.util.concurrent.TimeoutException: Waited 3000 milliseconds (plus 107677 nanoseconds delay) for com.alibaba.nacos.shaded.io.grpc.stub.ClientCalls$GrpcFuture@11536a85[status=PENDING, info=[GrpcFuture{clientCall=ClientCallImpl{method=MethodDescriptor{fullMethodName=Request/request, type=UNARY, idempotent=false, safe=false, sampledToLocalTracing=true, requestMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@7ad1f4e4, responseMarshaller=com.alibaba.nacos.shaded.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller@10210d44, schemaDescriptor=com.alibaba.nacos.api.grpc.auto.RequestGrpc$RequestMethodDescriptorSupplier@65f07418}}}]]
at com.alibaba.nacos.shaded.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:531) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.grpc.GrpcConnection.request(GrpcConnection.java:79) ~[nacos-client-2.4.3.jar!/:na]
... 38 common frames omitted

2024-12-26T19:58:28.026+08:00 ERROR 3431369 --- [cbsc-gateway] [ndedElastic-789] c.a.c.n.d.r.NacosReactiveDiscoveryClient : get services from nacos server fail,

com.alibaba.nacos.api.exception.NacosException: Client not connected, current status:UNHEALTHY
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:645) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.common.remote.client.RpcClient.request(RpcClient.java:624) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.requestToServer(NamingGrpcClientProxy.java:447) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.gprc.NamingGrpcClientProxy.getServiceList(NamingGrpcClientProxy.java:368) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.remote.NamingClientProxyDelegate.getServiceList(NamingClientProxyDelegate.java:159) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:515) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:503) ~[nacos-client-2.4.3.jar!/:na]
at com.alibaba.cloud.nacos.discovery.NacosServiceDiscovery.getServices(NacosServiceDiscovery.java:70) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at com.alibaba.cloud.nacos.discovery.reactive.NacosReactiveDiscoveryClient.lambda$getServices$3(NacosReactiveDiscoveryClient.java:87) ~[spring-cloud-starter-alibaba-nacos-discovery-2023.0.1.2.jar!/:2023.0.1.2]
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.6.11.jar!/:3.6.11]
at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.6.11.jar!/:3.6.11]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

@KomachiSion
Copy link
Collaborator

超时问题请自查如下3点:

  1. nacos-server是否有FullGC,高CPU,线程池满等问题
  2. nacos-client(即报错应用)是否有FullGC,高CPU,线程池满问题
  3. 网络问题(包括dns解析,收发tcp包,vpn,slb等)

@Robert-QiLin
Copy link
Author

超时问题请自查如下3点:

  1. nacos-server是否有FullGC,高CPU,线程池满等问题

  2. nacos-client(即报错应用)是否有FullGC,高CPU,线程池满问题

  3. 网络问题(包括dns解析,收发tcp包,vpn,slb等)

服务端跟客户端 cpu 8%使用率 线程数也不高

@KomachiSion
Copy link
Collaborator

如果确认客户端和服务端本身没问题, 那大概率就是网络问题了,我解决过已下几个场景:

  1. dns故障,导致域名解析超时。
  2. vpn/slb直接丢包,导致服务端未收到请求,同时客户端无回复
  3. 网络本身延迟高,客户端收到response 包很晚,导致超时。

可以尝试调高超时时间来看下是否能够解决

https://nacos.io/docs/latest/manual/user/java-sdk/properties/?spm=5238cd80.2ef5001f.0.0.3f613b7co4QHHZ#24-%E8%BF%9E%E6%8E%A5%E7%9B%B8%E5%85%B3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants