You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
agent log:
I0222 06:58:56.893624 1 main.go:31] start to run elastic gpu agent
I0222 06:58:56.893667 1 manager.go:146] start to run gpu manager
I0222 06:58:56.893701 1 manager.go:150] polling if the sitter has done listing pods:false
I0222 06:58:56.994223 1 manager.go:150] polling if the sitter has done listing pods:true
I0222 06:58:56.994290 1 base.go:237] start plugin elasticgpu.io/gpu-memory
I0222 06:58:56.994325 1 base.go:237] start plugin elasticgpu.io/gpu-core
E0222 06:59:01.935174 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:16.933856 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:28.932641 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:40.932691 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:55.933196 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
I0222 06:59:56.995228 1 base.go:250] gpushare plugin starts to GC
How to deal with the error ? need help, Thanks
The text was updated successfully, but these errors were encountered:
k8s version: 1.20.4
os: centos7
use master branch
k8s have 3 worker node,only one have nvidia t4 device
schedule error log:
I0222 08:58:21.218509 1 main.go:44] priority algorithm: binpack
I0222 08:58:21.283400 1 controller.go:57] Creating event broadcaster
I0222 08:58:21.285126 1 controller.go:104] begin to wait for cache
I0222 08:58:21.385693 1 controller.go:109] init the node cache successfully
I0222 08:58:21.586647 1 controller.go:115] init the pod cache successfully
I0222 08:58:21.586679 1 controller.go:118] end to wait for cache
I0222 08:58:21.586769 1 main.go:97] server starting on the port: 39999
I0222 08:58:21.586806 1 controller.go:128] Starting GPU Sharing Controller.
I0222 08:58:21.586825 1 controller.go:129] Waiting for informer caches to sync
I0222 08:58:21.586830 1 controller.go:131] Starting 1 workers.
I0222 08:58:21.586845 1 controller.go:136] Started workers
E0222 08:58:21.784264 1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x16888c0, 0xc000197950})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0
goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a
agent log:
I0222 06:58:56.893624 1 main.go:31] start to run elastic gpu agent
I0222 06:58:56.893667 1 manager.go:146] start to run gpu manager
I0222 06:58:56.893701 1 manager.go:150] polling if the sitter has done listing pods:false
I0222 06:58:56.994223 1 manager.go:150] polling if the sitter has done listing pods:true
I0222 06:58:56.994290 1 base.go:237] start plugin elasticgpu.io/gpu-memory
I0222 06:58:56.994325 1 base.go:237] start plugin elasticgpu.io/gpu-core
E0222 06:59:01.935174 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:16.933856 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:28.932641 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:40.932691 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:55.933196 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
I0222 06:59:56.995228 1 base.go:250] gpushare plugin starts to GC
How to deal with the error ? need help, Thanks
The text was updated successfully, but these errors were encountered: