schedule cannot be run #22

robin-2016 · 2023-02-22T09:06:57Z

k8s version: 1.20.4
os: centos7

use master branch

k8s have 3 worker node,only one have nvidia t4 device

schedule error log:
I0222 08:58:21.218509 1 main.go:44] priority algorithm: binpack
I0222 08:58:21.283400 1 controller.go:57] Creating event broadcaster
I0222 08:58:21.285126 1 controller.go:104] begin to wait for cache
I0222 08:58:21.385693 1 controller.go:109] init the node cache successfully
I0222 08:58:21.586647 1 controller.go:115] init the pod cache successfully
I0222 08:58:21.586679 1 controller.go:118] end to wait for cache
I0222 08:58:21.586769 1 main.go:97] server starting on the port: 39999
I0222 08:58:21.586806 1 controller.go:128] Starting GPU Sharing Controller.
I0222 08:58:21.586825 1 controller.go:129] Waiting for informer caches to sync
I0222 08:58:21.586830 1 controller.go:131] Starting 1 workers.
I0222 08:58:21.586845 1 controller.go:136] Started workers
E0222 08:58:21.784264 1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x16888c0, 0xc000197950})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0

goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a

agent log:
I0222 06:58:56.893624 1 main.go:31] start to run elastic gpu agent
I0222 06:58:56.893667 1 manager.go:146] start to run gpu manager
I0222 06:58:56.893701 1 manager.go:150] polling if the sitter has done listing pods:false
I0222 06:58:56.994223 1 manager.go:150] polling if the sitter has done listing pods:true
I0222 06:58:56.994290 1 base.go:237] start plugin elasticgpu.io/gpu-memory
I0222 06:58:56.994325 1 base.go:237] start plugin elasticgpu.io/gpu-core
E0222 06:59:01.935174 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:16.933856 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:28.932641 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:40.932691 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:55.933196 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
I0222 06:59:56.995228 1 base.go:250] gpushare plugin starts to GC

How to deal with the error ? need help, Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schedule cannot be run #22

schedule cannot be run #22

robin-2016 commented Feb 22, 2023

schedule cannot be run #22

schedule cannot be run #22

Comments

robin-2016 commented Feb 22, 2023