Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule cannot be run #22

Open
robin-2016 opened this issue Feb 22, 2023 · 0 comments
Open

schedule cannot be run #22

robin-2016 opened this issue Feb 22, 2023 · 0 comments

Comments

@robin-2016
Copy link

k8s version: 1.20.4
os: centos7

use master branch

k8s have 3 worker node,only one have nvidia t4 device
image

schedule error log:
I0222 08:58:21.218509 1 main.go:44] priority algorithm: binpack
I0222 08:58:21.283400 1 controller.go:57] Creating event broadcaster
I0222 08:58:21.285126 1 controller.go:104] begin to wait for cache
I0222 08:58:21.385693 1 controller.go:109] init the node cache successfully
I0222 08:58:21.586647 1 controller.go:115] init the pod cache successfully
I0222 08:58:21.586679 1 controller.go:118] end to wait for cache
I0222 08:58:21.586769 1 main.go:97] server starting on the port: 39999
I0222 08:58:21.586806 1 controller.go:128] Starting GPU Sharing Controller.
I0222 08:58:21.586825 1 controller.go:129] Waiting for informer caches to sync
I0222 08:58:21.586830 1 controller.go:131] Starting 1 workers.
I0222 08:58:21.586845 1 controller.go:136] Started workers
E0222 08:58:21.784264 1 runtime.go:78] Observed a panic: runtime.boundsError{x:0, y:0, signed:true, code:0x0} (runtime error: index out of range [0] with length 0)
goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x16888c0, 0xc000197950})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x7d
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x75
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a
panic: runtime error: index out of range [0] with length 0 [recovered]
panic: runtime error: index out of range [0] with length 0

goroutine 99 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00007c000})
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0xd8
panic({0x16888c0, 0xc000197950})
/usr/local/go/src/runtime/panic.go:1038 +0x215
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.GPUs.Transact({0xc000010038, 0x177041a, 0x1}, 0xc0005ab880)
/go/src/elastic-gpu-scheduler/pkg/scheduler/gpu.go:166 +0x396
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*NodeAllocator).Add(0xc0002ff1a0, 0xc000d0d0b0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/node.go:154 +0xd7
elasticgpu.io/elastic-gpu-scheduler/pkg/scheduler.(*GPUUnitScheduler).AddPod(0xc0003cac80, 0xc000d0d0b0)
/go/src/elastic-gpu-scheduler/pkg/scheduler/scheduler.go:248 +0x132
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).assignPod(0x0, 0x0)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:330 +0x49
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).syncPod(0xc0003ca600, {0xc000e06000, 0x26})
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:177 +0x4ca
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).processNextWorkItem(0xc0003ca600)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:197 +0x227
elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).runWorker(...)
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:147
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7fd07c1a8610)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x1499200, {0x193fd20, 0xc000d5dc80}, 0x1, 0xc00009d6e0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x4409a5)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0xc00009d6e0, 0x0)
/go/src/elastic-gpu-scheduler/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x25
created by elasticgpu.io/elastic-gpu-scheduler/pkg/controller.(*Controller).Run
/go/src/elastic-gpu-scheduler/pkg/controller/controller.go:133 +0x23a

agent log:
I0222 06:58:56.893624 1 main.go:31] start to run elastic gpu agent
I0222 06:58:56.893667 1 manager.go:146] start to run gpu manager
I0222 06:58:56.893701 1 manager.go:150] polling if the sitter has done listing pods:false
I0222 06:58:56.994223 1 manager.go:150] polling if the sitter has done listing pods:true
I0222 06:58:56.994290 1 base.go:237] start plugin elasticgpu.io/gpu-memory
I0222 06:58:56.994325 1 base.go:237] start plugin elasticgpu.io/gpu-core
E0222 06:59:01.935174 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:16.933856 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:28.932641 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:40.932691 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
E0222 06:59:55.933196 1 gpushare.go:229] annotation elasticgpu.io/assumed does not on pod default/cuda-gpu-test-69d586f88d-dpzxj:cuda
I0222 06:59:56.995228 1 base.go:250] gpushare plugin starts to GC

How to deal with the error ? need help, Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant