-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support with vLLM #17
Comments
We are not very familiar with vLLM and its internal mechanism. We will check its compatibility with SelfExtend. Thanks for your suggestion! |
+1, would love to see this in vLLM! |
+1, would love to see this in vLLM too! |
+1, would love to see this in vLLM, since lots of online services are based on vllm! It will be so ideal if we can easily use self extend trick on our online service! |
can this be used in vllm now? |
Hello!
Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?
This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!
The text was updated successfully, but these errors were encountered: