Support with vLLM #17

Aniketto16 · 2024-02-01T05:23:15Z

Hello!
Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?

This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!

Mooler0410 · 2024-02-03T05:44:31Z

We are not very familiar with vLLM and its internal mechanism. We will check its compatibility with SelfExtend. Thanks for your suggestion!

K-Mistele · 2024-03-01T01:50:17Z

+1, would love to see this in vLLM!

linchen111 · 2024-03-28T02:18:44Z

+1, would love to see this in vLLM too!

WeixuanXiong · 2024-04-25T08:35:45Z

+1, would love to see this in vLLM, since lots of online services are based on vllm! It will be so ideal if we can easily use self extend trick on our online service!

yxchng · 2024-11-30T09:21:19Z

can this be used in vllm now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support with vLLM #17

Support with vLLM #17

Aniketto16 commented Feb 1, 2024

Mooler0410 commented Feb 3, 2024

K-Mistele commented Mar 1, 2024

linchen111 commented Mar 28, 2024

WeixuanXiong commented Apr 25, 2024

yxchng commented Nov 30, 2024

Support with vLLM #17

Support with vLLM #17

Comments

Aniketto16 commented Feb 1, 2024

Mooler0410 commented Feb 3, 2024

K-Mistele commented Mar 1, 2024

linchen111 commented Mar 28, 2024

WeixuanXiong commented Apr 25, 2024

yxchng commented Nov 30, 2024