Questions about inference mode #86

Tengfei09 · 2023-08-02T02:56:21Z

Hi,
I'm trying to use your wonderful framework to do inference only. However, I'm not familiar with serving-related settings in your code. How to remove them? or change a bit of code?

By the way, after dumping the HLO graph, I found that the datatype is still fp32 even though I have changed the datatype option.

python -m EasyLM.models.llama.llama_serve \
    --load_llama_config='7b' \
    --load_checkpoint="params::/data/hantengfei/llama2_7b/open_llama_7b_v2_easylm" \
    --tokenizer.vocab_file='/data/hantengfei/llama2_7b/tokenizer.model' \
    --mesh_dim='1,-1,1' \
    --dtype='fp16' \
    --input_length=1024 \
    --seq_length=2048 \
    --lm_server.batch_size=4 \
    --lm_server.port=35009 \
    --lm_server.pre_compile='generate'

The text was updated successfully, but these errors were encountered:

young-geng · 2023-08-02T18:56:35Z

I'm not sure if I understand which part you want to remove. The serving script basically implements the inference methods defined in the LMServer class. If you don't want to use the HTTP server you can easily modify the llama_serve.py to directly call those methods without spinning up a HTTP server.

Tengfei09 · 2023-08-03T01:15:13Z

Ok, Got it. Thanks for your reply.

By the way, How to change the datatype of the whole model? As I said before, After setting the option --dtype='fp16' , I still found that some gemm ops run in fp32.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about inference mode #86

Questions about inference mode #86

Tengfei09 commented Aug 2, 2023 •

edited

Loading

young-geng commented Aug 2, 2023

Tengfei09 commented Aug 3, 2023

Questions about inference mode #86

Questions about inference mode #86

Comments

Tengfei09 commented Aug 2, 2023 • edited Loading

young-geng commented Aug 2, 2023

Tengfei09 commented Aug 3, 2023

Tengfei09 commented Aug 2, 2023 •

edited

Loading