-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running a model other than CoPilot crashes #5
Comments
Same issue here. |
Seems like there should be another server running at port 8000, and it isn't. |
Okay, I tried running the model call which should open a 8000: python3 -m llama_cpp.server --model '/Users/me/models/codellama-7b.Q5_K_S.gguf' --n_gpu_layers 1 --n_ctx 4096
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 4435.68 MB
..................................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: freq_base = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 2048.00 MB
llama_new_context_with_model: ggml_metal_init() failed
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/surak/Devel/localpilot/venv/lib/python3.11/site-packages/llama_cpp/server/__main__.py", line 96, in <module>
app = create_app(settings=settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/surak/Devel/localpilot/venv/lib/python3.11/site-packages/llama_cpp/server/app.py", line 343, in create_app
llama = llama_cpp.Llama(
^^^^^^^^^^^^^^^^
File "/Users/surak/Devel/localpilot/venv/lib/python3.11/site-packages/llama_cpp/llama.py", line 377, in __init__
assert self.ctx is not None
^^^^^^^^^^^^^^^^^^^^
AssertionError So the error in my case is on llama.cpp on my machine. |
The way to see it is not hide the output of the process call for llama.cpp. Just edit the file proxy.py and change this: diff --git a/proxy.py b/proxy.py
index 09d488c..177989b 100644
--- a/proxy.py
+++ b/proxy.py
@@ -22,7 +22,7 @@ def start_local_server(model_filename):
"--n_gpu_layers", "1", "--n_ctx", "4096"] # TODO: set this more correctly
logging.debug('Running: %s' % ' '.join(cmd))
local_server_process = subprocess.Popen(
- cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
+ cmd)#, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
@app.route('/set_target', methods=['POST']) |
In my case (intel macbook pro), the error is that there's no "gpu" for it to use, so removing the n_gpu_layers from the llama.cpp call seems to work: diff --git a/proxy.py b/proxy.py
index 09d488c..1e39b98 100644
--- a/proxy.py
+++ b/proxy.py
@@ -19,7 +19,7 @@ def start_local_server(model_filename):
local_server_process.terminate()
local_server_process.wait()
cmd = ["python3", "-m", "llama_cpp.server", "--model", model_filename,
- "--n_gpu_layers", "1", "--n_ctx", "4096"] # TODO: set this more correctly
+ "--n_ctx", "4096"] # TODO: set this more correctly
logging.debug('Running: %s' % ' '.join(cmd))
local_server_process = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) |
Another way of testing your local llama.cpp is just opening http://localhost:8000/docs |
The text was updated successfully, but these errors were encountered: