Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divide by 0 error #48

Closed
bendichter opened this issue Aug 8, 2024 · 14 comments · Fixed by #53
Closed

divide by 0 error #48

bendichter opened this issue Aug 8, 2024 · 14 comments · Fixed by #53
Assignees

Comments

@bendichter
Copy link
Contributor

running the testing files on an M1 mac, on step 3, Model Creation, I get this traceback in the log:

[LOGGING STARTED AT: 2024-08-08 16-33-41]2024-08-08 16:33:41.664 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 378 : Train Variational Autoencoder - model name: VAME
2024-08-08 16:33:41.665 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 392 : warning, a GPU was not found... proceeding with CPU (slow!)
2024-08-08 16:33:41.665 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 437 : Latent Dimensions: 30, Time window: 30, Batch Size: 256, Beta: 1, lr: 0.0005
2024-08-08 16:33:41.680 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 487 : Scheduler step size: 100, Scheduler gamma: 0.20
2024-08-08 16:33:41.680 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 493 : Start training...
2024-08-08 16:33:41.681 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 87 : Training Model: 0%| | 0/499 [00:00<?, ?epoch/s]
2024-08-08 16:33:42.637 INFO --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 87 : Training Model: 0%| | 0/499 [00:00<?, ?epoch/s]
2024-08-08 16:33:42.637 ERROR --- [Thread-150 (process_request_thread)] vame.model.rnn_vae : 566 : An error occurred: float division by zero
Traceback (most recent call last):
File "vame/model/rnn_vae.py", line 502, in train_model
File "vame/model/rnn_vae.py", line 351, in test
ZeroDivisionError: float division by zero
@bendichter
Copy link
Contributor Author

@vinicvaz, any idea why this error might be occuring?

@vinicvaz
Copy link
Contributor

Hey @bendichter
What is the size of the data you are using?
If your data is small maybe it should be related to a big batch_size.
Can you try reducing the batch_size to see if it works?

@bendichter
Copy link
Contributor Author

It was the test data with default params

@bendichter
Copy link
Contributor Author

@luiztauffer can you please look into this

@vinicvaz
Copy link
Contributor

@bendichter the testing files you mean the raw files or the cropped ones that are in the vame repository in the tests folder? Can you share the link to the files you are using so I can reproduce it here?
Thnkas

@bendichter
Copy link
Contributor Author

@vinicvaz
Copy link
Contributor

Got it. This is the cropped data, and it's quite small, so a batch_size=256 is too big. Could you please test with smaller values and let me know if it still breaks?

@luiztauffer
Copy link
Collaborator

luiztauffer commented Aug 16, 2024

I can reproduce the error, it is indeed due to using a large batch size for a small dataset.
Since this is a vame-py related error, not desktop app, I moved it here: EthoML/VAME#75

@bendichter
Copy link
Contributor Author

bendichter commented Aug 16, 2024

@luiztauffer could you fix this by adjusting the batch size?

@bendichter bendichter reopened this Aug 16, 2024
@luiztauffer
Copy link
Collaborator

yes, it needs to be small, try it with 10 for example

@luiztauffer
Copy link
Collaborator

@bendichter
Copy link
Contributor Author

Could you create a test config with proper settings? I understand that mechanically this not an issue with the desktop app, but practically it is because we aren't sufficiently communicating to a naive user how to run the app all the way through. Creating a config file that works for the test data would go a long way.

@luiztauffer
Copy link
Collaborator

@bendichter maybe it's better to point to this dataset, instead? That's what people should use for testing themselves: https://ethoml.github.io/VAME/docs/getting_started/running/#1-download-the-necessary-resources
the data you're using is the one we use for the github actions only

@bendichter
Copy link
Contributor Author

OK, so far so good. The training step says it could take 6.5 hours so I won't be able to test the whole thing for a while but it is running now. Let's add this to the README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants