Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add Dockerfile and .yml for building conda env #528

Merged
merged 13 commits into from
Dec 21, 2023

Conversation

gcroci2
Copy link
Collaborator

@gcroci2 gcroci2 commented Nov 27, 2023

The Dockerfile proposed here downloads the executable file from dssp repo, and installs it in the container context. It also uses a yml file (together with a txt file) for installing conda and pip dependencies, respectively.

@DaniBodor for reviewing this PR you can follow the new docker instructions and the new installation instructions.

I opened a separate issue for creating an action for testing the Dockerfile #529.

When this PR is going to be merged, remember to update the JOSS' reviewers (openjournals/joss-reviews#5983)

@gcroci2 gcroci2 self-assigned this Nov 27, 2023
@gcroci2 gcroci2 linked an issue Nov 27, 2023 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Nov 27, 2023

Pull Request Test Coverage Report for Build 7277606585

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 0.0%

Totals Coverage Status
Change from base Build 7277589910: 0.0%
Covered Lines: 0
Relevant Lines: 3682

💛 - Coveralls

Copy link
Collaborator

@dsmits dsmits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, couple of remarks

Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Dockerfile Outdated
WORKDIR /home/deeprank2

# Define default command
CMD ["bash"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CMD ["bash"]
CMD ["/opt/conda/envs/deeprank2/bin/jupyter", "notebook"]

Copy link
Collaborator Author

@gcroci2 gcroci2 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't working (no jupyter server was started)
I used the following instead:
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--NotebookApp.token=''","--NotebookApp.password=''", "--allow-root"]
Without disabling the password, the browser was asking for the token (and the token from the terminal wasn't working).
Anyway, I'd avoid using this if it's unsafe and there are no safe ways to disable the token request. The "bash" version seems easier to me from a user perspective in that case. @dsmits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge soon the PR, but will leave this conversation unsolved so you can comment when you come back :) @dsmits

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
docs/installation.md Outdated Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Show resolved Hide resolved
Copy link
Collaborator

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done reviewing, but will already release these comments, as we will need some changes already. I will continue from here with a new review

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the rest of my review.

I also tested installing the env from yml file, as that is new too, and works fine (ran all tests and they passed)

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
docs/installation.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@gcroci2 gcroci2 requested review from DaniBodor and dsmits November 28, 2023 22:14
Copy link
Collaborator

@DaniBodor DaniBodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error in the trainer notebook if I run it now (in the cell where GNN is tested). I get the same error if I run it locally from this branch and the latest env, but not from my working repo/env.

UnboundLocalError Traceback (most recent call last)
Cell In[9], line 7
4 earlystop_maxgap = 0.1
5 min_epoch = 10
----> 7 trainer.train(
8 nepoch = epochs,
9 batch_size = batch_size,
10 earlystop_patience = earlystop_patience,
11 earlystop_maxgap = earlystop_maxgap,
12 min_epoch = min_epoch,
13 validate = True,
14 filename = os.path.join(output_path, f"gnn_{task}", "model.pth.tar"))
16 epoch = trainer.epoch_saved_model
17 print(f"Model saved at epoch {epoch}")

File /opt/conda/envs/deeprank2/lib/python3.10/site-packages/deeprank2/trainer.py:640, in Trainer.train(self, nepoch, batch_size, shuffle, earlystop_patience, earlystop_maxgap, min_epoch, validate, num_workers, best_model, filename)
638 # Now that the training loop is over, save the model
639 if filename:
--> 640 torch.save(checkpoint_model, filename)
641 self.opt_loaded_state_dict = checkpoint_model["optimizer_state"]
642 self.model_load_state_dict = checkpoint_model["model_state"]

UnboundLocalError: local variable 'checkpoint_model' referenced before assignment

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
tutorials/data_generation_ppi.ipynb Outdated Show resolved Hide resolved
tutorials/data_generation_srv.ipynb Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@gcroci2
Copy link
Collaborator Author

gcroci2 commented Dec 12, 2023

I'm getting an error in the trainer notebook if I run it now (in the cell where GNN is tested). I get the same error if I run it locally from this branch and the latest env, but not from my working repo/env.

UnboundLocalError Traceback (most recent call last)
Cell In[9], line 7
4 earlystop_maxgap = 0.1
5 min_epoch = 10
----> 7 trainer.train(
8 nepoch = epochs,
9 batch_size = batch_size,
10 earlystop_patience = earlystop_patience,
11 earlystop_maxgap = earlystop_maxgap,
12 min_epoch = min_epoch,
13 validate = True,
14 filename = os.path.join(output_path, f"gnn_{task}", "model.pth.tar"))
16 epoch = trainer.epoch_saved_model
17 print(f"Model saved at epoch {epoch}")
File /opt/conda/envs/deeprank2/lib/python3.10/site-packages/deeprank2/trainer.py:640, in Trainer.train(self, nepoch, batch_size, shuffle, earlystop_patience, earlystop_maxgap, min_epoch, validate, num_workers, best_model, filename)
638 # Now that the training loop is over, save the model
639 if filename:
--> 640 torch.save(checkpoint_model, filename)
641 self.opt_loaded_state_dict = checkpoint_model["optimizer_state"]
642 self.model_load_state_dict = checkpoint_model["model_state"]
UnboundLocalError: local variable 'checkpoint_model' referenced before assignment

The issue is that if the loss is nan (such as in this case using very few datapoints), checkpoint_model is never created, thus the error. It has been fixed in #535

@gcroci2 gcroci2 merged commit 5cde0bb into main Dec 21, 2023
6 of 7 checks passed
@gcroci2 gcroci2 deleted the 527_add_dockerfile_gcroci2 branch December 21, 2023 10:51
@gcroci2 gcroci2 mentioned this pull request Jan 4, 2024
@gcroci2 gcroci2 added the JOSS label Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a Dockerfile for trying out the package
5 participants