Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a quickstart-on-Secotrec doc #190

Closed
wants to merge 1 commit into from
Closed

Conversation

julia326
Copy link
Member

@julia326 julia326 commented Jun 8, 2017

Not sure if this needs to live here or in a different repo, what do you guys think?

Addresses #182

@julia326 julia326 requested review from armish and smondet June 8, 2017 23:33
@smondet smondet self-assigned this Jun 9, 2017
@julia326 julia326 force-pushed the add-quickstart-doc branch from 058a453 to 3bd0410 Compare June 10, 2017 14:43
@julia326 julia326 force-pushed the add-quickstart-doc branch from 3bd0410 to cb0c013 Compare June 10, 2017 14:45
Copy link
Member

@armish armish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

optional, but since this document is related to all of us in the lab, you might want to get them to read this once and provide feedback; but OK if you don't want to complicate things.

@@ -0,0 +1,244 @@
# Running Epidisco With Ketrew/Secotrec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up, Julia! During the Disco -> Secotrec transition, everybody was already busy with the pipelines, we completely abandoned this documentation :)

I know we discussed some of these points already but just writing them down for our own records so apologies for the repetition.


This is an overview of the GCloud setup system in the Epidisco universe:

![Overview Diagram](https://cloud.githubusercontent.com/assets/617111/25453955/d099d7ee-2a98-11e7-8118-5222cb845c3e.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let this diagram stay only and only on the repo summary page (hence the README.md). Since this is meant to be a quick-start tutorial, screenshots and things in action should better work (if you eventually would like to put something to make it richer in contexts/refs).

That diagram is really great and should make people realize the huge effort put in all of these, people coming to this document from the summary are mostly likely looking for more detailed/technical things (I guess?)


Optionally set your preferred zone before you get started:

```shell
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I hear you talking to @smondet about this part which suggests a new GCE to be set up and used for operational purposes. As you know the old epidisco setup was like this, but the discobox was responsible for running the ketrew and coclobas (and DB) so it had a purpose. With what you are suggesting here, the only purpose of this pgv machine is to provide a shell that you submit tasks to the server, which seems like an overkill

The sharability of the environment is great but our experience with the old epidisco setup was not that great since people sshing into this machine can easily mess up with the shared files. Again, with the old one we had a bunch of important server folders/logs/confuig that sharing the Docker image with us in the operation room, but now the only truely thing that you need is an env file describing your setup.

The rest of the files (e.g. biokepi, kc.d) are all disposable and can be recreated at any point, which is great because keeping the env file in the related repo is the easiest thing to d; but of course, these are personal preferences and it all depends on how frequently you will be submitting jobs and customize the pipeline.


# We need to open the box's firewall to let HTTPS traffic through.
gcloud compute firewall-rules create https-on-$boxname \
--allow tcp:443 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree on this one with Seb (c.f. our Slack discussion)

gcloud compute ssh --zone $GCLOUD_ZONE $MAIN_NFS_SERVER -- sudo zpool add -f $ZPOOL_NAME /dev/disk/by-id/google-$NEW_DISK_NAME
```

### Secotrec Pre-requisites
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is giving a really great walk through of the major components and options, but I think you should decide who your target audience for this document is (if public, then the PGV stuff wouldn't really help them; and if internal, then going too much into details risk your document to be short-lived since these things tend to change and maintaining such a document might become a headache at some point.) One alternative is to simply defer the explanation of specific steps to each tool but then focus on things that either of the tools doesn't talk about on its own (e.g. integration, customization, useful hacks...)

Sorry for the long comment, but the shell trick to add new disks to an NFS was something in the document, then got into disco.sh since we were always using it, and now it is back in the document because disco.sh is no more.

For example, if you end up using that NFS trick quite often and find it useful, maybe it is time it should graduate from the disco world and requires a shout out near the gcloud community. Do you think it is worth adding this to gcloudnfs tool since they are pretty much coupled at this point.


Clone the PGV001 repo:
```shell
git clone https://github.com/hammerlab/pgv001.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PGV is just an instance of Epidisco with some random configuration to handle patient-specific issues, right? If so, it is still worth focusing on Epidisco and its generic setup instead of the PGV which is bounded by some restrictions and policies and is less flexible.

@smondet can correct me if I am wrong but I feel like a majority of the configuration options and the specific setup choices are due to the way our internal clusters and data management utilities don't leave room for alternatives.

But all of this just help show that having a well-thought-out infrastructure allow you to easily resolve issues that might have been a huge deal if it were in any other way. And this type of a end-to-end explanation with all the flexibility to it can be of interest to some people (a blog post maybe?)

# Find the container ID for coclotest_kserver_1, e.g. 123:
sudo docker exec -i -t 123 /bin/bash
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the right up, Julia!

But I do have a specific request if you would like to keep this the way it is: adding a section about "going from idly browsing web pages on a vanilla laptop to managing kubernetes clusters of tens of machines to answer research questions" case study would really be helpful showing what this whole thing is about and will show the most basic but common use case of this.

@smondet
Copy link
Member

smondet commented Aug 17, 2017

This should keep going in https://github.com/hammerlab/wobidisco.

hammerlab/wobidisco#6

@smondet smondet closed this Aug 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants