-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.html
1354 lines (1340 loc) · 60.5 KB
/
intro.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="October 14, 2022" />
<title>Savio introductory training: Basic usage of the Berkeley Savio high-performance computing cluster</title>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
<link rel="stylesheet" type="text/css" media="screen, projection, print"
href="https://www.w3.org/Talks/Tools/Slidy2/styles/slidy.css" />
<script src="https://www.w3.org/Talks/Tools/Slidy2/scripts/slidy.js"
charset="utf-8" type="text/javascript"></script>
</head>
<body>
<div class="slide titlepage">
<h1 class="title">Savio introductory training: Basic usage of the
Berkeley Savio high-performance computing cluster</h1>
<p class="author">
October 14, 2022
</p>
<p class="date">Christian White, Chris Paciorek, and Clint
Hamilton</p>
</div>
<div id="upcoming-events-and-hiring" class="slide section level1">
<h1>Upcoming events and hiring</h1>
<ul>
<li><p><a href="https://www.meetup.com/ucberkeley_cloudmeetup/">Cloud
Computing Meetup</a> (monthly)</p></li>
<li><p>We offer platforms and services for researchers working with <a
href="https://docs-research-it.berkeley.edu/services/srdc/">sensitive
data</a></p></li>
<li><p>Get paid to develop your skills in research data and computing!
Berkeley Research Computing is hiring several graduate student Domain
Consultants for flexible appointments, 10% to 25% effort (4-10
hours/week). Email your cover letter and CV to:
research-it@berkeley.edu.</p></li>
</ul>
</div>
<div id="introduction" class="slide section level1">
<h1>Introduction</h1>
<p>We’ll do this mostly as a demonstration. We encourage you to login to
your account and try out the various examples yourself as we go through
them.</p>
<p>Much of this material is based on the extensive Savio documention we
have prepared and continue to prepare, available at <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/">https://docs-research-it.berkeley.edu/services/high-performance-computing/</a>.</p>
<p>The materials for this tutorial are available using git at the short
URL (<a href="https://tinyurl.com/brc-oct22">tinyurl.com/brc-oct22</a>),
the GitHub URL (<a
href="https://github.com/ucb-rit/savio-training-intro-fall-2022">https://github.com/ucb-rit/savio-training-intro-fall-2022</a>),
or simply as a <a
href="https://github.com/ucb-rit/savio-training-intro-fall-2022/archive/main.zip">zip
file</a>.</p>
</div>
<div id="outline" class="slide section level1">
<h1>Outline</h1>
<p>This training session will cover the following topics:</p>
<ul>
<li>Introductory content
<ul>
<li>Basic parallel computing concepts</li>
<li>High level overview of system</li>
</ul></li>
<li>System capabilities and hardware
<ul>
<li>Getting access to the system - FCA, condo, ICA</li>
<li>Login nodes, compute nodes, and DTN nodes</li>
<li>Savio computing nodes</li>
<li>Disk space options (home, scratch, group, condo storage)</li>
</ul></li>
<li>Logging in, data transfer, and software
<ul>
<li>Logging in</li>
<li>Data transfer
<ul>
<li>SCP/SFTP</li>
<li>Globus</li>
<li>Box & bDrive (Google drive)</li>
</ul></li>
<li>Software modules</li>
</ul></li>
<li>Submitting and monitoring jobs
<ul>
<li>Acounts and partitions</li>
<li>Basic job submission</li>
<li>Parallel jobs</li>
<li>Interactive jobs</li>
<li>Low-priority queue</li>
<li>HTC jobs</li>
<li>Monitoring jobs and cluster status</li>
</ul></li>
<li>Basic use of standard software: Python
<ul>
<li>Jupyter notebooks using OOD</li>
<li>Parallelization in Python with ipyparallel</li>
<li>Dask for parallelization in Python</li>
</ul></li>
<li>More information
<ul>
<li>How to get additional help</li>
<li>Upcoming events</li>
</ul></li>
</ul>
</div>
<div id="basic-parallel-computing-concepts"
class="slide section level1">
<h1>Basic Parallel Computing Concepts</h1>
<ul>
<li>What is Savio?
<ul>
<li>In layman’s terms:
<ul>
<li>A collection of really powerful computers (nodes)</li>
<li>Some really big, fast hard drives</li>
</ul></li>
</ul></li>
<li>Two types of parallel Computing
<ul>
<li>Shared memory (e.g., OpenMP)
<ul>
<li>All computation on the same node</li>
<li>Can have shared objects in RAM in some cases</li>
</ul></li>
<li>Distributed memory (e.g., MPI)
<ul>
<li>Computation on multiple nodes</li>
<li>Special attention to passing information between nodes</li>
</ul></li>
</ul></li>
</ul>
</div>
<div id="getting-access-to-the-system---fca-and-condo"
class="slide section level1">
<h1>Getting access to the system - FCA and condo</h1>
<ul>
<li>All regular Berkeley faculty can request 300,000 service units
(roughly core-hours) per year through the <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/getting-account/faculty-computing-allowance/">Faculty
Computing Allowance (FCA)</a></li>
<li>Researchers can also purchase nodes for their own priority access
and gain access to the shared Savio infrastructure and to the ability to
<em>burst</em> to additional nodes through the <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/condos/condo-cluster-service/">condo
cluster program</a></li>
<li>Instructors can request an <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/getting-account/instructional-computing-allowance/">Instructional
Computing Allowance (ICA)</a>.</li>
<li>The application process has gotten even easier with the introduction
of the <a href="https://mybrc.brc.berkeley.edu/">MyBRC</a>, the Berkeley
Research Computing Access Management System</li>
<li>Please bear in mind that applications have to be manually reviewed
before they can be approved.</li>
</ul>
<p>Faculty/principal investigators can allow researchers working with
them to get user accounts with access to the FCA or condo resources
available to the faculty member.</p>
</div>
<div id="system-capabilities-and-hardware" class="slide section level1">
<h1>System capabilities and hardware</h1>
<ul>
<li>Savio is a >600-node, >15,000-core Linux cluster rated at
nearly 540 peak teraFLOPS.
<ul>
<li>about 40% of the compute nodes provided by the institution for
general access</li>
<li>about 60% compute nodes contributed by researchers in the Condo
program</li>
</ul></li>
</ul>
</div>
<div id="the-savio-cluster" class="slide section level1">
<h1>The Savio cluster</h1>
<p>Savio is a Linux cluster - by cluster we mean a set of computers
networked together</p>
<ul>
<li>Savio has 3 kinds of nodes:
<ul>
<li>Login nodes</li>
<li>Data transfer nodes</li>
<li>Compute nodes</li>
</ul></li>
</ul>
</div>
<div id="login-nodes" class="slide section level1">
<h1>Login nodes</h1>
<ul>
<li>Login nodes
<ul>
<li>Used to access the system when logging in</li>
<li>For login and non-intensive interactive work such as:
<ul>
<li>job submission and monitoring</li>
<li>basic compilation</li>
<li>managing your disk space</li>
</ul></li>
</ul></li>
</ul>
</div>
<div id="data-transfer-nodes" class="slide section level1">
<h1>Data transfer nodes</h1>
<ul>
<li>Data transfer nodes
<ul>
<li>For transferring data to/from Savio</li>
<li>This is a notable difference from many other clusters
<ul>
<li>Login node: <code>hpc.brc.berkeley.edu</code></li>
<li>Data transfer node: <code>dtn.brc.berkeley.edu</code></li>
<li>Some applications may look for SFTP via login node</li>
</ul></li>
</ul></li>
<li>Note: you can access your files on the system from any of the
computers</li>
</ul>
</div>
<div id="compute-nodes" class="slide section level1">
<h1>Compute nodes</h1>
<ul>
<li>Compute nodes
<ul>
<li>For computational tasks</li>
<li>Your work might use parallelization to do computation on more than
one CPU</li>
<li>You can also run “serial” jobs that use a single CPU</li>
</ul></li>
</ul>
</div>
<div id="savio-computing-node-types" class="slide section level1">
<h1>Savio computing node types</h1>
<ul>
<li><p>There are multiple types of computation nodes with different
hardware specifications <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/hardware-config/">(see
the <em>Hardware Configuration</em> page)</a>.</p></li>
<li><p>The nodes are divided into several pools, called
<em>partitions</em></p></li>
<li><p>These partitions have different restrictions and costs associated
with them</p>
<ul>
<li><a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/scheduler-config/">see
the <em>Scheduler Configuration</em> page</a></li>
<li>and <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/service-units-savio/">the
associated costs in Service Units</a></li>
</ul></li>
<li><p>Any job you submit must be submitted to a partition to which you
have access.</p></li>
</ul>
</div>
<div id="conceptual-diagram-of-savio" class="slide section level1">
<h1>Conceptual diagram of Savio</h1>
<center>
<img src="savio_diagram.jpeg">
</center>
</div>
<div id="disk-space-options-home-scratch-group-condo-storage"
class="slide section level1">
<h1>Disk space options (home, scratch, group, condo storage)</h1>
<ul>
<li>You have access to the multiple kinds of disk space, described <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/storing-data/">here
in the <em>Storing Data</em> page</a>.</li>
<li>There are 3 directories:
<ul>
<li><code>/global/home/users/SAVIO_USERNAME</code>
<ul>
<li>10 GB per user, backed up</li>
</ul></li>
<li><code>/global/home/groups/SAVIO_GROUPNAME</code>
<ul>
<li>Per group: 30 GB for FCA, 200 GB for Condo</li>
</ul></li>
<li><code>/global/scratch/users/SAVIO_USERNAME</code>
<ul>
<li>Connected via Infiniband (very fast)</li>
<li>Primary data storage during computation</li>
</ul></li>
</ul></li>
<li>All 3 are available from any of the nodes and changes to files on
one node will be seen on all the other nodes</li>
<li>Large amounts of disk space is available for purchase from the <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/condos/condo-storage-service/"><em>condo
storage</em> offering</a>.
<ul>
<li>The minimum purchase is about $5,750, which provides roughly 112 TB
for five years.</li>
</ul></li>
</ul>
</div>
<div id="using-disk-space" class="slide section level1">
<h1>Using disk space</h1>
<ul>
<li>When reading/writing data to/from disk put the data in your scratch
space at <code>/global/scratch/users/SAVIO_USERNAME</code></li>
<li>The system is set up so that disk access for all users is optimized
when users are doing input/output (I/O) off of scratch rather than off
of their home directories</li>
<li>Doing I/O with files on your home directory can impact the ability
of others to access their files on the filesystem</li>
</ul>
</div>
<div id="sensitive-data-on-savio" class="slide section level1">
<h1>Sensitive Data on Savio</h1>
<ul>
<li>Savio (and AEoD) is <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/getting-account/sensitive-accounts/">certified
for moderately sensitive data</a>
<ul>
<li>P2, P3 (formerly PL1) and NIH dbGap (non-“notice-triggering”
data).</li>
</ul></li>
<li>PIs/faculty must request a P2/P3 project alongside requests for a
new FCA/condo allocation
<ul>
<li>Existing projects can’t be converted to P2/P3 projects.</li>
</ul></li>
<li>BRC has a new platform for highly sensitive data (P4) called
SRDC.</li>
</ul>
<p>More info is available in <a
href="https://docs-research-it.berkeley.edu/services/srdc/">our
documentation</a> or on <a
href="https://research-it.berkeley.edu/services-projects/secure-research-data-computing">our
website</a>.</p>
</div>
<div id="logging-in-getting-set-up" class="slide section level1">
<h1>Logging in: Getting Set Up</h1>
<ul>
<li>To login, you need to have software on your own machine that gives
you access to the SSH command
<ul>
<li>These come built-in with Mac (see
<code>Applications -> Utilities -> Terminal</code>).</li>
<li>For Windows, you can use Powershell (or Command Prompt)
<ul>
<li>Other applications such as <a
href="https://mobaxterm.mobatek.net/">MobaXterm</a> may offer more
functionality</li>
</ul></li>
</ul></li>
<li>You also need to set up your smartphone or tablet with <em>Google
Authenticator</em> to generate one-time passwords for you.</li>
<li>Here are instructions for <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/logging-brc-clusters/">doing
this setup, and for logging in</a>.</li>
</ul>
</div>
<div id="logging-in" class="slide section level1">
<h1>Logging in</h1>
<p>Then to login:</p>
<pre><code>ssh SAVIO_USERNAME@hpc.brc.berkeley.edu</code></pre>
<ul>
<li><p>Then enter XXXXXYYYYYY where XXXXXX is your PIN and YYYYYY is the
one-time password. YYYYYY will be shown when you open your <em>Google
authenticator</em> app on your phone/tablet.</p></li>
<li><p>One can then navigate around and get information using standard
UNIX commands such as <code>ls</code>, <code>cd</code>, <code>du</code>,
<code>df</code>, etc.</p>
<ul>
<li>There is a lot of material online about using the UNIX command line
<ul>
<li>Also called the shell; ‘bash’ is one common variation</li>
</ul></li>
<li>Here is one <a
href="https://swcarpentry.github.io/shell-novice">basic tutorial from
Software Carpentry</a> and <a
href="https://berkeley-scf.github.io/tutorial-unix-basics">another one
from the Berkeley Statistical Computing Facility</a>.</li>
</ul></li>
</ul>
</div>
<div id="graphical-interface" class="slide section level1">
<h1>Graphical Interface</h1>
<p>If you want to be able to open programs with graphical user
interfaces:</p>
<pre><code>ssh -Y SAVIO_USERNAME@hpc.brc.berkeley.edu</code></pre>
<ul>
<li>To display the graphical windows on your local machine, you’ll need
X server software on your own machine to manage the graphical windows
<ul>
<li>For Windows, your options include <em>MobaXterm</em>,
<em>eXceed</em>, or <em>Xming</em></li>
<li>For Mac, there is <em>XQuartz</em></li>
</ul></li>
</ul>
</div>
<div id="editing-files" class="slide section level1">
<h1>Editing files</h1>
<p>You are welcome to edit your files on Savio (rather than copying
files back and forth from your laptop and editing them on your laptop).
To do so you’ll need to use some sort of editor. Savio has
<code>vim</code>, <code>emacs</code>, and <code>nano</code> installed.
Just start the editor from a login node.</p>
<pre><code>## To use vim
vim myfile.txt
## To use emacs
emacs myfile.txt
## To use nano
module load nano
nano myfile.txt</code></pre>
</div>
<div id="data-transfer-with-examples-tofrom-laptop-box-google-drive-aws"
class="slide section level1">
<h1>Data transfer with examples to/from laptop, Box, Google Drive,
AWS</h1>
<p>To do any work on the system, you’ll usually need to transfer files
(data files, code files, etc.) to the Savio filesystem, either into your
home directory, your scratch directory or a group directory.</p>
<p>And once you’re done with a computation, you’ll generally need to
transfer files back to another place (e.g., your laptop).</p>
<p>Let’s see how we would transfer files/data to/from Savio using a few
different approaches.</p>
</div>
<div id="data-transfer-for-smaller-files-scp"
class="slide section level1">
<h1>Data transfer for smaller files: SCP</h1>
<ul>
<li><p>The most common command line protocol for file transfer is
<em>SCP</em></p></li>
<li><p>You need to use the Savio data transfer node,
<code>dtn.brc.berkeley.edu</code>.</p></li>
<li><p>The example file <code>bayArea.csv</code> is too large to store
on Github; you can obtain it <a
href="https://www.stat.berkeley.edu/share/paciorek/bayArea.csv">here</a>.</p></li>
<li><p>SCP is supported in terminal for Mac/Linux and in
Powershell/Command Prompt in Windows</p></li>
</ul>
<div class="sourceCode" id="cb4"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># to Savio, while on your local machine</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">scp</span> bayArea.csv caw87@dtn.brc.berkeley.edu:~/.</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="fu">scp</span> bayArea.csv caw87@dtn.brc.berkeley.edu:~/data/newName.csv</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="fu">scp</span> bayArea.csv caw87@dtn.brc.berkeley.edu:/global/scratch/users/caw87/.</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="co"># from Savio, while on your local machine</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="fu">scp</span> caw87@dtn.brc.berkeley.edu:~/data/newName.csv ~/Documents/.</span></code></pre></div>
<p>If you can ssh to your local machine or want to transfer files to
other systems on to which you can ssh, you can login to the dtn node to
execute the scp commands:</p>
<pre><code>ssh SAVIO_USERNAME@dtn.brc.berkeley.edu
[SAVIO_USERNAME@dtn ~]$ scp ~/file.csv OTHER_USERNAME@other.domain.edu:~/data/.</code></pre>
<p>If you’re already connected to a Savio login node, you can use
<code>ssh dtn</code> to login to the dtn.</p>
<p>Pro tip: You can package multiple files (including directory
structure) together using tar</p>
<pre><code>tar -cvzf files.tgz dir_to_zip
# to untar later:
tar -xvzf files.tgz</code></pre>
</div>
<div id="data-transfer-for-smaller-files-sftp"
class="slide section level1">
<h1>Data transfer for smaller files: SFTP</h1>
<ul>
<li>Another common method for file transfer is <em>SFTP</em></li>
<li>A multi-platform program for doing transfers via SFTP is <a
href="https://filezilla-project.org/">FileZilla</a>.</li>
<li>After logging in to most <em>SFTP</em> applications, you’ll see
windows for the Savio filesystem and your local filesystem on your
machine. You can drag files back and forth.</li>
</ul>
</div>
<div id="data-transfer-for-larger-files-globus-intro"
class="slide section level1">
<h1>Data transfer for larger files: Globus, Intro</h1>
<ul>
<li>You can use Globus Connect to transfer data data to/from Savio (and
between other resources) quickly and unattended
<ul>
<li>This is a better choice for large transfers</li>
<li>Here are some <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/transferring-data/using-globus-connect-savio/">instructions</a>.</li>
</ul></li>
<li>Globus transfers data between <em>endpoints</em>
<ul>
<li>Possible endpoints include
<ul>
<li>Savio</li>
<li>your laptop or desktop</li>
<li>Other clusters like NERSC and XSEDE</li>
<li>bDrive</li>
<li>Collaborators & other researchers not on savio</li>
</ul></li>
</ul></li>
</ul>
</div>
<div id="data-transfer-for-larger-files-globus-requirements"
class="slide section level1">
<h1>Data transfer for larger files: Globus, requirements</h1>
<ul>
<li>If you are transferring to/from your laptop, you’ll need
<ol style="list-style-type: decimal">
<li>Globus Connect Personal set up,</li>
<li>your machine established as an endpoint, and</li>
<li>Globus Connect Personal actively running on your machine. At that
point you can proceed as below.</li>
</ol></li>
<li>Savio’s endpoint is named <code>ucb#brc</code>.</li>
</ul>
</div>
<div id="data-transfer-for-larger-files-globus-setup"
class="slide section level1">
<h1>Data transfer for larger files: Globus, Setup</h1>
<ul>
<li>To transfer files, you open Globus at <a
href="https://globus.org">globus.org</a> and authenticate to the
endpoints you want to transfer between.
<ul>
<li>This means that you only need to authenticate once, whereas you
might need to authenticate multiple times with scp and sftp.</li>
<li>You can then start a transfer and it will proceed in the background,
including restarting if interrupted.</li>
</ul></li>
<li>Globus also provides a <a
href="https://docs.globus.org/cli/">command line interface</a> that will
allow you to do transfers programmatically
<ul>
<li>Thus a transfer could be embedded in a workflow script.</li>
</ul></li>
</ul>
</div>
<div id="data-transfer-box-bdrive" class="slide section level1">
<h1>Data transfer: Box & bDrive</h1>
<ul>
<li>Box and bDrive (the Cal branded Google Drive) both provide free,
secured, and encrypted content storage of files to Berkeley affiliates
<ul>
<li>They are both good options for backup and long-term storage of data
that you plan to shuttle in and out of Savio</li>
<li>Box quotas
<ul>
<li>50GB for new individual accounts</li>
<li>500GB for new Special Purpose Accounts (“SPAs”)</li>
<li>Existing accounts will be allowed up to 10% above current storage
amount</li>
</ul></li>
<li>bDrive provides unlimited storage (for now)
<ul>
<li>Similar limits to Box are likely for bDrive in the near future</li>
<li>bDrive has a maximum file size of 5Tb, Box has a maximum file size
of 15 Gb</li>
</ul></li>
<li>These change reflect service provider price increases which may
increasingly fall on researchers for <strong>large</strong>
datasets</li>
</ul></li>
<li>Alternative paid options are also available
<ul>
<li>Cloud storage options include Amazon, Google, Microsoft Azure, and
Wasabi
<ul>
<li>See the <a
href="https://technology.berkeley.edu/services/cloud">bCloud web
page</a> for more information</li>
</ul></li>
<li>As mentioned earlier, Condo computing contributors can also buy into
the condo storage program</li>
</ul></li>
</ul>
</div>
<div id="data-transfer-bdrive-access" class="slide section level1">
<h1>Data transfer: bDrive Access</h1>
<ul>
<li>You can interact with both services via web browser, and both
services provide a desktop app you can use to move and sync files
between your computer and the cloud.
<ul>
<li><a href="http://bdrive.berkeley.edu/">bDrive web app</a></li>
<li><a href="https://www.google.com/drive/download/">Drive desktop
app</a></li>
<li><a href="http://box.berkeley.edu">Box web app</a></li>
<li><a href="https://www.box.com/resources/downloads">Box desktop
app</a></li>
</ul></li>
</ul>
<p>For more ambitious users, Box has a Python-based SDK that can be used
to write scripts for file transfers. For more information on how to do
this, check out the <code>BoxAuthenticationBootstrap.ipynb</code> and
<code>TransferFilesFromBoxToSavioScratch.ipynb</code> from BRC’s
cyberinfrastructure engineer on <a
href="https://github.com/ucberkeley/brc-cyberinfrastructure/tree/dev/analysis-workflows/notebooks">GitHub</a></p>
</div>
<div id="data-transfer-box-bdrive-with-rclone-setup"
class="slide section level1">
<h1>Data transfer: Box & bDrive with rclone setup</h1>
<p><a href="https://rclone.org/"><em>rclone</em></a> is a command line
program that you can use to sync files between both services and Savio.
You can read instructions for using rclone on Savio <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/transferring-data/rclone-box-bdrive/">with
Box or bDrive here</a>.</p>
<p>Briefly the steps to set up <em>rclone</em> on Savio to interact with
Box are as follows:</p>
<ul>
<li>Configuration (on dtn): <code>rclone config</code></li>
<li>Use auto config? -> n</li>
<li>For Box: install rclone on your PC, then run
<code>rclone authorize "box"</code></li>
<li>Paste the link into your browser and log in to your CalNet
account</li>
<li>Copy the authentication token and paste into the
<code>rclone config</code> prompt on Savio</li>
</ul>
<p>Finally you can set up <a
href="https://calnetweb.berkeley.edu/calnet-departments/special-purpose-accounts-spa">special
purpose accounts</a> so files are owned at a project level rather than
by individuals.</p>
</div>
<div id="data-transfer-box-bdrive-with-rclone-practice"
class="slide section level1">
<h1>Data transfer: Box & bDrive with rclone practice</h1>
<p><em>rclone</em> basics:</p>
<ul>
<li>Switch to DTN before using if on login node
<ul>
<li>Use command <code>ssh dtn</code></li>
<li>If using <em>rclone</em> on another node You need to load
<em>rclone</em> before use
<ul>
<li>Run command <code>module load rclone</code></li>
</ul></li>
</ul></li>
<li>All <em>rclone</em> commands begin with <code>rclone</code> and are
then followed by a commands
<ul>
<li>The commands are different from bash (i.e., <code>cp</code> in
<em>bash</em> vs. <code>copy</code> in rclone)</li>
</ul></li>
<li>To reference a file on the remote you add configured remote name
followed by a colon followed by the file path
<ul>
<li>For example <code>clint_bdrive:project_folder</code></li>
<li>To access the main folder leave nothing after the colon (e.g.,
<code>clint_bdrive:</code>)</li>
</ul></li>
<li>For more tips and tricks see <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/transferring-data/rclone-box-bdrive/">our
docs</a></li>
</ul>
<p><em>rclone</em> example:</p>
<pre><code>rclone listremotes # Lists configured remotes.
rclone lsd remote_name: # Lists directories, but not files. Note the trailing colon.
rclone size remote_name:home # Prints size and number of objects in remote "home" directory. This can take a very long time when tallying Tbs of files.
rclone copy /global/home/users/hannsode remote_name:savio_home/hannsode # Copies my entire home directory to a new directory on the remote.
rclone copy /global/scratch/users/hannsode/genomes remote_name:genome_sequences # Copies entire directory contents to a dirctory on the remote with a new name.</code></pre>
</div>
<div id="software-modules" class="slide section level1">
<h1>Software modules</h1>
<p>A lot of software is available on Savio but needs to be loaded from
the relevant software module before you can use it.</p>
<p>(We do this not to confuse you but to avoid clashes between
incompatible software and allow multiple versions of a piece of software
to co-exist on the system.)</p>
<pre><code>module list # what's loaded?
module avail # what's available</code></pre>
<p>One thing that tricks people is that some the modules are arranged in
a hierarchical (nested) fashion, so you only see some of the modules as
being available <em>after</em> you load the parent module (e.g., MKL,
FFT, and HDF5/NetCDF software are nested within the gcc module). Here’s
how we see and load MPI.</p>
<pre><code>module load openmpi # this fails if gcc not yet loaded
module load gcc
module avail
module load openmpi</code></pre>
<p>Note that a variety of Python packages are available simply by
loading the python module. For R this is not the case, but you can load
the <em>r-packages</em> module (as well as the <em>r-spatial</em> module
for GIS/spatial-related packages).</p>
</div>
<div id="submitting-jobs-overview" class="slide section level1">
<h1>Submitting jobs: overview</h1>
<p>All computations are done by submitting jobs to the scheduling
software that manages jobs on the cluster, called SLURM.</p>
<p>Why is this necessary? Otherwise your jobs would be slowed down by
other people’s jobs running on the same node. This also allows everyone
to share Savio in a fair way.</p>
<p>The basic workflow is:</p>
<ul>
<li>login to Savio; you’ll end up on one of the login nodes in your home
directory</li>
<li>use <code>cd</code> to go to the directory from which you want to
submit the job</li>
<li>submit the job using <code>sbatch</code> (or an interactive job
using <code>srun</code>, discussed later)
<ul>
<li>when your job starts, the working directory will be the one from
which the job was submitted</li>
<li>the job will be running on a compute node, not the login node</li>
</ul></li>
</ul>
</div>
<div id="submitting-jobs-accounts-and-partitions"
class="slide section level1">
<h1>Submitting jobs: accounts and partitions</h1>
<p>When submitting a job, the main things you need to indicate are the
project account you are using and the partition. Note that there is a
default value for the project account, but if you have access to
multiple accounts such as an FCA and a condo, it’s good practice to
specify it.</p>
<p>You can see what accounts you have access to and which partitions
within those accounts as follows:</p>
<pre><code>sacctmgr -p show associations user=$USER</code></pre>
<p>Here’s an example of the output for a user who has access to an FCA
and a condo:</p>
<pre><code>Cluster|Account|User|Partition|Share|GrpJobs|GrpTRES|GrpSubmit|GrpWall|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|QOS|Def QOS|GrpTRESRunMins|
brc|fc_paciorek|paciorek|savio3_gpu|1|||||||||||||gtx2080_gpu3_normal,savio_lowprio,v100_gpu3_normal|gtx2080_gpu3_normal||
brc|fc_paciorek|paciorek|savio3_htc|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio3_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio3|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_1080ti|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_knl|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_gpu|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_htc|1|||||||||||||savio_debug,savio_long,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio2|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|fc_paciorek|paciorek|savio_bigmem|1|||||||||||||savio_debug,savio_normal|savio_normal||
brc|co_stat|paciorek|savio3_htc|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio3_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio3|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_1080ti|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_knl|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2_gpu|1|||||||||||||savio_lowprio,stat_gpu2_normal|stat_gpu2_normal||
brc|co_stat|paciorek|savio2_htc|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio_bigmem|1|||||||||||||savio_lowprio|savio_lowprio||
brc|co_stat|paciorek|savio2|1|||||||||||||savio_lowprio,stat_savio2_normal|stat_savio2_normal||</code></pre>
<p>If you are part of a condo, you’ll notice that you have
<em>low-priority</em> access to certain partitions. For example, user
‘paciorek’ is part of the statistics condo <em>co_stat</em>, which
purchased some savio2 nodes and savio2_gpu nodes and therefore has
normal access to those, but he can also burst beyond the condo and use
other partitions at low-priority (see below).</p>
<p>In contrast, through his FCA, ‘paciorek’ has access to the savio,
savio2, and savio3 partitions as well as various big memory, HTC, and
GPU partitions, all at normal priority.</p>
</div>
<div id="submitting-a-batch-job" class="slide section level1">
<h1>Submitting a batch job</h1>
<p>Let’s see how to submit a simple job. If your job will only use the
resources on a single node, you can do the following.</p>
<p>Here’s an example job script that I’ll run. You’ll need to modify the
–account value and possibly the –partition value.</p>
<div class="sourceCode" id="cb12"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/bin/bash</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Job name:</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --job-name=test</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Account:</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --account=fc_paciorek</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Partition:</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --partition=savio2</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Wall clock limit (5 minutes here):</span></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --time=00:05:00</span></span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="co">## Command(s) to run:</span></span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a><span class="ex">module</span> load python/3.9.12</span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a><span class="ex">python</span> calc.py <span class="op">>&</span> calc.out</span></code></pre></div>
<p>Now let’s submit and monitor the job:</p>
<pre><code>sbatch job.sh
squeue -j <JOB_ID>
wwall -j <JOB_ID></code></pre>
<p>After a job has completed (or been terminated/cancelled), you can
review the maximum memory used via the sacct command.</p>
<pre><code>sacct -j <JOB_ID> --format=JobID,JobName,MaxRSS,Elapsed</code></pre>
<p>MaxRSS will show the maximum amount of memory that the job used in
kilobytes.</p>
<p>You can also login to the node where you are running and use commands
like <em>top</em> and <em>ps</em>:</p>
<pre><code>srun --jobid=<JOB_ID> --pty /bin/bash</code></pre>
<p>NOTE: except for the partitions named *_htc and *_gpu, all jobs are
given exclusive access to the entire node or nodes assigned to the job
(and your account is charged for all of the cores on the node(s)).</p>
</div>
<div id="parallel-job-submission" class="slide section level1">
<h1>Parallel job submission</h1>
<p>If you are submitting a job that uses multiple nodes, you’ll need to
carefully specify the resources you need. The key flags for use in your
job script are:</p>
<ul>
<li><code>--nodes</code> (or <code>-N</code>): indicates the number of
nodes to use</li>
<li><code>--ntasks-per-node</code>: indicates the number of tasks (i.e.,
processes) one wants to run on each node</li>
<li><code>--cpus-per-task</code> (or <code>-c</code>): indicates the
number of cpus to be used for each task</li>
</ul>
<p>In addition, in some cases it can make sense to use the
<code>--ntasks</code> (or <code>-n</code>) option to indicate the total
number of tasks and let the scheduler determine how many nodes and tasks
per node are needed. In general <code>--cpus-per-task</code> will be one
except when running threaded code.</p>
<p>Here’s an example job script for a job that uses MPI for
parallelizing over multiple nodes:</p>
<div class="sourceCode" id="cb16"><pre
class="sourceCode bash"><code class="sourceCode bash"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/bin/bash</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Job name:</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --job-name=test</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Account:</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --account=account_name</span></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Partition:</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --partition=partition_name</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Number of MPI tasks needed for use case (example):</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --ntasks=40</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Processors per task:</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --cpus-per-task=1</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Wall clock limit:</span></span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a><span class="co">#SBATCH --time=00:00:30</span></span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a><span class="co">#</span></span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a><span class="co">## Command(s) to run (example):</span></span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a><span class="ex">module</span> load intel openmpi</span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a><span class="ex">mpirun</span> ./a.out</span></code></pre></div>
<p>When you write your code, you may need to specify information about
the number of cores to use. SLURM will provide a variety of variables
that you can use in your code so that it adapts to the resources you
have requested rather than being hard-coded.</p>
<p>Here are some of the variables that may be useful: SLURM_NTASKS,
SLURM_CPUS_PER_TASK, SLURM_NODELIST, SLURM_NNODES.</p>
<p>NOTE: when submitting GPU jobs <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/submitting-jobs/#gpu-jobs">you
need to request multiple CPUs per GPU</a> (usually 2 GPUs, but for some
of the GPU types in savio3_gpu, 4 or 8 GPUs).</p>
</div>
<div id="parallel-job-submission-patterns" class="slide section level1">
<h1>Parallel job submission patterns</h1>
<p>Some common paradigms are:</p>
<ul>
<li>1 node, many CPUs
<ul>
<li>openMP/threaded jobs - 1 task, <em>c</em> CPUs for the task</li>
<li>Python/R/GNU parallel - many tasks, 1 per CPU at any given time</li>
</ul></li>
<li>many nodes, many CPUs
<ul>
<li>MPI jobs that use 1 CPU per task for each of <em>n</em> tasks,
spread across multiple nodes</li>
<li>Python/R/GNU parallel - many tasks, 1 per CPU at any given time</li>
</ul></li>
<li>hybrid jobs that use <em>c</em> CPUs for each of <em>n</em> tasks
<ul>
<li>e.g., MPI+threaded code</li>
</ul></li>
</ul>
<p>We have lots more <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/scheduler-examples">examples
of job submission scripts</a> for different kinds of parallelization
(multi-node (MPI), multi-core (openMP), hybrid, etc.</p>
</div>
<div id="interactive-jobs" class="slide section level1">
<h1>Interactive jobs</h1>
<p>You can also do work interactively. This simply moves you from a
login node to a compute node.</p>
<pre><code>srun -A fc_paciorek -p savio2_htc -c 1 -t 10:0 --pty bash
# note that you end up in the same working directory as when you submitted the job
# now execute on the compute node:
env | grep SLURM
module load matlab
matlab -nodesktop -nodisplay</code></pre>
<p>To end your interactive session (and prevent accrual of additional
charges to your FCA), simply enter <code>exit</code> in the terminal
session.</p>
<p>NOTE: you are charged for the entire node when running interactive
jobs (as with batch jobs) except in the HTC and GPU (*_htc and *_gpu)
partitions.</p>
</div>
<div id="running-graphical-interfaces-interactively"
class="slide section level1">
<h1>Running graphical interfaces interactively</h1>
<p>If you are running a graphical interface, we recommend you use <a
href="https://ood.brc.berkeley.edu">Savio’s Open OnDemand interface</a>
(more in a later slide), e.g.,</p>
<ul>
<li>Jupyter Notebooks</li>
<li>RStudio</li>
<li>the MATLAB GUI</li>
<li>VS Code</li>
<li>remote desktop</li>
</ul>
</div>
<div id="low-priority-queue" class="slide section level1">
<h1>Low-priority queue</h1>
<p>Condo users have access to the broader compute resource that is
limited only by the size of partitions, under the <em>savio_lowprio</em>
QoS (queue). However this QoS does not get a priority as high as the
general QoSs, such as <em>savio_normal</em> and <em>savio_debug</em>, or
all the condo QoSs, and it is subject to preemption when all the other
QoSs become busy.</p>
<p>More details can be found <a
href="https://docs-research-it.berkeley.edu/services/high-performance-computing/user-guide/running-your-jobs/submitting-jobs/#low-priority">in
the <em>Low Priority Jobs</em> section of the user guide</a>.</p>
<p>Suppose I wanted to burst beyond the Statistics condo to run on 20
nodes. I’ll illustrate here with an interactive job though usually this
would be for a batch job.</p>
<pre><code>## First I'll see if there are that many nodes even available.
sinfo -p savio2
srun -A co_stat -p savio2 --qos=savio_lowprio --nodes=20 -t 10:00 --pty bash
## now look at environment variables to see my job can access 20 nodes:
env | grep SLURM</code></pre>
<p>The low-priority queue is also quite useful for accessing specific
GPU types in the <code>savio3_gpu</code> partition.</p>
</div>
<div id="htc-jobs-and-long-running-jobs" class="slide section level1">
<h1>HTC jobs (and long-running jobs)</h1>
<p>There are multiple “HTC” partitions (savio2_htc, savio3_htc,
savio4_htc [coming soon]) that allow you to request cores individually
rather than an entire node at a time. In some cases the nodes in these
partition are faster than the other nodes. Here is an example SLURM
script:</p>
<pre><code>#!/bin/bash
# Job name:
#SBATCH --job-name=test
#
# Account:
#SBATCH --account=account_name
#