forked from chaos/slurm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBUILD.NOTES
222 lines (202 loc) · 9.03 KB
/
BUILD.NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
This information is meant primarily for the Slurm developers.
System administrators should read the instructions at
http://www.llnl.gov/linux/slurm/quickstart_admin.html
(also found in the file doc/html/quickstart_admin.shtml).
The "INSTALL" file contains generic Linux build instructions.
Simple build/install on Linux:
./configure --enable-debug \
--prefix=<install-dir> --sysconfdir=<config-dir>
make
make install
To build the files in the contribs directory:
make contrib
make install-contrib
(The RPMs are built by default)
If you make changes to any auxdir/* or Makefile.am file, then run
_snowflake_ (where there are recent versions of autoconf, automake
and libtool installed):
./autogen.sh
then check-in the new Makefile.am and Makefile.in files
Here is a step-by-step HOWTO for creating a new release of SLURM on a
Linux cluster (See BlueGene and AIX specific notes below for some differences).
0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm
svn co https://eris.llnl.gov/svn/chaos/private/buildfarm/trunk buildfarm
put the buildfarm directory in your search path
1. Update NEWS and META files for the new release. In the META file,
the API, Major, Minor, Micro, Version, and Release fields must all
by up-to-date. **** DON'T UPDATE META UNTIL RIGHT BEFORE THE TAG ****
The Release field should always be 1 unless one of
the following is true
- Changes were made to the spec file, documentation, or example
files, but not to code.
- this is a prerelease (Release = 0.preX)
2. Tag the repository with the appropriate name for the new version.
svn copy https://eris.llnl.gov/svn/slurm/trunk \
https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3 \
-m "description"
3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
(.rpmrc for newer versions of rpmbuild) file containing:
%_slurm_sysconfdir /etc/slurm
%_with_debug 1
%_with_sgijob 1
%_with_elan 1 (ONLY ON SYSTEMS WITH ELAN SWITCH)
I usually build with using the following syntax:
build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3
4. Remove the RPMs that we don't want:
rm -f slurm-perlapi*rpm slurm-torque*rpm
5. Move the RPMs to
/usr/local/admin/rpms/llnl/RPMS-RHEL4/x86_64 (odevi, or gauss)
/usr/local/admin/rpms/llnl/RPMS-RHEL4/i386/ (adevi)
/usr/local/admin/rpms/llnl/RPMS-RHEL4/ia64/ (tdevi)
send an announcement email (with the latest entry from the NEWS
file) out to linux-admin@lists.llnl.gov.
6. Copy tagged bzip file (e.g. slurm-0.6.0-0.pre3.bz2) to FTP server
for external SLURM users.
7. Copy bzip file and rpms (including src.rpm) to sourceforge.net:
ncftp upload.sf.net
cd upload
put filename
Use SourceForge admin tool to add new release, including changelog.
BlueGene build notes:
0. If on a bgp system and you want sview export these variables
export CFLAGS="-I/opt/gnome/lib/gtk-2.0/include -I/opt/gnome/lib/glib-2.0/include $CFLAGS"
export LIBS="-L/usr/X11R6/lib64 $LIBS"
export CMD_LDFLAGS='-L/usr/X11R6/lib64'
export PKG_CONFIG_PATH="/opt/gnome/lib64/pkgconfig/:$PKG_CONFIG_PATH"
1. Use the rpm make target to create the new RPMs. This requires a .rpmmacros
(.rpmrc for newer versions of rpmbuild) file containing:
%_prefix /usr
%_slurm_sysconfdir /etc/slurm
%_with_bluegene 1
%_without_pam 1
%_with_debug 1
Build on Service Node with using the following syntax
rpmbuild -ta slurm-...bz2
The RPM files get written to the directory
/usr/src/packages/RPMS/ppc64
To build and run on AIX:
0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm
svn co https://eris.llnl.gov/svn/buildfarm/trunk buildfarm
Put the buildfarm directory in your search path
Also, you will need several commands to appear FIRST in your PATH:
/usr/local/tools/gnu/aix_5_64_fed/bin/install
/usr/local/gnu/bin/tar
/usr/bin/gcc
I do this by making symlinks to those commands in the buildfarm directory,
then making the buildfarm directory the first one in my PATH.
Also, make certain that the "proctrack" rpm is installed.
1. export OBJECT_MODE=32
export PKG_CONFIG="/usr/bin/pkg-config"
2. Build with:
./configure --enable-debug --prefix=/opt/freeware \
--sysconfdir=/opt/freeware/etc/slurm \
--with-ssl=/opt/freeware --with-munge=/opt/freeware \
--with-proctrack=/opt/freeware
make
make uninstall # remove old shared libraries, aix caches them
make install
3. To build RPMs (NOTE: GNU tools early in PATH as described above in #0):
Create a .rpmmacros file specifying system specific files:
#
# RPM Macros for use with SLURM on AIX
# The system-wide macros for RPM are in /usr/lib/rpm/macros
# and this overrides a few of them
#
%_prefix /opt/freeware
%_slurm_sysconfdir %{_prefix}/etc/slurm
%_defaultdocdir %{_prefix}/doc
%_with_debug 1
%_with_aix 1
%with_ssl "--with-ssl=/opt/freeware"
%with_munge "--with-munge=/opt/freeware"
%with_proctrack "--with-proctrack=/opt/freeware"
Log in to the machine "uP". uP is currently the lowest-common-denominator
AIX machine.
CC=/usr/bin/gcc build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3
4. export MP_RMLIB=./slurm_ll_api.so
export CHECKPOINT=yes
5. poe hostname -rmpool debug
6. To debug, set SLURM_LL_API_DEBUG=3 before running poe - will create a file
/tmp/slurm.*
It can also be helpful to use poe options "-ilevel 6 -pmdlog yes"
There will be a log file create named /tmp/mplog.<jobid>.<taskid>
7. If you update proctrack, be sure to run "slibclean" to clear cached
version.
8. Remove the RPMs that we don't want:
rm -f slurm-perlapi*rpm slurm-torque*rpm
and install the other RPMs into /usr/admin/inst.images/slurm/aix5.3 on an
OCF AIX machine (pdev is a good choice).
Debian build notes:
Since Debian doesn't have PRMs, the rpmbuild program can not locate
dependencies, so build without them by patching the build program:
Index: build
===================================================================
--- build (revision 173)
+++ build (working copy)
@@ -798,6 +798,7 @@
$cmd .= " --define \"_tmppath $rpmdir/TMP\"";
$cmd .= " --define \"_topdir $rpmdir\"";
$cmd .= " --define \"build_bin_rpm 1\"";
+ $cmd .= " --nodeps";
if (defined $conf{rpm_dist}) {
my $dist = length $conf{rpm_dist} ? $conf{rpm_dist} : "%{nil}";
$cmd .= " --define \"dist $dist\"";
AIX/Federation switch window problems
To clean switch windows: ntblclean =w 8 -a sni0
To get switch window status: ntblstatus
BlueGene bglblock boot problem diagnosis
- Logon to the Service Node (bglsn, ubglsn)
- Execute /admin/bglscripts/fatalras
This will produce a list of failures including Rack and Midplane number
<date> R<rack> M<midplane> <failure details>
- Translate the Rack and Midplane to SLURM node id: smap -R r<rack><midplane>
- Drain only the bad SLURM node, return others to service using scontrol
Configuration file update procedures:
- cd /usr/bgl/dist/slurm (on bgli)
- co -l <filename>
- vi <filename>
- ci -u <filename>
- make install
- then run "dist_local slurm" on SN and FENs to update /etc/slurm
Some RPM commands:
rpm -qa | grep slurm (determine what is installed)
rpm -qpl slurm-1.1.9-1.rpm (check contents of an rpm)
rpm -e slurm-1.1.8-1 (erase an rpm)
rpm --upgrade slurm-1.1.9-1.rpm (replace existing rpm with new version)
rpm -i --ignoresize slurm-1.1.9-1.rpm (install a new rpm)
For main SLURM plugin installation on BGL service node:
rpm -i --force --nodeps --ignoresize slurm-1.1.9-1.rpm
rpm -U --force --nodeps --ignoresize slurm-1.1.9-1.rpm (upgrade option)
To clear a wedged job:
/bgl/startMMCSconsole
> delete bgljob ####
> free RMP###
Starting and stopping daemons on Linux:
/etc/init.d/slurm stop
/etc/init.d/slurm start
Patches:
- cd to the top level src directory
- Run the patch command with epilog_complete.patch as stdin:
patch -p[path_level_to_filter] [--dry-run] < epilog_complete.patch
To get the process and job IDs with proctrack/sgi_job:
- jstat -p
CVS and gnats:
Include "gnats:<id> e.g. "(gnats:123)" as part of cvs commit to
automatically record that update in gnats database. NOTE: Does
not change gnats bug state, but records source files associated
with the bug.
For memory leaks (for AIX use zerofault, zf; for linux use valgrind)
- run configure with the option --enable-memory-leak-debug
- valgrind --tool=memcheck --leak-check=yes --num-callers=6 --leak-resolution=med \
./slurmctld -Dc >ctld.out 2>&1 (or similar like for slurmd)
Before new major release:
- Test on ia64, i386, x86_64, BGL, AIX, OSX, XCPU
- Test on Elan and IB switches
- Test fail-over of slurmctld
- Test for memory leaks in slurmctld, slurmd and slurmdbd with various plugins
- Change API version number
- Review and release web pages
- Review and release code
- Run "make check"
- Test that the prolog and epilog run
- Run the test suite with SlurmUser NOT being self