| This information is meant primarily for the Slurm developers. |
| System administrators should read the instructions at |
| http://www.llnl.gov/linux/slurm/quickstart_admin.html |
| (also found in the file doc/html/quickstart_admin.shtml). |
| The "INSTALL" file contains generic Linux build instructions. |
| |
| Simple build/install on Linux: |
| ./configure --enable-debug \ |
| --prefix=<install-dir> --sysconfdir=<config-dir> |
| make |
| make install |
| |
| To build the files in the contribs directory: |
| make contrib |
| make install-contrib |
| (The RPMs are built by default) |
| |
| If you make changes to any auxdir/* or Makefile.am file, then run |
| _snowflake_ (where there are recent versions of autoconf, automake |
| and libtool installed): |
| ./autogen.sh |
| then check-in the new Makefile.am and Makefile.in files |
| |
| Here is a step-by-step HOWTO for creating a new release of SLURM on a |
| Linux cluster (See BlueGene and AIX specific notes below for some differences). |
| 0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm |
| svn co https://eris.llnl.gov/svn/chaos/private/buildfarm/trunk buildfarm |
| put the buildfarm directory in your search path |
| 1. Update NEWS and META files for the new release. In the META file, |
| the API, Major, Minor, Micro, Version, and Release fields must all |
| by up-to-date. **** DON'T UPDATE META UNTIL RIGHT BEFORE THE TAG **** |
| The Release field should always be 1 unless one of |
| the following is true |
| - Changes were made to the spec file, documentation, or example |
| files, but not to code. |
| - this is a prerelease (Release = 0.preX) |
| 2. Tag the repository with the appropriate name for the new version. |
| svn copy https://eris.llnl.gov/svn/slurm/trunk \ |
| https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3 \ |
| -m "description" |
| 3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros |
| (.rpmrc for newer versions of rpmbuild) file containing: |
| %_slurm_sysconfdir /etc/slurm |
| %_with_debug 1 |
| %_with_sgijob 1 |
| %_with_elan 1 (ONLY ON SYSTEMS WITH ELAN SWITCH) |
| I usually build with using the following syntax: |
| build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3 |
| 4. Remove the RPMs that we don't want: |
| rm -f slurm-perlapi*rpm slurm-torque*rpm |
| 5. Move the RPMs to |
| /usr/local/admin/rpms/llnl/RPMS-RHEL4/x86_64 (odevi, or gauss) |
| /usr/local/admin/rpms/llnl/RPMS-RHEL4/i386/ (adevi) |
| /usr/local/admin/rpms/llnl/RPMS-RHEL4/ia64/ (tdevi) |
| send an announcement email (with the latest entry from the NEWS |
| file) out to linux-admin@lists.llnl.gov. |
| 6. Copy tagged bzip file (e.g. slurm-0.6.0-0.pre3.bz2) to FTP server |
| for external SLURM users. |
| 7. Copy bzip file and rpms (including src.rpm) to sourceforge.net: |
| ncftp upload.sf.net |
| cd upload |
| put filename |
| Use SourceForge admin tool to add new release, including changelog. |
| |
| BlueGene build notes: |
| 3. Use the rpm make target to create the new RPMs. This requires a .rpmmacros |
| (.rpmrc for newer versions of rpmbuild) file containing: |
| %_slurm_sysconfdir /etc/slurm |
| %_with_debug 1 |
| %_with_bluegene 1 |
| %with_cflags CFLAGS=-m64 |
| Build on Service Node with using the following syntax |
| rpmbuild -ta slurm-...bz2 |
| |
| To build and run on AIX: |
| 0. svn co https://eris.llnl.gov/svn/slurm/trunk slurm |
| svn co https://eris.llnl.gov/svn/buildfarm/trunk buildfarm |
| Put the buildfarm directory in your search path |
| Also, you will need several commands to appear FIRST in your PATH: |
| |
| /usr/local/tools/gnu/aix_5_64_fed/bin/install |
| /usr/local/gnu/bin/tar |
| /usr/bin/gcc |
| |
| I do this by making symlinks to those commands in the buildfarm directory, |
| then making the buildfarm directory the first one in my PATH. |
| Also, make certain that the "proctrack" rpm is installed. |
| 1. export OBJECT_MODE=32 |
| 2. Build with: |
| ./configure --enable-debug --prefix=/opt/freeware \ |
| --sysconfdir=/opt/freeware/etc/slurm |
| --with-ssl=/opt/freeware --with-munge=/opt/freeware |
| make |
| make uninstall # remove old shared libraries, aix caches them |
| make install |
| 3. To build RPMs (NOTE: GNU tools early in PATH as described above in #0): |
| Create a file specifying system specific files: |
| # |
| # RPM Macros for use with SLURM on AIX |
| # The system-wide macros for RPM are in /usr/lib/rpm/macros |
| # and this overrides a few of them |
| # |
| %_prefix /opt/freeware |
| %_slurm_sysconfdir %{_prefix}/etc/slurm |
| #%_defaultdocdir %{_prefix}/doc |
| %_with_debug 1 |
| %_with_aix 1 |
| %with_ssl "--with-ssl=/opt/freeware" |
| %with_munge "--with-munge=/opt/freeware" |
| %with_proctrack "--with-proctrack=/admin/llnl/include" |
| Log in to the machine "uP". uP is currently the lowest-common-denominator |
| AIX machine. |
| CC=/usr/bin/gcc build -s https://eris.llnl.gov/svn/slurm/tags/slurm-1-2-0-0-pre3 |
| 4. export MP_RMLIB=./slurm_ll_api.so |
| export CHECKPOINT=yes |
| 5. poe hostname -rmpool debug |
| 6. To debug, set SLURM_LL_API_DEBUG=3 before running poe - will create a file |
| /tmp/slurm.* |
| It can also be helpful to use poe options "-ilevel 6 -pmdlog yes" |
| There will be a log file create named /tmp/mplog.<jobid>.<taskid> |
| 7. If you update proctrack, be sure to run "slibclean" to clear cached |
| version. |
| 8. Remove the RPMs that we don't want: |
| rm -f slurm-perlapi*rpm slurm-torque*rpm |
| and install the other RPMs into /usr/admin/inst.images/slurm/aix5.3 on an |
| OCF AIX machine (pdev is a good choice). |
| |
| AIX/Federation switch window problems |
| To clean switch windows: ntblclean =w 8 -a sni0 |
| To get switch window status: ntblstatus |
| |
| BlueGene bglblock boot problem diagnosis |
| - Logon to the Service Node (bglsn, ubglsn) |
| - Execute /admin/bglscripts/fatalras |
| This will produce a list of failures including Rack and Midplane number |
| <date> R<rack> M<midplane> <failure details> |
| - Translate the Rack and Midplane to SLURM node id: smap -R r<rack><midplane> |
| - Drain only the bad SLURM node, return others to service using scontrol |
| |
| Configuration file update procedures: |
| - cd /usr/bgl/dist/slurm (on bgli) |
| - co -l <filename> |
| - vi <filename> |
| - ci -u <filename> |
| - make install |
| - then run "dist_local slurm" on SN and FENs to update /etc/slurm |
| |
| Some RPM commands: |
| rpm -qa | grep slurm (determine what is installed) |
| rpm -qpl slurm-1.1.9-1.rpm (check contents of an rpm) |
| rpm -e slurm-1.1.8-1 (erase an rpm |
| rpm -i --ignoresize slurm-1.1.9-1.rpm (install a new rpm) |
| For main SLURM plugin installation on BGL service node: |
| rpm -i --force --nodeps --ignoresize slurm-1.1.9-1.rpm |
| |
| |
| To clear a wedged job: |
| /bgl/startMMCSconsole |
| > delete bgljob #### |
| > free RMP### |
| |
| Starting and stopping daemons on Linux: |
| /etc/init.d/slurm stop |
| /etc/init.d/slurm start |
| |
| Patches: |
| - cd to the top level src directory |
| - Run the patch command with epilog_complete.patch as stdin: |
| patch -p[path_level_to_filter] [--dry-run] < epilog_complete.patch |
| |
| To get the process and job IDs with proctrack/sgi_job: |
| - jstat -p |
| |
| CVS and gnats: |
| Include "gnats:<id> e.g. "(gnats:123)" as part of cvs commit to |
| automatically record that update in gnats database. NOTE: Does |
| not change gnats bug state, but records source files associated |
| with the bug. |
| |
| For memory leaks (for AIX use zerofault, zf; for linux use valgrind) |
| valgrind --tool=memcheck --leak-check=yes --num-callers=6 --leak-resolution=med ./slurmctld |
| |
| Before new major release: |
| - Test on ia64, i386, x86_64, BGL, AIX, OSX, XCPU |
| - Test on Elan and IB switches |
| - Test fail-over of slurmctld |
| - Test for memory leaks in slurmctld and slurmd |
| - Change API version number |
| - Review and release web pages |
| - Review and release code |
| - Run "make check" |
| - Test that the prolog and epilog run |
| - Run the test suite with SlurmUser NOT being self |