blob: 692436019e6226e2f9c35e0bd3d4baba780e5bed [file] [log] [blame] [edit]
LLNL Security Tests
11 December 2008
Unless otherwise noted. Run all tests as a normal user (not user root or
SlurmUser).
For computers running the slurmdbd daemon, run tests 1.x and 4.x only.
For computers running the slurmctld or slurmd daemon, run tests 1.x, 2.x,
and 3.x only.
NOTE: If SLURM has no pdebug partition (e.g. it can not run jobs without
Moab), then you will need to enable normal user jobs to run in the pbatch
partition and explicitly start them by setting their priority to a non-zero
value. For example (as user root):
# scontrol update PartitionName=pbatch RootOnly=NO < ENABLE JOB SUBMIT WITHOUT MOAB
<submit jobs as desired here>
# scontrol update JobID=<slurm_job_id> Priority=1 < ENABLE JOB TO RUN
# scontrol update PartitionName=pbatch RootOnly=YES < FORCE JOB SUBMIT THROUGH MOAB
Test 1.1 Verify file permissions
====================================
Execute "security_1_1.py" and check for errors. If the installation
directory is not "/usr" and the configuration directory not "/etc/slurm" then
specify the appropriate locations using the "--prefix" and "--sysconfdir"
options. It's last line will be "SUCCESS" if there were no errors and
"FAILURE" otherwise. File which do not exist or their directories can not
be read are noted as "WARNING:". Some warnings are expected unless a complete
SLURM installation (all RPMs) is performed and the test is executed as user
root. The output near the top with a prefix of "NOTE:" describes these cases.
Example:
$ ./security_1_1.py --prefix=/usr --sysconfdir=/etc/slurm
NOTE: slurm_epilog and slurm_prolog only exist on BlueGene systems
NOTE: federation.conf only exists on AIX systems
NOTE: sview, slurmdbd and slurmdbd.conf exists only on selected systems
NOTE: JobCredentialPrivateKey, SlurmctldLogFile, and StateSaveLocation only on control host
NOTE: SlurmdLogFile and SlurmdSpoolDir only exist on compute servers
Ensuring the following are not world writable:
OK: 755 /etc/slurm
OK: 444 /etc/slurm/slurm.conf
WARNING: Unable to stat /etc/slurm/bluegene.conf
...
OK: 755 /usr/lib/slurm/proctrack_sgi_job.so
OK: 755 /usr/lib/slurm/checkpoint_ompi.so
Ensuring the following are not world readable:
WARNING: Unable to stat /etc/slurm/slurm.private.key < only on control hosts
WARNING: Unable to stat /etc/slurm/slurmdbd.conf < only on slurmdbd host
WARNING: Unable to stat /etc/slurm/wiki.conf < only on control host
SUCCESS
Test 1.2 Verify that SlurmUser is unique
============================================
Execute "security_1_2.bash" and check for errors. Make sure that the
"scontrol" program is in your default search path or modify the script.
The last line of output will be "SUCCESS" if there were no errors and
"FAILURE" otherwise.
NOTE: On systems without a full slurm instalation (e.g. the slurmdbd server),
use the second example below.
Example for systems running slurmctld:
$ ./security_1_2.bash < Gets UID from "scontrol show config"
slurm:x:97:97:Slurm user:/var/slurm:/bin/false < If not full slurm install, see below
SUCCESS
Example for systems running only slurmdbd:
$ grep slurm /etc/passwd
slurm:x:101:101:slurm:/var/slurm:/sbin/nologin < Note the UID
$ grep ":101:" /etc/passwd
slurm:x:101:101:slurm:/var/slurm:/sbin/nologin < No other users/groups with ID of 101
Test 2.1 Verify that unauthenticated requests are rejected
==============================================================
Execute "security_2_1.bash" and check for errors. Make sure that the
"scontrol" and "srun" programs are in your default search path or modify
the script. It's last line of output will be "SUCCESS" if there were no
errors and "FAILURE" otherwise. Note that srun errors are expected due
to authentication failure.
NOTE: Insure the event is logged in SlurmctldLogFile for test 2.4 verification.
Example:
For AIX only:
$ export PATH=/opt/freeware/bin:/opt/freeware/sbin:$PATH
For all systems:
$ ./security_2_1.bash
srun: warning: auth/dummy plugin selected
srun: warning: auth/dummy plugin selected
srun: warning: auth/dummy plugin selected
srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error: Unable to allocate resources: Zero Bytes were transmitted or received
srun errors above are expected < ABOVE ERRORS ARE EXPECTED
SUCCESS
Test 2.2.1 Verify that normal user can't reconfigure the system
=================================================================
Execute "scontrol reconfig".
NOTE: Insure the event is logged in SlurmctldLogFile for test 2.4 verification.
Example:
$ scontrol reconfig
slurm_reconfigure error: Invalid user id < ERROR IS EXPECTED
Test 2.2.2 Verify that jobs can't be run as another user
==========================================================
First run a job as ourself and they try to do so as another user. On some
systems the pdebug partition should be specified. On BlueGene systems a
batch script must be submitted (use "sbatch --partition=pdebug security_2_2_2.sh").
If there is no pdebug partition and the pbatch partition can only be used by
user ROOT, you will need to change that before and after the test. See
example below for BlueGene. The script security_2_2_2.sh just invokes
"/usr/bin/id".
NOTE: Insure the event is logged in SlurmctldLogFile for test 2.4 verification.
Linux/AIX example:
$ srun --partition=pdebug security_2_2_2.sh
uid=5136(jette) gid=5136(jette) groups=902(pcs),1153(bgldev),1310(aixdev),5136(jette)
$ srun --partition=pdebug --uid=0 security_2_2_2.sh
srun: error: Unable to allocate resources: Invalid user id < ERROR IS EXPECTED
BlueGene example:
# scontrol update PartitionName=pbatch RootOnly=NO < AS USER ROOT
$ sbatch security_2_2_2.sh < AS REGULAR USER
sbatch: Submitted batch job 1784597
# scontrol update JobID=1784597 Priority=1 < ENABLE JOB TO RUN
$ sbatch --uid=0 security_2_2_2.sh
sbatch: error: Batch job submission failed: Invalid user id < ERROR IS EXPECTED
$ /usr/bin/id >out.2.2.2
$ diff out.2.2.2 slurm.out.1784597 < AFTER JOB COMPLETES
IDs SHOULD MATCH
# scontrol update PartitionName=pbatch RootOnly=YES < AS USER ROOT
Test 2.2.3 Verify that one user can't modify another user's job
=================================================================
Submit a job as one user then try to cancel or modify it as another user.
See note above if there is no pdebug or pbatch is root accessible only.
NOTE: Insure the event is logged in SlurmctldLogFile for test 2.4 verification.
Example:
# scontrol update PartitionName=pbatch RootOnly=NO < AS USER ROOT
$ sbatch --partition=pdebug security_2_2_3.sh < AS REGULAR USER
sbatch: Submitted batch job 1784601
FROM A DIFFERENT USER (NOT root or SlurmUser):
$ scancel 1784601 < THE JOB ID FROM ABOVE
scancel: error: Kill job error on job id 1784601: Access denied
$ scontrol update timelimit=4 jobid=1784601 < THE JOB ID FROM ABOVE
slurm_update error: Invalid user id
# scontrol update PartitionName=pbatch RootOnly=YES < AS USER ROOT
Test 2.2.4 Verify SLURM's wiki interface is secure
====================================================
SLURM's wiki interface is used by Moab to start and modify user jobs and is
protected by a digital signature. This test confirms that attempts to use the
wiki interface without the proper signature will fail. First we need to build
the test then execute it. The build lines vary by architecture.
NOTE: Insure the event is logged in SlurmctldLogFile for test 2.4 verification.
For BlueGene:
$ gcc -m64 -L/usr/lib64 -lslurm -osecurity_2_2_4 security_2_2_4.c
For peleton:
$ cc -L/usr/lib64 -lslurm -osecurity_2_2_4 security_2_2_4.c
For other Linux systems:
$ cc -lslurm -osecurity_2_2_4 security_2_2_4.c
Then execute thus:
$ ./security_2_2_4
Bad checksum reported < ERROR IS EXPECTED
SUCCESS
For AIX only (I am having linking problems. This work-around will work):
$ export OBJECT_MODE=32
$ gcc -DAIX=1 -I/opt/freeware/include -L/opt/freeware/lib -lslurm -osecurity_2_2_4 -pthread security_2_2_4.c
$ scontrol show config | grep ControlAddr (substitute the value in execution below)
Then execute thus:
$ ./security_2_2_4 <ControlAddr>
Bad checksum reported < ERROR IS EXPECTED
SUCCESS
Test 2.3 Verify that trigger program runs a the proper user
===============================================================
Since SlurmUser is not root at LLNL, this is a no-op for now. Test by
executing "security_2_3.sh".
Example:
$ ./security_2_3.sh
Executing:
strigger --set --idle --offset=0 --program=/g/g0/jette/slurm-1.3.way/testsuite/slurm_unit/slurmctld/security_2_3_in
slurm_set_trigger: Operation not permitted < ERROR IS EXPECTED
If this failure is a security violation, that's fine
Test 2.4 Verify security violations are logged
==================================================
NOTE: You will need to view the log file as user root.
From test 2.1 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 11:41:23] error: authentication: authentication type mismatch
[Dec 01 11:41:23] error: slurm_receive_msg: Header lengths are longer than data received
[Dec 01 11:41:23] error: slurm_receive_msg: Header lengths are longer than data received
From test 2.2.1 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 11:41:45] error: Security violation, RECONFIGURE RPC from uid=5136
[Dec 01 11:41:45] error: _slurm_rpc_reconfigure_controller: Invalid user id
From test 2.2.2 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 11:47:06] error: Security violation, RESOURCE_ALLOCATE from uid=5136
From test 2.2.3 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 11:56:21] error: Security violation, JOB_CANCEL RPC from uid 7558
[Dec 01 11:57:35] error: Security violation, JOB_UPDATE RPC from uid 7558
From test 2.2.4 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 12:24:55] error: wiki: message checksum error
Test 3.1 Verify slurm only accepts requests with valid credential
=====================================================================
Submit a job with a fabricated credential (lacking a proper digital
signature).
NOTE: Insure the event is logged in SlurmdLogFile for test 3.3 verification.
Example:
$ srun -Z -w adev7 hostname
srun: do not allocate resources
srun: error: Task launch failed on node adev7(0): Invalid job credential
srun: error: 1 launch request failed
srun: Job Failed < ERROR EXPECTED HERE
Test 3.2 Verify slurm only accepts attach requests from valid user
======================================================================
Start a job as one user, then attept attaching to it from a different user.
NOTE: This test is not applicable for BlueGene systems.
NOTE: Insure the event is logged in SlurmdLogFile for test 3.3 verification.
Example:
$ srun -w adev7 security_3_2.csh
Mon Dec 1 15:03:56 PST 2008
Mon Dec 1 15:04:06 PST 2008
Mon Dec 1 15:04:16 PST 2008
From a different window for the same user:
[jette@adevi etc]$ sattach 97700.0
Mon Dec 1 15:03:56 PST 2008
Mon Dec 1 15:04:06 PST 2008
Mon Dec 1 15:04:16 PST 2008
From a different window as a different user:
$ sattach 97700.0
sattach: error: Could not get job step info: Access denied
Test 3.3 Verify security violations are logged
==================================================
NOTE: You will need to view the log file as user root.
From test 3.1 in SlurmdLogFile on node where job tried to run (/var/log/slurm/slurmd.log):
[Dec 01 14:58:00] error: Invalid job credential from 5136@192.168.17.198: Invalid job credential
From test 3.3 in SlurmctldLogFile (/var/log/slurm/slurmctld.log):
[Dec 01 15:15:50] error: Security violation, REQUEST_STEP_LAYOUT for JobId=97700 from uid=7558
NOTE: The above error message was added to SLURM version 1.3.12.
Earlier versions of SLURM will not report this message.
Test 4.1 Verify that valid credential is required for SlurmDBD access
=========================================================================
The basic idea here is to try using a private munge program without the secure
munge key (we use our own key). This validates that munge authenitication
is being used as desired. You'll need two windows for this, both running as
a regular user.
NOTE: Insure the event is logged in SlurmdbdLogFile for test 4.3 verification.
Example, window 1, start munged:
./security_4_1a.bash
1024+0 records in
...
munged: Info: Created 2 work threads
munged: Info: Found 2067 users with supplementary groups in 0.958 seconds
Example, window 2, try running sacctmgr with new munged:
./security_4_1b.bash
sacctmgr: error: slurmdbd: DBD_RC is 2002 from DBD_INIT(1400): Failed to unpack DBD_INIT message
sacctmgr: error: slurmdbd: Sending DdbInit msg: Resource temporarily unavailable
Now kill the munged running in window 1 (just use ctrl-c)
Test 4.2 Verify that only authorized users can access the SlurmDBD database
===============================================================================
Try to add a dummy cluster name as a non-priviledged user the computers master node.
Examples (three):
$ sacctmgr add cluster foo_bar
Adding Cluster(s)
Name = foo_bar
sacctmgr: error: slurmdbd(2002): from 1405: Your user doesn't have privilege to preform this action
Problem adding clusters
sacctmgr: slurmdbd: reopening connection
$ sacctmgr add account foo_bar
Adding Account(s)
foo_bar
...
sacctmgr: error: slurmdbd(2002): from 1402: Your user doesn't have privilege to preform this action
Problem adding accounts
sacctmgr: slurmdbd: reopening connection
$ sacctmgr add coordinator foo_bar
Adding Coordinator(s)
foo_bar
...
sacctmgr: error: slurmdbd(2002): from 1402: Your user doesn't have privilege to preform this action
Problem adding accounts
sacctmgr: slurmdbd: reopening connection
Test 4.3 Verify SlurmDBD security violations are logged
===========================================================
From test 4.1 in /usr/log/slurmdbd.log:
[Dec 02 09:53:46] error: Munge decode failed: Invalid credential
[Dec 02 09:53:46] error: Bad authentication: authentication credential invalid
[Dec 02 09:53:46] error: Failed to unpack DBD_INIT message
[Dec 02 09:53:46] error: Processing last message from connection 32(134.9.1.222)
From test 4.2 in /usr/log/slurmdbd.log:
[Dec 02 09:58:02] error: Your user doesn't have privilege to preform this action
[Dec 02 09:58:02] error: Processing last message from connection 32(134.9.1.222) uid(5136)