Daveturner (talk | contribs) |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== The | == The Rocky/Slurm nodes == | ||
We have converted Beocat from | We have converted Beocat from CentOS Linux to Rocky Linux on April 1st of 2024. Any applications or libraries from the old system must be recompiled. | ||
=== Using Modules === | === Using Modules === | ||
Line 8: | Line 8: | ||
eos> <B>module avail</B> | eos> <B>module avail</B> | ||
eos> <B>module load | eos> <B>module load GROMACS</B> | ||
eos> <B>module list</B> | eos> <B>module list</B> | ||
When a module gets loaded, all the necessary libraries are also loaded and the paths to the libraries and executables are automatically set up. Loading | When a module gets loaded, all the necessary libraries are also loaded and the paths to the libraries and executables are automatically set up. Loading GROMACS for example also loads the OpenMPI library needed to run it and adds the path to the MPI commands and Grimaces executables. To see how the path is set up, try executing <B><I>which gmx</I></B>. The module system allows you to easily switch between different version of applications, libraries, or languages as well. | ||
If you are using a custom code or one that is not installed in a module, you'll need to recompile it yourself. This process is easier under CentOS as some of the work just involves loading the necessary set of modules. The first step is to decide whether to use the Intel compiler toolchain or the GNU toolchain, each of which includes the compilers and other math libraries. The module commands for each are below, and you can load these automatically when you log in by adding one of these module load statements to your .bashrc file. See <B>/homes/daveturner/.bashrc</B> as an example, where I put the module load statements . | If you are using a custom code or one that is not installed in a module, you'll need to recompile it yourself. This process is easier under CentOS as some of the work just involves loading the necessary set of modules. The first step is to decide whether to use the Intel compiler toolchain or the GNU toolchain, each of which includes the compilers and other math libraries. The module commands for each are below, and you can load these automatically when you log in by adding one of these module load statements to your .bashrc file. See <B>/homes/daveturner/.bashrc</B> as an example, where I put the module load statements . | ||
To load the Intel compiler tool chain including the Intel Math Kernel Library (and OpenMPI): | To load the Intel compiler tool chain including the Intel Math Kernel Library (and OpenMPI): | ||
icr-helios> <B>module load iomkl</B> | |||
To load the GNU compiler tool chain including OpenMPI, OpenBLAS, FFTW, and ScalaPack load foss (free open source software): | To load the GNU compiler tool chain including OpenMPI, OpenBLAS, FFTW, and ScalaPack load foss (free open source software): | ||
icr-helios> <B>module load foss</B> | |||
Modules provide an easy way to set up the compilers and libraries you may need to compile your code. Beyond that there are many different ways to compile codes so you'll just need to follow the directions. If you need help you can always email us at <B>beocat@cs.ksu.edu</B>. | Modules provide an easy way to set up the compilers and libraries you may need to compile your code. Beyond that there are many different ways to compile codes so you'll just need to follow the directions. If you need help you can always email us at <B>beocat@cs.ksu.edu</B>. | ||
Line 25: | Line 25: | ||
=== Submitting jobs to Slurm === | === Submitting jobs to Slurm === | ||
You can submit your job script using the <B>sbatch</B> command. | |||
icr-helios> <B>sbatch sbatch_script.sh</B> | |||
icr-helios> <B>kstat --me</B> | |||
This will submit the script and show you a list of your jobs that are running and the jobs you have in the queue. By default the output for each job will go into a <B>slurm-###.out</B> file where ### is the job ID number. If you need to kill a job, you can use the <B>scancel</B> command with the job ID number. | This will submit the script and show you a list of your jobs that are running and the jobs you have in the queue. By default the output for each job will go into a <B>slurm-###.out</B> file where ### is the job ID number. If you need to kill a job, you can use the <B>scancel</B> command with the job ID number. | ||
Line 53: | Line 53: | ||
* <code>--mem-per-cpu=</code> tells how much memory I need. In my example, I'm using our system minimum of 512 MB, which is more than enough. Note that your memory request is '''per core''', which doesn't make much difference for this example, but will as you submit more complex jobs. | * <code>--mem-per-cpu=</code> tells how much memory I need. In my example, I'm using our system minimum of 512 MB, which is more than enough. Note that your memory request is '''per core''', which doesn't make much difference for this example, but will as you submit more complex jobs. | ||
* <code>--time=</code> tells how much runtime I need. This can be in the form of "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". This is a very short job, so 1 minute should be plenty. This can't be changed after the job is started please make sure you have requested a sufficient amount of time. | * <code>--time=</code> tells how much runtime I need. This can be in the form of "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". This is a very short job, so 1 minute should be plenty. This can't be changed after the job is started please make sure you have requested a sufficient amount of time. | ||
* <code>--nodes=1</code> tells Slurm that this must be run on one machine. The [[AdvancedSlurm]] page has much more on the "nodes" switch. | * <code>--nodes=1</code> tells Slurm that this must be run on one machine. The [[AdvancedSlurm]] page has much more on the "nodes" switch. | ||
* <code> | * <code>--ntasks-per-node=16 </code> Request 16 cores on each node. | ||
* <code>--constraint=moles</code> Request to only run on the Mole class of compute nodes. | |||
% '''ls''' | % '''ls''' | ||
Line 74: | Line 73: | ||
The <B>kstat</B> perl script has been developed at K-State to provide you with all the available information about your jobs on Beocat. <B>kstat --help</B> will give you a full description of how to use it. | The <B>kstat</B> perl script has been developed at K-State to provide you with all the available information about your jobs on Beocat. <B>kstat --help</B> will give you a full description of how to use it. | ||
Eos> kstat --help | Eos> kstat --help | ||
USAGE: kstat [-q] [-c] [-g] [-l] [-u user] [-p NaMD] [-j 1234567] [--part partition] | |||
kstat alone dumps all info except for the core summaries | |||
choose -q -c for only specific info on queued or core summaries. | |||
then specify any searchables for the user, program name, or job id | |||
kstat info on running and queued jobs | |||
kstat -h list host info only, no jobs | |||
kstat -q info on the queued jobs only | |||
kstat -c core usage for each user | |||
kstat -d # show jobs run in the last # days | |||
Memory per node - used/allocated/requested | |||
Red is close to or over requested amount | |||
Yellow is under utilized for large jobs | |||
kstat -g Only show GPU nodes | |||
kstat -o Turner Only show info for a given owner | |||
-------------------------------------------------------------------------- | kstat -o CS_HPC Same but sub _ for spaces | ||
kstat -l long list - node features and performance | |||
Node hardware and node CPU usage | |||
job nodelist and switchlist | |||
job current and max memory | |||
job CPU utilizations | |||
kstat -u daveturner job info for one user only | |||
kstat --me job info for my jobs only | |||
kstat -j 1234567 info on a given job id | |||
kstat --osg show OSG background jobs also | |||
kstat --nocolor do not use any color | |||
kstat --name display full names instead of eIDs | |||
---------------- Graphs and Tables --------------------------------------- | |||
Specify graph/table, CPU or GPU or host, usage or memory, and optional time | |||
kstat --graph-cpu-memory # gnuplot CPU memory for job # | |||
kstat --table-gpu-usage-5min # GPU usage table every 5 min for job # | |||
kstat --table-cpu-60min # CPU usage, memory, swap table every 60 min for job # | |||
kstat --table-node [nodename] cores, load, CPU usage, memory table for a node | |||
-------------------------------------------------------------------------- | |||
Multi-node jobs are highlighted in Magenta | |||
kstat -l also provides a node list and switch list | |||
highlighted in Yellow when nodes are spread across multiple switches | |||
Run time is colorized yellow then red for jobs nearing their time limit | |||
Queue time is colorized yellow then red for jobs waiting longer times | |||
-------------------------------------------------------------------------- | |||
kstat can be used to give you a summary of your jobs that are running and in the queue: | kstat can be used to give you a summary of your jobs that are running and in the queue: | ||
Line 157: | Line 169: | ||
=== Detailed information about a single job === | === Detailed information about a single job === | ||
kstat can provide | kstat can provide a great deal of information on a particular job including a very rough estimate of when it will run. This time is a worst case scenario as this will | ||
be adapted as other jobs finish early. This is a good way to check for job submission problems before contacting us. kstat colorizes the more important | be adapted as other jobs finish early. This is a good way to check for job submission problems before contacting us. kstat colorizes the more important | ||
information to make it easier to identify. | information to make it easier to identify. | ||
Line 257: | Line 269: | ||
zhiguang 80 cores % 1.3 used 688 cores queued | zhiguang 80 cores % 1.3 used 688 cores queued | ||
=== Producing memory and CPU utilization tables and graphs === | |||
kstat can now produce tables or graphs for the memory or CPU utilization | |||
for a job. In order to view graphs you must set up X11 forwarding on your | |||
ssh connection by using the -X parameter. | |||
If you want to read more, continue on to our [[AdvancedSlurm]] page. | If you want to read more, continue on to our [[AdvancedSlurm]] page. | ||
=== kstat is now available to download and install on other clusters === | |||
https://gitlab.beocat.ksu.edu/Admin-Public/kstat | |||
This software has been installed and used on several clusters for many years. | |||
It should be considered Beta software and may take some slight modifications | |||
to install on some clusters. Please contact the author if you want to give | |||
it a try (daveturner@ksu.edu). |
Latest revision as of 23:54, 29 July 2024
The Rocky/Slurm nodes
We have converted Beocat from CentOS Linux to Rocky Linux on April 1st of 2024. Any applications or libraries from the old system must be recompiled.
Using Modules
If you're using a common code that others may also be using, we may already have it compiled in a module. You can list the modules available and load an application as in the example below for Vasp.
eos> module avail eos> module load GROMACS eos> module list
When a module gets loaded, all the necessary libraries are also loaded and the paths to the libraries and executables are automatically set up. Loading GROMACS for example also loads the OpenMPI library needed to run it and adds the path to the MPI commands and Grimaces executables. To see how the path is set up, try executing which gmx. The module system allows you to easily switch between different version of applications, libraries, or languages as well.
If you are using a custom code or one that is not installed in a module, you'll need to recompile it yourself. This process is easier under CentOS as some of the work just involves loading the necessary set of modules. The first step is to decide whether to use the Intel compiler toolchain or the GNU toolchain, each of which includes the compilers and other math libraries. The module commands for each are below, and you can load these automatically when you log in by adding one of these module load statements to your .bashrc file. See /homes/daveturner/.bashrc as an example, where I put the module load statements .
To load the Intel compiler tool chain including the Intel Math Kernel Library (and OpenMPI):
icr-helios> module load iomkl
To load the GNU compiler tool chain including OpenMPI, OpenBLAS, FFTW, and ScalaPack load foss (free open source software):
icr-helios> module load foss
Modules provide an easy way to set up the compilers and libraries you may need to compile your code. Beyond that there are many different ways to compile codes so you'll just need to follow the directions. If you need help you can always email us at beocat@cs.ksu.edu.
Submitting jobs to Slurm
You can submit your job script using the sbatch command.
icr-helios> sbatch sbatch_script.sh icr-helios> kstat --me
This will submit the script and show you a list of your jobs that are running and the jobs you have in the queue. By default the output for each job will go into a slurm-###.out file where ### is the job ID number. If you need to kill a job, you can use the scancel command with the job ID number.
Submitting your first job
To submit a job to run under Slurm, we use the sbatch (submit batch) command. The scheduler finds the optimum place for your job to run. With over 300 nodes and 7500 cores to schedule, as well as differing priorities, hardware, and individual resources, the scheduler's job is not trivial and it can take some time for a job to start even when there are empty nodes available.
There are a few things you'll need to know before running sbatch.
- How many cores you need. Note that unless your program is created to use multiple cores (called "threading"), asking for more cores will not speed up your job. This is a common misperception. Beocat will not magically make your program use multiple cores! For this reason the default is 1 core.
- How much time you need. Many users when beginning to use Beocat neglect to specify a time requirement. The default is one hour, and we get asked why their job died after one hour. We usually point them to the FAQ.
- How much memory you need. The default is 1 GB. If your job uses significantly more than you ask, your job will be killed off.
- Any advanced options. See the AdvancedSlurm page for these requests. For our basic examples here, we will ignore these.
So let's now create a small script to test our ability to submit jobs. Create the following file (either by copying it to Beocat or by editing a text file and we'll name it myhost.sh
. Both of these methods are documented on our LinuxBasics page.
#!/bin/sh
hostname
Be sure to make it executable
chmod u+x myhost.sh
So, now lets submit it as a job and see what happens. Here I'm going to use five options
--mem-per-cpu=
tells how much memory I need. In my example, I'm using our system minimum of 512 MB, which is more than enough. Note that your memory request is per core, which doesn't make much difference for this example, but will as you submit more complex jobs.--time=
tells how much runtime I need. This can be in the form of "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds". This is a very short job, so 1 minute should be plenty. This can't be changed after the job is started please make sure you have requested a sufficient amount of time.--nodes=1
tells Slurm that this must be run on one machine. The AdvancedSlurm page has much more on the "nodes" switch.--ntasks-per-node=16
Request 16 cores on each node.--constraint=moles
Request to only run on the Mole class of compute nodes.
% ls myhost.sh % sbatch --time=1 --mem-per-cpu=512M --cpus-per-task=1 --ntasks=1 --nodes=1 ./myhost.sh salloc: Granted job allocation 1483446
Since this is such a small job, it is likely to be scheduled almost immediately, so a minute or so later, I now see
% ls myhost.sh slurm-1483446.out
% cat slurm-1483446.out mage03
Monitoring Your Job
The kstat perl script has been developed at K-State to provide you with all the available information about your jobs on Beocat. kstat --help will give you a full description of how to use it.
Eos> kstat --help USAGE: kstat [-q] [-c] [-g] [-l] [-u user] [-p NaMD] [-j 1234567] [--part partition] kstat alone dumps all info except for the core summaries choose -q -c for only specific info on queued or core summaries. then specify any searchables for the user, program name, or job id kstat info on running and queued jobs kstat -h list host info only, no jobs kstat -q info on the queued jobs only kstat -c core usage for each user kstat -d # show jobs run in the last # days Memory per node - used/allocated/requested Red is close to or over requested amount Yellow is under utilized for large jobs kstat -g Only show GPU nodes kstat -o Turner Only show info for a given owner kstat -o CS_HPC Same but sub _ for spaces kstat -l long list - node features and performance Node hardware and node CPU usage job nodelist and switchlist job current and max memory job CPU utilizations kstat -u daveturner job info for one user only kstat --me job info for my jobs only kstat -j 1234567 info on a given job id kstat --osg show OSG background jobs also kstat --nocolor do not use any color kstat --name display full names instead of eIDs ---------------- Graphs and Tables --------------------------------------- Specify graph/table, CPU or GPU or host, usage or memory, and optional time kstat --graph-cpu-memory # gnuplot CPU memory for job # kstat --table-gpu-usage-5min # GPU usage table every 5 min for job # kstat --table-cpu-60min # CPU usage, memory, swap table every 60 min for job # kstat --table-node [nodename] cores, load, CPU usage, memory table for a node -------------------------------------------------------------------------- Multi-node jobs are highlighted in Magenta kstat -l also provides a node list and switch list highlighted in Yellow when nodes are spread across multiple switches Run time is colorized yellow then red for jobs nearing their time limit Queue time is colorized yellow then red for jobs waiting longer times --------------------------------------------------------------------------
kstat can be used to give you a summary of your jobs that are running and in the queue:
Eos> kstat --me
Hero43
24 of 24 cores
Load 23.4 / 24
495.3 / 512 GB used
daveturner
unafold 1234567
1 core
running
4gb req
0 d 5 h 35 m
daveturner
octopus 1234568
16 core
running
128gb req
8 d 15 h 42 m
################################## BeoCat Queue ###################################
daveturner
NetPIPE 1234569
2 core
PD
2h
4gb req
0 d 1 h 2 m
kstat produces a separate line for each host. Use kstat -h to see information on all hosts without the jobs. For the example above we are listing our jobs and the hosts they are on.
Core usage - yellow for empty, red for empty on owned nodes, cyan for partially used, blue for all cores used.
Load level - yellow or yellow background indicates the node is being inefficiently used. Red just means more threads than cores.
Memory usage - yellow or red means most memory is used.
If the node is owned the group name will be in orange on the right. Killable jobs can still be run on those nodes.
Each job line will contain the username, program name, job ID, number of cores, the status which may be colored red for killable jobs, the maximum memory used or memory requested, and the amount of time the job has run. Jobs in the queue may contain information on the requested memory and run time, priority access, constraints, and how long the job has been in the queue. In this case, I have 2 jobs running on Hero43. unafold is using 1 core while octopus is using 16 cores. Slurm did not provide any information on the actual memory use so the memory request is reported
Detailed information about a single job
kstat can provide a great deal of information on a particular job including a very rough estimate of when it will run. This time is a worst case scenario as this will be adapted as other jobs finish early. This is a good way to check for job submission problems before contacting us. kstat colorizes the more important information to make it easier to identify.
Eos> kstat -j 157054 ################################## Beocat Queue ################################### daveturner netpipe 157054 64 cores PD dwarves fabric CS HPC 8gb req 0 d 0 h 0 m JobId 157054 Job Name netpipe UserId=daveturner GroupId=daveturner_users(2117) MCS_label=N/A Priority=11112 Nice=0 Account=ksu-cis-hpc QOS=normal Status=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=00:40:00 TimeMin=N/A SubmitTime=2018-02-02T18:18:31 EligibleTime=2018-02-02T18:18:31 Estimated Start Time is 2018-02-03T06:17:49 EndTime=2018-02-03T06:57:49 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partitions killable.q,ksu-cis-hpc.q AllocNode:Sid=eos:1761 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=dwarf[01-02] NumNodes=2-2 NumCPUs=64 NumTasks=64 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES 2 nodes 64 cores 8192 mem gres/fabric 2 Socks/Node=* NtasksPerN:B:S:C=32:0:*:* CoreSpec=* MinCPUsNode=32 MinMemoryNode=4G MinTmpDiskNode=0 Constraint=dwarves DelayBoot=00:00:00 Gres=fabric Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Slurm script /homes/daveturner/perf/NetPIPE-5.x/sb.np WorkDir=/homes/daveturner/perf/NetPIPE-5.x StdErr=/homes/daveturner/perf/NetPIPE-5.x/0.o157054 StdIn=/dev/null StdOut=/homes/daveturner/perf/NetPIPE-5.x/0.o157054 Switches=1@00:05:00
#!/bin/bash -l
#SBATCH --job-name=netpipe
#SBATCH -o 0.o%j
#SBATCH --time=0:40:00
#SBATCH --mem=4G
#SBATCH --switches=1
#SBATCH --nodes=2
#SBATCH --constraint=dwarves
#SBATCH --ntasks-per-node=32
#SBATCH --gres=fabric:roce:1
host=`echo $SLURM_JOB_NODELIST | sed s/[^a-z0-9]/\ /g | cut -f 1 -d ' '`
nprocs=$SLURM_NTASKS
openmpi_hostfile.pl $SLURM_JOB_NODELIST 1 hf.$host
opts="--printhostnames --quick --pert 3"
echo "*******************************************************************"
echo "Running on $SLURM_NNODES nodes $nprocs cores on nodes $SLURM_JOB_NODELIST"
echo "*******************************************************************"
mpirun -np 2 --hostfile hf.$host NPmpi $opts -o np.${host}.mpi
mpirun -np 2 --hostfile hf.$host NPmpi $opts -o np.${host}.mpi.bi --async --bidir
mpirun -np $nprocs NPmpi $opts -o np.${host}.mpi$nprocs --async --bidir
Completed jobs and memory usage
kstat -d #
This will provide information on the jobs you have currently running and those that have completed in the last '#' days. This is currently the only reliable way to get the memory used per node for your job. This also provides information on whether the job completed normally, was canceled with scancel, timed out, or was killed because it exceeded its memory request.
Eos> kstat -d 10
########################### sacct -u daveturner for 10 days ########################### max gb used on a node / gb requested per node 193037 ADF dwarf43 1 n 32 c 30.46gb/100gb 05:15:34 COMPLETED 193289 ADF dwarf33 1 n 32 c 26.42gb/100gb 00:50:43 CANCELLED 195171 ADF dwarf44 1 n 32 c 56.81gb/120gb 14:43:35 COMPLETED 209518 matlab dwarf36 1 n 1 c 0.00gb/ 4gb 00:00:02 FAILED
Summary of core usage
kstat can also provide a listing of the core usage and cores requested for each user.
Eos> kstat -c ############################## Core usage ############################### antariksh 1512 cores %25.1 used 41528 cores queued bahadori 432 cores % 7.2 used 80 cores queued eegoetz 0 cores % 0.0 used 2 cores queued fahrialkan 24 cores % 0.4 used 32 cores queued gowri 66 cores % 1.1 used 32 cores queued jeffcomer 160 cores % 2.7 used 0 cores queued ldcoates12 80 cores % 1.3 used 112 cores queued lukesteg 464 cores % 7.7 used 0 cores queued mike5454 1060 cores %17.6 used 852 cores queued nilusha 344 cores % 5.7 used 0 cores queued nnshan2014 136 cores % 2.3 used 0 cores queued ploetz 264 cores % 4.4 used 60 cores queued sadish 812 cores %13.5 used 0 cores queued sandung 72 cores % 1.2 used 56 cores queued zhiguang 80 cores % 1.3 used 688 cores queued
Producing memory and CPU utilization tables and graphs
kstat can now produce tables or graphs for the memory or CPU utilization for a job. In order to view graphs you must set up X11 forwarding on your ssh connection by using the -X parameter.
If you want to read more, continue on to our AdvancedSlurm page.
kstat is now available to download and install on other clusters
https://gitlab.beocat.ksu.edu/Admin-Public/kstat
This software has been installed and used on several clusters for many years. It should be considered Beta software and may take some slight modifications to install on some clusters. Please contact the author if you want to give it a try (daveturner@ksu.edu).