Beocat - User contributions [en]

Compute Nodes

2017-08-16T00:40:51Z

Sgstrohkorb:

We currently have four classes of compute nodes. Starting with the oldest first we have

== Mages ==
[1,3,5,7,9,11] - Why are these numbered like this? There are actually 12 physical machines, however each pair (1 and 2, 3 and 4, etc.) is tied together with external [http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect QPI], making them appear as a single node.

{| class="wikitable"
|Processors
|8x 10-Core Xeon E7-8870
|-
|Ram
|1024GB
|-
|Hard Drive
|2x 300GB Hitachi 10,000rpm SAS
|-
|NIC 0
|Broadcom NetXtreme II BCM5709
|-
|NIC 1
|Broadcom NetXtreme II BCM5709
|-
|NIC 2
|Broadcom NetXtreme II BCM5709
|-
|NIC 3
|Broadcom NetXtreme II BCM5709
|-
| 10GbE and QDR Infiniband
|Mellanox Technologies MT27500 Family [ConnectX-3]
|}

== Elves ==
[1-56]
{| class="wikitable"
|Processors
|2x 8-Core Xeon E5-2690
|-
|Ram
|64GB
|-
|Hard Drive
|1x 250GB 7,200 RPM SATA
|-
|NICs
|4x Intel I350
|-
| 10GbE and QDR Infiniband
|Mellanox Technologies MT27500 Family [ConnectX-3]
|}

[57-72,77]
{| class="wikitable"
|Processors
|2x 10-Core Xeon E5-2690 v2
|-
|Ram
|96GB
|-
|Hard Drive
|1x 250GB 7,200 RPM SATA
|-
|NICs
|4x Intel I350
|-
|10GbE and QDR Infiniband
|Mellanox Technologies MT27500 Family [ConnectX-3]
|}

[73-76,78,79]
{| class="wikitable"
|Processors
|2x 10-Core Xeon E5-2690 v2
|-
|Ram
|384GB
|-
|Hard Drive
|1x 250GB 7,200 RPM SATA
|-
|NICs
|4x Intel I350
|-
|10GbE and QDR Infiniband
|Mellanox Technologies MT27500 Family [ConnectX-3]
|}

[80-85]
{| class="wikitable"
|Processors
|2x 10-Core Xeon E5-2690v2
|-
|Ram
|64GB
|-
|Hard Drive
|1x 250GB 7,200 RPM SATA
|-
|NICs
|4x Intel I350
|-
| 10GbE and QDR Infiniband
|Mellanox Technologies MT27500 Family [ConnectX-3]
|}

== Heroes ==
[1-36,47-54]
{| class="wikitable"
| Processors
| 2x 12-Core Xeon E5-2680 v3
|-
| Ram
| 128GB
|-
| Hard Drive
|1x 1TB 7,200 RPM SATA
|-
|NICs
|2x Intel I350
|-
|40GbE
| Mellanox Technologies MT27500 Family [ConnectX-3]
|}

[37-46]
{| class="wikitable"
| Processors
| 2x 12-Core Xeon E5-2680 v3
|-
| Ram
| 512GB
|-
| Hard Drive
|1x 1TB 7,200 RPM SATA
|-
|NICs
|2x Intel I350
|-
|40GbE
| Mellanox Technologies MT27500 Family [ConnectX-3]
|-
| Additional Notes
| 2x Xeon Phi FPU
|}

== Dwarves ==
[1-37,39]
{| class="wikitable"
| Processors
| 2x 16-Core Xeon E5-2683 v4
|-
| Ram
| 128GB
|-
| Hard Drive
|1x 1TB 7,200 RPM SATA
|-
|NICs
|4x Broadcom BCM5719
|-
|40GbE
| Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
|}

[38]
{| class="wikitable"
| Processors
| 2x 16-Core Xeon E5-2683 v4
|-
| Ram
| 512GB
|-
| Hard Drive
|1x 1TB 7,200 RPM SATA
|-
|NICs
|4x Broadcom BCM5719
|-
|40GbE
| Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
|}

[40-49]
{| class="wikitable"
| Processors
| 2x 16-Core Xeon E5-2683 v4
|-
| Ram
| 128GB
|-
| Hard Drive
|1x 1TB 7,200 RPM SATA
|-
|NICs
|4x Broadcom BCM5719
|-
|100GbE
| Mellanox Technologies MT27700 Family [ConnectX-4]
|}
[[Category:Information]]
[[Category:Hardware]]

Installed software

2017-05-14T20:23:59Z

Sgstrohkorb: This is an issue that I ran into and I want to post it on the wiki so others can learn from my mistake. If there's a better way to fix this issue, please let me know.

== Drinking from the Firehose ==
For a complete list of all installed software, see [[NodePackageList]]

== Most Commonly Used Software ==
=== [http://www.open-mpi.org/ OpenMPI] ===
Version 2.0.1

=== [http://www.scilab.org Scilab] ===
Version 6.0.0

=== [http://www.r-project.org/ R] ===
Version 3.3.1

==== Modules ====
We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.

==== Installing your own modules ====
To install your own module, login to Beocat and start R interactively
<syntaxhighlight lang="bash">
R
</syntaxhighlight>
Then install the package using
<syntaxhighlight lang="rsplus">
install.packages("PACKAGENAME")
</syntaxhighlight>
Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as "USA (KS)".

After installing you can test before leaving interactive mode by issuing the command
<syntaxhighlight lang="rsplus">
library("PACKAGENAME")
</syntaxhighlight>
==== Running R Jobs ====

You cannot submit an R script directly. '<tt>qsub myscript.R</tt>' will result in an error. Instead, you need to make a bash [[AdvancedSGE#Running_from_a_qsub_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.qsub

<syntaxhighlight lang="bash">
#!/bin/bash
#$ -l mem=1G
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)
#$ -l h_rt=0:15:00

# Now lets do some actual work. This starts R and loads the file myscript.R
R --no-save -q < myscript.R
</syntaxhighlight>

Now, to submit your R job, you would type
<syntaxhighlight lang="bash">
qsub submit-R.qsub
</syntaxhighlight>

=== [http://www.java.com/ Java] ===
We support 4 versions of the Java VM on Beocat. [[wikipedia:IcedTea|IcedTea]] 7 and 8 (based on [[wikipedia:OpenJDK|OpenJDK]]), Oracle JDK 1.7 (Java 7), and Oracle JDK 1.8 (Java 8).

We allow each user to select his or her Java version individually. If you do not select one, we default to IcedTea 8. This was changed from Oracle JDK 1.7 on May 29, 2015 due to a EOL notice from Oracle.

==== Selecting your Java version ====
First, lets list the available versions. This can be done with the command <code>eselect java-vm list</code>
<pre>
% eselect java-vm list
Available Java Virtual Machines:
[1] icedtea-bin-7
[2] icedtea-bin-8 system-vm
[3] oracle-jdk-bin-1.7
[4] oracle-jdk-bin-1.8
</pre>
If you'll note, icedtea-bin-8 (marked "system-vm") is the default for all users. If you have a custom version set, it will be marked with "user-vm". Now if you wanted to use icedtea-7, you could run the following:
<syntaxhighlight lang="bash">
eselect java-vm set user 1
</syntaxhighlight>
Now, we see the difference when running the above command
<pre>
% eselect java-vm list
Available Java Virtual Machines:
[1] icedtea-bin-7 user-vm
[2] icedtea-bin-8 system-vm
[3] oracle-jdk-bin-1.7
[4] oracle-jdk-bin-1.8
</pre>
To verify you are seeing the correct java, you can run <code>java -version</code>
<pre>
% java -version
java version "1.7.0_121"
OpenJDK Runtime Environment (IcedTea 2.6.8) (Gentoo icedtea-7.2.6.8)
OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
</pre>

=== [http://www.python.org/about/ Python] ===

We have several versions of Python available:
* [http://docs.python.org/2.7/ CPython 2.7]
* [http://docs.python.org/3.4/ CPython 3.4]
* [http://pypy.org/ PyPy 5.4.1] (Python 2.7.10)
* [http://pypy.org/ PyPy3 5.5.0-alpha0] (Python 3.3.5)

For the uninitiated PyPy provides [[wikipedia:Just-in-time_compilation|just-in-time compilation]] for python code. While it doesn't support all modules, code which does run under PyPy can see a significant performance increase.

If you just need python and its default modules, you can use python2 python3 or pypy as you would any other application.

If, however, you need modules that we do not have installed, you should use [http://www.doughellmann.com/projects/virtualenvwrapper/ virtualenvwrapper] to setup a virtual python environment in your home directory. This will let you install python modules as you please.

==== Setting up your virtual environment ====
* [[LinuxBasics#Shells|Change your shell]] to bash
* Make sure ~/.bash_profile exists
<syntaxhighlight lang="bash">
if [ ! -f ~/.bash_profile ]; then cp /etc/skel/.bash_profile ~/.bash_profile; fi
</syntaxhighlight>
* Add a line like <code>source /usr/bin/virtualenvwrapper.sh</code> to your .bashrc.
<syntaxhighlight lang="bash">
echo "source /usr/bin/virtualenvwrapper.sh" >> ~/.bashrc
</syntaxhighlight>
* '''''CRITICAL:''''' Logout, and then log back in
* Show your existing environments
<syntaxhighlight lang="bash">
workon
</syntaxhighlight>
* Create a virtual environment. Here I will create a default virtual environment called 'test', a python2 virtual environment called 'testp2', a python3 virtual environment called 'testp3', and a pypy environment called testpypy. Note that <code>mkvirtualenv --help</code> has many more useful options.
<syntaxhighlight lang="bash">
mkvirtualenv -p $(which python2) testp2
mkvirtualenv -p $(which python3) testp3
mkvirtualenv -p $(which pypy) testpypy
</syntaxhighlight>
* Lets look at our virtual environments
<pre>
%workon
testp2
testp3
testpypy
</pre>
* Activate one of these
<pre>
%workon testp2
</pre>
* You can now install the python modules you want. This can be done using <tt>pip</tt>.
<syntaxhighlight lang="bash">
pip install numpy biopython
</syntaxhighlight>

==== Using your virtual environment within a job ====
Here is a simple job script using the virtual environment testp2
<syntaxhighlight lang="bash">
#!/bin/bash
source /usr/bin/virtualenvwrapper.sh
workon testp2
~/path/to/your/python/script.py
</syntaxhighlight>
==== A note on [http://mpi4py.scipy.org/docs/usrman/index.html mpi4py] ====
If you are wanting to use mpi with your python script and are using a virtual environment, you will need to send the correct environment variables to all of the mpi processes to make the virtual environment work.
<syntaxhighlight lang="bash">
#!/bin/bash
# sample mpi4py submit script
source /usr/bin/virtualenvwrapper.sh
workon testp2
# figure out the location of the python interpreter in the virtual environment
PYTHON_BINARY=$(which python)
# mpirun the python interpreter within the virtual environment
# if you don't use the interpreter within the virtual environment, i.e. just using 'python'
# the system python interpreter (without access to your other modules) will be used.
mpirun ${PYTHON_BINARY} ~/path/to/your/mpi-enabled/python/script.py
</syntaxhighlight>
If you are using comm.send and comm.recv for communication with python objects and receive an output message like the one below, you will need to use [https://support.beocat.ksu.edu/BeocatDocs/index.php/AdvancedSGE#Infiniband infiniband] to allow MPI to communicate properly.
<syntaxhighlight lang="xml">
--------------------------------------------------------------------------
[[33053,1],52]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
Host: host

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
</syntaxhighlight>

==== A note on [http://www.scipy.org/ scipy] ====
SciPy requires numpy, unfortunately it doesn't properly define a dependency on numpy, so you just have to install it first.
<syntaxhighlight lang="bash">
source /usr/bin/virtualenvwrapper.sh
workon testp2
pip install numpy
# now scipy needs lapack and it doesn't detect the system one. lets fix it
export LAPACK=/usr/lib/libreflapack.so
export BLAS=/usr/lib/libopenblas_openmp.so
pip install scipy
</syntaxhighlight>

=== [http://www.perl.org/ Perl] ===
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.
==== Submitting a job with Perl ====
Much like R (above), you cannot simply '<tt>qsub myProgram.pl</tt>', but you must create a [[AdvancedSGE#Running_from_a_qsub_Submit_Script|submit script]] which will call perl. Here is an example:
<syntaxhighlight lang="bash">
#!/bin/bash
#$ -l mem=1G
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)
#$ -l h_rt=0:15:00
# Now lets do some actual work.
perl /path/to/myProgram.pl
</syntaxhighlight>
==== Getting Perl with threads ====
* Setup perlbrew
** [[LinuxBasics#Shells|Change your shell]] to bash
** Install perlbrew
<syntaxhighlight lang="bash">
curl -L http://install.perlbrew.pl | bash
</syntaxhighlight>
** Make sure that ~/.bash_profile exists
<syntaxhighlight lang="bash">
if [ ! -f ~/.bash_profile ]; then cp /etc/skel/.bash_profile ~/.bash_profile; fi
</syntaxhighlight>
** Add <code>source ~/perl5/perlbrew/etc/bashrc</code> to ~/.bash_profile
<syntaxhighlight lang="bash">
echo "source ~/perl5/perlbrew/etc/bashrc" >> ~/.bash_profile
</syntaxhighlight>
** Then source your bash profile
<syntaxhighlight lang="bash">
source ~/.bash_profile
</syntaxhighlight>
* Now, install perl with threads within perlbrew
** Find the current Perl version.
<pre>
% perl -version

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux
(with 22 registered patches, see perl -V for more detail)
(...several more lines deleted)
</pre>
** In this case the version is 5.16.3, so we run
<syntaxhighlight lang="bash">
perlbrew install -f -n -D usethreads perl-5.16.3
</syntaxhighlight>
** To temporarily use the new version of perl in the current shell, we now run
<syntaxhighlight lang="bash">
perlbrew use perl-5.16.3
</syntaxhighlight>
** To switch versions of perl for every new login or job, run
<syntaxhighlight lang="bash">
perlbrew switch perl-5.16.3
</syntaxhighlight>
** You can reverse this switch with
<syntaxhighlight lang="bash">
perlbrew switch-off
</syntaxhighlight>

== Installing my own software ==
Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.

In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.

As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].

OLD DEPRECATED AdvancedSGE

2017-05-14T20:02:25Z

Sgstrohkorb: Fixed the option to use infiniband in sample qsub script. I needed to use infiniband and realized that "ib=True" needed a "-l" option before it, so I added it to the wiki so others don't have to deal with it.

== Resource Requests ==
Aside from the time, RAM, and CPU requirements listed on the [[SGEBasics]] page, we have several other requestable resources. Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command
<tt>qconf -sc | awk '{ if ($5 != "NO") { print }}'</tt>
{| class="wikitable sortable"
!name
!shortcut
!type
!relop
!requestable
!consumable
!default
!urgency
|-
|arch
|a
|RESTRING
|==
|YES
|NO
|NONE
|0
|-
|avx
|avx
|BOOL
|==
|YES
|NO
|FALSE
|0
|-
|calendar
|c
|RESTRING
|==
|YES
|NO
|NONE
|0
|-
|cpu
|cpu
|DOUBLE
|>=
|YES
|NO
|0
|0
|-
|cpu_flags
|c_f
|STRING
|==
|YES
|NO
|NONE
|0
|-
|cuda
|cuda
|INT
|<=
|YES
|JOB
|0
|0
|-
|display_win_gui
|dwg
|BOOL
|==
|YES
|NO
|0
|0
|-
|exclusive
|excl
|BOOL
|EXCL
|YES
|YES
|0
|1000
|-
|h_core
|h_core
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|h_cpu
|h_cpu
|TIME
|<=
|YES
|NO
|0:0:0
|0
|-
|h_data
|h_data
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|h_fsize
|h_fsize
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|h_rss
|h_rss
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|h_rt
|h_rt
|TIME
|<=
|FORCED
|NO
|0:0:0
|0
|-
|h_stack
|h_stack
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|h_vmem
|h_vmem
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|hostname
|h
|HOST
|==
|YES
|NO
|NONE
|0
|-
|infiniband
|ib
|BOOL
|==
|YES
|NO
|FALSE
|0
|-
|m_core
|core
|INT
|<=
|YES
|NO
|0
|0
|-
|m_socket
|socket
|INT
|<=
|YES
|NO
|0
|0
|-
|m_thread
|thread
|INT
|<=
|YES
|NO
|0
|0
|-
|m_topology
|topo
|RESTRING
|==
|YES
|NO
|NONE
|0
|-
|m_topology_inuse
|utopo
|RESTRING
|==
|YES
|NO
|NONE
|0
|-
|mem_free
|mf
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|mem_total
|mt
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|mem_used
|mu
|MEMORY
|>=
|YES
|NO
|0
|0
|-
|memory
|mem
|MEMORY
|<=
|FORCED
|YES
|0
|0
|-
|num_proc
|p
|INT
|==
|YES
|NO
|0
|0
|-
|qname
|q
|RESTRING
|==
|YES
|NO
|NONE
|0
|-
|s_core
|s_core
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|s_cpu
|s_cpu
|TIME
|<=
|YES
|NO
|0:0:0
|0
|-
|s_data
|s_data
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|s_fsize
|s_fsize
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|s_rss
|s_rss
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|s_rt
|s_rt
|TIME
|<=
|YES
|NO
|0:0:0
|0
|-
|s_stack
|s_stack
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|s_vmem
|s_vmem
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|slots
|s
|INT
|<=
|YES
|YES
|1
|1000
|-
|swap_free
|sf
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|swap_rate
|sr
|MEMORY
|>=
|YES
|NO
|0
|0
|-
|swap_rsvd
|srsv
|MEMORY
|>=
|YES
|NO
|0
|0
|-
|swap_total
|st
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|swap_used
|su
|MEMORY
|>=
|YES
|NO
|0
|0
|-
|virtual_free
|vf
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|virtual_total
|vt
|MEMORY
|<=
|YES
|NO
|0
|0
|-
|virtual_used
|vu
|MEMORY
|>=
|YES
|NO
|0
|0
|}

The good news is that most of these nobody ever uses. There are a couple of exceptions, though:
=== Infiniband ===
First of all, let me state that just because it sounds "cool" doesn't mean you need it or even want it. Infiniband does absolutely no good if running in a 'single' parallel environment. Infiniband is a high-speed host-to-host communication fabric. It is used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested Infiniband, and all the nodes with Infiniband were currently busy. In fact, some of our fastest nodes do not have Infiniband, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add <tt>-l ib=true</tt> to your qsub command-line.
=== CUDA ===
[[CUDA]] is the resource required for GPU computing. We have a very small number of nodes which have GPUs installed. To request one of these nodes, add <tt>-l cuda=true</tt> to your qsub command-line.
=== Exclusive ===
Some programs just don't play nicely with others. They will attempt to use all available memory or will try to use all the cores it can use. The way to be a nice neighbor if your program has this problem is to request exclusive use of a node with <tt>-l excl=true</tt>. This can also be useful for benchmarking, where you can be sure that no other jobs are interfering with yours.
== Parallel Jobs ==
There are two ways jobs can run in parallel, ''intra''node and ''inter''node. '''Note: Beocat will not automatically make a job run in parallel.''' Have I said that enough? It's a common misperception.
=== Intranode jobs ===
Intranode jobs are easier to code and can take advantage of many common libraries, such as [http://openmp.org/wp/ OpenMP], or Java's threads. Many times, your program will need to know how many cores you want it to use. Many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the qsub directive '<tt>-pe single ''n''</tt>', where ''n'' is the number of cores you wish to use. If your command can take an environment variable, you can use $nslots to tell how many cores you've been allocated.
=== Internode (MPI) jobs ===
"Talking" between nodes is trickier that talking between cores on the same node. The specification for doing so is called "[[wikipedia:Message_Passing_Interface|Message Passing Interface]]", or MPI. We have [http://www.open-mpi.org/ OpenMPI] installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI. You can tell if you have an MPI-enabled program because its directions will tell you to run '<tt>mpirun ''program''</tt>'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '<tt>-pe single ''n''</tt>' for your qsub request, you will use one of the following:
{| class="wikitable sortable"
! Parallel Environment !! Description
|-
|mpi-fill
|This environment will use as many slots on each node as it can until it reaches the number of cores you have requested.
|-
|mpi-spread
|This environment will spread itself out over as many nodes as possible until it reaches the number of cores you have requested.
|-
|mpi-1
|This environment will allocate the slots you've requested 1 per node.
|-
|mpi-2
|This environment will allocate the slots you've requested 2 per node. You must request cores as a multiple of 2
|-
|mpi-4
|This environment will allocate the slots you've requested 4 per node. You must request cores as a multiple of 4
|-
|mpi-8
|This environment will allocate the slots you've requested 8 per node. You must request cores as a multiple of 8
|-
|mpi-10
|This environment will allocate the slots you've requested 10 per node. You must request cores as a multiple of 10
|-
|mpi-12
|This environment will allocate the slots you've requested 12 per node. You must request cores as a multiple of 12
|-
|mpi-16
|This environment will allocate the slots you've requested 16 per node. You must request cores as a multiple of 16
|-
|mpi-20
|This environment will allocate the slots you've requested 20 per node. You must request cores as a multiple of 20
|-
|mpi-80
|This environment will allocate the slots you've requested 80 per node. You must request cores as a multiple of 80
|}
Some quick examples:

<tt>-pe mpi-4 16</tt> will give you 4 chunks of 4 cores apiece. They might all happen to be allocated on the same node (16 cores), on 4 different nodes (4 cores each), on 3 nodes (8 cores on one and 4 cores on the other two), or on 2 nodes (8 cores each).

<tt>-pe mpi-fill 40</tt> will give you 40 cores, but will attempt to get them all on the same node.

<tt>-pe mpi-fill 100</tt> will give you 100 cores, and place them on as few nodes as possible. In this case it's likely you would get a full mage (80 cores) and either part of another mage (the remaining 20 cores) or one of the 20-core elves.

<tt>-pe mpi-spread 40</tt> will give you 40 cores, and will attempt to place each on a separate node.
== Requesting memory for multi-core jobs ==
All memory requests are '''per core'''. One of the more common scenarios is where somebody will need, say 20 cores and 400 GB of memory. So they will make a request like '<tt>-pe single 20, -l mem=400G</tt>' This will never run, because what you are really requesting is 20 cores and 8000GB of memory (20 * 400). Since we have no nodes with 8000 terabytes of memory, the job will never run. In this case, you will divide the 400GB total memory request by the number of cores (20), so the correct command would be '<tt>-pe single 20, -l mem=20G</tt>'.
== Other Handy SGE Features ==
=== Email status changes ===
One of the most commonly used options when submitting jobs not related to resource requests is to have have SGE email you when a job changes its status. This takes two directives to qsub: '<tt>-M ''someone@somewhere.com''</tt>' will give the email address to which to send status updates. '<tt>-m abe</tt>' is probably the most common directive given for ''when'' to send updates. This will send email messages when a job (a)borts, (b)egins, or (e)nds. Other possibilities are (s)uspended and (n)ever.
=== Job Naming ===
If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '<tt>-N ''JobName''</tt>' qsub directive.
=== Combining Output Streams ===
Normally, SGE will create two files for output. One will be .e''jobnumber'' and the other .o''jobnumber''. If you want both of these to be combined into a single file, you can use the qsub directive '<tt>-j y</tt>'.
=== Running from the Current Directory ===
By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '<tt>-cwd</tt>' directive to change to the "current working directory" you used when submitting the job.
=== Running in a specific class of machine ===
If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag "-q \*@@dwarves" to select that queue.
=== SGE Environment Variables ===
Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that SGE makes available to you. Of course the value of these variables will be different based on many different factors.
<syntaxhighlight lang="bash">
HOSTNAME=titan1.beocat
SGE_TASK_STEPSIZE=undefined
SGE_INFOTEXT_MAX_COLUMN=5000
SHELL=/usr/local/bin/sh
NHOSTS=2
SGE_O_WORKDIR=/homes/mozes
TMPDIR=/tmp/105.1.batch.q
SGE_O_HOME=/homes/mozes
SGE_ARCH=lx24-amd64
SGE_CELL=default
RESTARTED=0
ARC=lx24-amd64
USER=mozes
QUEUE=batch.q
PVM_ARCH=LINUX64
SGE_TASK_ID=undefined
SGE_BINARY_PATH=/opt/sge/bin/lx24-amd64
SGE_STDERR_PATH=/homes/mozes/sge_test.sub.e105
SGE_STDOUT_PATH=/homes/mozes/sge_test.sub.o105
SGE_ACCOUNT=sge
SGE_RSH_COMMAND=builtin
JOB_SCRIPT=/opt/sge/default/spool/titan1/job_scripts/105
JOB_NAME=sge_test.sub
SGE_NOMSG=1
SGE_ROOT=/opt/sge
REQNAME=sge_test.sub
SGE_JOB_SPOOL_DIR=/opt/sge/default/spool/titan1/active_jobs/105.1
ENVIRONMENT=BATCH
PE_HOSTFILE=/opt/sge/default/spool/titan1/active_jobs/105.1/pe_hostfile
SGE_CWD_PATH=/homes/mozes
NQUEUES=2
SGE_O_LOGNAME=mozes
SGE_O_MAIL=/var/mail/mozes
TMP=/tmp/105.1.batch.q
JOB_ID=105
LOGNAME=mozes
PE=mpi-fill
SGE_TASK_FIRST=undefined
SGE_O_HOST=loki
SGE_O_SHELL=/bin/bash
SGE_CLUSTER_NAME=beocat
REQUEST=sge_test.sub
NSLOTS=32
SGE_STDIN_PATH=/dev/null
</syntaxhighlight>
Sometimes it is nice to know what hosts you have access to during a PE job. You would checkout the PE_HOSTFILE to know that. If your job has been restarted, it is nice to be able to change what happens rather than redoing all of your work. If this is the case, RESTARTED would equal 1. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.

Some of the most commonly-used variables we see used are $NSLOTS, $HOSTNAME, and $SGE_TASK_ID (used for array jobs, discussed below).
== Running from a qsub Submit Script ==
No doubt after you've run a few jobs you get tired of typing something like 'qsub -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.
<syntaxhighlight lang="bash">
#!/bin/bash

## A Sample qsub script created by Kyle Hutson
##
## Note: Usually a '#" at the beginning of the line is ignored. However, in
## the case of qsub, lines beginning with #$ are commands for qsub itself, so
## I have taken the convention here of starting *every* line with a '#', just
## Delete the first one if you want to use that line, and then modify it to
## your own purposes. The only exception here is the first line, which *must*
## be #!/bin/bash (or another valid shell).

## Specify the amount of RAM needed _per_core_. Default is 1G
##$ -l mem=1G

## Specify the maximum runtime. Default is 1 hour (1:00:00)
##$ -l h_rt=1:00:00

## Require the use of infiniband. If you don't know what this is, you probably
## don't need it. Default is "FALSE"
##$ -l ib=TRUE

## CUDA directive. If You don't know what this is, you probably don't need it
## Default is "FALSE"
##$ -l cuda=TRUE

## Parallel environment. Syntax is '-pe Environment NumberOfCores' A list of
## valid environments can be found at
## https://support.beocat.ksu.edu/BeocatDocs/index.php/AdvancedSGE (section 2). One
## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled
## fairly quickly. If you need a job that requires more than that, you might
## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in
## getting your job scheduled in a reasonable amount of time. Default is
## "single 1"
##$ -pe single 12
##$ -pe mpi-1 2
##$ -pe mpi-fill 20
##$ -pe mpi-spread 16

## Checkpointing. Options are BLCR or dmtcp. Default is no checkpointing.
##$ -ckpt dmtcp

## Use the current working directory instead of your home directory
##$ -cwd

## Merge output and error text streams into a single stream
##$ -j y

## Name my job, to make it easier to find in the queue
##$ -N MyJobTitle

## And finally, we run the job we came here to do.
## $HOME/ProgramDir/ProgramName ProgramArguments

## OR, for the case of MPI-capable jobs
## mpirun $HOME/path/MpiJobName

## Send email when a job is aborted (a), begins (b), and/or ends (e)
##$ -m abe

## Email address to send the email to based on the above line.
##$ -M myemail@ksu.edu
</syntaxhighlight>

== Array Jobs ==
One of SGE's useful options is the ability to run "Array Jobs"

It can be used with the following option to qsub.

-t n[-m[:s]]
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Grid
Engine almost like a series of jobs. The option argument to -t specifies the number of array job tasks and the index number which will be
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SGE_TASK_ID. The option arguments
n, m and s will be available through the environment variables SGE_TASK_FIRST, SGE_TASK_LAST and SGE_TASK_STEPSIZE.

Following restrictions apply to the values n and m:

1 <= n <= 1,000,000
1 <= m <= 1,000,000
n <= m

The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each
with the environment variable SGE_TASK_ID containing one of the 5 index numbers.

Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The
number of tasks in a array job is unlimited.

STDOUT and STDERR of array job tasks will be written into different files with the default location

<jobname>.['e'|'o']<job_id>'.'<task_id>

=== Examples ===
==== Change the Size of the Run ====
Array Jobs have a variety of uses, one of the easiest to comprehend is the following:

I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.

My original script looks like this:

<syntaxhighlight lang="bash">
#!/bin/bash
RUNSIZE=50
#RUNSIZE=100
#RUNSIZE=150
#RUNSIZE=200
app1 $RUNSIZE dataset.txt
</syntaxhighlight>
For every run of that job I have to change the RUNSIZE variable, and submit each script. This gets tedious.

With Array Jobs the script can be written like so:

<syntaxhighlight lang="bash">
#!/bin/bash
#$ -t 50:200:50
RUNSIZE=$SGE_TASK_ID
app1 $RUNSIZE dataset.txt
</syntaxhighlight>
I then submit that job, and SGE understands that it needs to run it 4 times, once for each task. It also knows that it can and should run these tasks in parallel.

==== Choosing a Dataset ====
A slightly more complex use of Array Jobs is the following:

I have an application, app2, that needs to be run against every line of my dataset. Every line changes how app2 runs slightly, but I need to compare the runs against each other.

Originally I had to take each line of my dataset and generate a new submit script and submit the job. This was done with yet another script:

<syntaxhighlight lang="bash">
#!/bin/bash
DATASET=dataset.txt
scriptnum=0
while read LINE
do
echo "app2 $LINE" > ${scriptnum}.sh
qsub ${scriptnum}.sh
scriptnum=$(( $scriptnum + 1 ))
done < $DATASET
</syntaxhighlight>
Not only is this needlessly complex, it is also slow, as qsub has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.

<syntaxhighlight lang="bash">
#!/bin/bash
#$ -t 1:5000
app2 `sed -n "${SGE_TASK_ID}p" dataset.txt`
</syntaxhighlight>
This uses a subshell via `, and has the sed command print out only the line number $SGE_TASK_ID out of the file dataset.txt.

Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so qsub doesn't have to verify as many.

To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.
== Running jobs interactively ==
Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'qrsh'. qrsh uses the exact same command-line arguments as qsub. If no node is available with your resource requirements, qrsh will tell you
Your "qrsh" request could not be scheduled, try again later.
Note that, like qsub, your interactive job will timeout after your allotted time has passed.
== Altering Job Requests ==
We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job and resubmit it with the right parameters. '''If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.'''
=== qalter ===
<tt>qalter</tt> is the command that can be used to modify parameters of the job after it has been submitted. '''Note: resource requests (memory, runtime, et. al.) can only be modified on jobs that have yet to start running.'''
==== Changing resource requests ====
Syntax:
<syntaxhighlight lang="bash">
qalter -l $all_resources $jobid
</syntaxhighlight>
When modifying resource requests, you '''must''' specify all of the resources your job needs, not just the one you plan to change. If you just specify h_rt, it will drop the memory request. If you just specify memory, it will drop the h_rt. And so on. This leads to jobs failing to start.
==== Changing core requests ====
Syntax:
<syntaxhighlight lang="bash">
qalter -pe $pe_name $number_of_cores $jobid
</syntaxhighlight>
If you request more cores than are available in the parallel environment that you need, the job may fail to start.
: i.e. requesting 400 cores in the single environment will fail due to the fact that we have no machines with 400 cores.
==== Determining why a job is not running ====
Syntax:
<syntaxhighlight lang="bash">
qalter -w v $jobid
</syntaxhighlight>
This will output the scheduler's reasoning as to why the job has not started. Note that lines like:
Job 1122334455 cannot run in PE "single" because it only offers 0 slots
Are usually red herrings. Sometimes they are indicative that the scheduler cannot meet the resources requests for that job at this moment in time.

Sometimes you will see output like this:
Job 1122334455 does not request 'forced' resource "memory" of queue instance batch.q@elf73.beocat
In this case the user performed a qalter and forgot to specify the memory request. The job will never run in this state.

Other times it will have lots of lines like this:
verification: found possible assignment with 1 slots
This indicates that the job should be scheduled shortly.
== Killable jobs ==
There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as "killable." If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.

The way you would designate your job as killable is to add <tt>-l killable</tt> to the '''<tt>qsub</tt> or <tt>qrsh</tt>''' arguments. This could be either on the command-line or in your script file.

''Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted.'' If you would like jobs modified to be '''killable''' after the jobs have been submitted (and it is too much work to <tt>qdel</tt> the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.

== Scheduling Priority ==
The scheduler uses a complex formula to determine the order that jobs get scheduled in. Jobs in general get run in the order that they are submitted to the queue with the following exceptions. Jobs for users in a group that owns nodes will immediately get scheduled on those nodes even if that means bumping existing jobs off. Users in groups that have contributed funds to Beocat may have higher scheduling priority. You can check the base scheduling priority of each group using <tt>qconf -sst</tt>. If you do not have a group your jobs are scheduled using BEODEFAULT. The higher the priority, the faster your job will be moved to the front of the queue. A fair scheduling algorithm adjusts this scheduling priority down as users in that group submit more jobs.

Since all users not in a group having higher priority get put into BEODEFAULT, the priority is always very low and each job gets scheduled in the order it was submitted. Groups with a higher priority may jump ahead of the BEODEFAULT jobs, but if these groups are submitting lots of jobs their priority will become low as well. Groups with the highest priority that are submitting the fewest jobs may see those jobs moved to the front of the queue quickly.

When processing cores become available, the scheduler looks at the head of the queue to find jobs that will fit within the resources available. Shorter jobs of 12 hours or less get marked as killable and will be run on nodes owned by other groups. These jobs will jump past longer jobs when resources become available on owned nodes. Many jobs in the queue may require more memory than is available on some nodes, so smaller memory jobs will be scheduled ahead of larger memory jobs on hosts with more limited memory. <tt>kstat -q</tt> will show you the order in the queue and allow you to see jobs marked as "killable" and those that require large memory.

== Job Accounting ==
Some people may find it useful to know what their job did during its run. The qacct tool will read SGE's accounting file and give you summarized or detailed views on jobs that have run within Beocat.
=== qacct ===
This data can usually be used to diagnose two very common job failures.
==== Job debugging ====
It is simplest if you know the job number of the job you are trying to get information on.
<syntaxhighlight lang="bash">
# if you know the jobid, put it here:
qacct -j 1122334455
# if you don't know the job id, you can look at your jobs over some number of days in this case the past 14 days:
qacct -o $USER -d 14 -j
</syntaxhighlight>

===== My job didn't do anything when it ran! =====
<tt>qname batch.q
hostname mage07.beocat
group some_user_users
owner some_user
project BEODEFAULT
department defaultdepartment
jobname my_job_script.sh
jobnumber 1122334455
...
snipped to save space
...
exit_status 1 </tt>
<tt style="color: red">ru_wallclock 1s</tt>
<tt>ru_utime 0.030s
ru_stime 0.030s
...
snipped to save space
...
arid undefined
category -u some_user -q batch.q,long.q -l h_rt=604800,mem_free=1024.0M,memory=2G</tt>
If you look at the line showing ru_wallclock. You can see that it shows 1s. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.

===== My job ran but didn't finish! =====
<tt>qname batch.q
hostname scout59.beocat
group some_user_users
owner some_user
project BEODEFAULT
department defaultdepartment
jobname my_job_script.sh
jobnumber 1122334455
...
snipped to save space
...
slots 1 </tt>
<tt style="color: red">failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit</tt>
<tt>exit_status 0 </tt>
<tt style="color: red">ru_wallclock 21600s</tt>
<tt>ru_utime 0.130s
ru_stime 0.020s
...
snipped to save space
...
arid undefined</tt>
<tt style="color: red">category -u some_user -q batch.q,long.q -l h_rt=21600,mem_free=512.0M,memory=1G</tt>
If you look at the lines showing failed, ru_wallclock and category we can see some pointers to the issue.
It didn't finish because the scheduler (qmaster) enforced some limit. If you look at the category line, the only limit requested was h_rt. So it was a runtime (wallclock) limit.
Comparing ru_wallclock and the h_rt request, we can see that it ran until the h_rt time was hit, and then the scheduler enforce the limit and killed the job. You will need to resubmit the job and ask for more time next time.