From Beocat
Jump to: navigation, search
 
(76 intermediate revisions by 5 users not shown)
Line 1: Line 1:
== Drinking from the Firehose ==
== Module Availability ==
For a complete list of all installed modules, see [[ModuleList]]
Most people will be just fine running 'module avail' to see a list of modules available on Beocat. There are a couple software packages that are only available on particular node types. For those cases, check [https://modules.beocat.ksu.edu/ our modules website.] If you are used to OpenScienceGrid computing, you may wish to take a look at how to use [[OpenScienceGrid#Using_OpenScienceGrid_modules_on_Beocat|their modules.]]


== Toolchains ==
== Toolchains ==
Line 7: Line 7:
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.
We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.


These toolchains include (you can run 'module keyword keychain compiler'):
These toolchains include (you can run 'module keyword toolchain'):
; GCC:    The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).
; GCCcore:    The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj,...).
; foss:    GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.
; foss:    GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.
; gcccuda:    GNU Compiler Collection (GCC) based compiler toolchain, along with CUDA toolkit.
; gmvapich2:    GNU Compiler Collection (GCC) based compiler toolchain, including MVAPICH2 for MPI support.
; gompi:    GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.
; gompi:    GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.
; gompic:    GNU Compiler Collection (GCC) based compiler toolchain along with CUDA toolkit, including OpenMPI for MPI support with CUDA features enabled.
; goolfc:    GCC based compiler toolchain __with CUDA support__, and including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.
; icc:    C and C++ compiler from Intel
; iccifort:    Intel Cluster Toolkit Compiler Edition provides Intel C,C++ and fortran compilers, Intel MPI and Intel MKL
; ifort:    Fortran compiler from Intel
; iomkl:    Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.
; iomkl:    Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.
; iompi:    Intel C/C++ and Fortran compilers, alongside Open MPI.
; intel:    Intel Compiler Suite, providing Intel C/C++ and Fortran compilers, Intel MKL & Intel MPI. Recently made free by Intel, we have less experience with Intel MPI than OpenMPI.


You can run 'module spider $toolchain' to see the versions we have:
You can run 'module spider $toolchain/' to see the versions we have:
  $ module spider iomkl
  $ module spider iomkl/
* iomkl/2017a
* iomkl/2017a
* iomkl/2017b
* iomkl/2017b
Line 46: Line 37:


With software we provide, the toolchain used to compile is always specified in the "version" of the software that you want to load.
With software we provide, the toolchain used to compile is always specified in the "version" of the software that you want to load.
If you mix toolchains, inconsistent things may happen.
== Most Commonly Used Software ==
== Most Commonly Used Software ==
Check our [https://modules.beocat.ksu.edu/ modules website] for the most up to date software availability.
The versions mentioned below are representations of what was available at the time of writing, not necessarily what is currently available.
=== [http://www.open-mpi.org/ OpenMPI] ===
=== [http://www.open-mpi.org/ OpenMPI] ===
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module spider OpenMPI':
We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module avail OpenMPI/'
 
The first step to run an MPI application is to load one of the compiler toolchains that include OpenMPI.  You normally will just need to load the default version as below.  If your code needs access to nVidia GPUs you'll need the cuda version above.  Otherwise some codes are picky about what versions of the underlying GNU or Intel compilers that are needed.
 
  module load foss
 
If you are working with your own MPI code you will need to start by compiling it.  MPI offers <B>mpicc</B> for compiling codes written in C, <B>mpic++</B> for compiling C++ code, and <B>mpifort</B> for compiling Fortran code.  You can get a complete listing of parameters to use by running them with the <B>--help</B> parameter.  Below are some examples of compiling with each.
 
  mpicc --help
  mpicc -o my_code.x my_code.c
  mpic++ -o my_code.x my_code.cc
  mpifort -o my_code.x my_code.f
 
In each case above, you can name the executable file whatever you want (I chose <T>my_code.x</I>).  It is common to use different optimization levels, for example, but those may depend on which compiler toolchain you choose.  Some are based on the Intel compilers so you'd need to use  optimizations for the underlying icc or ifort compilers they call, and some are GNU based so you'd use compiler optimizations for gcc or gfortran.
 
We have many MPI codes in our modules that you simply need to load before using.  Below is an example of loading and running Gromacs which is an MPI based code to simulate large numbers of atoms classically.
 
  module load GROMACS
 
This loads the Gromacs modules and sets all the paths so you can run the scalar version <B>gmx</B> or the MPI version <B>gmx_mpi</B>.  Below is a sample job script for running a complete Gromacs simulation.
 
  #!/bin/bash -l
  #SBATCH --mem=120G
  #SBATCH --time=24:00:00
  #SBATCH --job-name=gromacs
  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=4
 
  module reset
  module load GROMACS
 
  echo "Running Gromacs on $HOSTNAME"
 
  export OMP_NUM_THREADS=1
  time mpirun -x OMP_NUM_THREADS=1 gmx_mpi mdrun -nsteps 500000 -ntomp 1 -v -deffnm 1ns -c 1ns.pdb -nice 0
 
  echo "Finished run on $SLURM_NTASKS $HOSTNAME cores"
 
<B>mpirun</B> will run your job on all cores requested which in this case is 4 cores on a single node.  You will often just need to guess at the memory size for your code, then check on the memory usage with <B>kstat --me</B> and adjust the memory in future jobs.
 
I prefer to put a <B>module reset</B> in my scripts then manually load the modules needed to insure each run is using the modules it needs.  If you don't do this when you submit a job script it will simply use the modules you currently have loaded which is fine too.
 
I also like to put a <B>time</B> command in front of each part of the script that can use significant amounts of time.  This way I can track the amount of time used in each section of the job script.  This can prove very useful if your job script copies large data files around at the start, for example, allowing you to see how much time was used for each stage of the job if it runs longer than expected.
 
The OMP_NUM_THREADS environment variable is set to 1 and passed to the MPI system to insure that each MPI task only uses 1 thread.  There are some MPI codes that are also multi-threaded, so this insures that this particular code uses the cores allocated to it in the manner we want.
 
Once you have your job script ready, submit it using the <B>sbatch</B> command as below where the job script is in the file <I>sb.gromacs</I>.
 
  sbatch sb.gromacs
 
You should then monitor your job as it goes through the queue and starts running using <B>kstat --me</B>.  You code will also generate an output file, usually of the form <I>slurm-#######.out</I> where the 7 # signs are the 7 digit job ID number.  If you need to cancel your job use <B>scancel</B> with the 7 digit job ID number.


* OpenMPI/2.0.2-GCC-6.3.0-2.27
  scancel #######
* OpenMPI/2.0.2-iccifort-2017.1.132-GCC-6.3.0-2.27
* OpenMPI/2.1.1-GCC-6.4.0-2.28
* OpenMPI/2.1.1-GCC-7.2.0-2.29
* OpenMPI/2.1.1-gcccuda-2017b
* OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28
* OpenMPI/2.1.1-iccifort-2018.0.128-GCC-7.2.0-2.29


=== [http://www.r-project.org/ R] ===
=== [http://www.r-project.org/ R] ===
We currently provide (module -r spider '^R$'):
You can see what versions of R we provide with 'module avail R/'
* R/3.4.0-foss-2017beocatb-X11-20170314


==== Packages ====
==== Packages ====
Line 72: Line 112:
</syntaxhighlight>
</syntaxhighlight>
Then install the package using
Then install the package using
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="R">
install.packages("PACKAGENAME")
install.packages("PACKAGENAME")
</syntaxhighlight>
</syntaxhighlight>
Line 78: Line 118:


After installing you can test before leaving interactive mode by issuing the command
After installing you can test before leaving interactive mode by issuing the command
<syntaxhighlight lang="rsplus">
<syntaxhighlight lang="R">
library("PACKAGENAME")
library("PACKAGENAME")
</syntaxhighlight>
</syntaxhighlight>
==== Running R Jobs ====
==== Running R Jobs ====


You cannot submit an R script directly. '<tt>sbatch myscript.R</tt>' will result in an error. Instead, you need to make a bash [[AdvancedSGE#Running_from_a_qsub_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch
You cannot submit an R script directly. '<tt>sbatch myscript.R</tt>' will result in an error. Instead, you need to make a bash [[AdvancedSlurm#Running_from_a_sbatch_Submit_Script|script]] that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
#!/bin/bash
#!/bin/bash -l
#SBATCH --mem-per-cpu=1G
#SBATCH --mem-per-cpu=4G
# Now we tell qsub how long we expect our work to take: 15 minutes (D-H:MM:SS)
# Now we tell Slurm how long we expect our work to take: 15 minutes (D-HH:MM:SS)
#SBATCH --time=0-0:15:00
#SBATCH --time=0-00:15:00


# Now lets do some actual work. This starts R and loads the file myscript.R
# Now lets do some actual work. This starts R and loads the file myscript.R
module reset
module load R
module load R
R --no-save -q < myscript.R
R --no-save -q < myscript.R
Line 100: Line 141:
sbatch submit-R.sbatch
sbatch submit-R.sbatch
</syntaxhighlight>
</syntaxhighlight>
You can monitor your jobs using <B>kstat --me</B>.  The output of your job will be in a slurm-#.out file where '#' is the 7 digit job ID number for your job.


=== [http://www.java.com/ Java] ===
=== [http://www.java.com/ Java] ===
We currently provide (module spider Java):
You can see what versions of Java we support with 'module avail Java/'
* Java/1.8.0_131
* Java/1.8.0_144


=== [http://www.python.org/about/ Python] ===
=== [http://www.python.org/about/ Python] ===
We currently provide (module spider Python)
You can see what versions of Python we support with 'module avail Python/'. Note: Running this does not load a Python module, it just shows you a list of the ones that are available.
* Python/2.7.13-foss-2017beocatb
* Python/2.7.13-GCCcore-7.2.0-bare
* Python/2.7.13-iomkl-2017a
* Python/2.7.13-iomkl-2017beocatb
* Python/3.6.3-foss-2017b
* Python/3.6.3-foss-2017beocatb
* Python/3.6.3-iomkl-2017beocatb


If you need modules that we do not have installed, you should use [https://virtualenv.pypa.io/en/stable/userguide/ virtualenv] to setup a virtual python environment in your home directory. This will let you install python modules as you please.
If you need libraries that we do not have installed, you should use [https://docs.python.org/3/library/venv.html python -m venv] to setup a virtual python environment in your home directory. This will let you install python libraries as you please.


==== Setting up your virtual environment ====
==== Setting up your virtual environment ====
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Load Python
# Load Python (pick a version from the 'module avail Python/' list)
module load Python/3.6.3-iomkl-2017beocatb
module load Python/SOME_VERSION_THAT_YOU_PICKED_FROM_THE_LIST
</syntaxhighlight>
</syntaxhighlight>
(After running this command Python is loaded.  After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)
(After running this command Python is loaded.  After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)
Line 129: Line 163:
cd ~/virtualenvs
cd ~/virtualenvs
</syntaxhighlight>
</syntaxhighlight>
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that <code>virtualenv --help</code> has many more useful options.
* Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that their [https://docs.python.org/3/library/venv.html documentation] has many more useful options.
<syntaxhighlight lang="bash">
python -m venv --system-site-packages test
# or you could use 'python -m venv test'
# using the '--system-site-packages' allows the virtual environment to make use of python libraries we have already installed
# particularly useful if you're going to use our SciPy-Bundle, TensorFlow, or Jupyter
# if you don't use '--system-site-packages' then the virtual environment is completely isolated from our other provided packages and everything it needs it will have to build and install within itself.
</syntaxhighlight>
* Lets look at our virtual environments (the virtual environment name should be in the output):
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
virtualenv test
ls ~/virtualenvs
</syntaxhighlight>
</syntaxhighlight>
* Lets look at our virtual environments
<pre>
% ls ~/virtualenvs
test
</pre>
* Activate one of these
* Activate one of these
<pre>
<syntaxhighlight lang="bash">
%source ~/virtualenvs/test/bin/activate
source ~/virtualenvs/test/bin/activate
</pre>
</syntaxhighlight>
(After running this command your virtual environment is activated.  After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)
(After running this command your virtual environment is activated.  After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)
* You can now install the python modules you want. This can be done using <tt>pip</tt>.
* You can now install the python modules you want. This can be done using <tt>pip</tt>.
Line 149: Line 186:


==== Using your virtual environment within a job ====
==== Using your virtual environment within a job ====
Here is a simple job script using the virtual environment testp2
Here is a simple job script using the virtual environment test
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
#!/bin/bash
#!/bin/bash
module load Python/3.6.3-iomkl-2017beocatb
module load Python/THE_SAME_VERSION_YOU_USED_TO_CREATE_YOUR_ENVIRONMENT_ABOVE
source ~/virtualenvs/test/bin/activate
source ~/virtualenvs/test/bin/activate
export PYTHONDONTWRITEBYTECODE=1
python ~/path/to/your/python/script.py
python ~/path/to/your/python/script.py
</syntaxhighlight>
==== Using MPI with Python within a job ====
We're going to load the SciPy-bundle module, as that has mpi4py available within it.
You check the available versions and load one that uses the python version you would like.
module avail SciPy-bundle
Here is a simple job script using MPI with Python
<syntaxhighlight lang="bash">
#!/bin/bash
module load SciPy-bundle
export PYTHONDONTWRITEBYTECODE=1
mpirun python ~/path/to/your/mpi/python/script.py
</syntaxhighlight>
=== [https://www.tensorflow.org/ TensorFlow] ===
TensorFlow provided by pip is often completely broken on any system that is not running a recent version of Ubuntu. Beocat (and most HPC systems) does not use Ubuntu. As such, we provide TensorFlow modules for you to load.
You can see what versions of TensorFlow we support with 'module avail TensorFlow/'. Note: Running this does not load a TensorFlow module, it just shows you a list of the ones that are available.
If you need other python libraries that we do not have installed, you should use [https://docs.python.org/3/library/venv.html python -m venv] to setup a virtual python environment in your home directory. This will let you install python libraries as you please.
We document creating a virtual environment [[#Setting up your virtual environment|above]]. You can skip loading the python module, as loading TensorFlow will load the correct version of python module behind the scenes. The singular change you need to make is to use the '--system-site-packages' when creating the virtual environment.
<syntaxhighlight lang=bash>
python -m venv --system-site-packages test
# using the '--system-site-packages' allows the virtual environment to make use of python libraries we have already installed
# particularly useful if you're going to use our SciPy-Bundle, or TensorFlow
</syntaxhighlight>
=== Jupyter ===
[https://jupyter.org/ Jupyter] is a framework for creating and running reusable "notebooks" for scientific computing. It runs Python code by default. Normally, it is meant to be used in an interactive manner. Interactive codes can be limiting and/or problematic when used in a cluster environment. We have an example submit script available [https://gitlab.beocat.ksu.edu/Admin-Public/ondemand/job_templates/-/tree/master/Jupyter_Notebook here] to help you transition from an OpenOnDemand interactive job using Jupyter to a non-interactive job.
=== [http://spark.apache.org/ Spark] ===
Spark is a programming language for large scale data processing.
It can be used in conjunction with Python, R, Scala, Java, and SQL.
Spark can be run on Beocat interactively or through the Slurm queue.
To run interactively, you must first request a node or nodes from the Slurm queue.
The line below requests 1 node and 1 core for 24 hours and if available will drop
you into the bash shell on that node.
<syntaxhighlight lang=bash>
srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash
</syntaxhighlight>
We have some sample python based Spark code you can try out that came from the
exercises and homework from the PSC Spark workshop. 
<syntaxhighlight lang=bash>
mkdir spark-test
cd spark-test
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .
</syntaxhighlight>
You will need to set up a python virtual environment and load the <B>nltk</B> package
before you run the first time.
<syntaxhighlight lang=bash>
module load Spark
mkdir -p ~/virtualenvs
cd ~/virtualenvs
python -m venv --system-site-packages spark-test
source ~/virtualenvs/spark-test/bin/activate
pip install nltk
deactivate
</syntaxhighlight>
To run the sample code interactively, load the Python and Spark modules,
source your python virtual environment, change to the sample directory, fire up pyspark,
then execute the sample code.
<syntaxhighlight lang=bash>
module load Spark
source ~/virtualenvs/spark-test/bin/activate
cd ~/spark-test
pyspark
</syntaxhighlight>
<syntaxhighlight lang=python>
exec(open("shakespeare.py").read())
</syntaxhighlight>
You can work interactively from the pyspark prompt (>>>) in addition to running scripts as above.
The Shakespeare directory also contains a sample sbatch submit script that will run the
same shakespeare.py code through the Slurm batch queue. 
<syntaxhighlight lang=bash>
#!/bin/bash -l
#SBATCH --job-name=shakespeare
#SBATCH --mem=10G
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
# Load Spark and Python (version 3 here)
module load Spark
source ~/virtualenvs/spark-test/bin/activate
spark-submit shakespeare.py
</syntaxhighlight>
When you run interactively, pyspark initializes your spark context <B>sc</B>.
You will need to do this manually as in the sample python code when you want
to submit jobs through the Slurm queue.
<syntaxhighlight lang=python>
# If there is no Spark Context (not running interactive from pyspark), create it
try:
  sc
except NameError:
  from pyspark import SparkConf, SparkContext
  conf = SparkConf().setMaster("local").setAppName("App")
  sc = SparkContext(conf = conf)
</syntaxhighlight>
</syntaxhighlight>


Line 160: Line 305:
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.
The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.


If you need a newer version (or threads), just load one we provide in our modules (module spider Perl):
To use perl with threads, out a newer version, you can load it with the module command. To see what versions of perl we provide, you can use 'module avail Perl/'
* Perl/5.26.0-foss-2017beocatb
 
* Perl/5.26.0-iompi-2017beocatb
==== Installing Perl Modules ====
 
The easiest way to install Perl modules is by using <B>cpanm</B>.
Below is an example of installing the Perl module <I>Term::ANSIColor</I>.
 
<syntaxhighlight lang=bash>
module load Perl
cpanm -i Term::ANSIColor
</syntaxhighlight>
 
CPAN: LWP::UserAgent loaded ok (v6.39)
Fetching with LWP:
http://www.cpan.org/authors/01mailrc.txt.gz
CPAN: YAML loaded ok (v1.29)
Reading '/homes/mozes/.cpan/sources/authors/01mailrc.txt.gz'
CPAN: Compress::Zlib loaded ok (v2.084)
............................................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/02packages.details.txt.gz
Reading '/homes/mozes/.cpan/sources/modules/02packages.details.txt.gz'
  Database was generated on Mon, 09 Mar 2020 20:41:03 GMT
.............
  New CPAN.pm version (v2.27) available.
  [Currently running version is v2.22]
  You might want to try
    install CPAN
    reload cpan
  to both upgrade CPAN.pm and run the new version without leaving
  the current session.
...............................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/03modlist.data.gz
Reading '/homes/mozes/.cpan/sources/modules/03modlist.data.gz'
DONE
Writing /homes/mozes/.cpan/Metadata
Running install for module 'Term::ANSIColor'
Fetching with LWP:
http://www.cpan.org/authors/id/R/RR/RRA/Term-ANSIColor-5.01.tar.gz
CPAN: Digest::SHA loaded ok (v6.02)
Fetching with LWP:
http://www.cpan.org/authors/id/R/RR/RRA/CHECKSUMS
Checksum for /homes/mozes/.cpan/sources/authors/id/R/RR/RRA/Term-ANSIColor-5.01.tar.gz ok
CPAN: CPAN::Meta::Requirements loaded ok (v2.140)
CPAN: Parse::CPAN::Meta loaded ok (v2.150010)
CPAN: CPAN::Meta loaded ok (v2.150010)
CPAN: Module::CoreList loaded ok (v5.20190522)
Configuring R/RR/RRA/Term-ANSIColor-5.01.tar.gz with Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Term::ANSIColor
Writing MYMETA.yml and MYMETA.json
  RRA/Term-ANSIColor-5.01.tar.gz
  /opt/software/software/Perl/5.30.0-GCCcore-8.3.0/bin/perl Makefile.PL -- OK
Running make for R/RR/RRA/Term-ANSIColor-5.01.tar.gz
cp lib/Term/ANSIColor.pm blib/lib/Term/ANSIColor.pm
Manifying 1 pod document
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make -- OK
Running make test for RRA/Term-ANSIColor-5.01.tar.gz
PERL_DL_NONLAZY=1 "/opt/software/software/Perl/5.30.0-GCCcore-8.3.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*/*.t
t/docs/pod-coverage.t ....... skipped: POD coverage tests normally skipped
t/docs/pod-spelling.t ....... skipped: Spelling tests only run for author
t/docs/pod.t ................ skipped: POD syntax tests normally skipped
t/docs/spdx-license.t ....... skipped: SPDX identifier tests normally skipped
t/docs/synopsis.t ........... skipped: Synopsis syntax tests normally skipped
t/module/aliases-env.t ...... ok
t/module/aliases-func.t ..... ok
t/module/basic.t ............ ok
t/module/basic256.t ......... ok
t/module/eval.t ............. ok
t/module/stringify.t ........ ok
t/module/true-color.t ....... ok
t/style/coverage.t .......... skipped: Coverage tests only run for author
t/style/critic.t ............ skipped: Coding style tests only run for author
t/style/minimum-version.t ... skipped: Minimum version tests normally skipped
t/style/obsolete-strings.t .. skipped: Obsolete strings tests normally skipped
t/style/strict.t ............ skipped: Strictness tests normally skipped
t/taint/basic.t ............. ok
All tests successful.
Files=18, Tests=430,  7 wallclock secs ( 0.21 usr  0.08 sys +  3.41 cusr  1.15 csys =  4.85 CPU)
Result: PASS
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make test -- OK
Running make install for RRA/Term-ANSIColor-5.01.tar.gz
Manifying 1 pod document
Installing /homes/mozes/perl5/lib/perl5/Term/ANSIColor.pm
Installing /homes/mozes/perl5/man/man3/Term::ANSIColor.3
Appending installation info to /homes/mozes/perl5/lib/perl5/x86_64-linux-thread-multi/perllocal.pod
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make install  -- OK
 
===== When things go wrong =====
Some perl modules fail to realize they shouldn't be installed globally. Usually, you'll notice this when they try to run 'sudo' something. Unfortunately we do not grant sudo access to anyone other then Beocat system administrators. Usually, this can be worked around by putting the following in your <tt>~/.bashrc</tt> file (at the bottom). Once this is in place, you should log out and log back in.
<syntaxhighlight lang="bash">
PATH="/homes/${USER}/perl5/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/homes/${USER}/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}";
export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/homes/${USER}/perl5${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}";
export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/homes/${USER}/perl5\""; export PERL_MB_OPT;
</syntaxhighlight>


==== Submitting a job with Perl ====
==== Submitting a job with Perl ====
Line 169: Line 415:
#!/bin/bash
#!/bin/bash
#SBATCH --mem-per-cpu=1G
#SBATCH --mem-per-cpu=1G
# Now we tell qsub how long we expect our work to take: 15 minutes (H:MM:SS)
# Now we tell sbatch how long we expect our work to take: 15 minutes (H:MM:SS)
#SBATCH --time=0-0:15:00
#SBATCH --time=0-0:15:00
# Now lets do some actual work.  
# Now lets do some actual work.  
Line 178: Line 424:
=== Octave for MatLab codes ===
=== Octave for MatLab codes ===


module load Octave/4.2.1-foss-2017beocatb-enable64
'module avail Octave/'


The 64-bit version of Octave can be loaded using the command above.  Octave can then be used
The 64-bit version of Octave can be loaded using the command above.  Octave can then be used
Line 185: Line 431:
everything that MatLab itself does.
everything that MatLab itself does.


#!/bin/bash -l
<syntaxhighlight lang="bash">
#SBATCH --job-name=octave
#!/bin/bash -l
#SBATCH --output=octave.o%j
#SBATCH --job-name=octave
#SBATCH --time=1:00:00
#SBATCH --output=octave.o%j
#SBATCH --mem=4G
#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --mem=4G
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module purge
 
module load Octave/4.2.1-foss-2017beocatb-enable64
module reset
module load Octave/4.2.1-foss-2017beocatb-enable64
octave < matlab_code.m
 
octave < matlab_code.m
</syntaxhighlight>


=== MatLab compiler ===
=== MatLab compiler ===


Beocat also has a <B>single-user license</B> for the MatLab compiler and the most common toolboxes
Beocat also has a <B>single floating user license</B> for the MatLab compiler and the most common toolboxes
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,
including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox,
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Sumbolic Math Toolbox,  
Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Symbolic Math Toolbox,  
Global Optimization Toolbox, and the Bioinformatics Toolbox.
Global Optimization Toolbox, and the Bioinformatics Toolbox.


Since we only have a <B>single-user license</B>, this means that you will be expected to develop your MatLab code
Since we only have a <B>single floating user license</B>, this means that you will be expected to develop your MatLab code
with Octave or elsewhere on a laptop or departmental server.  Once you're ready to do large runs, then you
with Octave or elsewhere on a laptop or departmental server.  Once you're ready to do large runs, then you
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as
move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as
you want to the scheduler.   
you want to the scheduler.  To use the MatLab compiler, you need to load the MATLAB module to compile code and
load the mcr module to run the resulting MatLab executable.


module load MATLAB<BR>
<syntaxhighlight lang="bash">
module load MATLAB
mcc -m matlab_main_code.m -o matlab_executable_name
mcc -m matlab_main_code.m -o matlab_executable_name
</syntaxhighlight>
If you have addpath() commands in your code, you will need to wrap them in an "if ~deployed" block and tell the
compiler to include that path via the -I flag.
<syntaxhighlight lang="MATLAB">
% wrap addpath() calls like so:
if ~deployed
    addpath('./another/folder/with/code/')
end
</syntaxhighlight>
NOTE:  The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code
you unfortunately may need to wait for up to 30 minutes to compile your own code.
Compiling with additional paths:
<syntaxhighlight lang="bash">
module load MATLAB
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name
</syntaxhighlight>
Any directories added with addpath() will need to be added to the list of compile options as -I arguments.  You
can have multiple -I arguments in your compile command.
Here is an example job submission script.  Modify time, memory, tasks-per-node, and job name as you see fit:
<syntaxhighlight lang="bash">
#!/bin/bash -l
#SBATCH --job-name=matlab
#SBATCH --output=matlab.o%j
#SBATCH --time=1:00:00
#SBATCH --mem=4G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module reset
module load mcr
./matlab_executable_name
</syntaxhighlight>
For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these
files to the compiled archive via the -a flag.  See the behavior of this flag in the [https://www.mathworks.com/help/compiler/mcc.html compiler documentation].  You can either target specific .mex files or entire directories.
Because codes often require adding several directories to the Matlab path as well as mex files from several locations,
we recommend writing a script to preserve and help document the steps to compile your Matlab code.  Here is an
abbreviated example from a current user:
<syntaxhighlight lang="bash">
#!/bin/bash -l


#!/bin/bash -l
module load MATLAB
#SBATCH --job-name=matlab
 
#SBATCH --output=matlab.o%j
cd matlabPyrTools/MEX/
#SBATCH --time=1:00:00
 
#SBATCH --mem=4G
# compile mex files
#SBATCH --nodes=1
mex upConv.c convolve.c wrap.c edges.c
#SBATCH --ntasks-per-node=1
mex corrDn.c convolve.c wrap.c edges.c
mex histo.c
module purge
mex innerProd.c
module load MATLAB
 
cd ../..
./matlab_executable_name
 
mcc -m mongrel_creation.m \
  -I ./matlabPyrTools/MEX/ \
  -I ./matlabPyrTools/ \
  -I ./FastICA/ \
  -a ./matlabPyrTools/MEX/ \
  -a ./texturesynth/ \
  -o mongrel_creation_binary
</syntaxhighlight>


Again, we only have a <B>single-user license</B> for MatLab so the model is to develop and debug your MatLab code
Again, we only have a <B>single floating user license</B> for MatLab so the model is to develop and debug your MatLab code
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without
elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without
limits on Beocat.   
limits on Beocat.   


For more info on the mcc compiler see:  https://www.mathworks.com/help/compiler/mcc.html
For more info on the mcc compiler see:  https://www.mathworks.com/help/compiler/mcc.html
=== COMSOL ===
Beocat has no license for COMSOL. If you want to use it, you must provide your own.
module spider COMSOL/
----------------------------------------------------------------------------
  COMSOL: COMSOL/5.3
----------------------------------------------------------------------------
    Description:
      COMSOL Multiphysics software, an interactive environment for modeling
      and simulating scientific and engineering problems
    This module can be loaded directly: module load COMSOL/5.3
    Help:
     
      Description
      ===========
      COMSOL Multiphysics software, an interactive environment for modeling and
simulating scientific and engineering problems
      You must provide your own license.
      export LM_LICENSE_FILE=/the/path/to/your/license/file
      *OR*
      export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME
      e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu
     
      More information
      ================
      - Homepage: https://www.comsol.com/
==== Graphical COMSOL ====
Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:
<syntaxhighlight lang="bash">
# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)
# load the comsol module on the headnode
module load COMSOL
# export your comsol license as mentioned above, and tell the scheduler to run the software
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw
</syntaxhighlight>
=== .NET Core ===
==== Load .NET ====
mozes@[eunomia] ~ $ module load dotNET-Core-SDK
==== create an application ====
Following instructions from [https://docs.microsoft.com/en-us/dotnet/core/tutorials/using-with-xplat-cli here], we'll create a simple 'Hello World' application
mozes@[eunomia] ~ $ mkdir Hello
mozes@[eunomia] ~ $ cd Hello
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
mozes@[eunomia] ~/Hello $ dotnet new console
The template "Console Application" was created successfully.
Processing post-creation actions...
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...
  Restoring packages for /homes/mozes/Hello/Hello.csproj...
  Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.
  Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.
  Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.
Restore succeeded.
==== Edit your program ====
mozes@[eunomia] ~/Hello $ vi Program.cs
==== Run your .NET application ====
mozes@[eunomia] ~/Hello $ dotnet run
Hello World!
==== Build and run the built application ====
mozes@[eunomia] ~/Hello $ dotnet build
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
  Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.
  Hello -> /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll
Build succeeded.
    0 Warning(s)
    0 Error(s)
Time Elapsed 00:00:02.86
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll
Hello World!


== Installing my own software ==
== Installing my own software ==
Line 238: Line 630:


As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].
As a quick example of installing software in your home directory, we have a sample video on our [[Training Videos]] page. If you're still having problems or questions, please contact support as mentioned on our [[Main Page]].
== Loading multiple modules ==
modules, when loaded, will stay loaded for the duration of your session until they are unloaded.
; You can load multiple pieces of software with one module load command. : module load iompi iomkl
; You can unload all software : module reset
; If you see output from a module load command that looks like ''"The following have been reloaded with a version change"'' you likely have tried to load two pieces of software that have not been tested together. There may be serious issues with using either pieces of software while you're in this state. Libraries missing, applications non-functional. If you encounter issues, you will want to unload all software before switching modules. : 'module reset' and then 'module load'
== Containers ==
More and more science is being done within containers, these days. Sometimes referred to Docker or Kubernetes, containers allow you to package an entire software runtime platform and run that software on another computer or site with minimal fuss.
Unfortunately, Docker and Kubernetes are not particularly well suited to multi-user HPC environments, but that's not to say that you can't make use of these containers on Beocat.
=== Apptainer ===
[https://apptainer.org/docs/user/1.2/index.html Apptainer] is a container runtime that is designed for HPC environments. It can convert docker containers to its own format, and can be used within a job on Beocat. It is a very broad topic and we've made the decision to point you to the upstream documentation, as it is much more likely that they'll have up to date and functional instructions to help you utilize containers. If you need additional assistance, please don't hesitate to reach out to us.

Latest revision as of 19:41, 25 June 2024

Module Availability

Most people will be just fine running 'module avail' to see a list of modules available on Beocat. There are a couple software packages that are only available on particular node types. For those cases, check our modules website. If you are used to OpenScienceGrid computing, you may wish to take a look at how to use their modules.

Toolchains

A toolchain is a set of compilers, libraries and applications that are needed to build software. Some software functions better when using specific toolchains.

We provide a good number of toolchains and versions of toolchains make sure your applications will compile and/or run correctly.

These toolchains include (you can run 'module keyword toolchain'):

foss
GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.
gompi
GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support.
iomkl
Intel Cluster Toolchain Compiler Edition provides Intel C/C++ and Fortran compilers, Intel MKL & OpenMPI.
intel
Intel Compiler Suite, providing Intel C/C++ and Fortran compilers, Intel MKL & Intel MPI. Recently made free by Intel, we have less experience with Intel MPI than OpenMPI.

You can run 'module spider $toolchain/' to see the versions we have:

$ module spider iomkl/
  • iomkl/2017a
  • iomkl/2017b
  • iomkl/2017beocatb

If you load one of those (module load iomkl/2017b), you can see the other modules and versions of software that it loaded with the 'module list':

$ module list
Currently Loaded Modules:
  1) icc/2017.4.196-GCC-6.4.0-2.28
  2) binutils/2.28-GCCcore-6.4.0
  3) ifort/2017.4.196-GCC-6.4.0-2.28
  4) iccifort/2017.4.196-GCC-6.4.0-2.28
  5) GCCcore/6.4.0
  6) numactl/2.0.11-GCCcore-6.4.0
  7) hwloc/1.11.7-GCCcore-6.4.0
  8) OpenMPI/2.1.1-iccifort-2017.4.196-GCC-6.4.0-2.28
  9) iompi/2017b
 10) imkl/2017.3.196-iompi-2017b
 11) iomkl/2017b

As you can see, toolchains can depend on each other. For instance, the iomkl toolchain, depends on iompi, which depends on iccifort, which depend on icc and ifort, which depend on GCCcore which depend on GCC. Hence it is very important that the correct versions of all related software are loaded.

With software we provide, the toolchain used to compile is always specified in the "version" of the software that you want to load.

If you mix toolchains, inconsistent things may happen.

Most Commonly Used Software

Check our modules website for the most up to date software availability.

The versions mentioned below are representations of what was available at the time of writing, not necessarily what is currently available.

OpenMPI

We provide lots of versions, you are most likely better off directly loading a toolchain or application to make sure you get the right version, but you can see the versions we have with 'module avail OpenMPI/'

The first step to run an MPI application is to load one of the compiler toolchains that include OpenMPI. You normally will just need to load the default version as below. If your code needs access to nVidia GPUs you'll need the cuda version above. Otherwise some codes are picky about what versions of the underlying GNU or Intel compilers that are needed.

 module load foss

If you are working with your own MPI code you will need to start by compiling it. MPI offers mpicc for compiling codes written in C, mpic++ for compiling C++ code, and mpifort for compiling Fortran code. You can get a complete listing of parameters to use by running them with the --help parameter. Below are some examples of compiling with each.

 mpicc --help
 mpicc -o my_code.x my_code.c
 mpic++ -o my_code.x my_code.cc
 mpifort -o my_code.x my_code.f

In each case above, you can name the executable file whatever you want (I chose <T>my_code.x). It is common to use different optimization levels, for example, but those may depend on which compiler toolchain you choose. Some are based on the Intel compilers so you'd need to use optimizations for the underlying icc or ifort compilers they call, and some are GNU based so you'd use compiler optimizations for gcc or gfortran.

We have many MPI codes in our modules that you simply need to load before using. Below is an example of loading and running Gromacs which is an MPI based code to simulate large numbers of atoms classically.

 module load GROMACS

This loads the Gromacs modules and sets all the paths so you can run the scalar version gmx or the MPI version gmx_mpi. Below is a sample job script for running a complete Gromacs simulation.

 #!/bin/bash -l
 #SBATCH --mem=120G
 #SBATCH --time=24:00:00
 #SBATCH --job-name=gromacs
 #SBATCH --nodes=1
 #SBATCH --ntasks-per-node=4
 
 module reset
 module load GROMACS
 
 echo "Running Gromacs on $HOSTNAME"
 
 export OMP_NUM_THREADS=1
 time mpirun -x OMP_NUM_THREADS=1 gmx_mpi mdrun -nsteps 500000 -ntomp 1 -v -deffnm 1ns -c 1ns.pdb -nice 0
 
 echo "Finished run on $SLURM_NTASKS $HOSTNAME cores"

mpirun will run your job on all cores requested which in this case is 4 cores on a single node. You will often just need to guess at the memory size for your code, then check on the memory usage with kstat --me and adjust the memory in future jobs.

I prefer to put a module reset in my scripts then manually load the modules needed to insure each run is using the modules it needs. If you don't do this when you submit a job script it will simply use the modules you currently have loaded which is fine too.

I also like to put a time command in front of each part of the script that can use significant amounts of time. This way I can track the amount of time used in each section of the job script. This can prove very useful if your job script copies large data files around at the start, for example, allowing you to see how much time was used for each stage of the job if it runs longer than expected.

The OMP_NUM_THREADS environment variable is set to 1 and passed to the MPI system to insure that each MPI task only uses 1 thread. There are some MPI codes that are also multi-threaded, so this insures that this particular code uses the cores allocated to it in the manner we want.

Once you have your job script ready, submit it using the sbatch command as below where the job script is in the file sb.gromacs.

 sbatch sb.gromacs

You should then monitor your job as it goes through the queue and starts running using kstat --me. You code will also generate an output file, usually of the form slurm-#######.out where the 7 # signs are the 7 digit job ID number. If you need to cancel your job use scancel with the 7 digit job ID number.

  scancel #######

R

You can see what versions of R we provide with 'module avail R/'

Packages

We provide a small number of R modules installed by default, these are generally modules that are needed by more than one person.

Installing your own R Packages

To install your own module, login to Beocat and start R interactively

module load R
R

Then install the package using

install.packages("PACKAGENAME")

Follow the prompts. Note that there is a CRAN mirror at KU - it will be listed as "USA (KS)".

After installing you can test before leaving interactive mode by issuing the command

library("PACKAGENAME")

Running R Jobs

You cannot submit an R script directly. 'sbatch myscript.R' will result in an error. Instead, you need to make a bash script that will call R appropriately. Here is a minimal example. We'll save this as submit-R.sbatch

#!/bin/bash -l
#SBATCH --mem-per-cpu=4G
# Now we tell Slurm how long we expect our work to take: 15 minutes (D-HH:MM:SS)
#SBATCH --time=0-00:15:00

# Now lets do some actual work. This starts R and loads the file myscript.R
module reset
module load R
R --no-save -q < myscript.R

Now, to submit your R job, you would type

sbatch submit-R.sbatch

You can monitor your jobs using kstat --me. The output of your job will be in a slurm-#.out file where '#' is the 7 digit job ID number for your job.

Java

You can see what versions of Java we support with 'module avail Java/'

Python

You can see what versions of Python we support with 'module avail Python/'. Note: Running this does not load a Python module, it just shows you a list of the ones that are available.

If you need libraries that we do not have installed, you should use python -m venv to setup a virtual python environment in your home directory. This will let you install python libraries as you please.

Setting up your virtual environment

# Load Python (pick a version from the 'module avail Python/' list)
module load Python/SOME_VERSION_THAT_YOU_PICKED_FROM_THE_LIST

(After running this command Python is loaded. After you logoff and then logon again Python will not be loaded so you must rerun this command every time you logon.)

  • Create a location for your virtual environments (optional, but helps keep things organized)
mkdir ~/virtualenvs
cd ~/virtualenvs
  • Create a virtual environment. Here I will create a default virtual environment called 'test'. Note that their documentation has many more useful options.
python -m venv --system-site-packages test
# or you could use 'python -m venv test'
# using the '--system-site-packages' allows the virtual environment to make use of python libraries we have already installed
# particularly useful if you're going to use our SciPy-Bundle, TensorFlow, or Jupyter
# if you don't use '--system-site-packages' then the virtual environment is completely isolated from our other provided packages and everything it needs it will have to build and install within itself.
  • Lets look at our virtual environments (the virtual environment name should be in the output):
ls ~/virtualenvs
  • Activate one of these
source ~/virtualenvs/test/bin/activate

(After running this command your virtual environment is activated. After you logoff and then logon again your virtual environment will not be loaded so you must rerun this command every time you logon.)

  • You can now install the python modules you want. This can be done using pip.
pip install numpy biopython

Using your virtual environment within a job

Here is a simple job script using the virtual environment test

#!/bin/bash
module load Python/THE_SAME_VERSION_YOU_USED_TO_CREATE_YOUR_ENVIRONMENT_ABOVE
source ~/virtualenvs/test/bin/activate
export PYTHONDONTWRITEBYTECODE=1
python ~/path/to/your/python/script.py

Using MPI with Python within a job

We're going to load the SciPy-bundle module, as that has mpi4py available within it.

You check the available versions and load one that uses the python version you would like.

module avail SciPy-bundle

Here is a simple job script using MPI with Python

#!/bin/bash
module load SciPy-bundle

export PYTHONDONTWRITEBYTECODE=1
mpirun python ~/path/to/your/mpi/python/script.py

TensorFlow

TensorFlow provided by pip is often completely broken on any system that is not running a recent version of Ubuntu. Beocat (and most HPC systems) does not use Ubuntu. As such, we provide TensorFlow modules for you to load.

You can see what versions of TensorFlow we support with 'module avail TensorFlow/'. Note: Running this does not load a TensorFlow module, it just shows you a list of the ones that are available.

If you need other python libraries that we do not have installed, you should use python -m venv to setup a virtual python environment in your home directory. This will let you install python libraries as you please.

We document creating a virtual environment above. You can skip loading the python module, as loading TensorFlow will load the correct version of python module behind the scenes. The singular change you need to make is to use the '--system-site-packages' when creating the virtual environment.

python -m venv --system-site-packages test
# using the '--system-site-packages' allows the virtual environment to make use of python libraries we have already installed
# particularly useful if you're going to use our SciPy-Bundle, or TensorFlow

Jupyter

Jupyter is a framework for creating and running reusable "notebooks" for scientific computing. It runs Python code by default. Normally, it is meant to be used in an interactive manner. Interactive codes can be limiting and/or problematic when used in a cluster environment. We have an example submit script available here to help you transition from an OpenOnDemand interactive job using Jupyter to a non-interactive job.

Spark

Spark is a programming language for large scale data processing. It can be used in conjunction with Python, R, Scala, Java, and SQL. Spark can be run on Beocat interactively or through the Slurm queue.

To run interactively, you must first request a node or nodes from the Slurm queue. The line below requests 1 node and 1 core for 24 hours and if available will drop you into the bash shell on that node.

srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G --pty bash

We have some sample python based Spark code you can try out that came from the exercises and homework from the PSC Spark workshop.

mkdir spark-test
cd spark-test
cp -rp /homes/daveturner/projects/PSC-BigData-Workshop/Shakespeare/* .

You will need to set up a python virtual environment and load the nltk package before you run the first time.

module load Spark
mkdir -p ~/virtualenvs
cd ~/virtualenvs
python -m venv --system-site-packages spark-test
source ~/virtualenvs/spark-test/bin/activate
pip install nltk
deactivate

To run the sample code interactively, load the Python and Spark modules, source your python virtual environment, change to the sample directory, fire up pyspark, then execute the sample code.

module load Spark
source ~/virtualenvs/spark-test/bin/activate
cd ~/spark-test
pyspark
exec(open("shakespeare.py").read())

You can work interactively from the pyspark prompt (>>>) in addition to running scripts as above.

The Shakespeare directory also contains a sample sbatch submit script that will run the same shakespeare.py code through the Slurm batch queue.

#!/bin/bash -l
#SBATCH --job-name=shakespeare
#SBATCH --mem=10G
#SBATCH --time=01:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

# Load Spark and Python (version 3 here)
module load Spark
source ~/virtualenvs/spark-test/bin/activate

spark-submit shakespeare.py

When you run interactively, pyspark initializes your spark context sc. You will need to do this manually as in the sample python code when you want to submit jobs through the Slurm queue.

# If there is no Spark Context (not running interactive from pyspark), create it
try:
   sc
except NameError:
   from pyspark import SparkConf, SparkContext
   conf = SparkConf().setMaster("local").setAppName("App")
   sc = SparkContext(conf = conf)

Perl

The system-wide version of perl is tracking the stable releases of perl. Unfortunately there are some features that we do not include in the system distribution of perl, namely threads.

To use perl with threads, out a newer version, you can load it with the module command. To see what versions of perl we provide, you can use 'module avail Perl/'

Installing Perl Modules

The easiest way to install Perl modules is by using cpanm. Below is an example of installing the Perl module Term::ANSIColor.

module load Perl
cpanm -i Term::ANSIColor
CPAN: LWP::UserAgent loaded ok (v6.39)
Fetching with LWP:
http://www.cpan.org/authors/01mailrc.txt.gz
CPAN: YAML loaded ok (v1.29)
Reading '/homes/mozes/.cpan/sources/authors/01mailrc.txt.gz'
CPAN: Compress::Zlib loaded ok (v2.084)
............................................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/02packages.details.txt.gz
Reading '/homes/mozes/.cpan/sources/modules/02packages.details.txt.gz'
  Database was generated on Mon, 09 Mar 2020 20:41:03 GMT
.............
  New CPAN.pm version (v2.27) available.
  [Currently running version is v2.22]
  You might want to try
    install CPAN
    reload cpan
  to both upgrade CPAN.pm and run the new version without leaving
  the current session.
...............................................................DONE
Fetching with LWP:
http://www.cpan.org/modules/03modlist.data.gz
Reading '/homes/mozes/.cpan/sources/modules/03modlist.data.gz'
DONE
Writing /homes/mozes/.cpan/Metadata
Running install for module 'Term::ANSIColor'
Fetching with LWP:
http://www.cpan.org/authors/id/R/RR/RRA/Term-ANSIColor-5.01.tar.gz
CPAN: Digest::SHA loaded ok (v6.02)
Fetching with LWP:
http://www.cpan.org/authors/id/R/RR/RRA/CHECKSUMS
Checksum for /homes/mozes/.cpan/sources/authors/id/R/RR/RRA/Term-ANSIColor-5.01.tar.gz ok
CPAN: CPAN::Meta::Requirements loaded ok (v2.140)
CPAN: Parse::CPAN::Meta loaded ok (v2.150010)
CPAN: CPAN::Meta loaded ok (v2.150010)
CPAN: Module::CoreList loaded ok (v5.20190522)
Configuring R/RR/RRA/Term-ANSIColor-5.01.tar.gz with Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Term::ANSIColor
Writing MYMETA.yml and MYMETA.json
  RRA/Term-ANSIColor-5.01.tar.gz
  /opt/software/software/Perl/5.30.0-GCCcore-8.3.0/bin/perl Makefile.PL -- OK
Running make for R/RR/RRA/Term-ANSIColor-5.01.tar.gz
cp lib/Term/ANSIColor.pm blib/lib/Term/ANSIColor.pm
Manifying 1 pod document
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make -- OK
Running make test for RRA/Term-ANSIColor-5.01.tar.gz
PERL_DL_NONLAZY=1 "/opt/software/software/Perl/5.30.0-GCCcore-8.3.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*/*.t
t/docs/pod-coverage.t ....... skipped: POD coverage tests normally skipped
t/docs/pod-spelling.t ....... skipped: Spelling tests only run for author
t/docs/pod.t ................ skipped: POD syntax tests normally skipped
t/docs/spdx-license.t ....... skipped: SPDX identifier tests normally skipped
t/docs/synopsis.t ........... skipped: Synopsis syntax tests normally skipped
t/module/aliases-env.t ...... ok
t/module/aliases-func.t ..... ok
t/module/basic.t ............ ok
t/module/basic256.t ......... ok
t/module/eval.t ............. ok
t/module/stringify.t ........ ok
t/module/true-color.t ....... ok
t/style/coverage.t .......... skipped: Coverage tests only run for author
t/style/critic.t ............ skipped: Coding style tests only run for author
t/style/minimum-version.t ... skipped: Minimum version tests normally skipped
t/style/obsolete-strings.t .. skipped: Obsolete strings tests normally skipped
t/style/strict.t ............ skipped: Strictness tests normally skipped
t/taint/basic.t ............. ok
All tests successful.
Files=18, Tests=430,  7 wallclock secs ( 0.21 usr  0.08 sys +  3.41 cusr  1.15 csys =  4.85 CPU)
Result: PASS
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make test -- OK
Running make install for RRA/Term-ANSIColor-5.01.tar.gz
Manifying 1 pod document
Installing /homes/mozes/perl5/lib/perl5/Term/ANSIColor.pm
Installing /homes/mozes/perl5/man/man3/Term::ANSIColor.3
Appending installation info to /homes/mozes/perl5/lib/perl5/x86_64-linux-thread-multi/perllocal.pod
  RRA/Term-ANSIColor-5.01.tar.gz
  /usr/bin/make install  -- OK
When things go wrong

Some perl modules fail to realize they shouldn't be installed globally. Usually, you'll notice this when they try to run 'sudo' something. Unfortunately we do not grant sudo access to anyone other then Beocat system administrators. Usually, this can be worked around by putting the following in your ~/.bashrc file (at the bottom). Once this is in place, you should log out and log back in.

PATH="/homes/${USER}/perl5/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/homes/${USER}/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}";
export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/homes/${USER}/perl5${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}";
export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/homes/${USER}/perl5\""; export PERL_MB_OPT;

Submitting a job with Perl

Much like R (above), you cannot simply 'sbatch myProgram.pl', but you must create a submit script which will call perl. Here is an example:

#!/bin/bash
#SBATCH --mem-per-cpu=1G
# Now we tell sbatch how long we expect our work to take: 15 minutes (H:MM:SS)
#SBATCH --time=0-0:15:00
# Now lets do some actual work. 
module load Perl
perl /path/to/myProgram.pl

Octave for MatLab codes

'module avail Octave/'

The 64-bit version of Octave can be loaded using the command above. Octave can then be used to work with MatLab codes on the head node and to submit jobs to the compute nodes through the sbatch scheduler. Octave is made to run MatLab code, but it does have limitations and does not support everything that MatLab itself does.

#!/bin/bash -l
#SBATCH --job-name=octave
#SBATCH --output=octave.o%j
#SBATCH --time=1:00:00
#SBATCH --mem=4G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module reset
module load Octave/4.2.1-foss-2017beocatb-enable64

octave < matlab_code.m

MatLab compiler

Beocat also has a single floating user license for the MatLab compiler and the most common toolboxes including the Parallel Computing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox, Image Processing Toolbox, Curve Fitting Toolbox, Neural Network Toolbox, Symbolic Math Toolbox, Global Optimization Toolbox, and the Bioinformatics Toolbox.

Since we only have a single floating user license, this means that you will be expected to develop your MatLab code with Octave or elsewhere on a laptop or departmental server. Once you're ready to do large runs, then you move your code to Beocat, compile the MatLab code into an executable, and you can submit as many jobs as you want to the scheduler. To use the MatLab compiler, you need to load the MATLAB module to compile code and load the mcr module to run the resulting MatLab executable.

module load MATLAB
mcc -m matlab_main_code.m -o matlab_executable_name

If you have addpath() commands in your code, you will need to wrap them in an "if ~deployed" block and tell the compiler to include that path via the -I flag.

% wrap addpath() calls like so:
if ~deployed
    addpath('./another/folder/with/code/')
end

NOTE: The license manager checks the mcc compiler out for a minimum of 30 minutes, so if another user compiles a code you unfortunately may need to wait for up to 30 minutes to compile your own code.

Compiling with additional paths:

module load MATLAB
mcc -m matlab_main_code.m -I ./another/folder/with/code/ -o matlab_executable_name

Any directories added with addpath() will need to be added to the list of compile options as -I arguments. You can have multiple -I arguments in your compile command.

Here is an example job submission script. Modify time, memory, tasks-per-node, and job name as you see fit:

#!/bin/bash -l
#SBATCH --job-name=matlab
#SBATCH --output=matlab.o%j
#SBATCH --time=1:00:00
#SBATCH --mem=4G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module reset
module load mcr

./matlab_executable_name

For those who make use of mex files - compiled C and C++ code with matlab bindings - you will need to add these files to the compiled archive via the -a flag. See the behavior of this flag in the compiler documentation. You can either target specific .mex files or entire directories.

Because codes often require adding several directories to the Matlab path as well as mex files from several locations, we recommend writing a script to preserve and help document the steps to compile your Matlab code. Here is an abbreviated example from a current user:

#!/bin/bash -l

module load MATLAB

cd matlabPyrTools/MEX/

# compile mex files
mex upConv.c convolve.c wrap.c edges.c
mex corrDn.c convolve.c wrap.c edges.c
mex histo.c
mex innerProd.c

cd ../..

mcc -m mongrel_creation.m \
  -I ./matlabPyrTools/MEX/ \
  -I ./matlabPyrTools/ \
  -I ./FastICA/ \
  -a ./matlabPyrTools/MEX/ \
  -a ./texturesynth/ \
  -o mongrel_creation_binary

Again, we only have a single floating user license for MatLab so the model is to develop and debug your MatLab code elsewhere or using Octave on Beocat, then you can compile the MatLab code into an executable and run it without limits on Beocat.

For more info on the mcc compiler see: https://www.mathworks.com/help/compiler/mcc.html

COMSOL

Beocat has no license for COMSOL. If you want to use it, you must provide your own.

module spider COMSOL/
----------------------------------------------------------------------------
 COMSOL: COMSOL/5.3
----------------------------------------------------------------------------
   Description:
     COMSOL Multiphysics software, an interactive environment for modeling
     and simulating scientific and engineering problems

   This module can be loaded directly: module load COMSOL/5.3

   Help:
     
     Description
     ===========
     COMSOL Multiphysics software, an interactive environment for modeling and 
simulating scientific and engineering problems
     You must provide your own license.
     export LM_LICENSE_FILE=/the/path/to/your/license/file
     *OR*
     export LM_LICENSE_FILE=$LICENSE_SERVER_PORT@$LICENSE_SERVER_HOSTNAME
     e.g. export LM_LICENSE_FILE=1719@some.flexlm.server.ksu.edu
     
     More information
     ================
      - Homepage: https://www.comsol.com/

Graphical COMSOL

Running COMSOL in graphical mode on a cluster is generally a bad idea. If you choose to run it in graphical mode on a compute node, you will need to do something like the following:

# Connect to the cluster with X11 forwarding (ssh -Y or mobaxterm)
# load the comsol module on the headnode
module load COMSOL
# export your comsol license as mentioned above, and tell the scheduler to run the software
srun --nodes=1 --time=1:00:00 --mem=1G --pty --x11 comsol -3drend sw

.NET Core

Load .NET

mozes@[eunomia] ~ $ module load dotNET-Core-SDK

create an application

Following instructions from here, we'll create a simple 'Hello World' application

mozes@[eunomia] ~ $ mkdir Hello
mozes@[eunomia] ~ $ cd Hello
mozes@[eunomia] ~/Hello $ export DOTNET_SKIP_FIRST_TIME_EXPERIENCE=true
mozes@[eunomia] ~/Hello $ dotnet new console
The template "Console Application" was created successfully.

Processing post-creation actions...
Running 'dotnet restore' on /homes/mozes/Hello/Hello.csproj...
 Restoring packages for /homes/mozes/Hello/Hello.csproj...
 Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.props.
 Generating MSBuild file /homes/mozes/Hello/obj/Hello.csproj.nuget.g.targets.
 Restore completed in 358.43 ms for /homes/mozes/Hello/Hello.csproj.

Restore succeeded.

Edit your program

mozes@[eunomia] ~/Hello $ vi Program.cs

Run your .NET application

mozes@[eunomia] ~/Hello $ dotnet run
Hello World!

Build and run the built application

mozes@[eunomia] ~/Hello $ dotnet build
Microsoft (R) Build Engine version 15.8.169+g1ccb72aefa for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

 Restore completed in 106.12 ms for /homes/mozes/Hello/Hello.csproj.
 Hello -> /homes/mozes/Hello/bin/Debug/netcoreapp2.1/Hello.dll

Build succeeded.
   0 Warning(s)
   0 Error(s)

Time Elapsed 00:00:02.86
mozes@[eunomia] ~/Hello $ dotnet bin/Debug/netcoreapp2.1/Hello.dll
Hello World!

Installing my own software

Installing and maintaining software for the many different users of Beocat would be very difficult, if not impossible. For this reason, we don't generally install user-run software on our cluster. Instead, we ask that you install it into your home directories.

In many cases, the software vendor or support site will incorrectly assume that you are installing the software system-wide or that you need 'sudo' access.

As a quick example of installing software in your home directory, we have a sample video on our Training Videos page. If you're still having problems or questions, please contact support as mentioned on our Main Page.

Loading multiple modules

modules, when loaded, will stay loaded for the duration of your session until they are unloaded.

You can load multiple pieces of software with one module load command.
module load iompi iomkl
You can unload all software
module reset
If you see output from a module load command that looks like "The following have been reloaded with a version change" you likely have tried to load two pieces of software that have not been tested together. There may be serious issues with using either pieces of software while you're in this state. Libraries missing, applications non-functional. If you encounter issues, you will want to unload all software before switching modules.
'module reset' and then 'module load'

Containers

More and more science is being done within containers, these days. Sometimes referred to Docker or Kubernetes, containers allow you to package an entire software runtime platform and run that software on another computer or site with minimal fuss.

Unfortunately, Docker and Kubernetes are not particularly well suited to multi-user HPC environments, but that's not to say that you can't make use of these containers on Beocat.

Apptainer

Apptainer is a container runtime that is designed for HPC environments. It can convert docker containers to its own format, and can be used within a job on Beocat. It is a very broad topic and we've made the decision to point you to the upstream documentation, as it is much more likely that they'll have up to date and functional instructions to help you utilize containers. If you need additional assistance, please don't hesitate to reach out to us.