From Beocat
Jump to: navigation, search
(Created page with "== XSEDE Resources ==")
 
Line 1: Line 1:
== XSEDE Resources ==
== Open Science Grid ==
 
    OSG is a high-throughput computing system where users have unlimited
access to submit large numbers of small single-node jobs of typically 1-8 cores,
10 GB memory, around 10 GB of IO, for 24 hours to the HTCondor queue where the jobs
will be run on supercomputers in the U.S. and Europe.  To use OSG, you must first
obtain an account from the OSG Connect group, arrange a short Zoom meeting
with someone from their support team, use a webpage to upload your ssh keys, then
log into their head node.  Below are several links on OSG, the signup process,
and quick start guides for submitting HTCondor scripts.
 
https://opensciencegrid.org
https://support.opensciencegrid.org/support/home    Full documentations
 
Guidelines for determining if your jobs will work well with OSG
https://opensciencegrid.org/about/computation/
https://support.opensciencegrid.org/support/solutions/articles/5000632058-is-the-open-science-grid-for-you-
 
Below is an example HTCondor job script to run a code named NaMD
 
  #!/bin/bash -l
 
  output = osg.namd.out
  error = osg.namd.error
  log = osg.namd.log
 
  # Requested resources
  request_cpus = 8
  request_memory = 8 GB
  request_disk = 1 GB
  requirements = Arch == "X86_64" && HAS_MODULES == True
 
  transfer_input_files = input_files/        # Slash means all files in that directory
  executable = namd2
  arguments = +p8 test.0.namd
  transfer_output_files = output
  queue 1
 
Below are some common HTCondor commands
> condor_submit htc.sh                      # Submit the condor script to the queue
> condor_q                                  # Check on the status while in the queue
> condor_q netid                            # Check status of currently running jobs
> condor_q 1441271                          # Check status of a particular job
> condor_history 1441271                    # Check status of a job that is completed
> condor_history -long 1441271              # Same but report more info
> condor_rm #                                # Remove the job #
> condor_rm daveturner                      # Remove all jobs for the given username
 
The job output will be in the file specified by 'output='.  This is similar
to the slurm-#.out files on Beocat.
 
The log file contains timings for the file transfers and the job execution.
 
One big change from Slurm on Beocat is that you will need to list all the
input files that need to be transferred before the run since the job will not
be run on a computer with the same file system, and also list the output files
to transfer back after the run.
 
Modules
  Modules are available.  Use 'module avail' on the login node to get a list,
  then request modules as a resource.
https://support.opensciencegrid.org/support/solutions/articles/12000048518
 
Equivalent of array jobs
  If you submit with queue=10, then you get 10 identical jobs.  You can use
  output=job.$(Cluster).$(Process).output to keep the outputs different
  Use 'arguments = input_file.$(ProcId)' to vary the input file for each job.

Revision as of 21:15, 19 April 2021

Open Science Grid

    OSG is a high-throughput computing system where users have unlimited

access to submit large numbers of small single-node jobs of typically 1-8 cores, 10 GB memory, around 10 GB of IO, for 24 hours to the HTCondor queue where the jobs will be run on supercomputers in the U.S. and Europe. To use OSG, you must first obtain an account from the OSG Connect group, arrange a short Zoom meeting with someone from their support team, use a webpage to upload your ssh keys, then log into their head node. Below are several links on OSG, the signup process, and quick start guides for submitting HTCondor scripts.

https://opensciencegrid.org https://support.opensciencegrid.org/support/home Full documentations

Guidelines for determining if your jobs will work well with OSG https://opensciencegrid.org/about/computation/ https://support.opensciencegrid.org/support/solutions/articles/5000632058-is-the-open-science-grid-for-you-

Below is an example HTCondor job script to run a code named NaMD

 #!/bin/bash -l
 
 output = osg.namd.out
 error = osg.namd.error
 log = osg.namd.log
 
 # Requested resources
 request_cpus = 8
 request_memory = 8 GB
 request_disk = 1 GB
 requirements = Arch == "X86_64" && HAS_MODULES == True
 
 transfer_input_files = input_files/         # Slash means all files in that directory
 executable = namd2
 arguments = +p8 test.0.namd
 transfer_output_files = output
 queue 1

Below are some common HTCondor commands > condor_submit htc.sh # Submit the condor script to the queue > condor_q # Check on the status while in the queue > condor_q netid # Check status of currently running jobs > condor_q 1441271 # Check status of a particular job > condor_history 1441271 # Check status of a job that is completed > condor_history -long 1441271 # Same but report more info > condor_rm # # Remove the job # > condor_rm daveturner # Remove all jobs for the given username

The job output will be in the file specified by 'output='. This is similar to the slurm-#.out files on Beocat.

The log file contains timings for the file transfers and the job execution.

One big change from Slurm on Beocat is that you will need to list all the input files that need to be transferred before the run since the job will not be run on a computer with the same file system, and also list the output files to transfer back after the run.

Modules

 Modules are available.  Use 'module avail' on the login node to get a list,
 then request modules as a resource.

https://support.opensciencegrid.org/support/solutions/articles/12000048518

Equivalent of array jobs

  If you submit with queue=10, then you get 10 identical jobs.  You can use
  output=job.$(Cluster).$(Process).output to keep the outputs different
  Use 'arguments = input_file.$(ProcId)' to vary the input file for each job.