OSG: Difference between revisions

Revision as of 21:15, 19 April 2021

Open Science Grid

    OSG is a high-throughput computing system where users have unlimited

access to submit large numbers of small single-node jobs of typically 1-8 cores, 10 GB memory, around 10 GB of IO, for 24 hours to the HTCondor queue where the jobs will be run on supercomputers in the U.S. and Europe. To use OSG, you must first obtain an account from the OSG Connect group, arrange a short Zoom meeting with someone from their support team, use a webpage to upload your ssh keys, then log into their head node. Below are several links on OSG, the signup process, and quick start guides for submitting HTCondor scripts.

https://opensciencegrid.org https://support.opensciencegrid.org/support/home Full documentations

Guidelines for determining if your jobs will work well with OSG https://opensciencegrid.org/about/computation/ https://support.opensciencegrid.org/support/solutions/articles/5000632058-is-the-open-science-grid-for-you-

Below is an example HTCondor job script to run a code named NaMD

 #!/bin/bash -l
 
 output = osg.namd.out
 error = osg.namd.error
 log = osg.namd.log
 
 # Requested resources
 request_cpus = 8
 request_memory = 8 GB
 request_disk = 1 GB
 requirements = Arch == "X86_64" && HAS_MODULES == True
 
 transfer_input_files = input_files/         # Slash means all files in that directory
 executable = namd2
 arguments = +p8 test.0.namd
 transfer_output_files = output
 queue 1

Below are some common HTCondor commands > condor_submit htc.sh # Submit the condor script to the queue > condor_q # Check on the status while in the queue > condor_q netid # Check status of currently running jobs > condor_q 1441271 # Check status of a particular job > condor_history 1441271 # Check status of a job that is completed > condor_history -long 1441271 # Same but report more info > condor_rm # # Remove the job # > condor_rm daveturner # Remove all jobs for the given username

The job output will be in the file specified by 'output='. This is similar to the slurm-#.out files on Beocat.

The log file contains timings for the file transfers and the job execution.

One big change from Slurm on Beocat is that you will need to list all the input files that need to be transferred before the run since the job will not be run on a computer with the same file system, and also list the output files to transfer back after the run.

Modules

 Modules are available.  Use 'module avail' on the login node to get a list,
 then request modules as a resource.

https://support.opensciencegrid.org/support/solutions/articles/12000048518

Equivalent of array jobs

  If you submit with queue=10, then you get 10 identical jobs.  You can use
  output=job.$(Cluster).$(Process).output to keep the outputs different
  Use 'arguments = input_file.$(ProcId)' to vary the input file for each job.

OSG: Difference between revisions

Views

Revision as of 21:15, 19 April 2021

Open Science Grid

Navigation menu

Navigation

Search

Tools

Personal tools

@@ Line 1: / Line 1: @@
-== XSEDE Resources ==
+== Open Science Grid ==
+     OSG is a high-throughput computing system where users have unlimited
+access to submit large numbers of small single-node jobs of typically 1-8 cores,
+GB memory, around 10 GB of IO, for 24 hours to the HTCondor queue where the jobs
+will be run on supercomputers in the U.S. and Europe.  To use OSG, you must first
+obtain an account from the OSG Connect group, arrange a short Zoom meeting
+with someone from their support team, use a webpage to upload your ssh keys, then
+log into their head node.  Below are several links on OSG, the signup process,
+and quick start guides for submitting HTCondor scripts.
+https://opensciencegrid.org
+https://support.opensciencegrid.org/support/home     Full documentations
+Guidelines for determining if your jobs will work well with OSG
+https://opensciencegrid.org/about/computation/
+https://support.opensciencegrid.org/support/solutions/articles/5000632058-is-the-open-science-grid-for-you-
+Below is an example HTCondor job script to run a code named NaMD
+  #!/bin/bash -l
+  output = osg.namd.out
+  error = osg.namd.error
+  log = osg.namd.log
+  # Requested resources
+  request_cpus = 8
+  request_memory = 8 GB
+  request_disk = 1 GB
+  requirements = Arch == "X86_64" && HAS_MODULES == True
+  transfer_input_files = input_files/         # Slash means all files in that directory
+  executable = namd2
+  arguments = +p8 test.0.namd
+  transfer_output_files = output
+  queue 1
+Below are some common HTCondor commands
+> condor_submit htc.sh                       # Submit the condor script to the queue
+> condor_q                                   # Check on the status while in the queue
+> condor_q netid                             # Check status of currently running jobs
+> condor_q 1441271                           # Check status of a particular job
+> condor_history 1441271                     # Check status of a job that is completed
+> condor_history -long 1441271               # Same but report more info
+> condor_rm #                                # Remove the job #
+> condor_rm daveturner                       # Remove all jobs for the given username
+The job output will be in the file specified by 'output='.  This is similar
+to the slurm-#.out files on Beocat.
+The log file contains timings for the file transfers and the job execution.
+One big change from Slurm on Beocat is that you will need to list all the
+input files that need to be transferred before the run since the job will not
+be run on a computer with the same file system, and also list the output files
+to transfer back after the run.
+Modules
+  Modules are available.  Use 'module avail' on the login node to get a list,
+  then request modules as a resource.
+https://support.opensciencegrid.org/support/solutions/articles/12000048518
+Equivalent of array jobs
+   If you submit with queue=10, then you get 10 identical jobs.  You can use
+   output=job.$(Cluster).$(Process).output to keep the outputs different
+   Use 'arguments = input_file.$(ProcId)' to vary the input file for each job.