From Beocat

MAINTENANCE -- MAJOR CHANGES STARTING DECEMBER 26th. See here

Jump to: navigation, search
(Adding some simple FAQ)
 
Line 63: Line 63:
 
You need to remember to copy over your data from <tt>$TMPDIR</tt> as part of your job.
 
You need to remember to copy over your data from <tt>$TMPDIR</tt> as part of your job.
 
That directory and its contents are deleted when the job is complete.
 
That directory and its contents are deleted when the job is complete.
 +
 +
== Help! When I submit my jobs I get "Warning To stay compliant with standard unix behavior, there should be a valid #! line in your script i.e. #!/bin/tcsh" ==
 +
Job submission scripts are supposed to have a line similar to '<code>#!/bin/bash</code>' in them to start. We have had problems with people submitting jobs with invalid #! lines, so we enforce that rule. When this happens the job fails and we have to manually clean it up. The warning message is there just to inform you that the job script should have a line in it, in most cases #!/bin/tcsh or #!/bin/bash, to indicate what program should be used to run the script. When the line is missing from a script, by default your default shell is used to execute the script (in your case /usr/local/bin/tcsh). This works in most cases, but may not be what you are wanting.
 +
 +
== Help! When I submit my jobs I get "A #! line exists, but it is not pointing to an executable. Please fix. Job not submitted." ==
 +
Like the above, error says you need a #!/bin/bash or similar line in your job script. This error says that while the line exists, the #! line isn't mentioning an executable file, thus the script will not be able to run. Most likely you wanted #!/bin/bash instead of something else.
 +
 +
== Help! My jobs keep dying after 1 hour and I don't know why ==
 +
Beocat has default runtime limit of 1 hour. If you need more than that, or need more than 1 GB of memory per core, you'll want to look at the documentation [[SGEBasics|here]] to see how to request it.
 +
 +
In short, when you run qsub for your job, you'll want to put something along the lines of '<code>-l h_rt=10:00:00</code>' before the job script if you want your job to run for 10 hours.
 +
 +
== Help my error file has "Warning: no access to tty" ==
 +
The warning message "Warning: no access to tty (Bad file descriptor)" is safe to ignore. It typically happens with the tcsh shell.
 +
 +
== Help! My job isn't going to finish in the time I specified. Can I change the time requirement? ==
 +
Generally speaking, no.
 +
 +
Jobs are scheduled based on execution times (among other things). If it were easy to change your time requirement, one could submit a job with a 15-minute run-time, get it scheduled quickly, and then say "whoops - I meant 15 weeks", effectively gaming the job scheduler. In fact, even the administrators cannot change the run-time requirement of a particular job. In extreme circumstances and depending on the job requirements, we '''may''' be able to manually intervene. This process prevents other users from using the node(s) you are currently using, so are not routinely approved. Contact Beocat support (below) if you feel your circumstances warrant special consideration.
 +
 +
== Help! My perl job runs fine on the head node, but only runs for a few seconds and then quits when submitted to the queue. ==
 +
Perl doesn't like getting called straight from the scheduler. However, there is a fairly easy workaround. Create a shell wrapper script that calls perl and its program.
 +
 +
For instance, I can create a script called runperl.sh that looks like this:
 +
 +
#!/bin/sh
 +
#$ -l h_rt=1:00:00,mem=1G
 +
/usr/bin/perl /path/to/my/perl_program.pl
 +
 +
Make this wrapper program executable:
 +
chmod 755 runperl.sh
 +
 +
Then submit it with
 +
qsub runperl.sh
 +
 +
Of course, the name of this script isn't important, as long as you change the corresponding chmod and qsub commands.
 +
 +
== Help! When using mpi I get 'CMA: no RDMA devices found' or 'A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces' ==
 +
This message simply means that some but not all nodes the job is running on have infiniband cards. The job will still run, but will not use the fastest interconnect we have available. This may or may not be an issue, depending on how message heavy your job is. If you would like to not see this warning, you may request infiniband as a resource when submitting your job. <code>-l infinband=TRUE</code>
 +
 +
== How do I get more help? ==
 +
There are many sources of help for most Linux systems.
 +
 +
=== Unix man pages ===
 +
Linux provides man pages (short for manual pages). These are simple enough to call, for example: if you need information on submitting jobs to Beocat, you can type '<code>man qsub</code>'. This will bring up the manual for qsub.
 +
 +
=== GNU info system ===
 +
Not all applications have "man pages." Most of the rest have what they call info pages. For example, if you needed information on finding a file you could use '<code>info find</code>'.
 +
 +
=== This documentation ===
 +
This documentation is very thoroughly researched, and has been painstakingly assembled for your benefit. Please use it.
 +
 +
=== Contact support ===
 +
Support can be contacted [mailto:beocat@cis.ksu.edu here]. Please include detailed information about your problem, including the job number, applications you are trying to run, and the current directory that you are in.

Revision as of 21:35, 22 May 2014

How do I connect to Beocat

Connection Settings
Hostname beocat.cis.ksu.edu
Port 22
Username eID
Password eID Password

How do I compile my programs?

Serial programs

Fortran

ifort or gfortran

C/C++

icc, gcc and g++

Parallel programs

Fortran

mpif77 or mpif90

C/C++

mpicc or mpic++

How are the filesystems on Beocat set up?

Mountpoint Local / Shared Size Filesystem Advice
/home Shared 210TB total glusterfs on top of xfs Good enough for most jobs
/tmp Local >30GB (varies per node) ext2 Good for I/O intensive jobs

Usage Advice

For most jobs you shouldn't need to worry, your default working directory is your homedir and it will be fast enough for most tasks. I/O intensive work should use /tmp, but you will need to remember to copy your files to and from this partition as part of your job script. This is made easier through the $TMPDIR environment variable in your jobs.

Example usage of $TMPDIR in a job script

 1 #!/bin/bash
 2 
 3 #copy our input file to $TMPDIR to make processing faster
 4 cp ~/experiments/input.data $TMPDIR
 5 
 6 #use the input file we copied over to the local system
 7 #generate the output file in $TMPDIR as well
 8 ~/bin/my_program --input-file=$TMPDIR/input.data --output-file=$TMPDIR/output.data
 9 
10 #copy the results back from $TMPDIR
11 cp $TMPDIR/output.data ~/experiments/results.$SGE_JOBID

You need to remember to copy over your data from $TMPDIR as part of your job. That directory and its contents are deleted when the job is complete.

Help! When I submit my jobs I get "Warning To stay compliant with standard unix behavior, there should be a valid #! line in your script i.e. #!/bin/tcsh"

Job submission scripts are supposed to have a line similar to '#!/bin/bash' in them to start. We have had problems with people submitting jobs with invalid #! lines, so we enforce that rule. When this happens the job fails and we have to manually clean it up. The warning message is there just to inform you that the job script should have a line in it, in most cases #!/bin/tcsh or #!/bin/bash, to indicate what program should be used to run the script. When the line is missing from a script, by default your default shell is used to execute the script (in your case /usr/local/bin/tcsh). This works in most cases, but may not be what you are wanting.

Help! When I submit my jobs I get "A #! line exists, but it is not pointing to an executable. Please fix. Job not submitted."

Like the above, error says you need a #!/bin/bash or similar line in your job script. This error says that while the line exists, the #! line isn't mentioning an executable file, thus the script will not be able to run. Most likely you wanted #!/bin/bash instead of something else.

Help! My jobs keep dying after 1 hour and I don't know why

Beocat has default runtime limit of 1 hour. If you need more than that, or need more than 1 GB of memory per core, you'll want to look at the documentation here to see how to request it.

In short, when you run qsub for your job, you'll want to put something along the lines of '-l h_rt=10:00:00' before the job script if you want your job to run for 10 hours.

Help my error file has "Warning: no access to tty"

The warning message "Warning: no access to tty (Bad file descriptor)" is safe to ignore. It typically happens with the tcsh shell.

Help! My job isn't going to finish in the time I specified. Can I change the time requirement?

Generally speaking, no.

Jobs are scheduled based on execution times (among other things). If it were easy to change your time requirement, one could submit a job with a 15-minute run-time, get it scheduled quickly, and then say "whoops - I meant 15 weeks", effectively gaming the job scheduler. In fact, even the administrators cannot change the run-time requirement of a particular job. In extreme circumstances and depending on the job requirements, we may be able to manually intervene. This process prevents other users from using the node(s) you are currently using, so are not routinely approved. Contact Beocat support (below) if you feel your circumstances warrant special consideration.

Help! My perl job runs fine on the head node, but only runs for a few seconds and then quits when submitted to the queue.

Perl doesn't like getting called straight from the scheduler. However, there is a fairly easy workaround. Create a shell wrapper script that calls perl and its program.

For instance, I can create a script called runperl.sh that looks like this:

#!/bin/sh
#$ -l h_rt=1:00:00,mem=1G
/usr/bin/perl /path/to/my/perl_program.pl

Make this wrapper program executable:

chmod 755 runperl.sh

Then submit it with

qsub runperl.sh

Of course, the name of this script isn't important, as long as you change the corresponding chmod and qsub commands.

Help! When using mpi I get 'CMA: no RDMA devices found' or 'A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces'

This message simply means that some but not all nodes the job is running on have infiniband cards. The job will still run, but will not use the fastest interconnect we have available. This may or may not be an issue, depending on how message heavy your job is. If you would like to not see this warning, you may request infiniband as a resource when submitting your job. -l infinband=TRUE

How do I get more help?

There are many sources of help for most Linux systems.

Unix man pages

Linux provides man pages (short for manual pages). These are simple enough to call, for example: if you need information on submitting jobs to Beocat, you can type 'man qsub'. This will bring up the manual for qsub.

GNU info system

Not all applications have "man pages." Most of the rest have what they call info pages. For example, if you needed information on finding a file you could use 'info find'.

This documentation

This documentation is very thoroughly researched, and has been painstakingly assembled for your benefit. Please use it.

Contact support

Support can be contacted here. Please include detailed information about your problem, including the job number, applications you are trying to run, and the current directory that you are in.