From Beocat
Jump to: navigation, search
(Embed the videos)
No edit summary
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== CUDA Overview ==
== CUDA Overview ==
[[wikipedia:CUDA|CUDA]] is a feature set for programming nVidia [[wikipedia:Graphics_processing_unit|GPUs]]. We have 16 nodes with nVidia Tesla m2050 GPUs. These GPUs have 448 cores running at 1.15 GHz, and are very fast at floating point math - over a TeraFLOP! However, programming in CUDA is difficult for the uninitiated.
[[wikipedia:CUDA|CUDA]] is a feature set for programming nVidia [[wikipedia:Graphics_processing_unit|GPUs]]. We have 7 CUDA-enabled nodes. dwarf22, dwarf23, dwarf24, dwarf25, and dwarf35 each have two [https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/ nVidia 1080 Ti graphics cards]. dwarf38 and dwarf39 each have a single [https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-980-ti/specifications nVidia 980 Ti graphic card]. The former set of nodes is only available to 'killable" jobs for those outside the research group that purchased them. The latter are available for anybody, however, you should send an email to beocat@cs.ksu.edu with a request to be added to the GPU priority group.
 
Note that both of these graphic cards are consumer-grade rather than the typical GPUs used in most high-performance computing centers. For single-precision computations, these cards are comparable to the high-end cards (at a fraction of the price), however double-precision computations are much slower.


== Training videos ==
== Training videos ==
CUDA Programming Model Overview: [http://www.youtube.com/watch?v=aveYOlBSe-Y http://www.youtube.com/watch?v=aveYOlBSe-Y]
CUDA Programming Model Overview: [http://www.youtube.com/watch?v=aveYOlBSe-Y http://www.youtube.com/watch?v=aveYOlBSe-Y]
<HTML5video type="youtube" width="800" height="480" autoplay="false">aveYOlBSe-Y</HTML5video>
<HTML5video type="youtube" width="800" height="480" autoplay="false">aveYOlBSe-Y</HTML5video>
CUDA Programming Basics Part I (Host functions): [http://www.youtube.com/watch?v=79VARRFwQgY http://www.youtube.com/watch?v=79VARRFwQgY]
CUDA Programming Basics Part I (Host functions): [http://www.youtube.com/watch?v=79VARRFwQgY http://www.youtube.com/watch?v=79VARRFwQgY]
<HTML5video type="youtube" width="800" height="480" autoplay="false">79VARRFwQgY</HTML5video>
<HTML5video type="youtube" width="800" height="480" autoplay="false">79VARRFwQgY</HTML5video>
CUDA Programming Basics Part II (Device functions): [http://www.youtube.com/watch?v=G5-iI1ogDW4 http://www.youtube.com/watch?v=G5-iI1ogDW4]
CUDA Programming Basics Part II (Device functions): [http://www.youtube.com/watch?v=G5-iI1ogDW4 http://www.youtube.com/watch?v=G5-iI1ogDW4]
<HTML5video type="youtube" width="800" height="480" autoplay="false">G5-iI1ogDW4</HTML5video>
<HTML5video type="youtube" width="800" height="480" autoplay="false">G5-iI1ogDW4</HTML5video>
Line 12: Line 16:
nvcc is the compiler for CUDA applications. When compiling your applications manually you will need to keep 3 things in mind:
nvcc is the compiler for CUDA applications. When compiling your applications manually you will need to keep 3 things in mind:


* The CUDA development headers are located here: /opt/cuda/sdk/C/common/inc
* The CUDA development headers are located here: /opt/cuda/sdk/common/inc
* The CUDA architecture is: sm_20
* The CUDA architecture is: sm_30
* The CUDA SDK is currently not available on the headnode. (compile on the nodes with CUDA, either in your jobscript or via <tt>qrsh -l cuda=TRUE</tt>)
* The CUDA SDK is currently not available on the headnode. (compile on the nodes with CUDA, either in your jobscript or via <tt>qrsh -l cuda=TRUE</tt>)
* '''Do not run your cuda applications on the headnode. I cannot guarantee it will run, and it will give you terrible results if it does run.'''
* '''Do not run your cuda applications on the headnode. I cannot guarantee it will run, and it will give you terrible results if it does run.'''
Line 20: Line 24:


<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
nvcc -I /opt/cuda/sdk/C/common/inc -arch sm_20 <source>.cu -o <output>
nvcc -I /opt/cuda/sdk/common/inc -arch sm_30 <source>.cu -o <output>
</syntaxhighlight>
</syntaxhighlight>
== Example ==
== Example ==
Line 66: Line 70:
</syntaxhighlight>
</syntaxhighlight>
=== Gain Access to a CUDA-capable Node ===
=== Gain Access to a CUDA-capable Node ===
<syntaxhighlight lang="bash">
See our [[AdvancedSlurm|advanced scheduler documentation]]
qrsh -l cuda=TRUE
</syntaxhighlight>
=== Compile Your Application ===
=== Compile Your Application ===
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
nvcc -I /opt/cuda/sdk/C/common/inc -arch sm_20 vecadd.cu -o vecadd
nvcc -I /opt/cuda/sdk/common/inc -arch sm_30 vecadd.cu -o vecadd
</syntaxhighlight>
</syntaxhighlight>
This will create a program with the name 'vecadd' (specified by the '-o' flag).
This will create a program with the name 'vecadd' (specified by the '-o' flag).

Revision as of 10:08, 2 January 2018

CUDA Overview

CUDA is a feature set for programming nVidia GPUs. We have 7 CUDA-enabled nodes. dwarf22, dwarf23, dwarf24, dwarf25, and dwarf35 each have two nVidia 1080 Ti graphics cards. dwarf38 and dwarf39 each have a single nVidia 980 Ti graphic card. The former set of nodes is only available to 'killable" jobs for those outside the research group that purchased them. The latter are available for anybody, however, you should send an email to beocat@cs.ksu.edu with a request to be added to the GPU priority group.

Note that both of these graphic cards are consumer-grade rather than the typical GPUs used in most high-performance computing centers. For single-precision computations, these cards are comparable to the high-end cards (at a fraction of the price), however double-precision computations are much slower.

Training videos

CUDA Programming Model Overview: http://www.youtube.com/watch?v=aveYOlBSe-Y <HTML5video type="youtube" width="800" height="480" autoplay="false">aveYOlBSe-Y</HTML5video>

CUDA Programming Basics Part I (Host functions): http://www.youtube.com/watch?v=79VARRFwQgY <HTML5video type="youtube" width="800" height="480" autoplay="false">79VARRFwQgY</HTML5video>

CUDA Programming Basics Part II (Device functions): http://www.youtube.com/watch?v=G5-iI1ogDW4 <HTML5video type="youtube" width="800" height="480" autoplay="false">G5-iI1ogDW4</HTML5video>

Compiling CUDA Applications

nvcc is the compiler for CUDA applications. When compiling your applications manually you will need to keep 3 things in mind:

  • The CUDA development headers are located here: /opt/cuda/sdk/common/inc
  • The CUDA architecture is: sm_30
  • The CUDA SDK is currently not available on the headnode. (compile on the nodes with CUDA, either in your jobscript or via qrsh -l cuda=TRUE)
  • Do not run your cuda applications on the headnode. I cannot guarantee it will run, and it will give you terrible results if it does run.

Putting it all together you can compile CUDA applications as follows:

nvcc -I  /opt/cuda/sdk/common/inc -arch sm_30 <source>.cu -o <output>

Example

Create your Application

Copy the following Application into Beocat as vecadd.cu

//  Kernel definition, see also section 4.2.3 of Nvidia Cuda Programming Guide
__global__  void vecAdd(float* A, float* B, float* C)
{
            // threadIdx.x is a built-in variable  provided by CUDA at runtime
            int i = threadIdx.x;
       A[i]=0;
       B[i]=i;
       C[i] = A[i] + B[i];
}

#include  <stdio.h>
#define  SIZE 10
int  main()
{
   int N=SIZE;
   float A[SIZE], B[SIZE], C[SIZE];
   float *devPtrA;
   float *devPtrB;
   float *devPtrC;
   int memsize= SIZE * sizeof(float);

   cudaMalloc((void**)&devPtrA, memsize);
   cudaMalloc((void**)&devPtrB, memsize);
   cudaMalloc((void**)&devPtrC, memsize);
   cudaMemcpy(devPtrA, A, memsize,  cudaMemcpyHostToDevice);
   cudaMemcpy(devPtrB, B, memsize,  cudaMemcpyHostToDevice);
   // __global__ functions are called:  Func<<< Dg, Db, Ns  >>>(parameter);
   vecAdd<<<1, N>>>(devPtrA,  devPtrB, devPtrC);
   cudaMemcpy(C, devPtrC, memsize,  cudaMemcpyDeviceToHost);

   for (int i=0; i<SIZE; i++)
        printf("C[%d]=%f\n",i,C[i]);

  cudaFree(devPtrA);
  cudaFree(devPtrA);
  cudaFree(devPtrA);

}

Gain Access to a CUDA-capable Node

See our advanced scheduler documentation

Compile Your Application

nvcc -I /opt/cuda/sdk/common/inc -arch sm_30 vecadd.cu -o vecadd

This will create a program with the name 'vecadd' (specified by the '-o' flag).

Run Your Application

Run the program as you usually would, namely

./vecadd

Assuming you don't want to run the program interactively because this is a large job, you can submit a job via qsub, just be sure to add the '-l cuda=true' directive.