BigDataOnBeocat: Difference between revisions

Latest revision as of 09:01, 12 April 2021

This course is now available here: http://people.beocat.ksu.edu/~dan/education/bigdata/

@@ Line 1: / Line 1: @@
-= Big Data course on Beocat =
+This course is now available here: http://people.beocat.ksu.edu/~dan/education/bigdata/
-The Pittsburgh Supercomputing Center hosts 2-day remote Big Data workshops
-several times each year.  The information provided here will allow individual
-users to go through the videos at their own pace and perform the exercises
-on our local Beocat supercomputer.  Each exercise will have data and results
-tailored to each individual to allow instructors to measure the progress of
-students assigned to take this course interactively.
-Use the Agenda website below to access the slides starting with the Welcome slides
-that don't have an associated video.  The '>' sign at the start of lines below
-represents the command line prompt on Beocat, and '>>>' represents the prompt
-you'll get when you start pyspark or python.
-Agenda:  https://www.psc.edu/hpc-workshop-series/big-data
-Videos:  https://www.youtube.com/watch?v=NpapUmGHXyw&list=PLdkRteUOw2X-YKqommnuGWqNfEEUG6P2E
-== Welcome ==
-ssh into Beocat from your computer and copy the workshop data to your
-home directory.
-  > cp -rp ~daveturner/workshops/bigdata_workshop .
-  > cd bigdata_workshop
-PDF versions of the slides are available for each section
-as are directories containing the data for each set of exercises.
-You can copy the PDF files to your local computer for viewing or click
-on the web link for each section.
-Follow along with the Welcome slides from the Agenda website link or PDF file
-Big_Data_Welcome.pdf as you listen to the video.  Much of this information is specific to
-the Bridges supercomputer at PSC so just scan over these slides.
-Welcome:  https://www.psc.edu/images/xsedetraining/BigData/Big_Data_Welcome.pdf
-== Intro to Big Data ==
-History of Big Data: https://www.psc.edu/images/xsedetraining/BigData/A_Brief_History_of_Big_Data.pdf
-Watch the video 'Intro to Big Data - Big Data Video 1' and follow along with the slides
-(<B>A_Brief_History_of_Big_Data.pdf</B>)
-== Hadoop ==
-Watch the video 'Hadoop - Big Data Video 2'   (slides are <B>Hadoop2019.pdf</B>)
-We do not have Hadoop on Beocat so the commands they cover will not work locally
-== Intro to Spark and Spark sections combined ==
-The link below shows how to load the Spark and Python modules on Beocat,
-set up the Python virtual environment, and run Spark code interactively
-or through the Slurm scheduler.
-https://support.beocat.ksu.edu/BeocatDocs/index.php/Installed_software#Spark
-Watch the video 'Spark - Big Data Video 3'   (slides are <B>Intro_To_Spark.pdf</B>)
-Pause the video and do the exercises 1-5 around the 43 minute mark.
-Try these yourself before they cover the answers.
-You can do demos and exercises interactively by requesting an Elf core
-or you can submit the job using a script
-(see ~/bigdata_workshop/Shakespeare/sb.shakespeare as an example).
-Request 1 core on an Elf node for interactive use then load the modules
-  > srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G -C elves --pty bash
-  > module purge
-  > module load Spark
-  > module load Python
-  > source ~/.virtualenvs/spark-test/bin/activate
-  > pyspark
-  >>>
-Email your solutions to exercises 1-5 to Dan along with a description of
-how well you did on your own.  Also include your solutions to
-homework assignments 1-3 around the 103 minute mark if you want to impress him.
-Dave's answers are in ~/bigdata_workshop/Shakespeare/shakespeare.py.
-== Machine Learning: Recommender System for Spark ==
-If you want to run demos and exercises interactively,
-request 1 core on an Elf node for interactive use then load the modules
-and activate your Python virtual environment.
-  > srun -J srun -N 1 -n 1 -t 24:00:00 --mem=10G -C elves --pty bash
-  > module purge
-  > module load Spark
-  > module load Python
-  > source ~/.virtualenvs/spark-test/bin/activate
-Watch the video 'Machine Learning Recommender System With Spark - Big Data Video 4'
-(slides are <B>A_Recommender_System.pdf</B>)
-Do the 3 exercises at 1:06 in the video and email Dan your answers and
-a summary of how you did on your own.
-Demos and exercises can be run on the node you're on using pyspark-submit
-  > pyspark-submit recommender.py
-You can also start pyspark and use it interactively
-  > pyspark
-  >>>
-The recommender.py script can be run using the job script sb.recommender
-  > sbatch sb.recommender
-== Deep Learning with TensorFlow ==
-Watch the video 'Tensorflow - Big Data Video 5'  (slides are Deep_Learning.pdf)
-PSC has a version of TensorFlow that works on GPUs.  The version on
-Beocat is newer, but works on the CPUs instead.
-You can do the demos on Beocat if you want.  There is a warning that the
-mnist data will be deprecated in the future.
-  > module purge
-  > module load TensorFlow
-  > source ~/.virtualenvs/spark-test/bin/activate
-  > python
-  >>>
-== Bridges ==
-Watch the video 'A Big Data Platform - Big Data Video 6'

BigDataOnBeocat: Difference between revisions

Views

Latest revision as of 09:01, 12 April 2021

Navigation menu

Navigation

Search

Tools

Personal tools