From Beocat
Jump to: navigation, search
(Created page with "== Big Data course on Beocat ==")
 
Line 1: Line 1:
== Big Data course on Beocat ==
== Big Data course on Beocat ==
The Pittsburgh Supercomputing Center hosts 2-day remote Big Data workshops
several times each year.  The information provided here will allow individual
users to go through the videos at their own pace and perform the exercises
on our local Beocat supercomputer.  Each exercise will have data and results
tailored to each individual to allow instructors to measure the progress of
students assigned to take this course interactively.
Use the Agenda website below to access the slides starting with the Welcome slides
that don't have an associated video.  The '>' sign at the start of lines below
represents the command line prompt on Beocat, and '>>>' represents the prompt
you'll get when you start pyspark or python.
Agenda:  https://www.psc.edu/hpc-workshop-series/big-data
Videos:  https://www.youtube.com/watch?v=NpapUmGHXyw&list=PLdkRteUOw2X-YKqommnuGWqNfEEUG6P2E
Welcome
=======
  ssh into Beocat from your computer and copy the workshop data to your
  home directory.
  > cp -rp ~daveturner/workshops/bigdata_workshop .
  > cd bigdata_workshop
  PDF versions of the slides are available for each section
  as are directories containing the data for each set of exercises.
  You'll need to copy the PDF files to your local computer for viewing.
  Go through the Welcome slides from the Agenda website link or PDF file
    Big_Data_Welcome.pdf.  Much of this information is specific to
    the Bridges supercomputer at PSC so just scan over these slides.
Intro to Big Data
=================
  [web link is bad]
  Watch the video 'Intro to Big Data - Big Data Video 1'
      (slides are A_Brief_History_of_Big_Data.pdf)
Hadoop
======
  Watch the video 'Hadoop - Big Data Video 2'  (slides are Hadoop2019.pdf)
  We do not have Hadoop on Beocat so the commands they cover will not work locally

Revision as of 14:48, 24 March 2020

Big Data course on Beocat

The Pittsburgh Supercomputing Center hosts 2-day remote Big Data workshops several times each year. The information provided here will allow individual users to go through the videos at their own pace and perform the exercises on our local Beocat supercomputer. Each exercise will have data and results tailored to each individual to allow instructors to measure the progress of students assigned to take this course interactively.

Use the Agenda website below to access the slides starting with the Welcome slides that don't have an associated video. The '>' sign at the start of lines below represents the command line prompt on Beocat, and '>>>' represents the prompt you'll get when you start pyspark or python.

Agenda: https://www.psc.edu/hpc-workshop-series/big-data

Videos: https://www.youtube.com/watch?v=NpapUmGHXyw&list=PLdkRteUOw2X-YKqommnuGWqNfEEUG6P2E

Welcome

=

 ssh into Beocat from your computer and copy the workshop data to your
 home directory.
 > cp -rp ~daveturner/workshops/bigdata_workshop .
 > cd bigdata_workshop
 PDF versions of the slides are available for each section
 as are directories containing the data for each set of exercises.
 You'll need to copy the PDF files to your local computer for viewing.
 Go through the Welcome slides from the Agenda website link or PDF file
    Big_Data_Welcome.pdf.  Much of this information is specific to
    the Bridges supercomputer at PSC so just scan over these slides.

Intro to Big Data

=====
 [web link is bad]
 Watch the video 'Intro to Big Data - Big Data Video 1'
     (slides are A_Brief_History_of_Big_Data.pdf)

Hadoop

==

  Watch the video 'Hadoop - Big Data Video 2'   (slides are Hadoop2019.pdf)
  We do not have Hadoop on Beocat so the commands they cover will not work locally