From Beocat

MAINTENANCE -- MAJOR CHANGES STARTING DECEMBER 26th. See here

Revision as of 09:44, 9 July 2014 by Kylehutson (talk | contribs) (Created page with "== Hadoop == Hadoop does not integrate well with SGE (or, for that matter, any other HPC scheduling system). So we have created our own separate Cloudera Hadoop cluster to acc...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Hadoop

Hadoop does not integrate well with SGE (or, for that matter, any other HPC scheduling system). So we have created our own separate Cloudera Hadoop cluster to accommodate the increased usage of Hadoop on campus.

To use Hadoop:

  • Login to Beocat
  • From there login to the Hadoop headnode, named 'theia'. ssh theia
  • Copy files into or out of the Hadoop filesystem. Use hadoop fs put and hadoop fs get to copy files. Note that the Hadoop filesystem is both smaller than the Beocat filesystem and is not backed up. Please copy data back out of Hadoop as soon as you are done using it. Data which remains untouched may be deleted with no prior notice.
  • Run your Hadoop job. hadoop -jar path/to/file.jar