==== Spark ==== Apache Spark version 1.6.1 is installed in /share/apps/spark. Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It allows users to combine the memory and cpus of multiple compute nodes into into a Spark cluster and use the aggregated cluster memory and cpus to run a single task. The example PBS script below sets up a 3 node spark cluster in standalone mode using 3 compute nodes on Trestles. #PBS -l walltime=30:00 #PBS -q q30m32c #PBS -l nodes=3:ppn=32 module load java/sunjdk_1.8.0_92 module load spark MASTER=`head -1 $PBS_NODEFILE`; PORT=7077 spark_master="spark://$MASTER:$PORT" echo -n "starting spark master on $MASTER... "; $SPARK_HOME/sbin/start-master.sh echo "done"; sleep 2; echo "spark cluster web interface: http://$MASTER:8080" >$HOME/spark-info echo " spark master URL: spark://$MASTER:7077" >>$HOME/spark-info for n in `uniq $PBS_NODEFILE`; do echo -n "starting spark slave on $n..." if [ "$n" == "$MASTER" ]; then $SPARK_HOME/sbin/start-slave.sh $spark_master else ssh $n "module load java/sunjdk_1.8.0_92 spark; $SPARK_HOME/sbin/start-slave.sh $spark_master"; fi echo "done"; done sleep $PBS_WALLTIME; When the job starts all 3 nodes are running the worker service. The job head node is also running the spark master service. The log file from the spark master is saved to: **''/home/$USER/spark-$USER-org.apache.spark.deploy.master.Master-1-$HOST.out''** and worker logs are in: **''/home/$USER/spark-$USER-org.apache.spark.deploy.worker.Worker--$HOST.out''** In addition to the log files a new file in the $HOME/spark-info contains the URL of the spark master web interface: tres-l1:pwolinsk:$ cat spark-info spark cluster web interface: http://tres1005:8080 spark master URL: spark://tres1005:7077 You can use the **''lynx''** text based web browser to connect to the spark cluster web interface and check the status of the cluster: tres-l1:pwolinsk:$ lynx http://tres1005:8080 Spark Master at spark://tres1005:7077 [spark-logo-77x50px-hd.png] 1.6.1 Spark Master at spark://tres1005:7077 * URL: spark://tres1005:7077 * REST URL: spark://tres1005:6066 (cluster mode) * Alive Workers: 3 * Cores in use: 96 Total, 0 Used * Memory in use: 185.7 GB Total, 0.0 B Used * Applications: 0 Running, 0 Completed * Drivers: 0 Running, 0 Completed * Status: ALIVE Workers Worker Id Address State Cores Memory worker-20160512160005-172.16.10.5-47819 172.16.10.5:47819 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used) worker-20160512160007-172.16.10.8-53663 172.16.10.8:53663 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used) worker-20160512160009-172.16.10.9-60317 172.16.10.9:60317 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used) Running Applications Application ID Name Cores Memory per Node Submitted Time User State Duration Completed Applications Application ID Name Cores Memory per Node Submitted Time User State Duration The spark cluster will remain running for the requested duration of the job (i.e. job's requested walltime), 30 minutes in this example. === Submitting Jobs to Spark === **''spark-submit''** script can be used to submit a job to the spark cluster directly from one of the trestles login nodes (i.e. tres-l1 or tres-l2). The **''--master''** has to be used to specify the spark master URL (found in $HOME/spark-info file). Below the SparkPi example from the $SPARK_HOME/lib/park-examples-1.6.1-hadoop2.6.0.jar is ran using 3 executors (workers), each using 32 cores and 60GB of memory. tres-l1:pwolinsk:$ module load java/sunjdk_1.8.0_92 spark/1.6.1 tres-l1:pwolinsk:$ spark-submit --class org.apache.spark.examples.SparkPi --master spark://tres1005:7077 --num-executors 3 --executor-memory 60g --executor-cores 32 /share/apps/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 10000 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/05/12 16:11:20 INFO SparkContext: Running Spark version 1.6.1 16/05/12 16:11:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/05/12 16:11:21 INFO SecurityManager: Changing view acls to: pwolinsk 16/05/12 16:11:21 INFO SecurityManager: Changing modify acls to: pwolinsk 16/05/12 16:11:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pwolinsk); users with modify permissions: Set(pwolinsk) 16/05/12 16:11:21 INFO Utils: Successfully started service 'sparkDriver' on port 45770. 16/05/12 16:11:22 INFO Slf4jLogger: Slf4jLogger started 16/05/12 16:11:22 INFO Remoting: Starting remoting 16/05/12 16:11:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.16.6.14:56961] ...