==== Spark ====
Apache Spark version 1.6.1 is installed in /share/apps/spark. Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It allows users to combine the memory and cpus of multiple compute nodes into into a Spark cluster and use the aggregated cluster memory and cpus to run a single task.
The example PBS script below sets up a 3 node spark cluster in standalone mode using 3 compute nodes on Trestles.
#PBS -l walltime=30:00
#PBS -q q30m32c
#PBS -l nodes=3:ppn=32
module load java/sunjdk_1.8.0_92
module load spark
MASTER=`head -1 $PBS_NODEFILE`;
PORT=7077
spark_master="spark://$MASTER:$PORT"
echo -n "starting spark master on $MASTER... ";
$SPARK_HOME/sbin/start-master.sh
echo "done";
sleep 2;
echo "spark cluster web interface: http://$MASTER:8080" >$HOME/spark-info
echo " spark master URL: spark://$MASTER:7077" >>$HOME/spark-info
for n in `uniq $PBS_NODEFILE`; do
echo -n "starting spark slave on $n..."
if [ "$n" == "$MASTER" ]; then
$SPARK_HOME/sbin/start-slave.sh $spark_master
else
ssh $n "module load java/sunjdk_1.8.0_92 spark; $SPARK_HOME/sbin/start-slave.sh $spark_master";
fi
echo "done";
done
sleep $PBS_WALLTIME;
When the job starts all 3 nodes are running the worker service. The job head node is also running the spark master service. The log file from the spark master is saved to:
**''/home/$USER/spark-$USER-org.apache.spark.deploy.master.Master-1-$HOST.out''**
and worker logs are in:
**''/home/$USER/spark-$USER-org.apache.spark.deploy.worker.Worker--$HOST.out''**
In addition to the log files a new file in the $HOME/spark-info contains the URL of the spark master web interface:
tres-l1:pwolinsk:$ cat spark-info
spark cluster web interface: http://tres1005:8080
spark master URL: spark://tres1005:7077
You can use the **''lynx''** text based web browser to connect to the spark cluster web interface and check the status of the cluster:
tres-l1:pwolinsk:$ lynx http://tres1005:8080
Spark Master at spark://tres1005:7077
[spark-logo-77x50px-hd.png] 1.6.1 Spark Master at spark://tres1005:7077
* URL: spark://tres1005:7077
* REST URL: spark://tres1005:6066 (cluster mode)
* Alive Workers: 3
* Cores in use: 96 Total, 0 Used
* Memory in use: 185.7 GB Total, 0.0 B Used
* Applications: 0 Running, 0 Completed
* Drivers: 0 Running, 0 Completed
* Status: ALIVE
Workers
Worker Id
Address
State
Cores Memory
worker-20160512160005-172.16.10.5-47819 172.16.10.5:47819 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)
worker-20160512160007-172.16.10.8-53663 172.16.10.8:53663 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)
worker-20160512160009-172.16.10.9-60317 172.16.10.9:60317 ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)
Running Applications
Application ID
Name
Cores Memory per Node Submitted Time User State Duration
Completed Applications
Application ID
Name
Cores Memory per Node Submitted Time User State Duration
The spark cluster will remain running for the requested duration of the job (i.e. job's requested walltime), 30 minutes in this example.
=== Submitting Jobs to Spark ===
**''spark-submit''** script can be used to submit a job to the spark cluster directly from one of the trestles login nodes (i.e. tres-l1 or tres-l2). The **''--master''** has to be used to specify the spark master URL (found in $HOME/spark-info file). Below the SparkPi example from the $SPARK_HOME/lib/park-examples-1.6.1-hadoop2.6.0.jar is ran using 3 executors (workers), each using 32 cores and 60GB of memory.
tres-l1:pwolinsk:$ module load java/sunjdk_1.8.0_92 spark/1.6.1
tres-l1:pwolinsk:$ spark-submit --class org.apache.spark.examples.SparkPi --master spark://tres1005:7077
--num-executors 3 --executor-memory 60g --executor-cores 32
/share/apps/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 10000
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/05/12 16:11:20 INFO SparkContext: Running Spark version 1.6.1
16/05/12 16:11:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 16:11:21 INFO SecurityManager: Changing view acls to: pwolinsk
16/05/12 16:11:21 INFO SecurityManager: Changing modify acls to: pwolinsk
16/05/12 16:11:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pwolinsk); users with modify permissions: Set(pwolinsk)
16/05/12 16:11:21 INFO Utils: Successfully started service 'sparkDriver' on port 45770.
16/05/12 16:11:22 INFO Slf4jLogger: Slf4jLogger started
16/05/12 16:11:22 INFO Remoting: Starting remoting
16/05/12 16:11:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@172.16.6.14:56961]
...