User Tools

Site Tools



Apache Spark version 1.6.1 is installed in /share/apps/spark. Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It allows users to combine the memory and cpus of multiple compute nodes into into a Spark cluster and use the aggregated cluster memory and cpus to run a single task.

The example PBS script below sets up a 3 node spark cluster in standalone mode using 3 compute nodes on Trestles.

#PBS -l walltime=30:00
#PBS -q q30m32c
#PBS -l nodes=3:ppn=32

module load java/sunjdk_1.8.0_92
module load spark


echo -n "starting spark master on $MASTER... ";
echo "done";
sleep 2;
echo "spark cluster web interface: http://$MASTER:8080"  >$HOME/spark-info
echo "           spark master URL: spark://$MASTER:7077" >>$HOME/spark-info

for n in `uniq $PBS_NODEFILE`; do
   echo -n "starting spark slave on $n..."
   if [ "$n" == "$MASTER" ]; then
      $SPARK_HOME/sbin/ $spark_master
      ssh $n "module load java/sunjdk_1.8.0_92 spark; $SPARK_HOME/sbin/ $spark_master";
   echo "done";


When the job starts all 3 nodes are running the worker service. The job head node is also running the spark master service. The log file from the spark master is saved to:


and worker logs are in:


In addition to the log files a new file in the $HOME/spark-info contains the URL of the spark master web interface:

tres-l1:pwolinsk:$ cat spark-info 
spark cluster web interface: http://tres1005:8080
           spark master URL: spark://tres1005:7077

You can use the lynx text based web browser to connect to the spark cluster web interface and check the status of the cluster:

tres-l1:pwolinsk:$ lynx http://tres1005:8080
Spark Master at spark://tres1005:7077
[spark-logo-77x50px-hd.png] 1.6.1 Spark Master at spark://tres1005:7077

     * URL: spark://tres1005:7077
     * REST URL: spark://tres1005:6066 (cluster mode)
     * Alive Workers: 3
     * Cores in use: 96 Total, 0 Used
     * Memory in use: 185.7 GB Total, 0.0 B Used
     * Applications: 0 Running, 0 Completed
     * Drivers: 0 Running, 0 Completed
     * Status: ALIVE


   Worker Id

   Cores Memory
   worker-20160512160005- ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)
   worker-20160512160007- ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)
   worker-20160512160009- ALIVE 32 (0 Used) 61.9 GB (0.0 B Used)

Running Applications

   Application ID

   Cores Memory per Node Submitted Time User State Duration

Completed Applications

   Application ID

   Cores Memory per Node Submitted Time User State Duration

The spark cluster will remain running for the requested duration of the job (i.e. job's requested walltime), 30 minutes in this example.

Submitting Jobs to Spark

spark-submit script can be used to submit a job to the spark cluster directly from one of the trestles login nodes (i.e. tres-l1 or tres-l2). The –master has to be used to specify the spark master URL (found in $HOME/spark-info file). Below the SparkPi example from the $SPARK_HOME/lib/park-examples-1.6.1-hadoop2.6.0.jar is ran using 3 executors (workers), each using 32 cores and 60GB of memory.

tres-l1:pwolinsk:$ module load java/sunjdk_1.8.0_92 spark/1.6.1
tres-l1:pwolinsk:$ spark-submit --class org.apache.spark.examples.SparkPi  --master spark://tres1005:7077 
--num-executors 3 --executor-memory 60g --executor-cores 32 
/share/apps/spark/spark-1.6.1-bin-hadoop2.6/lib/spark-examples-1.6.1-hadoop2.6.0.jar 10000
Using Spark's default log4j profile: org/apache/spark/
16/05/12 16:11:20 INFO SparkContext: Running Spark version 1.6.1
16/05/12 16:11:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 16:11:21 INFO SecurityManager: Changing view acls to: pwolinsk
16/05/12 16:11:21 INFO SecurityManager: Changing modify acls to: pwolinsk
16/05/12 16:11:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pwolinsk); users with modify permissions: Set(pwolinsk)
16/05/12 16:11:21 INFO Utils: Successfully started service 'sparkDriver' on port 45770.
16/05/12 16:11:22 INFO Slf4jLogger: Slf4jLogger started
16/05/12 16:11:22 INFO Remoting: Starting remoting
16/05/12 16:11:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@]
spark.txt · Last modified: 2020/09/21 21:16 by root