Spark on YARN

I tried out Spark on YARN recently. It is very straight forward. If you are planning to use MLlib, make sure you have gcc-gfortran installed on every cluster node.

1) First, you have to download and install  spark-1.0.0-bin-hadoop2

http://www.apache.org/dyn/closer.cgi/spark/spark-1.0.1/spark-1.0.1-bin-hadoop2.tgz

2) Set the YARN_CONF_DIR to point to the location of your hadoop configuration files.

eg.

export YARN_CONF_DIR=/etc/hadoop/conf

You can now start spark shell as follows

cd spark-1.0.0-bin-hadoop2

./bin/spark-shell —master yarn-client –driver-java-options “-Dspark.executor.memory=2g”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s