Wei Shung Chung

Wei Shung Chung – Hadoop, HBase, MapReduce, Spark, Spark ML, Machine Learning, Deep Learning

Menu

Skip to content

Home
About

Running Spark Job in AWS Data Pipeline

Posted on June 27, 2017 by BigData Explorer

If you want to run Spark job in AWS data pipeline, add an EmrActivity and use command-runner.jar to submit the spark job.

In the Step field box of the EmrActivity node, enter the command as follows

command-runner.jar,spark-submit,--master,yarn-cluster,--deploy-mode,cluster,--class,com.yourcompany.yourpackage.YourClass,s3://PATH_TO_YOUR_JAR,YOUR_PROGRAM_ARGUMENT_1,YOUR_PROGRAM_ARGUMENT_2,YOUR_PROGRAM_ARGUMENT_3

Some useful resources
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-commandrunner.html

Share this:

X
Facebook

Like Loading...

Related

This entry was posted in AWS Data Pipeline, Spark and tagged Spark, Spark AWS Data Pipeline. Bookmark the permalink.

Post navigation

← BLAS in MLlib

Scala Enumeration →

Leave a comment Cancel reply

Δ

Search

Recent Posts

TensorFlow Data Flow Graph Optimization
TensorFlow Papers
Run Tensorflow on Mac using Docker
Deep Learning on AWS GPU Instance
Azure Data Lake Store

Categories

Algorithm Analytics Big Data Clustering Algorithm Data Science Deep Learning Feature Engineering Flume Hadoop Hadoop Yarn HBase HBase 0.96.0 Hive Keras Machine Learning Mahout MapReduce Oozie Random Forest Recommender System Scala Spark Spark Analytics Spark Data Frame Spark Internals Spark MLlib Spark Shuffle Spark SQL Stock Prediction TensorFlow

Archives

Meta

Create account
Log in
Entries feed
Comments feed
WordPress.com

Blog at WordPress.com.

Comment
Reblog
Subscribe Subscribed
- Wei Shung Chung
- Already have a WordPress.com account? Log in now.

%d