Wei Shung Chung

Wei Shung Chung – Hadoop, HBase, MapReduce, Spark, Spark ML, Machine Learning, Deep Learning

Menu

Skip to content

Home
About

StringIndexer transform fails when column contains nulls

Posted on August 14, 2017 by BigData Explorer

If you run into NullPointerException when using StringIndexer in Spark version < 2.2.0, this means that your input column contains null values. You would have to remove/impute these null values before using StringIndexer. See ticket below. Good news is this issue was fixed in Spark version 2.2.0

https://issues.apache.org/jira/browse/SPARK-11569

With the fix, we can specify how StringIndexer should handle null values, three different strategies are available as below.

handleInvalid=error: Throw an exception as before
handleInvalid=skip: Skip null values as well as unseen labels
handleInvalid=keep: Give null values an additional index as well as unseen labels

val codeIndexer = new StringIndexer().setInputCol("originalCode").setOutputCol("originalCodeCategory")
codeIndexer.setHandleInvalid("keep")

Share this:

X
Facebook

Like Loading...

Related

This entry was posted in Machine Learning, Scala, Spark, Spark MLlib and tagged Spark, Spark MLlib. Bookmark the permalink.

Post navigation

← Zillow Price Kaggle Competition Part 2

Spark 2.2.0: New Imputer to replace missing values →

Leave a comment Cancel reply

Δ

Search

Recent Posts

TensorFlow Data Flow Graph Optimization
TensorFlow Papers
Run Tensorflow on Mac using Docker
Deep Learning on AWS GPU Instance
Azure Data Lake Store

Categories

Algorithm Analytics Big Data Clustering Algorithm Data Science Deep Learning Feature Engineering Flume Hadoop Hadoop Yarn HBase HBase 0.96.0 Hive Keras Machine Learning Mahout MapReduce Oozie Random Forest Recommender System Scala Spark Spark Analytics Spark Data Frame Spark Internals Spark MLlib Spark Shuffle Spark SQL Stock Prediction TensorFlow

Archives

Meta

Create account
Log in
Entries feed
Comments feed
WordPress.com

Blog at WordPress.com.

Comment
Reblog
Subscribe Subscribed
- Wei Shung Chung
- Already have a WordPress.com account? Log in now.

%d