If you run into NullPointerException when using StringIndexer in Spark version < 2.2.0, this means that your input column contains null values. You would have to remove/impute these null values before using StringIndexer. See ticket below. Good news is this issue was fixed in Spark version 2.2.0
https://issues.apache.org/jira/browse/SPARK-11569
With the fix, we can specify how StringIndexer should handle null values, three different strategies are available as below.
handleInvalid=error: Throw an exception as before
handleInvalid=skip: Skip null values as well as unseen labels
handleInvalid=keep: Give null values an additional index as well as unseen labels
val codeIndexer = new StringIndexer().setInputCol("originalCode").setOutputCol("originalCodeCategory") codeIndexer.setHandleInvalid("keep")