Deprecated functions in org.apache.spark.sql. functions in Spark 2.0

Posted on August 4, 2017 by BigData Explorer

I just moved some of my spark codes from 1.6.0 to 2.2.0 and discovered that some functions in org.apache.spark.sql.functions._ are being replaced/renamed.

To name a few, see below

1) rowNumber() is replaced by row_number()

import org.apache.spark.sql.functions._
/**
* @group window_funcs
* @deprecated As of 1.6.0, replaced by `row_number`. This will be removed in Spark 2.0.
*/
@deprecated("Use row_number. This will be removed in Spark 2.0.", "1.6.0")
def rowNumber(): Column = row_number()

2) isNaN is replaced by isnan

/**
   * @group normal_funcs
   * @deprecated As of 1.6.0, replaced by `isnan`. This will be removed in Spark 2.0.
   */
  @deprecated("Use isnan. This will be removed in Spark 2.0.", "1.6.0")
  def isNaN(e: Column): Column = isnan(e)

3) inputFileName() is replaced by input_file_name

/**
   * @group normal_funcs
   * @deprecated As of 1.6.0, replaced by `input_file_name`. This will be removed in Spark 2.0.
   */
  @deprecated("Use input_file_name. This will be removed in Spark 2.0.", "1.6.0")
  def inputFileName(): Column = input_file_name()

To get the full list of all the replaced/renamed functions, refer to this code
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/functions.scala

Scala Enumeration

Posted on July 4, 2017 by BigData Explorer

In Java we use enum to represent fixed set of constants

For example, we would define days of week enum type as follows

public enum Day {
    SUNDAY, MONDAY, TUESDAY, WEDNESDAY,
    THURSDAY, FRIDAY, SATURDAY 
}

In Scala, we can do the same thing by extending Enumeration, for example

object Day extends Enumeration {
  type Day = Value
  val SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY = Value
}

You can find examples of Scala Enumeration usage in Spark
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TaskState.scala

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/InputMetrics.scala

Render Json using Jackson in Scala

Posted on June 23, 2017 by BigData Explorer

If you use Jackson Json library in Scala, remember to register the DefaultScalaModule so that ObjectMapper can convert List, Array to Json correctly. See below.

 
val objectMapper = new ObjectMapper()
objectMapper.registerModule(DefaultScalaModule)

Simple example:

 
import com.fasterxml.jackson.annotation.JsonAutoDetect.Visibility
import com.fasterxml.jackson.annotation.{JsonProperty, PropertyAccessor}
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.module.scala.DefaultScalaModule



object JsonExample {
  case class Car(@JsonProperty("id")  id: Long)
  case class Person(@JsonProperty("name") name: String = null,
                    @JsonProperty("cars") cars: Seq[Car] = null)

  def main(args:Array[String]):Unit = {
    val car1 = Car(12345)
    val car2 = Car(12346)
    val carsOwned = List(car1, car2)
    var person = Person(name="wei", cars=carsOwned)

    val objectMapper = new ObjectMapper()
    objectMapper.registerModule(DefaultScalaModule)
    objectMapper.setVisibility(PropertyAccessor.ALL, Visibility.NONE)
    objectMapper.setVisibility(PropertyAccessor.FIELD, Visibility.ANY)
    println(s"person: ${objectMapper.writeValueAsString(person)}")
  }
}

Output:
person: {“name”:”wei”,”cars”:[{“id”:12345},{“id”:12346}]}

Wei Shung Chung

Wei Shung Chung – Hadoop, HBase, MapReduce, Spark, Spark ML, Machine Learning, Deep Learning

Tag Archives: Scala

Deprecated functions in org.apache.spark.sql. functions in Spark 2.0

Scala Enumeration

Render Json using Jackson in Scala