Quantcast
Viewing all articles
Browse latest Browse all 24

Answer by WeiChing 林煒清 for How can I change column types in Spark SQL's DataFrame?

First, if you wanna cast type, then this:

import org.apache.spark.sqldf.withColumn("year", $"year".cast(sql.types.IntegerType))

With same column name, the column will be replaced with new one. You don't need to do add and delete steps.

Second, about Scala vs R.
This is the code that most similar to R I can come up with:

val df2 = df.select(   df.columns.map {     case year @ "year" => df(year).cast(IntegerType).as(year)     case make @ "make" => functions.upper(df(make)).as(make)     case other         => df(other)   }: _*)

Though the code length is a little longer than R's. That is nothing to do with the verbosity of the language. In R the mutate is a special function for R dataframe, while in Scala you can easily ad-hoc one thanks to its expressive power.
In word, it avoid specific solutions, because the language design is good enough for you to quickly and easy build your own domain language.


side note: df.columns is surprisingly a Array[String] instead of Array[Column], maybe they want it look like Python pandas's dataframe.


Viewing all articles
Browse latest Browse all 24

Trending Articles