As the cast
operation is available for Spark Column
's (and as I personally do not favour udf
's as proposed by @Svend
at this point), how about:
df.select( df("year").cast(IntegerType).as("year"), ... )
to cast to the requested type? As a neat side effect, values not castable / "convertable" in that sense, will become null
.
In case you need this as a helper method, use:
object DFHelper{ def castColumnTo( df: DataFrame, cn: String, tpe: DataType ) : DataFrame = { df.withColumn( cn, df(cn).cast(tpe) ) }}
which is used like:
import DFHelper._val df2 = castColumnTo( df, "year", IntegerType )