Quantcast
Channel: How can I change column types in Spark SQL's DataFrame? - Stack Overflow
Viewing all articles
Browse latest Browse all 24

Answer by msemelman for How can I change column types in Spark SQL's DataFrame?

$
0
0

Edit: Newest newest version

Since spark 2.x you should use dataset api instead when using Scala [1]. Check docs here:

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html#withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame

If working with python, even though easier, I leave the link here as it's a very highly voted question:

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html

>>> df.withColumn('age2', df.age + 2).collect()[Row(age=2, name='Alice', age2=4), Row(age=5, name='Bob', age2=7)]

[1] https://spark.apache.org/docs/latest/sql-programming-guide.html:

In the Scala API, DataFrame is simply a type alias of Dataset[Row].While, in Java API, users need to use Dataset to represent aDataFrame.

Edit: Newest version

Since spark 2.x you can use .withColumn. Check the docs here:

https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.sql.Dataset@withColumn(colName:String,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame

Oldest answer

Since Spark version 1.4 you can apply the cast method with DataType on the column:

import org.apache.spark.sql.types.IntegerTypeval df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))    .drop("year")    .withColumnRenamed("yearTmp", "year")

If you are using sql expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", "make", "model", "comment", "blank")

For more info check the docs:http://spark.apache.org/docs/1.6.0/api/scala/#org.apache.spark.sql.DataFrame


Viewing all articles
Browse latest Browse all 24

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>