Quantcast
Channel: How can I change column types in Spark SQL's DataFrame? - Stack Overflow
Viewing all articles
Browse latest Browse all 24

How can I change column types in Spark SQL's DataFrame?

$
0
0

Suppose I'm doing something like:

val df = sqlContext.load("com.databricks.spark.csv", Map("path" -> "cars.csv", "header" -> "true"))df.printSchema()root |-- year: string (nullable = true) |-- make: string (nullable = true) |-- model: string (nullable = true) |-- comment: string (nullable = true) |-- blank: string (nullable = true)df.show()year make  model comment              blank2012 Tesla S     No comment1997 Ford  E350  Go get one now th...

But I really wanted the year as Int (and perhaps transform some other columns).

The best I could come up with was

df.withColumn("year2", 'year.cast("Int")).select('year2 as 'year, 'make, 'model, 'comment, 'blank)org.apache.spark.sql.DataFrame = [year: int, make: string, model: string, comment: string, blank: string]

which is a bit convoluted.

I'm coming from R, and I'm used to being able to write, e.g.

df2 <- df %>%   mutate(year = year %>% as.integer,          make = make %>% toupper)

I'm likely missing something, since there should be a better way to do this in Spark/Scala...


Viewing all articles
Browse latest Browse all 24

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>