Explode data along a column

Exploding an array column of length N will replicate the top level record N times. The ith replicated record will contain a struct (not an array) corresponding to the ith element of the exploded array. Exploding will not promote any fields or otherwise change the schema of the data.

sdf_explode(x, column, is_map = FALSE, keep_all = FALSE)

Arguments

x	An object (usually a `spark_tbl`) coercible to a Spark DataFrame.
column	The field to explode
is_map	Logical. The (scala) `explode` method works for both `array` and `map` column types. If the column to explode in an array, then `is_map=FALSE` will ensure that the exploded output retains the name of the array column. If however the column to explode is a map, then the map will have key/value names that will be used if `is_map=TRUE`.
keep_all	Logical. If `FALSE` then records where the exploded value is empty/null will be dropped.

Details

Two types of exploding are possible. The default method calls the scala explode method. This operation is supported in both Spark version > 1.6. It will however drop records where the exploding field is empty/null. Alternatively keep_all=TRUE will use the explode_outer scala method introduced in spark 2 to not drop any records.

Examples

# NOT RUN {
# first get some nested data
iris2 <- copy_to(sc, iris, name="iris")
iris_nst <- iris2 %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))

# then explode it
iris_nst %>% sdf_explode(data)
# }

Arguments

Details

Examples

Contents