Exploding an array column of length N will replicate the top level record N times. The ith replicated record will contain a struct (not an array) corresponding to the ith element of the exploded array. Exploding will not promote any fields or otherwise change the schema of the data.

sdf_explode(x, column, is_map = FALSE, keep_all = FALSE)

Arguments

x

An object (usually a spark_tbl) coercible to a Spark DataFrame.

column

The field to explode

is_map

Logical. The (scala) explode method works for both array and map column types. If the column to explode in an array, then is_map=FALSE will ensure that the exploded output retains the name of the array column. If however the column to explode is a map, then the map will have key/value names that will be used if is_map=TRUE.

keep_all

Logical. If FALSE then records where the exploded value is empty/null will be dropped.

Details

Two types of exploding are possible. The default method calls the scala explode method. This operation is supported in both Spark version > 1.6. It will however drop records where the exploding field is empty/null. Alternatively keep_all=TRUE will use the explode_outer scala method introduced in spark 2 to not drop any records.

Examples

# NOT RUN {
# first get some nested data
iris2 <- copy_to(sc, iris, name="iris")
iris_nst <- iris2 %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))

# then explode it
iris_nst %>% sdf_explode(data)
# }