This function is like tidyr::nest. Calling this function will not aggregate over other columns. Rather the output has the same number of rows/records as the input. See examples of how to achieve row reduction by aggregating elements using collect_list, which is a Spark SQL function

sdf_nest(x, ..., .key = "data")

Arguments

x

A Spark dataframe.

...

Columns to nest.

.key

Character. A name for the new column containing nested fields

Examples

# NOT RUN {
# produces a dataframe with an array of characteristics nested under
# each unique species identifier
iris2 <- copy_to(sc, iris, name="iris")
iris2 %>%
  sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>%
  group_by(Species) %>%
  summarize(data=collect_list(data))
# }