sdf_nest.Rd
This function is like tidyr::nest
. Calling this function will not
aggregate over other columns. Rather the output has the same number of
rows/records as the input. See examples of how to achieve row reduction
by aggregating elements using collect_list
, which is a Spark SQL function
sdf_nest(x, ..., .key = "data")
x | A Spark dataframe. |
---|---|
... | Columns to nest. |
.key | Character. A name for the new column containing nested fields |
# NOT RUN { # produces a dataframe with an array of characteristics nested under # each unique species identifier iris2 <- copy_to(sc, iris, name="iris") iris2 %>% sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>% group_by(Species) %>% summarize(data=collect_list(data)) # }