sdf_unnest.Rd
Unnesting is an (optional) explode operation coupled with a nested select to promote the sub-fields of
the exploded top level array/map/struct to the top level. Hence, given a
, an array with fields
a1, a2, a3
, then codesdf_explode(df, a) will produce output with each record replicated
for every element in the a
array and with the fields a1, a2, a3
(but not a
)
at the top level. Similar to tidyr::unnest
.
sdf_unnest(x, column, keep_all = FALSE)
x | An object (usually a |
---|---|
column | The field to explode |
keep_all | Logical. If |
Note that this is a less precise tool than using sdf_explode
and sdf_select
directly because all fields of the exploded array will be kept and promoted. Direct calls to these
methods allows for more targetted use of sdf_select
to promote only those fields that
are wanted to the top level of the data frame.
Additionally, though sdf_select
allows users to reach arbitrarily far into a nested
structure, this function will only reach one layer deep. It may well be that the unnested fields
are themselves nested structures that need to be dealt with accordingly.
Note that map types are supported, but there is no is_map
argument. This is because the
function is doing schema interrogation of the input data anyway to determine whether an explode
operation is required (it is of maps and arrays, but not for bare structs). Given this the result
of the schema interrogation drives the value o is_map
provided to sdf_explode
.
# NOT RUN { # first get some nested data iris2 <- copy_to(sc, iris, name="iris") iris_nst <- iris2 %>% sdf_nest(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width, .key="data") %>% group_by(Species) %>% summarize(data=collect_list(data)) # then explode it iris_nst %>% sdf_unnest(data) # }