I’m trying to expand a DataFrame column with nested struct
type (see below)
to multiple columns. The Struct schema I’m working with looks something like
{"foo": 3, "bar": {"baz": 2}}
.
Ideally, I’d like to expand the above into two columns ("foo"
and
"bar.baz"
). However, when I tried using .select("data.*")
(where data
is
the Struct column), I only get columns foo
and bar
, where bar
is still a
struct
.
Is there a way such that I can expand the Struct for both layers?
You can select data.bar.baz
as bar.baz
:
df.show()
+-------+
| data|
+-------+
|[3,[2]]|
+-------+
df.printSchema()
root
|-- data: struct (nullable = false)
| |-- foo: long (nullable = true)
| |-- bar: struct (nullable = false)
| | |-- baz: long (nullable = true)
In pyspark:
import pyspark.sql.functions as F
df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show()
+---+-------+
|foo|bar.baz|
+---+-------+
| 3| 2|
+---+-------+