当前位置: 首页 > 面试题库 >

How to unwrap nested Struct column into multiple columns?

丁阳炎
2023-03-14
问题内容

I’m trying to expand a DataFrame column with nested struct type (see below)
to multiple columns. The Struct schema I’m working with looks something like
{"foo": 3, "bar": {"baz": 2}}.

Ideally, I’d like to expand the above into two columns ("foo" and
"bar.baz"). However, when I tried using .select("data.*") (where data is
the Struct column), I only get columns foo and bar, where bar is still a
struct.

Is there a way such that I can expand the Struct for both layers?


问题答案:

You can select data.bar.baz as bar.baz:

df.show()
+-------+
|   data|
+-------+
|[3,[2]]|
+-------+

df.printSchema()
root
 |-- data: struct (nullable = false)
 |    |-- foo: long (nullable = true)
 |    |-- bar: struct (nullable = false)
 |    |    |-- baz: long (nullable = true)

In pyspark:

import pyspark.sql.functions as F
df.select(F.col("data.foo").alias("foo"), F.col("data.bar.baz").alias("bar.baz")).show()
+---+-------+
|foo|bar.baz|
+---+-------+
|  3|      2|
+---+-------+


 类似资料:

相关阅读

相关文章

相关问答