并排输出两个熊猫数据帧中的差异-突出显示差异

连乐

2023-03-14

问题内容：

我试图突出显示两个数据框之间到底发生了什么变化。

假设我有两个Python Pandas数据框：

"StudentRoster Jan-1":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.11                     False                Graduated
113  Zoe    4.12                     True

"StudentRoster Jan-2":
id   Name   score                    isEnrolled           Comment
111  Jack   2.17                     True                 He was late to class
112  Nick   1.21                     False                Graduated
113  Zoe    4.12                     False                On vacation

我的目标是输出一个HTML表：

标识已更改的行（可以是int，float，boolean，string）
输出具有相同，OLD和NEW值的行（理想情况下，将其输出到HTML表中），以便使用者可以清楚地看到两个数据框之间的变化：
```
"StudentRoster Difference Jan-1 - Jan-2":
```
id Name score isEnrolled Comment
112 Nick was 1.11| now 1.21 False Graduated
113 Zoe 4.12 was True | now False was “” | now “On vacation”

我想我可以逐行和逐列进行比较，但是有没有更简单的方法？

问题答案：

第一部分类似于君士坦丁，您可以获取其中的行为空的布尔值*：

In [21]: ne = (df1 != df2).any(1)

In [22]: ne
Out[22]:
0    False
1     True
2     True
dtype: bool

然后，我们可以查看哪些条目已更改：

In [23]: ne_stacked = (df1 != df2).stack()

In [24]: changed = ne_stacked[ne_stacked]

In [25]: changed.index.names = ['id', 'col']

In [26]: changed
Out[26]:
id  col
1   score         True
2   isEnrolled    True
    Comment       True
dtype: bool

在这里，第一个条目是索引，第二个条目是已更改的列。

In [27]: difference_locations = np.where(df1 != df2)

In [28]: changed_from = df1.values[difference_locations]

In [29]: changed_to = df2.values[difference_locations]

In [30]: pd.DataFrame({'from': changed_from, 'to': changed_to}, index=changed.index)
Out[30]:
               from           to
id col
1  score       1.11         1.21
2  isEnrolled  True        False
   Comment     None  On vacation

*注：这是非常重要的df1，并df2在这里分享相同的索引。为了克服这种歧义，您可以确保仅使用来查看共享标签df1.index & df2.index，但我想将其保留为练习。

并排输出两个熊猫数据帧中的差异-突出显示差异

相关阅读

相关文章

相关问答

相关工具

相关文档