当前位置: 首页 > 工具软件 > Sublime Merge > 使用案例 >

merge合并

柴禄
2023-12-01

merge合并

与concat类似,但主要是用于两组有key column的数据,统一索引的数据

创建数据

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                             'A': ['A0', 'A1', 'A2', 'A3'],
                             'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                              'C': ['C0', 'C1', 'C2', 'C3'],
                              'D': ['D0', 'D1', 'D2', 'D3']})

依据一组key合并

res = pd.merge(left,right,on='key')
  • on 依据的key值

Run:

res:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

创造数据

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                      'key2': ['K0', 'K1', 'K0', 'K1'],
                      'A': ['A0', 'A1', 'A2', 'A3'],
                      'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                       'key2': ['K0', 'K0', 'K0', 'K0'],
                       'C': ['C0', 'C1', 'C2', 'C3'],
                       'D': ['D0', 'D1', 'D2', 'D3']})

依据两组key合并

res = pd.merge(left,right,on=['key1','key2'],how='inner')
print('how=inner\n',res)

res = pd.merge(left,right,on=['key1','key2'],how='outer')
print('how=outer\n',res)

res = pd.merge(left,right,on=['key1','key2'],how='left')
print('how=left\n',res)

res = pd.merge(left,right,on=['key1','key2'],how='right')
print('how=right\n',res)
  • how 合并方式

Run:

how=inner [交集]
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2

how=outer [并集]
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN
5 K2 K0 NaN NaN C3 D3

how=left [左边的key值]
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN

how=right [右边的key值]
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2
3 K2 K0 NaN NaN C3 D3

Indicator

indicator=True会将合并的记录放在新的一列

df1 = pd.DataFrame({'col1':[0,1],'col_left':['a','b']})
df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})

Run:

df1
col1 col_left
0 0 a
1 1 b

df2
col1 col_right
0 1 2
1 2 2
2 2 2

res = pd.merge(df1,df2,on='col1',how='outer',indicator=True)
print('indicator=True\n',res)

res = pd.merge(df1,df2,on='col1',how='outer',indicator='indicator_column')
print('indicator=indicator_column\n',res)
  • 自定义indicator column的名称
  • Indicator 合并的记录

Run:

indicator=True
col1 col_left col_right _merge
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only

indicator=indicator_column
col1 col_left col_right indicator_column
0 0 a NaN left_only
1 1 b 2.0 both
2 2 NaN 2.0 right_only
3 2 NaN 2.0 right_only

创造数据

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                     index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                     index=['K0', 'K2', 'K3'])

Run:

left
A B
K0 A0 B0
K1 A1 B1
K2 A2 B2
right
C D
K0 C0 D0
K2 C2 D2
K3 C3 D3

index

res = pd.merge(left,right,left_index=True,right_index=True,how='outer')
print('res\n',res)

res = pd.merge(left,right,left_index=True,right_index=True,how='inner')
print('res\n',res)

Run:

res
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 NaN NaN
K2 A2 B2 C2 D2
K3 NaN NaN C3 D3

res
A B C D
K0 A0 B0 C0 D0
K2 A2 B2 C2 D2

创造数据

left = pd.DataFrame({'k': ['k0', 'k1', 'k2'], 'age': [1, 2, 3]})
right = pd.DataFrame({'k': ['k0', 'k0', 'k3'], 'age': [4, 5, 6]})

Run:

left
k age
0 k0 1
1 k1 2
2 k2 3
right
k age
0 k0 4
1 k0 5
2 k3 6

解决overlapping

res = pd.merge(boys, girls, on='k', suffixes=['_boy', '_girl'], how='inner')

Run:

k  age_boy  age_girl

0 k0 1 4
1 k0 1 5

 类似资料: