多类别任务的评估指标一般有的计算方式有mirco和marco之分,micro使用全体样本计算指标,marco使用各类别的指标均值。
以F1_score为例,
二分类的F1_score计算公式为:
F
1
=
2
×
P
r
e
c
i
s
i
o
n
×
R
e
c
a
l
l
P
r
e
c
i
s
i
o
n
+
R
e
c
a
l
l
F1 = \frac{2 \times Precision \times Recall}{Precision + Recall}
F1=Precision+Recall2×Precision×Recall
多分类F1_score分为:micro-f1,macro-f1
micro-f1
Micro
−
F
1
=
2
P
×
R
P
+
R
\text { Micro }-F 1=\frac{2 \mathrm{P} \times R}{\mathrm{P}+\mathrm{R}}
Micro −F1=P+R2P×R
P = ∑ t ∈ S T P t ∑ t ∈ S T P t + F P t , R = ∑ t ∈ S T P t ∑ t ∈ S T P t + F N t , t 为 类 别 \mathrm{P}=\frac{\sum_{t \in \mathcal{S}} T P_{t}}{\sum_{t \in S} T P_{t}+F P_{t}}, \quad \mathrm{R}=\frac{\sum_{t \in S} T P_{t}}{\sum_{t \in \mathcal{S}} T P_{t}+F N_{t}}, \quad t为类别 P=∑t∈STPt+FPt∑t∈STPt,R=∑t∈STPt+FNt∑t∈STPt,t为类别
macro-f1
Macro
−
F
1
=
1
S
∑
t
∈
S
2
P
t
×
R
t
P
t
+
R
t
\text { Macro }-F 1=\frac{1}{\mathcal{S}} \sum_{t \in \mathcal{S}} \frac{2 \mathrm{P}_{t} \times R_{t}}{\mathrm{P}_{\mathrm{t}}+\mathrm{R}_{\mathrm{t}}}
Macro −F1=S1t∈S∑Pt+Rt2Pt×Rt
P t = T P t T P t + F P t , R t = T P t T P t + F N t \mathrm{P}_{t}=\frac{T P_{t}}{T P_{t}+F P_{t}}, \quad \mathrm{R}_{t}=\frac{T P_{t}}{T P_{t}+F N_{t}} Pt=TPt+FPtTPt,Rt=TPt+FNtTPt
import numpy as np
import pandas as pd
import sklearn
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.preprocessing import LabelBinarizer
y_true = np.array([0, 1, 2, 0, 1, 2])
y_pred = np.array([0, 2, 1, 0, 0, 1])
print("marco:")
print('precision: {}'.format(precision_score(y_true, y_pred, average='macro')))
print('recall: {}'.format(recall_score(y_true, y_pred, average='macro')))
print('f1_score: {}'.format(f1_score(y_true, y_pred, average='macro')))
print('')
print('micro')
print('precision: {}'.format(precision_score(y_true, y_pred, average='micro')))
print('recall: {}'.format(recall_score(y_true, y_pred, average='micro')))
print('f1_score: {}'.format(f1_score(y_true, y_pred, average='micro')))
print('')
print("各类别单独计算指标")
for i in range(3):
print('label: {}'.format(i))
y_true_new = (y_true==i).astype(int)
y_pred_new = (y_pred==i).astype(int)
print('precision: {}'.format(precision_score(y_true_new, y_pred_new, average='binary')))
print('recall: {}'.format(recall_score(y_true_new, y_pred_new, average='binary')))
print('f1_score: {}'.format(f1_score(y_true_new, y_pred_new, average='binary')))
print(" ")
执行结果:
marco: precision: 0.2222222222222222 recall: 0.3333333333333333 f1_score: 0.26666666666666666 micro precision: 0.3333333333333333 recall: 0.3333333333333333 f1_score: 0.3333333333333333 各类别单独计算指标 label: 0 precision: 0.6666666666666666 recall: 1.0 f1_score: 0.8 label: 1 precision: 0.0 recall: 0.0 f1_score: 0.0 label: 2 precision: 0.0 recall: 0.0 f1_score: 0.0
验证:
P
=
2
+
0
+
0
2
+
2
+
2
=
1
3
,
P
=
2
+
0
+
0
3
+
2
+
1
=
1
3
M
i
c
r
o
−
F
1
=
2
P
×
R
P
+
R
=
1
3
P = \frac{2+0+0}{2+2+2} = \frac{1}{3}, \quad P = \frac{2+0+0}{3+2+1} = \frac{1}{3}\\ { Micro }-F 1=\frac{2 \mathrm{P} \times R}{\mathrm{P}+\mathrm{R}} = \frac{1}{3}
P=2+2+22+0+0=31,P=3+2+12+0+0=31Micro−F1=P+R2P×R=31
M a c r o − F 1 = 1 3 ( 0.8 + 0.0 + 0.0 ) = 4 15 { Macro }-F 1=\frac{1}{3} (0.8+0.0+0.0)=\frac{4}{15} Macro−F1=31(0.8+0.0+0.0)=154
自己的思考:找了几个资料,发现对于macro,micro的适用场景,没有详细的区分说明。我认为marco比micro更加关注类别的分布,更适合类别不平衡的情况。当然,类别不平衡时,还可以在sklearn中选用weighted参数,自己赋权,更加合适。