摘要: 分享对论文的理解. 原文见 Zhang, M.-L., & Wu, L. (2015). LIFT: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 107–120.
针对每个标签的属性提取.
Existing approaches learn from multi-label data by manipulating with identical feature set, i.e. the very instance representation of each example is employed in the discrimination processes of all class labels. However, this popular strategy might be suboptimal as each label is supposed to possess specific characteristics of its own.
翻译: 现有方法将所有条件属性用于各个标签的预测, 但每个标签应有自己的属性集.
参见 论文笔记: BP-MLL.
Label-specific features construction and classification models induction.
为每个标签建立正例集合
P
k
=
{
x
i
∣
(
x
i
,
Y
i
)
∈
D
,
l
k
∈
Y
i
}
,
(1)
\mathcal{P}_k = \{\mathbf{x}_i \mid (\mathbf{x}_i, Y_i) \in \mathcal{D}, l_k \in Y_i\}, \tag{1}
Pk={xi∣(xi,Yi)∈D,lk∈Yi},(1)
负例集合
N
k
=
{
x
i
∣
(
x
i
,
Y
i
)
∈
D
,
l
k
∉
Y
i
}
.
(2)
\mathcal{N}_k = \{\mathbf{x}_i \mid (\mathbf{x}_i, Y_i) \in \mathcal{D}, l_k \not\in Y_i\}. \tag{2}
Nk={xi∣(xi,Yi)∈D,lk∈Yi}.(2)
利用
k
k
k-means 算法将这两个集合分别聚类, 其中心集合依次为:
{
p
1
k
,
p
2
k
,
…
,
p
m
k
+
k
}
,
(3)
\{\mathbf{p}_1^k, \mathbf{p}_2^k, \dots, \mathbf{p}_{m_k^+}^k\}, \tag{3}
{p1k,p2k,…,pmk+k},(3)
{
n
1
k
,
n
2
k
,
…
,
n
m
k
−
k
}
.
(4)
\{\mathbf{n}_1^k, \mathbf{n}_2^k, \dots, \mathbf{n}_{m_k^-}^k\}. \tag{4}
{n1k,n2k,…,nmk−k}.(4)
为保持平衡, 设置
m
k
+
=
m
k
−
=
⌈
r
⋅
min
{
∣
P
k
∣
,
∣
N
k
∣
}
⌉
,
(5)
m_k^+ = m_k^- = \lceil r \cdot \min\{\vert \mathcal{P}_k \vert, \vert \mathcal{N}_k \vert\}\rceil, \tag{5}
mk+=mk−=⌈r⋅min{∣Pk∣,∣Nk∣}⌉,(5)
即正类与负类的簇数相同.
向每个实例从原空间映射到新的空间:
ϕ
k
(
x
)
=
[
d
(
x
,
p
1
k
)
,
⋯
,
d
(
x
,
p
m
k
k
)
,
d
(
x
,
n
1
k
)
,
⋯
,
d
(
x
,
n
m
k
k
)
]
.
(6)
\phi_k(\mathbf{x}) = [d(\mathbf{x}, \mathbf{p}_1^k), \cdots, d(\mathbf{x}, \mathbf{p}_{m_k}^k), d(\mathbf{x}, \mathbf{n}_1^k), \cdots, d(\mathbf{x}, \mathbf{n}_{m_k}^k)]. \tag{6}
ϕk(x)=[d(x,p1k),⋯,d(x,pmkk),d(x,n1k),⋯,d(x,nmkk)].(6)
这里就是核心了!
为每个标签构建一个二分类器.
对每个新实例、每个标签都进行相应的属性映射, 然后分类.