论文笔记: 多标签学习 LIFT 算法

阎辰钊

2023-12-01

摘要: 分享对论文的理解. 原文见 Zhang, M.-L., & Wu, L. (2015). LIFT: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 107–120.

1. 论文贡献

针对每个标签的属性提取.

2. 主要思想

Existing approaches learn from multi-label data by manipulating with identical feature set, i.e. the very instance representation of each example is employed in the discrimination processes of all class labels. However, this popular strategy might be suboptimal as each label is supposed to possess specific characteristics of its own.
翻译: 现有方法将所有条件属性用于各个标签的预测, 但每个标签应有自己的属性集.

3. 符号系统

参见论文笔记: BP-MLL.

4. 算法核心

Label-specific features construction and classification models induction.

4.1 标签相关属性构建

为每个标签建立正例集合
$\mathcal{P}_k = \{\mathbf{x}_i \mid (\mathbf{x}_i, Y_i) \in \mathcal{D}, l_k \in Y_i\}, \tag{1}$
负例集合
$\mathcal{N}_k = \{\mathbf{x}_i \mid (\mathbf{x}_i, Y_i) \in \mathcal{D}, l_k \not\in Y_i\}. \tag{2}$
利用 $k$ -means 算法将这两个集合分别聚类, 其中心集合依次为:
$\{\mathbf{p}_1^k, \mathbf{p}_2^k, \dots, \mathbf{p}_{m_k^+}^k\}, \tag{3}$
$\{\mathbf{n}_1^k, \mathbf{n}_2^k, \dots, \mathbf{n}_{m_k^-}^k\}. \tag{4}$
为保持平衡, 设置
$m_k^+ = m_k^- = \lceil r \cdot \min\{\vert \mathcal{P}_k \vert, \vert \mathcal{N}_k \vert\}\rceil, \tag{5}$
即正类与负类的簇数相同.

向每个实例从原空间映射到新的空间:
$\phi_k(\mathbf{x}) = [d(\mathbf{x}, \mathbf{p}_1^k), \cdots, d(\mathbf{x}, \mathbf{p}_{m_k}^k), d(\mathbf{x}, \mathbf{n}_1^k), \cdots, d(\mathbf{x}, \mathbf{n}_{m_k}^k)]. \tag{6}$
这里就是核心了!

4.2 分类模型（归纳）学习

为每个标签构建一个二分类器.

4.3 算法优点

灵活性
易于实现
效果好

4.4 如何分类

对每个新实例、每个标签都进行相应的属性映射, 然后分类.

5. 小结

嵌入 embedding 是常见招数.