论文阅读笔记-A Survey on Graph Neural Networks and Graph Transformers in Computer Vision(GNN综述)

郁灿
2023-12-01

论文阅读笔记-GNN综述

主要介绍了GNN以及它在各个领域的应用

2D NATURAL IMAGES

Image Classification

Multi-Label Classification

ML-GCN:builds a directed graph on the basis of label space, where each node stands for a object label (word embeddings) and their connections model the inter-dependencies of different labels.

attention-driven GCN:model the label dependencies via more elaborate GNN architectures

hypergraph neural networks:model the label dependencies via more elaborate GNN architectures

Few-Shot Learning

论文名称来源主要思想
Few-shot learning with graph neural networksICLR,2018formulate FSL as a supervised interpolation problem on a densely-connected graph, where the vertices stand for images in the collection and the adjacency is learnable with trainable similarity kernels.
Learning to propagate labels: Transductive propagation network for few-shot learningICLR,2019constructs graphs on the top of embedding space to fully exploit the manifold structure of the novel classes.Label information is propagated from the support set to the query set based on the constructed graphs
dge-labeling graph neural network for few-shot learningCVPR,2019propose a edge-labeling GNN framework that learns to predict edge labels, explicitly constraining the intra- and inter-class similarities.
Learning from the past: Continual meta-learning via bayesian graph modelingAAAI,2020formulate meta-learning-based FSL as continual learning of a sequence of tasks and resort to Bayesian GNN to capture the intra- and inter-task correlations.
Dpgn: Distribution propagation graph network for few-shot learningCVPR,2020devise a dual complete graph network to model both distribution- and instance-level relations.
Hierarchical graph neural networks for few-shot learningTCSVT,2021exploit the hierarchical relationships among graph nodes via the bottom-up and top-down reasoning modules.
Hybrid graph neural networks for few-shot learningAAAI,2022introduce an instance GNN and a prototype GNN as feature embedding task adaptation modules for quickly adapting learned features to new tasks.

Zero-Shot Learning (ZSL)

论文名称来源主要思想
Rethinking knowledge graph propagation for zero-shot learningCVPR,2019propose a Dense Graph Propagation (DGP) module to exploit the hierarchical structure of knowledge graph.It consists of two phases to iteratively propagate knowledge between a node and its ancestors and descendants.
Region graph embedding network for zero-shot learningECCV,2020represent each input image as a region graph, where each node stands for an attended region in the image and the edges are appearance similarities among these region nodes.
Attribute propagation network for graph zero-shot learningAAAI,2020generates and updates attribute vectors with an attribute propagation network for optimizing the attribute space
Isometric propagation network for generalized zero-shot learningICLR,2021introduce the visual and semantic prototype propagation on auto-generated graphs to enhance the inter-class relations and align the corresponding classwise dependencies in visual and semantic space
Learning graph embeddings for open world compositional zero-shot learningTPAMI, 2022introducing a Compositional Cosine Graph Embedding (Co-CGE) model to learn the relationship between primitives and compositions through a GCN.They quantitatively measure the feasibility scores of a state-object composition and incorporate the computed scores into CoCGE in two ways
Gndan: Graph navigated dual attention network for zero-shot learningIEEE TNNLS, 2022resort to GAT for exploiting the appearance relations between local regions and the cooperation between local and global features.

Transfer Learning

论文名称来源主要思想
Gcan: Graph convolutional adversarial network for unsupervised domain adaptationCVPR,2019propose a Graph Convolutional Adversarial Network (GCAN) for DA, where a GCN is developed on top of densely-connected instance graphs to encode data structure information.
Heterogeneous graph attention network for unsupervised multiple-target domain adaptationIEEE TPAMI, 2020build a heterogeneous relation graph and introduce GAT to propagate the semantic information and generate reliable pseudo-labels.
Curriculum graph co-teaching for multi-target domain adaptationCVPR,2021introduce a GCN to aggregate information from different domains along with a co-teaching and curriculum learning strategy to achieve progressive adaptation.
Progressive graph learning for open-set domain adaptationICML,2020study the problem of open-set DA via a progressive graph learning framework to select pseudo-labels and thus avoid the negative transfer.
Prototype-matching graph network for heterogeneous domain adaptationACMMM 2020attain cross-domain prototype alignment based on features learned from different stages of GNNs.
Learning to combine: Knowledge aggregation for multi-source domain adaptationECCV. Springer, 2020.introduce a knowledge graph based on the prototypes of different domains to perform information propagation among semantically adjacent representations.
Compound domain generalization via meta-knowledge encodingCVPR,2022build global prototypical relation graphs and introduce a graph self-attention mechanism

当前工作重点

Current work focuses on extracting adhoc knowledge graphs from the data for a certain task, which is heuristic and relies on the human prior

未来的方向

(1)develop general and automatic graph construction procedures,

(2)enhance the interactions between abstract graph structures and task-specific classifiers

(3)excavate more fine-grained building blocks (node and edge) to increase the capability of constructed graphs.

Object Detection

论文名称来源主要思想
Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detectionCVPR,2019presents an adaptive global reasoning network for large-scale object detection by incorporating commonsense knowledge (category-wise knowledge graph) and propagating visual information globally
Spatial-aware graph relation network for large-scale object detectionCVPR,2019adaptively discover semantic and spatial relationships without requiring prior handcrafted linguistic knowledge
Relation networks for object detectionCVPR,2018introduces an adapted attention module to detection head networks, explicitly learning information between objects through encoding the longrange dependencies.
Relationnet++: Bridging visual representations for object detection via transformer decoderNeurIPS,2020presents a selfattention-based decoder module to embrace the strengths of different object/part representations within a single detection framework.
Gar: Graph assisted reasoning for object detectionWACV,2020introduce a heterogeneous graph to jointly model object-object and object-scene relations.
Graphfpn: Graph feature pyramid network for object detectionICCV,2021propose a graph feature pyramid network (GraphFPN), which explores the contextual and hierarchical structures of an input image based on a superpixel hierarchy
Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detectionIEEE TPAMI,2022first builds intra- and inter-domain relation graphs in virtue of cyclic between-domain consistency without any prior knowledge about the target distribution.
Sigma: Semantic-complete graph matching for domain adaptive object detectionICCV,2021formulates DAOD as a graph matching problem by establishing cross-image graphs to model classconditional distributions on both domains
Semantic relation reasoning for shot-stable few-shot object detectionCVPR,2022introduces a semantic relation reasoning module to integrate semantic information between base and novel classes for novel object detection

说明:domain adaptive object detection (DAOD)

当前的工作重点

exploit between-object, cross-scale or cross-domain relationships, as well as relationships between base and novel classes

未来的方向

(1)design better region-to-node feature mapping methods,

(2)incorporate Transformer (or pure GNN) encoders to improve the expressive power of initial node features

(3)directly perform reasoning in the original feature space to better preserve the intrinsic structure of images.

Image Segmentation

一般的分割

论文题目来源主要思想
Dual graph convolutional network for semantic segmentationBMVC,2019targets on modeling the global context of input features via a dual GCN framework where a coordinate space GCN models spatial relationships between pixels in the image, and a feature space GCN models dependencies along the channel dimensions of the network’s feature map.
Graph-based global reasoning networksCVPR,2019design the global reasoning unit by projecting features that are globally aggregated in coordinate space to node domain and performing relational reasoning in a fullyconnected graph.
Dynamic graph message passing networksCVPR,2020dynamically samples the neighborhood of a node and then predicts the node dependencies, filter weights, and affinity matrix to attain information propagation
Representative graph neural networkECCV,2020propose to dynamically sample some representative nodes for relational modeling.
Spatial pyramid based graph reasoning for semantic segmentationCVPR,2020propose an improved Laplacian formulation that enables graph reasoning in the original feature space, fully exploiting the contextual relations at different feature scales.
Class-wise dynamic graph convolution for semantic segmentationECCV,2020introduce a classwise dynamic graph convolution module to conduct graph reasoning over the pixels that belong to the same class
Bidirectional graph reasoning network for panoptic segmentationCVPR,2020design a bidirectional graph reasoning network to bridge the things branch and the stuff branch for panoptic segmentation.

One-Shot Semantic Segmentation

论文题目来源主要思想
Pyramid graph networks with connection attentions for region-based oneshot semantic segmentationICCV,2019introduce a pyramid graph attention module to model the connection between query and support feature maps

Few-Shot Semantic Segmentation

论文题目来源主要思想
Scale-aware graph neural network for few-shot semantic segmentationCVPR,2021propose a scale-aware GNN to perform crossscale relational reasoning among support-query images. A self-node collaboration mechanism is introduced to perceive different resolutions of the same object.

Weakly Supervised Semantic Segmentation

论文题目来源主要思想
Affinity attention graph neural network for weakly supervised semantic segmentationIEEE,TPAMI 2021an image will first be converted to a weighted graph via an affinity CNN network, and then an affinity attention layer is devised to obtain long-range interactions from the constructed graph and propagate semantic information to the unlabeled pixels

当前的工作重点

explore contextual information in the localor global-level with pyramid pooling, dilated convolutions, or the self-attention mechanism

Scene Graph Generation (SGG)

任务概述:检测图像中的对象对及其关系以生成可视化的场景图的任务,它提供了对视觉场景的高级理解,而不是孤立地处理单个对象

论文题目来源主要思想
Factorizable net: an efficient subgraph-based framework for scene graph generationECCV,2018a subgraph-based approach (each subgraph is regarded as a node), has a spatially weighted message passing structure to refine the features of objects and subgroups by passing messages among them with attention-like schemes
Graph r-cnn for scene graph generationECCV,2018first obtain a sparse candidate graph by pruning the densely-connected graph generated from RPN via a relation proposal network, then an attentional GCN is introduced to aggregate contextual information and update node features and edge relationships
Attentive relational networks for mapping images to scene graphsCVPR,2019propose attentive relational networks, which first transform label word embeddings and visual features into a shared semantic space, and then rely on GAT to perform feature aggregation for final relation inference
Bipartite graph network with adaptive message passing for unbiased scene graph generationCVPR,2021introduce bipartite GNN to estimate and propagate relation confidence in a multi-stage manner.
Energy-based learning for scene graph generationCVPR,2021propose an energybased framework, which depends on graph message passing algorithm for computing the energy of configurations.

VIDEO UNDERSTANDING

Video Action Recognition

任务介绍:视频人体动作识别是视频处理和理解的基本任务之一,其目的是识别和分类RGB/深度视频或骨架数据中的人体动作。

Action Recognition

论文题目来源主要思想
propose to capture the long-range temporal contexts via graph-based reasoning over human-object and object-object relationships
construct actor-centric object-level graph and applying GCNs to capture the contexts among objects in a actor-centric way.A relation-level graph is built to inference the contexts in relation nodes
propose multi-scale reasoning in the temporal graph of a video, in which each node is a frame in the video, and the pairwise relations between nodes are represented as a learnable adjacent matrix
extend the GCN-based relation modeling to zero-shot action recognition and leverage knowledge graphs to model the relations among actions and attributes jointly
introduce a graph-based high-order relation modeling method for long-term action recognition.

Skeleton-Based Action Recognition.

论文题目来源主要思想
propose a STGCN network first connects joints in a frame according to the natural connectivity in the human body and then connects the same joints in two consecutive frames to maintain temporal information.
introduce a fully-connected graph with learnable edge weights between joints and a data-dependent graph learned from the input skeleton.
connect physically-apart skeleton joints to captures the patterns of collaborative moving joints
improves the joints’ connection in a single frame by adding edges between limbs and head.it uses GCNs to capture joints’ relations in single frames and adopt the LSTM to capture the temporal dynamics.
introduce to maintain edge features and learn both node and edge feature representations via directed graph convolution.
first construct multiple dilated windows over temporal dimension.Then separately utilize GCNs on multiple graphs with different scales.Finally aggregate the results of GCNs on all the graphs in multiple windows to capture multi-scale and long-range dependencies.

Temporal Action Localization

 类似资料: