基于图的图像分割(Effective graph-based image segmentation)python实现

柳俊彦

2023-12-01

前言

最近在学习区域卷积神经网络(RCNN)时，候选框产生使用了选择搜索(selective search)，选择搜索中第一步图像分割又使用了基于图的图像分割(Effective graph-based image segmentation)。所以从底层开始研究基于图的图像分割(Effective graph-based image segmentation)。

简介

该图像分割算法的详细介绍可以参见以下博客：
https://blog.csdn.net/surgewong/article/details/39008861

在这里主要结合自己的理解作简要总结和梳理：

首先，将图像(image)表达成图论中的图(graph)。具体说来就是，把图像中的每一个像素点看成一个顶点vi ∈ V（node或vertex），每个像素与相邻8个像素(8-领域)构成图的一条边ei ∈ E，这样就构建好了一个图 G = (V,E)。图每条边的权值是像素与相邻像素的关系(灰度图的话是灰度值差的绝对值，RGB图像为3个通道值差平方和开根号)，表达了相邻像素之间的相似度。
$\left| {grey[x1,y1] - grey[x2,y2]} \right|$
$\sqrt {{{(rgb[0][x1,y1] - rgb[0][x2,y2])}^2} + {{(rgb[1][x1,y1] - rgb[1][x2,y2])}^2} + {{(rgb[2][x1,y1] - rgb[2][x2,y2])}^2}}$
将每个节点（像素点）看成单一的区域，然后进行合并。
(1) 对所有边根据权值从小到大排序，权值越小，两像素的相似度越大。
(2) S[0]是一个原始分割，相当于每个顶点当做是一个分割区域。
(3) 从小到大遍历所有边，如果这条边(vi,vj)的两个顶点属于不同的分割区域，并且权值不大于两个区域的内部差(区域内左右边最大权值)，那么合并这两个区域。更新合并区域的参数和内部差。
最后对所有区域中，像素数都小于min_size的两个相邻区域，进行合并得到最后的分割。

代码实现与解读

图的构建

class Node:
    def __init__(self, parent, rank=0, size=1):
        self.parent = parent
        self.rank = rank
        self.size = size

    def __repr__(self):
        return '(parent=%s, rank=%s, size=%s)' % (self.parent, self.rank, self.size)

定义顶点(node)类，把每个像素定义为节点(顶点)。顶点有三个性质：
(1) parent，该顶点对应分割区域的母顶点，可以认为的分割区域的编号或者索引。后面初始化时，把图像每个像素当成一个分割区域，所以每个像素的母顶点就是他们本身。
(2) rank，母顶点的优先级（每个顶点初始化为0），用来两个区域合并时，确定唯一的母顶点。
(3) size（每个顶点初始化为1），表示每个顶点作为母顶点时，所在分割区域的顶点数量。当它被其他区域合并，不再是母顶点时，它的size不再改变。

class Forest:
    def __init__(self, num_nodes):
        # 节点列表
        self.nodes = [Node(i) for i in range(num_nodes)]
        # 节点数
        self.num_sets = num_nodes

    def size_of(self, i):
        return self.nodes[i].size

    def find(self, n):
        temp = n
        while temp != self.nodes[temp].parent:
            temp = self.nodes[temp].parent
        self.nodes[n].parent = temp
        return temp

    # 节点a和节点b合并
    def merge(self, a, b):
        if self.nodes[a].rank > self.nodes[b].rank:
            self.nodes[b].parent = a
            self.nodes[a].size = self.nodes[a].size + self.nodes[b].size
        else:
            self.nodes[a].parent = b
            self.nodes[b].size = self.nodes[b].size + self.nodes[a].size
            if self.nodes[a].rank == self.nodes[b].rank:
                self.nodes[b].rank = self.nodes[b].rank + 1
        self.num_sets = self.num_sets - 1

    def print_nodes(self):
        for node in self.nodes:
            print(node)

(1) self.nodes初始化forest类的所有顶点列表，初始化时把图像每个像素作为顶点，当成一个分割区域，每个像素的母顶点就是他们本身。forest.num_sets表示该图像当前的分割区域数量。
(2) size_of()，获取某个顶点的size，一般用来获得某个母顶点的size，即为母顶点所在分割区域的顶点数量。
(3) find()，获得该顶点所在区域的母顶点编号(索引)
(4) merge(self, a, b)，合并顶点a所在区域和顶点b所在区域，找到a,b的母顶点，根据其优先级rank确定新的母顶点，更新合并后新区域的顶点数量size，新母顶点的优先级rank，分割区域的数量num_sets。

# 创建边，方向由(x,y)指向(x1,y1)，大小为梯度值
def create_edge(img, width, x, y, x1, y1, diff):
    # lamda:函数输入是x和y，输出是x * width + y
    vertex_id = lambda x, y: x * width + y
    w = diff(img, x, y, x1, y1)
    return (vertex_id(x, y), vertex_id(x1, y1), w)

创建边(顶点1，顶点2，权值)

def build_graph(img, width, height, diff, neighborhood_8=False):
    graph = []
    for x in range(height):
        for y in range(width):
            if x > 0:
                graph.append(create_edge(img, width, x, y, x-1, y, diff))
            if y > 0:
                graph.append(create_edge(img, width, x, y, x, y-1, diff))
            if neighborhood_8:
                if x > 0 and y > 0:
                    graph.append(create_edge(img, width, x, y, x-1, y-1, diff))

                if x > 0 and y < width-1:
                    graph.append(create_edge(img, width, x, y, x-1, y+1, diff))
    return graph

创建图，对每个顶点，←↑↖↗创建四条边，达到8-邻域的效果，自此完成图的构建。

图像分割

def segment_graph(graph, num_nodes, const, min_size, threshold_func):
    weight = lambda edge: edge[2]
    forest = Forest(num_nodes)
    sorted_graph = sorted(graph, key=weight)
    threshold = [threshold_func(1, const)] * num_nodes

    for edge in sorted_graph:
        parent_a = forest.find(edge[0])
        parent_b = forest.find(edge[1])
        a_condition = weight(edge) <= threshold[parent_a]
        b_condition = weight(edge) <= threshold[parent_b]

        if parent_a != parent_b and a_condition and b_condition:
            # print(parent_a)
            # print(parent_b)
            # print(weight(edge))
            # print(threshold[parent_a])
            # print(threshold[parent_b])
            forest.merge(parent_a, parent_b)
            a = forest.find(parent_a)
            # print(a)
            threshold[a] = weight(edge) + threshold_func(forest.nodes[a].size, const)

    return remove_small_components(forest, sorted_graph, min_size)

(1) 首先初始化forest
(2) 对所有边，根据其权值从小到大排序
(3) 初始化区域内部差列表
(4) 从小到大遍历所有边，如果顶点在两个区域，且权值小于两个顶点所在区域的内部差(threshold[])，则合并这两个区域，找到合并后区域的新母顶点，更新该顶点对应的区域内部差（threshold[]）: ${\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}}$
$Int({C_i})$ 为顶点i所在区域的内部差； $\left| {{C_i}} \right|$ 为该区域的顶点数量；k为可调参数，k过大，导致更新时区域内部差过大，导致过多的区域进行合并，最终造成图像分割粗糙，反之，k过小，容易导致图像分割太精细。
因为遍历时是从小到大遍历，所以如果合并，这条边的权值一定是新区域所有边最大的权值，即为该新区域的内部差，因此 $Int({C_i}) = weight(edge)$ 。

def remove_small_components(forest, graph, min_size):
    for edge in graph:
        a = forest.find(edge[0])
        b = forest.find(edge[1])
        if a != b and (forest.size_of(a) < min_size or forest.size_of(b) < min_size):
            forest.merge(a, b)
    return  forest

初次分割后的图像，对于其中定点数均小于min_size的两个相邻区域，进行合并。

def graphbased_segmentation(img_path, neighbor, sigma, K, min_size):
    # neighbor = int(8)
    image_file = Image.open(img_path)
    # sigma = float(0.5)
    # K = float(800)
    # min_size = int(20)

    size = image_file.size
    print('Image info: ', image_file.format, size, image_file.mode)

    # 生成高斯滤波算子
    grid = gaussian_grid(sigma)

    if image_file.mode == 'RGB':
        image_file.load()
        r, g, b = image_file.split()
        # 对r,g,b三个通道分别进行滤波(height,width),x行y列
        r = filter_image(r, grid)
        g = filter_image(g, grid)
        b = filter_image(b, grid)
        # print(r.shape)

        smooth = (r, g, b)
        diff = diff_rgb
    else:
        smooth = filter_image(image_file, grid)
        diff = diff_grey
    # print(smooth[0].shape)
    # 对图像中每个像素建立图。
    graph = build_graph(smooth, size[0], size[1], diff, neighbor == 8)
    forest = segment_graph(graph, size[0]*size[1], K, min_size, threshold)
    image = generate_image(forest, size[0], size[1])
    # image.save("output2.jpg")
    print('Number of components: %d' % forest.num_sets)

    return image

首先图像滤波，然后把图像中每个像素作为一个顶点，建立图，然后进行图像分割。

问题

对于内部差的更新公式： ${\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}}$ 还不是很理解。

基于图的图像分割(Effective graph-based image segmentation)python实现

前言

简介

代码实现与解读

图的构建

图像分割

问题

相关阅读

相关文章

相关问答

相关文档