最近在学习区域卷积神经网络(RCNN)时,候选框产生使用了选择搜索(selective search),选择搜索中第一步图像分割又使用了基于图的图像分割(Effective graph-based image segmentation)。所以从底层开始研究基于图的图像分割(Effective graph-based image segmentation)。
该图像分割算法的详细介绍可以参见以下博客:
https://blog.csdn.net/surgewong/article/details/39008861
在这里主要结合自己的理解作简要总结和梳理:
class Node:
def __init__(self, parent, rank=0, size=1):
self.parent = parent
self.rank = rank
self.size = size
def __repr__(self):
return '(parent=%s, rank=%s, size=%s)' % (self.parent, self.rank, self.size)
定义顶点(node)类,把每个像素定义为节点(顶点)。顶点有三个性质:
(1) parent,该顶点对应分割区域的母顶点,可以认为的分割区域的编号或者索引。后面初始化时,把图像每个像素当成一个分割区域,所以每个像素的母顶点就是他们本身。
(2) rank,母顶点的优先级(每个顶点初始化为0),用来两个区域合并时,确定唯一的母顶点。
(3) size(每个顶点初始化为1),表示每个顶点作为母顶点时,所在分割区域的顶点数量。当它被其他区域合并,不再是母顶点时,它的size不再改变。
class Forest:
def __init__(self, num_nodes):
# 节点列表
self.nodes = [Node(i) for i in range(num_nodes)]
# 节点数
self.num_sets = num_nodes
def size_of(self, i):
return self.nodes[i].size
def find(self, n):
temp = n
while temp != self.nodes[temp].parent:
temp = self.nodes[temp].parent
self.nodes[n].parent = temp
return temp
# 节点a和节点b合并
def merge(self, a, b):
if self.nodes[a].rank > self.nodes[b].rank:
self.nodes[b].parent = a
self.nodes[a].size = self.nodes[a].size + self.nodes[b].size
else:
self.nodes[a].parent = b
self.nodes[b].size = self.nodes[b].size + self.nodes[a].size
if self.nodes[a].rank == self.nodes[b].rank:
self.nodes[b].rank = self.nodes[b].rank + 1
self.num_sets = self.num_sets - 1
def print_nodes(self):
for node in self.nodes:
print(node)
(1) self.nodes初始化forest类的所有顶点列表,初始化时把图像每个像素作为顶点,当成一个分割区域,每个像素的母顶点就是他们本身。forest.num_sets表示该图像当前的分割区域数量。
(2) size_of(),获取某个顶点的size,一般用来获得某个母顶点的size,即为母顶点所在分割区域的顶点数量。
(3) find(),获得该顶点所在区域的母顶点编号(索引)
(4) merge(self, a, b),合并顶点a所在区域和顶点b所在区域,找到a,b的母顶点,根据其优先级rank确定新的母顶点,更新合并后新区域的顶点数量size,新母顶点的优先级rank,分割区域的数量num_sets。
# 创建边,方向由(x,y)指向(x1,y1),大小为梯度值
def create_edge(img, width, x, y, x1, y1, diff):
# lamda:函数输入是x和y,输出是x * width + y
vertex_id = lambda x, y: x * width + y
w = diff(img, x, y, x1, y1)
return (vertex_id(x, y), vertex_id(x1, y1), w)
创建边(顶点1,顶点2,权值)
def build_graph(img, width, height, diff, neighborhood_8=False):
graph = []
for x in range(height):
for y in range(width):
if x > 0:
graph.append(create_edge(img, width, x, y, x-1, y, diff))
if y > 0:
graph.append(create_edge(img, width, x, y, x, y-1, diff))
if neighborhood_8:
if x > 0 and y > 0:
graph.append(create_edge(img, width, x, y, x-1, y-1, diff))
if x > 0 and y < width-1:
graph.append(create_edge(img, width, x, y, x-1, y+1, diff))
return graph
创建图,对每个顶点,←↑↖↗创建四条边,达到8-邻域的效果,自此完成图的构建。
def segment_graph(graph, num_nodes, const, min_size, threshold_func):
weight = lambda edge: edge[2]
forest = Forest(num_nodes)
sorted_graph = sorted(graph, key=weight)
threshold = [threshold_func(1, const)] * num_nodes
for edge in sorted_graph:
parent_a = forest.find(edge[0])
parent_b = forest.find(edge[1])
a_condition = weight(edge) <= threshold[parent_a]
b_condition = weight(edge) <= threshold[parent_b]
if parent_a != parent_b and a_condition and b_condition:
# print(parent_a)
# print(parent_b)
# print(weight(edge))
# print(threshold[parent_a])
# print(threshold[parent_b])
forest.merge(parent_a, parent_b)
a = forest.find(parent_a)
# print(a)
threshold[a] = weight(edge) + threshold_func(forest.nodes[a].size, const)
return remove_small_components(forest, sorted_graph, min_size)
(1) 首先初始化forest
(2) 对所有边,根据其权值从小到大排序
(3) 初始化区域内部差列表
(4) 从小到大遍历所有边,如果顶点在两个区域,且权值小于两个顶点所在区域的内部差(threshold[]),则合并这两个区域,找到合并后区域的新母顶点,更新该顶点对应的区域内部差(threshold[]):
t
h
r
e
s
h
o
l
d
[
i
]
=
I
n
t
(
C
i
)
+
k
∣
C
i
∣
{\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}}
threshold[i]=Int(Ci)+∣Ci∣k
I
n
t
(
C
i
)
Int({C_i})
Int(Ci)为顶点i所在区域的内部差;
∣
C
i
∣
\left| {{C_i}} \right|
∣Ci∣为该区域的顶点数量;k为可调参数,k过大,导致更新时区域内部差过大,导致过多的区域进行合并,最终造成图像分割粗糙,反之,k过小,容易导致图像分割太精细。
因为遍历时是从小到大遍历,所以如果合并,这条边的权值一定是新区域所有边最大的权值,即为该新区域的内部差,因此
I
n
t
(
C
i
)
=
w
e
i
g
h
t
(
e
d
g
e
)
Int({C_i}) = weight(edge)
Int(Ci)=weight(edge)。
def remove_small_components(forest, graph, min_size):
for edge in graph:
a = forest.find(edge[0])
b = forest.find(edge[1])
if a != b and (forest.size_of(a) < min_size or forest.size_of(b) < min_size):
forest.merge(a, b)
return forest
初次分割后的图像,对于其中定点数均小于min_size的两个相邻区域,进行合并。
def graphbased_segmentation(img_path, neighbor, sigma, K, min_size):
# neighbor = int(8)
image_file = Image.open(img_path)
# sigma = float(0.5)
# K = float(800)
# min_size = int(20)
size = image_file.size
print('Image info: ', image_file.format, size, image_file.mode)
# 生成高斯滤波算子
grid = gaussian_grid(sigma)
if image_file.mode == 'RGB':
image_file.load()
r, g, b = image_file.split()
# 对r,g,b三个通道分别进行滤波(height,width),x行y列
r = filter_image(r, grid)
g = filter_image(g, grid)
b = filter_image(b, grid)
# print(r.shape)
smooth = (r, g, b)
diff = diff_rgb
else:
smooth = filter_image(image_file, grid)
diff = diff_grey
# print(smooth[0].shape)
# 对图像中每个像素建立图。
graph = build_graph(smooth, size[0], size[1], diff, neighbor == 8)
forest = segment_graph(graph, size[0]*size[1], K, min_size, threshold)
image = generate_image(forest, size[0], size[1])
# image.save("output2.jpg")
print('Number of components: %d' % forest.num_sets)
return image
首先图像滤波,然后把图像中每个像素作为一个顶点,建立图,然后进行图像分割。
对于内部差的更新公式: t h r e s h o l d [ i ] = I n t ( C i ) + k ∣ C i ∣ {\rm{thre}}shold[i] = Int({C_i}) + \frac{k}{{\left| {{C_i}} \right|}} threshold[i]=Int(Ci)+∣Ci∣k 还不是很理解。