VLAD学习总结和python实现

狄宾实

2023-12-01

工作需要，研究了一些很经典的图像检索算法，逐一记录下来，方便自己复习和大家交流。

这篇博文是关于VLAD(vector of locally aggregated descriptors），即聚合局部描述子的向量，是一种利用图像的局部描述子如：SIFT、SURF、ORB等，做一些聚合的操作，然后用一个长向量来表征一副图像的过程。把图像表征成向量，是图像检索的先决条件。因为，图像检索的思想是对查询图像和检索库中的图像做相似性度量，然后输出相似性最大的值，数学本质是向量间的相似性度量。当然，你可以用两幅图像直接进行相似性度量，比如像素值做差或者是计算颜色直方图的分布的相似性，对于少量的数据是没问题的，但是对于海量数据，首先存储图像就是一个很大的问题，其次这样做的检索速度太慢，而且需要很大的磁盘和内存空间。

我们必须对图像做一些精简的工作，即是选择一种描述方式既可以很好的代表一副图像的内容，同时对硬件资源的消耗较低，而且又有很快的检索速度。显然，基于局部描述子的聚合向量可以解决这些问题。（PS:如果不知道SIFT是什么，可以看一下其他的博文，这个有很多解释，也是一种很成熟的图像特征提取方法）

博主喜欢先研究论文，然后根据论文内容去实现算法，所以，也是先从论文说起。

参考《Aggregating local descriptors into a compact image representation》

这篇论文中作者横向对比了两种比较成熟的聚合局部向量的方法：BOF(Bag of features)、FK(Fisher kernel)，我在这里不再详细展开，后续的博文会对它们单独记录。VLAD的优点是速度快，精度高，不会造成硬件资源的大量消耗。但是这些算法都有一个共同的缺点，它们都无法直接应用于海量数据，因为数据量还是过于庞大，需要进一步的编码处理进行压缩。这个可以参考我的博文：

乘积量化学习和实战总结点击打开链接

这个算法就是针对海量数据的编码处理提出的。

回到VLAD，算法可以分为如下几步：

1、提取图像的SIFT描述子

2、利用提取到的SIFT描述子（是所有训练图像的SIFT）训练一本码书，训练方法是K-means

3、把一副图像所有的SIFT描述子按照最近邻原则分配到码书上（也即分配到K个聚类中心）

4、对每个聚类中心做残差和（即属于当前聚类中心的所有SIFT减去聚类中心然后求和）

5、对这个残差和做L2归一化，然后拼接成一个K*128的长向量。128是单条SIFT的长度

如果不考虑海量数据的话，这个训练和建库过程已经可以了，直接保存第五步的结果（图像库），然后对查询图像做上述的操作，之后计算和图像库中每一条向量的欧式距离，输出前5个最小距离，既是一次完整的检索过程。

代码：


#!/usr/bin/env python

print "feature"
import pickle 
from sklearn import cluster
from sklearn.cluster import  MiniBatchKMeans
batch_size=1000
import sys
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import cv2
import matplotlib.pyplot as plt
import numpy as np
import skimage.io as io
#model_root='/media/mysj/c2338634-673f-4908-8d27-3f73bda92d05/home/mysj'
#sys.path.insert(0,model_root)

def getImage(str):
    
    mat=io.ImageCollection(str)
    print len(mat)
   

    return mat
def getOrb(mat):
    label={}
    orbdic=[]
    siftt=cv2.xfeatures2d.SIFT_create()
    for i in range(len(mat)):
        kp=siftt.detect(mat[i],None)
        des=siftt.compute(mat[i],kp)
        #des_pca=(PCA(n_components=128).fit_transform(des[1].transpose())).transpose()
        #des[1]=Vlad(des[1])
        #print des[1].shape
        #final=Vlad(des[1])
        final=des[1]
        print i
        print final.shape
        #print final
        orbdic.append(final)
        
        #label[i]=des[1].shape[0]
    return orbdic,label
#################################################################################
## use minibatchkmeans to train your sift descriptors,then you will get a codebook with k words 
def codebook(orbdic):
    mbk = MiniBatchKMeans(init='k-means++', n_clusters=64,max_iter=1000, batch_size=batch_size,
                      n_init=10, max_no_improvement=10, verbose=0).fit(orbdic)
    label=mbk.labels_
    centroid=mbk.cluster_centers_
    return label,centroid
#########################################################################################

########################################################################################### 
## assign all the sift descriptors of each picture to the nearest codeword,and we can use a K*128 vlad vector to descrip each 
## picture
## refer to my blog
## notice:if picture x doesn't have any sift descriptor which belongs to code word i ,we use a 128d zero vector to represent it.
def vlad(locdpt,centroid):
    dis={}
    final=[]
    for i in range(locdpt.shape[0]):
        des=[]
        for j in range(centroid.shape[0]):
            des.append(np.linalg.norm(locdpt[i]-centroid[j]))
        if dis.has_key(np.argmin(des)):
            dis[np.argmin(des)].append(locdpt[i])
        else:
            dis[np.argmin(des)]=[locdpt[i]]
    #print len(dis),dis.keys()
    for i in range(64):
        total=0
        if dis.has_key(i):
            for j in range(len(dis[i])):
            
                total=total+dis[i][j]-centroid[i]
            if np.linalg.norm(total)!=0:
                total=total/np.linalg.norm(total)
                
        else:
            total=np.zeros((128,))
        final.append(total)
    print len(final)    
    final=concate2(final,len(final))
    return final
###############################################################################################
def gfinal(mat):
    gf=[]
    orbdic,label=getOrb(mat)
    database=concate(orbdic,len(orbdic))
    label,centroid=codebook(database)
    print centroid.shape
    for i in range(len(orbdic)):
        gf.append(vlad(orbdic[i],centroid))
    return gf

def concate2(orbdic,l):
    #print "concate-all-features-vector"
    database=orbdic[0]
    for i in range(1,l):
        #print orbdic[i].shape
        database=np.hstack((database,orbdic[i]))
    return database
        
def concate(orbdic,l):
    #print "concate-all-features-vector"
    database=orbdic[0]
    for i in range(1,l):
        database=np.vstack((database,orbdic[i]))
    return database

def train(database):
    label,centroid=codebook(database)
    print centroid.shape
    with open("codebook.pickle",'wb')as pk:
        pickle.dump(centroid,pk)

def test(orbdic):
    final=[]
    with open("codebook.pickle",'rb')as pk:
        codebook=pickle.load(pk)
    print codebook.shape
    for i in range(len(orbdic)):
        final.append(vlad(orbdic[i],codebook)) 
    final=concate(final,len(final))    
    return final    

if __name__=="__main__":
    ##the picture path
    filename="klboat"
    str=filename+"/*.jpg"
    ##get each picture's sift-descriptor
    mat=getImage(str)
    io.imshow(mat[0])
    orbdic,label=getOrb(mat)
    database=concate(orbdic,len(orbdic))
    ##train for codebook
    train(database)
    ##do NN-assign
    final=test(orbdic)
    #save vlad descriptor,size 86*(k*128)
    print final.shape
    with open("db.pickle",'wb')as pk:
        pickle.dump(final,pk)

欢迎大家来挑bug。。。。。

VLAD学习总结和python实现

乘积量化学习和实战总结点击打开链接

相关阅读

相关文章

相关问答

相关文档

VLAD学习总结和python实现

乘积量化学习和实战总结 点击打开链接

相关阅读

相关文章

相关问答

相关文档

乘积量化学习和实战总结点击打开链接