本文主要是对LIRE Documentation官网的使用教程进行简单的翻译和介绍。
一、使用LIRE创建索引
这部分包括四个案例,主要是介绍使用可用的builders创建Lucene的文档并添加他们在自定义的索引中,如何使用一个或多个特征来创建索引。
例一:创建索引,使用CEDD特征提取来创建文档并检索所有的文件
/**
* Simple class showing the process of indexing
* @author Mathias Lux, mathias@juggle.at and Nektarios Anagnostopoulos, nek.anag@gmail.com
*/
public class Indexer {
public static void main(String[] args) throws IOException {
// Checking if arg[0] is there and if it is a directory.
boolean passed = false;
if (args.length > 0) {
File f = new File(args[0]);
System.out.println("Indexing images in " + args[0]);
if (f.exists() && f.isDirectory()) passed = true;
}
if (!passed) {
System.out.println("No directory given as first argument.");
System.out.println("Run \"Indexer <directory>\" to index files of a directory.");
System.exit(1);
}
// Getting all images from a directory and its sub directories.
ArrayList<String> images = FileUtils.readFileLines(new File(args[0]), true);
// Creating a CEDD document builder and indexing all files.
GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder(CEDD.class);
// Creating an Lucene IndexWriter
IndexWriter iw = LuceneUtils.createIndexWriter("index", true, LuceneUtils.AnalyzerType.WhitespaceAnalyzer);
// Iterating through images building the low level features
for (Iterator<String> it = images.iterator(); it.hasNext(); ) {
String imageFilePath = it.next();
System.out.println("Indexing " + imageFilePath);
try {
BufferedImage img = ImageIO.read(new FileInputStream(imageFilePath));
Document document = globalDocumentBuilder.createDocument(img, imageFilePath);
iw.addDocument(document);
} catch (Exception e) {
System.err.println("Error reading image or indexing it.");
e.printStackTrace();
}
}
// closing the IndexWriter
LuceneUtils.closeWriter(iw);
System.out.println("Finished indexing.");
}
}
例二:一次使用多个特征描述符提取器,创建一个GlobalDocumentBuilder对象,添加你需要的Extractor(特征提取器),使用CEDD FCTH AutoColorCorrelogram进行特征提取。
public class Indexer {
public static void main(String[] args) throws IOException {
// Checking if arg[0] is there and if it is a directory.
boolean passed = false;
if (args.length > 0) {
File f = new File(args[0]);
System.out.println("Indexing images in " + args[0]);
if (f.exists() && f.isDirectory()) passed = true;
}
if (!passed) {
System.out.println("No directory given as first argument.");
System.out.println("Run \"Indexer <directory>\" to index files of a directory.");
System.exit(1);
}
// Getting all images from a directory and its sub directories.
ArrayList<String> images = FileUtils.readFileLines(new File(args[0]), true);
// Creating a CEDD document builder and indexing all files.
GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder(CEDD.class);
// and here we add those features we want to extract in a single run:
globalDocumentBuilder.addExtractor(FCTH.class);
globalDocumentBuilder.addExtractor(AutoColorCorrelogram.class);
// Creating an Lucene IndexWriter
IndexWriter iw = LuceneUtils.createIndexWriter("index", true, LuceneUtils.AnalyzerType.WhitespaceAnalyzer);
// Iterating through images building the low level features
for (Iterator<String> it = images.iterator(); it.hasNext(); ) {
String imageFilePath = it.next();
System.out.println("Indexing " + imageFilePath);
try {
BufferedImage img = ImageIO.read(new FileInputStream(imageFilePath));
Document document = globalDocumentBuilder.createDocument(img, imageFilePath);
iw.addDocument(document);
} catch (Exception e) {
System.err.println("Error reading image or indexing it.");
e.printStackTrace();
}
}
// closing the IndexWriter
LuceneUtils.closeWriter(iw);
System.out.println("Finished indexing.");
}
}
例三:并行检索,如果你有多个cpu你可以使用并行检索工具。注意,使用线程的选项,你只需配置消费者线程的数量。这里将会有一个监听线程、主线程和一个生产者线程。然而只有n个消费者线程加一个生产者线程将创建cpu负载,生产者只是从内存中读取数据并把它放入一个队列。还请注意,对于具有少量特征的索引,I/O会造成严重的瓶颈,因此应尽可能使用SSD.
/**
* Simple class showing the use of the ParallelIndexer, which uses up as much CPU as it can get.
* @author Mathias Lux, mathias@juggle.at and Nektarios Anagnostopoulos, nek.anag@gmail.com
*/
public class ParallelIndexing {
public static void main(String[] args) throws IOException {
// Checking if arg[0] is there and if it is a directory.
boolean passed = false;
if (args.length > 0) {
File f = new File(args[0]);
System.out.println("Indexing images in " + args[0]);
if (f.exists() && f.isDirectory()) passed = true;
}
if (!passed) {
System.out.println("No directory given as first argument.");
System.out.println("Run \"ParallelIndexing <directory>\" to index files of a directory.");
System.exit(1);
}
// use ParallelIndexer to index all photos from args[0] into "index" ... use 6 threads (actually 7 with the I/O thread).
ParallelIndexer indexer = new ParallelIndexer(6, "index", args[0]);
// use this to add you preferred builders. For now we go for CEDD, FCTH and AutoColorCorrelogram
indexer.addExtractor(CEDD.class);
indexer.addExtractor(FCTH.class);
indexer.addExtractor(AutoColorCorrelogram.class);
indexer.run();
System.out.println("Finished indexing.");
}
}
例四:使用多个不同的特征进行并行检索,并行检索器可用于同时使用全局和局部特征创建索引。该例中使用了Global Local Simple同时进行特征提取
/**
* Simple class showing the use of the ParallelIndexer, which uses up as much CPU as it can get.
* @author Mathias Lux, mathias@juggle.at and Nektarios Anagnostopoulos, nek.anag@gmail.com
*/
public class ParallelIndexing {
public static void main(String[] args) throws IOException {
// Checking if arg[0] is there and if it is a directory.
boolean passed = false;
if (args.length > 0) {
File f = new File(args[0]);
System.out.println("Indexing images in " + args[0]);
if (f.exists() && f.isDirectory()) passed = true;
}
if (!passed) {
System.out.println("No directory given as first argument.");
System.out.println("Run \"ParallelIndexing <directory>\" to index files of a directory.");
System.exit(1);
}
// use ParallelIndexer to index all photos from args[0] into "index".
int numOfDocsForVocabulary = 500;
Class<? extends AbstractAggregator> aggregator = BOVW.class;
int[] numOfClusters = new int[] {128, 512};
ParallelIndexer indexer = new ParallelIndexer(DocumentBuilder.NUM_OF_THREADS, "index", args[0], numOfClusters, numOfDocsForVocabulary, aggregator);
//Global
indexer.addExtractor(CEDD.class);
indexer.addExtractor(FCTH.class);
indexer.addExtractor(AutoColorCorrelogram.class);
//Local
indexer.addExtractor(CvSurfExtractor.class);
indexer.addExtractor(CvSiftExtractor.class);
//Simple
indexer.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.CVSURF);
indexer.addExtractor(JCD.class, SimpleExtractor.KeypointDetector.Random);
indexer.run();
System.out.println("Finished indexing.");
}
}
http://www.semanticmetadata.net/wiki/createindex/
二、使用LIRE进行检索
使用GenericFastImageSearcher创建一个ImageSearcher用于从索引中检索图片。这个可以通过新建GenericFastImageSearcher(30, CEDD.class) for ie. CEDD.ImageSearcher 将用于查询一张图片,这个图片可以通过BufferedImage, 或者Lucene Document表示,比如:for instance with the method search(BufferedImage, IndexReader) or search(Document, IndexReader).
请注意,ImageSearcher使用Lucene IndexReader,并在索引中进行线性搜索。结果作为ImageSearchHits对象返回,其目的是模拟Lucene Hits对象。
还要注意,IndexSearcher仅使用图像特征,这些特征在索引中的特定文档中可用。如果文档仅使用快速DocumentBuilder机制索引,则在索引文档中不能使用ColorHistogram或EdgeHistogram特征,只能使用ColorLayout特征。
public class Searcher {
public static void main(String[] args) throws IOException {
// Checking if arg[0] is there and if it is an image.
BufferedImage img = null;
boolean passed = false;
if (args.length > 0) {
File f = new File(args[0]);
if (f.exists()) {
try {
img = ImageIO.read(f);
passed = true;
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
}
if (!passed) {
System.out.println("No image given as first argument.");
System.out.println("Run \"Searcher <query image>\" to search for <query image>.");
System.exit(1);
}
IndexReader ir = DirectoryReader.open(FSDirectory.open(Paths.get("indexPath")));
ImageSearcher searcher = new GenericFastImageSearcher(30, CEDD.class);
// searching with a image file ...
ImageSearchHits hits = searcher.search(img, ir);
// searching with a Lucene document instance ...
for (int i = 0; i < hits.length(); i++) {
String fileName = ir.document(hits.documentID(i)).getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0];
System.out.println(hits.score(i) + ": \t" + fileName);
}
}
}
http://www.semanticmetadata.net/wiki/searchindex/
三、SIMPLE描述符
SIMPLE [搜索图像使用Mpeg-7 (& Mpeg-7 like)支持的本地化描述符]开始作为[Simple-SCD, Simple-CLD, Simple-EHD and Simple-CEDD (or LoCATe)]这四个描述符的集合。SIMPLE背后的主要思想是利用全局描述符作为本地描述符。为此,使用SURF检测器来定义图像上的感兴趣区域,而不是使用SURF描述符,使用MPEG-7 SCD,MPEG-7 CLD,MPEG-7 EHD和CEDD描述符之一来提取那些图像的片的特征。最后,Bag-Of-Visual-Words框架用于测试这些描述符在CBIR任务中的性能。此外,最近SIMPLE从描述符的集合扩展到方案(作为检测器和全局描述符的组合)。在使用其它检测器(SIFT检测器和两个随机图像补丁发生器(随机生成器已产生最佳结果并被描绘为优选选择)之后)进行测试],并且当前具有更多全局描述符的该方案的性能正在测试。你可以在这里找到更多的细节。
使用所需的GlobalFeatue和Simple Extractor.Keypoint检测器与Parallel Indexer的组合,如下所示,以便创建索引:
ParallelIndexer parallelIndexer = new ParallelIndexer(DocumentBuilder.NUM_OF_THREADS, "test-index", "testdata/ferrari");
//parallelIndexer.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.CVSURF);
//parallelIndexer.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.CVSIFT);
parallelIndexer.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.Random);
//parallelIndexer.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.GaussRandom);
parallelIndexer.run();
您可以在各种全局描述符和以下检测器之间切换:{CVSURF,CVSIFT,Random,GaussRandom}。之后,您可以继续使用GenericFastImageSearcher的搜索过程,如下所示:
new GenericFastImageSearcher(10, CEDD.class, SimpleExtractor.KeypointDetector.CVSURF, new BOVW(), numOfClusters,
true, reader, indexPath + ".config")
or for VLAD:
new GenericFastImageSearcher(10, CEDD.class, SimpleExtractor.KeypointDetector.CVSURF, new VLAD(), numOfClusters,
true, reader, indexPath + ".config")
http://www.semanticmetadata.net/wiki/simple/
四、自动颜色关联图片特征
该描述符基于Jing Huang等人的出版物Image Indexing Using Color Correlograms。来自康奈尔大学,并提出了MPEG-7描述符的替代方案。主要特点:
1.它基于颜色(HSV颜色空间)。
2.它包括关于图像中的颜色相关性的信息。
在我的主观意见中,它提供比ScalableColor更好的颜色搜索,但是它提取得更慢(详细信息请参见上面提到的论文)。
对于基本LIRE使用,请参阅[search](searchindex.md]]和索引创建
要使用描述符尝试
new GlobalDocumentBuilder(AutoColorCorrelogram.class)
创建一个合适的搜索器使用
new GenericFastImageSearcher(maximumHits, AutoColorCorrelogram.class)
很重要:
确保分析的图像足够大的描述符,否则描述符不能提取。
public void testCorrelationSearch() throws IOException {
String[] testFiles = new String[]{"img01.JPG", "img02.JPG", "img03.JPG", "img04.JPG", "img05.JPG",
"img06.JPG", "img07.JPG", "img08.JPG", "img08a.JPG"};
String testFilesPath = "./src/test/resources/images/";
String indexPath = "test-index";
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
int numDocs = reader.numDocs();
System.out.println("numDocs = " + numDocs);
// create the appropriate correlogram searcher:
ImageSearcher searcher = new GenericFastImageSearcher(10, AutoColorCorrelogram.class, true, reader);
FileInputStream imageStream = new FileInputStream(testFilesPath + testFiles[0]);
BufferedImage bimg = ImageIO.read(imageStream);
ImageSearchHits hits = null;
// search for the image in the index
hits = searcher.search(bimg, reader);
for (int i = 0; i < hits.length(); i++) {
System.out.println(hits.score(i) + ": " + reader.document(hits.documentID(i)).getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0]);
}
}
http://www.semanticmetadata.net/wiki/autocolorcorrelation/
五、过滤器
基本上Lire允许您基于一些相似性度量以及低级图像特征创建一个排名的结果列表。但是,有时候您需要过滤或重新排序过程。常见的使用情况是如果Lire搜索应用于快速但不是非常精确的特征(例如散列)的大型数据库,并且结果列表必须根据全局特征重新排序。重排序也可以采用扩展分析方法如[[http://en.wikipedia.org/wiki/Latent_semantic_analysis|LSA]] (latent semantic analysis).
public void testRerankFilter() throws IOException {
// search
System.out.println("---< searching >-------------------------");
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
Document document = reader.document(0);
ImageSearcher searcher = new GenericFastImageSearcher(100, AutoColorCorrelogram.class, true, reader);
ImageSearchHits hits = searcher.search(document, reader);
// rerank
System.out.println("---< filtering >-------------------------");
RerankFilter filter = new RerankFilter(ColorLayout.class, DocumentBuilder.FIELD_NAME_COLORLAYOUT);
hits = filter.filter(hits, reader, document);
// output
FileUtils.saveImageResultsToHtml("filtertest", hits, document.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0], reader);
}
public void testLsaFilter() throws IOException {
// search
System.out.println("---< searching >-------------------------");
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(indexPath)));
Document document = reader.document(0);
ImageSearcher searcher = new GenericFastImageSearcher(100, AutoColorCorrelogram.class, true, reader);
ImageSearchHits hits = searcher.search(document, reader);
// rerank
System.out.println("---< filtering >-------------------------");
LsaFilter filter = new LsaFilter(CEDD.class, DocumentBuilder.FIELD_NAME_CEDD);
hits = filter.filter(hits, reader, document);
// output
FileUtils.saveImageResultsToHtml("filtertest", hits, document.getValues(DocumentBuilder.FIELD_NAME_IDENTIFIER)[0], reader);
}
http://www.semanticmetadata.net/wiki/applyfilters/
六、生成器
使用DocumentBuilders,可以使用构建器的createDocument(BufferedImage,String)方法从图像创建Lucene文档,以便添加到Lucene索引中。有三个主要的DocumentBuilders。GlobalDocumentBuilder,LocalDocumentBuilder和SimpleDocumentBuilder。它们中的每一个都支持不同的特征类型,最重要的是,在创建Lucene文档之前,它们都可以处理多个Extractor,通过使用每个DocumentBuilder的addExtractor方法。最后,在这两个LocalDocumentBuilder和SimpleDocumentBuilder,每个提取器伴随一个代码簿,或码本的名单,这是必要的,以便聚集局部特征以向量表示,使用一个聚合器。
使用GlobalDocumentBuilder为全局特征创建一个DocumentBuilder。此构建器可以采用任何GlobalFeatue实现,并从中创建一个构建器类。
GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder(CEDD.class);
您还可以使用addExtractor添加多个GlobalFeatues。例如:
GlobalDocumentBuilder globalDocumentBuilder = new GlobalDocumentBuilder();
globalDocumentBuilder.addExtractor(CEDD.class);
globalDocumentBuilder.addExtractor(FCTH.class);
globalDocumentBuilder.addExtractor(AutoColorCorrelogram.clas
使用LocalDocumentBuilder为局部特征(如SIFT或SURF)创建一个DocumentBuilder。此构建器可以采用伴随有码本或码本列表的任何LocalFeatureExtractor实现,并从中创建构建器类。
LocalDocumentBuilder localDocumentBuilder = new LocalDocumentBuilder();
localDocumentBuilder.addExtractor(CvSurfExtractor.class, Cluster.readClusters("./src/test/resources/codebooks/CvSURF32"));
最后,还有一个构建器可用,SimpleDocumentBuilder可以采用GlobalFeatue和SimpleExtractor.KeypointDetector的任意组合,以便根据SIMPLE本地化全局描述符。
SimpleDocumentBuilder simpleDocumentBuilder = new SimpleDocumentBuilder();
simpleDocumentBuilder.addExtractor(CEDD.class, SimpleExtractor.KeypointDetector.CVSURF,
Cluster.readClusters("./src/test/resources/codebooks/SIMPLEdetCVSURFCEDD32"));
http://www.semanticmetadata.net/wiki/builders/
七、聚合器
Lire支持使用本地特征进行索引和搜索。当从图像提取局部特征时,需要聚集这些特征,以便创建该图像的矢量表示。使用来自net.semanticmetadata.lire.aggregators包的BOVW或VLAD聚合器的createVectorRepresentation()方法,以便根据BOVW和VLAD使用图像的局部特征列表和预先计算的码本来创建向量表示模型。在创建向量表示之后,使用方法getVectorRepresentation,getByteVectorRepresentation可以分别获得double []或byte []格式的向量。
public void testAggregate() throws IOException {
String codebookPath = "./src/test/resources/codebooks/";
String imagePath = "./src/test/resources/images/";
LocalFeatureExtractor localFeatureExtractor = new CvSurfExtractor();
Aggregator aggregator = new BOVW();
Cluster[] codebook = Cluster.readClusters(codebookPath + "CvSURF128");
ArrayList<String> images = FileUtils.readFileLines(new File(imagePath), true);
BufferedImage image;
double[] featureVector;
List<? extends LocalFeature> listOfLocalFeatures;
for (String path : images) {
image = ImageIO.read(new FileInputStream(path));
localFeatureExtractor.extract(image);
listOfLocalFeatures = localFeatureExtractor.getFeatures();
aggregator.createVectorRepresentation(listOfLocalFeatures, codebook);
featureVector = aggregator.getVectorRepresentation();
System.out.println(path.substring(path.lastIndexOf('\\') + 1) + " ~ " + Arrays.toString(featureVector));
}
}
http://www.semanticmetadata.net/wiki/aggregators/
八、特征
Lire支持全局特征以及本地特征。每种类型的功能都有适当的接口。接下来,您可以找到可用接口的简要说明,您可以遵循这些接口来实现和集成您自己的特征。
LireFeature是所有基于内容的特征的基本接口,并继承了FeatureVector接口。FeatureVector接口包含getFeatureVector()方法,您可以使用该方法将特征向量作为double []数组。LireFeature继承自两个接口GlobalFeature和LocalFeature。
全局特征,如CEDD或AutoColorCorrelogram,应该实现GlobalFeature接口。该接口继承了LireFeature和Extractor接口。这意味着全局特征同时包括了特征和extract()方法。
另一方面,由于局部特征提取器创建了许多局部特征的事实,在这种情况下应该使用两个接口。LocalFeatureExtractor,它继承Extractor接口并包含extract()方法和LocalFeature,它继承了LireFeature接口。
http://www.semanticmetadata.net/wiki/features/