问题：

索引文件时出错

梁丘成和

2023-03-14

package lia.meetlucene;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.util.Version;

import java.io.File;
import java.io.FileFilter;
import java.io.IOException;
import java.io.FileReader;

public class Indexer {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      throw new IllegalArgumentException("Usage: java " + Indexer.class.getName()
        + " <index dir> <data dir>");
    }
    String indexDir = args[0];         //1
    String dataDir = args[1];          //2

    long start = System.currentTimeMillis();
    Indexer indexer = new Indexer(indexDir);
    int numIndexed;
    try {
      numIndexed = indexer.index(dataDir);
    } finally {
      indexer.close();
    }
    long end = System.currentTimeMillis();

    System.out.println("Indexing " + numIndexed + " files took "
      + (end - start) + " milliseconds");
  }

  private IndexWriter writer;

  public Indexer(String indexDir) throws IOException {
    Directory dir = FSDirectory.open(new File(indexDir));
    writer = new IndexWriter(dir,            //3
                 new StandardAnalyzer(       //3
                     Version.LUCENE_30),//3
                 true,                       //3
                             IndexWriter.MaxFieldLength.UNLIMITED); //3
  }

  public void close() throws IOException {
    writer.close();                             //4
  }

  public int index(String dataDir)
    throws Exception {
try{
    File[] files = new File(dataDir).listFiles();

    for (File f: files) {                
    ************************************************        
         if(f.isDirectory())           // I added this if block which is causing error
        {
            index(dataDir);
        }
    ************************************************
       else if (!f.isDirectory() &&
          !f.isHidden() &&
          f.exists() &&
          f.canRead()
          ) {
        indexFile(f);
      }
    }
}
      catch (IOException e) {
            e.printStackTrace();
        }
    return writer.numDocs();                     //5
  }


  protected Document getDocument(File f) throws Exception {
    Document doc = new Document();
    doc.add(new Field("contents", new FileReader(f)));      //7
    doc.add(new Field("filename", f.getName(),              //8
                Field.Store.YES, Field.Index.NOT_ANALYZED));//8
    doc.add(new Field("fullpath", f.getCanonicalPath(),     //9
                Field.Store.YES, Field.Index.NOT_ANALYZED));//9
    return doc;
  }

  private void indexFile(File f) throws Exception {
    System.out.println("Indexing " + f.getCanonicalPath());
    Document doc = getDocument(f);
    writer.addDocument(doc);                              //10
  }
}

这是Lucene In Action Book中给出的一个程序。它只索引父文件夹中的文件而不是子文件夹中的文件。因此，我添加了一个if块以递归地查找子文件夹中的文件。但运行此程序后，它正在创建write.lock文件，即使在关闭命令提示符后，它仍继续创建索引文件。代码有什么问题？

我是Lucene和Java的新手，之前我尝试使用apache Commons io查找子文件夹，但我正在获取包不存在错误（包org.apache.commons.io不存在错误）。

共有1个答案

左丘昊天

2023-03-14

是的，它会继续运行，因为你一直在走同一条路。因此，您将无法访问close（）方法，这就是写入的原因。锁定保持现有状态。

在这里您当前的代码。

if(f.isDirectory())           
{
     index(dataDir); // dataDir is the orginal path
}

你必须这样做：

if(f.isDirectory())           
{
     index(f.getAbsolutePath());
}

类似资料：

WinPcap: 文件索引

以下列出了所有文件及其简要说明: daemon.h [code] fileconf.h [code] jitter.h [code] Packet.h [code] pcap-remote.h [code] incs/pcap.h [code] funcs/pcap.h [code] remote-ext.h [code] rpcapd.h [code] utils.h [code] Win32-
logstash索引文本文件

问题内容：我想在Elasticsearch中导入一个文本文件。文本文件每行包含3个值。经过数小时的奋斗，我没有完成它。非常感谢您的帮助。安装了Logstash的Elasticsearch 5.4.0。样本数据：还构建了一个python脚本，但是它太慢了：编辑：感谢它的工作，但我想我的筛选器很烂，因为我希望它看起来像这样：然后将数据如下所示：问题答案：只需将其放入一个名为：然后使用
ES6在索引文件中导出/导入

我目前正在通过webpack/babel在React应用程序中使用ES6。我使用索引文件收集模块的所有组件并将其导出。不幸的是，这看起来是这样的：所以我可以很好地从其他地方导入它，比如：显然，这不是一个很好的解决方案，所以我想知道，是否还有其他方法。我似乎无法直接导出导入的组件。
引用vm文件时出现Impex错误

我正在尝试为电子邮件页面创建impex。 $contentCatalog=ShopzoneContentCatalog$contentCV=catalogVersion（catalogVersion.catalog（catalog.id[default=$contentCatalog]），catalogVersion。版本[default=Staged]）[default=$contentCata
通过Solr4.7.2索引Excel的xslx格式文件时出现异常

在通过Solr4.7.2搜索API索引xslx扩展的excel表时，我遇到了一个异常。我有4阿帕奇POI罐在我的tomcat库相关此Excel工作表，这是：poi-3.9-20121203.jar，poi-ooxml-3.9-20121203.jar，poi-ooxml-schemas-3.9-20121203.jar，poi-scratchpad-3.9-20121203.jar 我检查并发现
StringIndexOutOfBoundsException:读取文件[duplicate]时字符串索引超出范围：0

我有一个读取文件的简单程序。现在线之间有一个空白。我得到的StringIndexOutOfBoundsExc0019：字符串索引超出范围：0错误那里。请帮帮忙

索引文件时出错

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档