regain 检索工具两个配置文件的翻译



| Configuration for the regain crawler (for creating a search index)
| You can find a detailed description of all configuration tags here:
| http://regain.murfman.de/wiki/en/index.php/CrawlerConfiguration.xml
| You can find more configration examples in the CrawlerConfiguration_examples.xml.

| Enter your HTTP proxy settings here (Look at the preferences of your browser)

| The list of URLs where the spidering will start.
| Enter the start page of your web site resp. a file system folder here.
| NOTE: The examples are in a comment. Thus, if you add your path in one of
| them, then don't forget to uncomment them.
<!-- Directory parsing 目录解析-->
<start parse="true" index="false">file://c:/Eigene Dateien</start>
set the place where the document to located
file://E:/eclipse 3.2/workspace/SIS/WebRoot/FileDepository ${SEARCHDIR}
<start index="false" parse="true">file://${WORKDIR}FileDepository</start>
<!-- HTML parsing -->
<start parse="true" index="true">http://www.mydomain.de/some/path/</start>

| The whitelist containing prefixes an URL must have to be processed
| Enter the domain of your web site here.

| The blacklist containing prefixes an URL must NOT have to be processed
| Enter sub directories you don't want to be indexed here.

| ==================================================================================
| That's all you have to configure! The rest of this file is advanced configuration.
| ==================================================================================

| The preferences for the search index.
The directory where the index should be located ${SEARCHDIR}
| Specifies the analyzer type to use.
| 翻译:指定分析机类型以便使用
| You may specify the class name of the analyzer or you use one of the
| following aliases:
| * english: For the english language
| (alias for org.apache.lucene.analysis.standard.StandardAnalyzer)
| * german: For the german language
| (alias for org.apache.lucene.analysis.de.GermanAnalyzer)
| 翻译:你可以指定分析机的类名,也可以任意选取下面的别名中的一个
| english:针对英文环境,是org.apache.lucene.analysis.standard.StandardAnalyzer的别名
| german:针对德文环境,是org.apache.lucene.analysis.de.GermanAnalyzer的别名


| Contains all words that should not be indexed.
| Separate the words by a blank.
einer eine eines einem einen der die das dass da?du er sie es was wer wie
wir und oder ohne mit am im in aus auf ist sein war wird ihr ihre ihres als
für von mit dich dir mich mir mein sein kein durch wegen wird
<!-- italian:
di a da in con su per tra fra io tu egli ella essa noi voi essi loro che cui
se e n?anche inoltre neanche o ovvero oppure ma per?eppure anzi invece
bens?tuttavia quindi dunque perci?pertanto cio?infatti ossia non come
mentre perch?quando mio mia miei mie tuo tua tuoi tue suo sua suoi sue
nostro nostre nostri nostre vostro vostre vostri vostre il lo la i gli le un
uno una degli delle alcuno alcuna alcune qualcuno qualcuna nessuno nessuna
molto molte molti molte poco parecchio assai

| Contains all words that should not be changed by an analyser when indexed.
| Separate the words by a blank.

| The names of the fields of which to prefetch the destinct values.
| Separate the field names by a blank.
| Put in the names of the fields you use a search:input_fieldlist tag for.
| The values shown in the list will then be extracted by the crawler and not
| by the search mask, which prevents a slow first loading of a page for huge
| indexes.

| Specifies wether the whole content should be stored in the index for the
| purpose of a content preview


| The preparators in the order they should be applied. Preparators that aren't listed
| here will be applied after the listed ones.
| You can use this list...
| ... to define the priority (= order) of the preparators
| ... to disable preparators
| ... to configure preparators
| ... 定义preparators的属性(= order)
| ... 禁用preparators
| ... 配置preparators
| Enable this preparator if you want to use the text extractor of
| Microsoft Windows. This preparator is able to read tons of file formats.
| NOTE: Under Windows 2000 you have to make sure that reg.exe is installed
| (It's part of the "Support Tools").
| For details see: http://support.microsoft.com/kb/301423
|详细资料可以参考网址 http://support.microsoft.com/kb/301423
<preparator enabled="false">

| Enable this preparator if you want to use MS Excel for indexing your Excel
| documents.
<preparator enabled="false">

| Enable this preparator if you want to use MS Word for indexing your Word
| documents.
<preparator enabled="false">

| Enable this preparator if you want to use MS Powerpoint for indexing your
| Powerpoint documents.
<preparator enabled="false">

| This tells regain that it should first try the SimpleRtfPreparator for RTF
| files. Only if this one fails the SwingRtfPreparator is used
| (which is much slower).

| This preparator may be used if you have an external program that can
| extract text. It's disabled by default.
<preparator enabled="false">
<section name="command">
<param name="urlPattern">\.ps$</param>
<param name="commandLine">ps2ascii ${filename}</param>
<param name="checkExitCode">false</param>

CatchAll-preparator on basis of EmptyPreparator
<preparator priority="-10">

| The index may be extended with auxiliary fields. These are fields that have
| been generated from the URL of an document.
| 翻译:通过辅助域索引可以扩充,这里有通过一个文档的url产生的字段。
| Example: If you have a directory with a sub directory for every project,
| then you may create a field with the project's name.
| 翻译:例如:有这样一种情况,现在有一个所有项目都有子目录的目录,这时你就会用这个项目的名称产生一个字段
| The folling tag will create a field "project" with the value "otto23"
| from the URL "file://c:/projects/otto23/docs/Spez.doc":
| 产生一个名称为"project",值为"otto23"的字段
| <auxiliaryField name="project" regexGroup="1">
| <regex>^file://c:/projects/([^/]*)</regex>
| </auxiliaryField>
| URLs that doen't match will get no "project" field.
| Having done this you may search for "Offer project:otto23" and you will get
| only hits from this project directory.
|翻译:假设已经做了这些,你也许会查询"Offer project:otto23",这样你将只从该project目录获得结果集
Don't change these two fields. But you may add your own.
<auxiliaryField name="extension" regexGroup="1" toLowercase="true">
<auxiliaryField name="location" regexGroup="1" store="false" tokenize="true">
<auxiliaryField name="mimetype" regexGroup="1" >

<!-- The regular expressions that indentify URLs in HTML. -->
<!-- This configuration part is no longer neccessary -->
<pattern parse="true" index="true" regexGroup="1">="([^"]*(/|htm|html|jsp|php\d?|asp))"</pattern>
<pattern parse="false" index="false" regexGroup="1">="([^"]*\.(js|css|jpg|gif|png))"</pattern>
<pattern parse="false" index="true" regexGroup="1">="([^"]*\.[^\."]{3})"</pattern>

| Configuration for the regain search mask.
|翻译:regain search 的配置文件
| Normally you only have to specify the directory where the search index is
| located. You do this in the <dir> tag of the <index name="main"> (line 74).
|翻译:一般的您只需要指定查询索引所在的目录就可以了,在这个配置文件中你在 <index name="main">标签下的
|<dir> 目录中指定

| You can find a detailed description of all configuration tags here:
| http://regain.murfman.de/wiki/en/index.php/SearchConfiguration.xml


<!-- The search indexes 查询索引-->
| All settings defined in this section are applied to all indexes unless
|翻译: 所有的在section中定义的设置被应用于所有的索引中,除非设置被重新定义
| they redefine the setting.
1 <defaultSettings>: The cascaded default settings
2<index>: The settings for one index.

| The regular expression that identifies URLs that should be opened in
| a new window.
| 翻译:在一个新窗口中打开的规则的整齐的标时urls的表达式

| Specifies whether the file-to-http-bridge should be used for file-URLs.
| Mozilla browsers have a security mechanism that blocks loading file-URLs
|翻译:Mozilla浏览器有一个安全机制,他限制从已经下载的http页面中下载 file-URLs
| from pages loaded via http. To be able to load files from the search
| results, regain offers the file-to-http-bridge that provides all files that
| are listed in the index via http.

| The index fields to search by default.
| NOTE: The user may search in other fields also using the
| "field:"-operator. Read the lucene query syntax for details:
| http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
<searchFieldList>content title headlines location filename</searchFieldList>
| The SearchAccessController to use.
| 翻译:应用查询访问控制器
| This is a part of the access control system that ensures that only those
| documents are shown in the search results that the user is allowed to
| read.
| If you specify a SearchAccessController, don't forget to specify the
| CrawlerAccessController counterpart in the CrawlerConfiguration.xml!
<class jar="myAccess.jar">mypackage.MySearchAccessController</class>
<param name="bla">blubb</param>
| Specifies whether the search terms should by highlighted whithin the
| search results (summary, title)
|翻译:指定在查询结果(summary, title)中,查询部分需要被高亮显示


<!-- The search index 'main' 查询索引'main' -->
<index name="main" default="true" isparent="true">
The directory where the index is located
| A child index of 'main'
<index name="main1" default="true" isparent="false" parent="main">

<!-- The search index 'example' 查询索引'example' 例子-->
<index name="example">
<!-- The directory where the index is located 索引存放的目录-->

<rule prefix="file://c:/example/www-data" replacement="http://www.mydomain.de"/>

