问题：

弹性搜索只在数据库中索引最近添加的记录，而忽略以前添加的记录

冀弘济

2023-03-14

我已经在我的windows计算机上安装了logstash和elasticSearch，配置如下（根据配置，它在1分钟内轮询客户表中的记录）

#1 Logstash配置文件加载客户表数据并将其索引为Logstash-config.conf

input {
    jdbc {
        jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
        jdbc_driver_library => "C:\\org\\postgresql\\postgresql\\42.2.11\\postgresql-42.2.11.jar"
        jdbc_user => "test"
        jdbc_password => "test"
        jdbc_driver_class => "org.postgresql.Driver"
        schedule => "* * * * *"
        statement => "SELECT * FROM public.customer where id >:sql_last_value"
        tracking_column_type => "numeric"
        use_column_value =>true
        tracking_column => id
    }
}
output {
    elasticsearch {
            
             index => "customer_index"
             document_type => "customer"
             document_id => "%{id}"
             hosts =>"localhost:9200"
    }
    
    stdout {
        codec =>rubydebug
    }
}

#2用数据库创建包含一些记录的表

create table customer (id integer,name varchar);

select * from customer;
insert into customer values (1,'test1');
insert into customer values (2,'test2');

#4点击get API 4.1：不返回任何记录http://localhost:9200/customer_index/_search？Q=1

e.g.

    "took": 86,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

4.2 Returns record
http://localhost:9200/customer_index/_search?q=2

e.g.

    {
    "took": 371,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "customer_index",
                "_type": "customer",
                "_id": "%{customer_id}",
                "_score": 1.0,
                "_source": {
                    "name": "test2",
                    "id": 17,
                    "@timestamp": "2020-07-06T06:41:00.343Z",
                    "@version": "1"
                }
            }
        ]
    }
}

**Also what is customer_id here and how can I get the whole record as I indexed whole row i.e. select * from customer (Which means I should get all columns)**
4.3
Looks like index contains only the record which was added to index last
e.g. if I will execute in db 
insert into customer values (2,'test2');
http://localhost:9200/customer_index/_search?q=2
will not return record  
4.4 however Returns record
http://localhost:9200/customer_index/_search?q=3

共有1个答案

薛阳荣

2023-03-14

    jdbc {
        jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
        jdbc_driver_library => "C:\\org\\postgresql\\postgresql\\42.2.11\\postgresql-42.2.11.jar"
        jdbc_user => "test"
        jdbc_password => "test"
        jdbc_driver_class => "org.postgresql.Driver"
        schedule => "* * * * *"
        statement => "SELECT * FROM public.customer where id >:sql_last_value"
        tracking_column_type => "numeric"
        use_column_value =>true
        tracking_column => id
        **last_run_metadata_path => "C:\\logstash\\.sch_id_tracker_file"**
    }
}

Here last_run_metadata_path captures the value of a given field when it ran last time so that As per schedule when it runs next time records with value 
i.e.  id >:sql_last_value would be considered to process and push to elasticsearch

类似资料：

添加索引以加快Geocoder靠近搜索的速度

问题内容：在我的Rails应用程序中，我具有允许查找与当前登录用户最接近的用户的功能。我为此使用了Geocoder gem。在用户模型中，我具有如下范围：这非常有效，但是对于大量用户而言却很慢。当我调用此作用域时，它将生成以下sql查询：我正在尝试为此创建索引，但它们不起作用。我正在尝试以下组合：我应该如何添加索引以加快此查询的速度？编辑：我忘记添加我的纬度和经度列是小数。此查询的AN
Jolt转换在每个json记录之前添加新的索引json记录

输入JSON 输出JSON 我需要在每个json记录{“索引”：{“_index”：“test”，“_type”：“doc”，“_id”：“20200128121343561”}}之前添加这个索引记录，并且_id值是从时间戳派生的。我们还可以使用jolt转换在每个json记录后添加新行吗
hibernate不向数据库添加新记录

我正在使用entityMenager登录，并没事。
全文搜索和200M +记录的数据库

问题内容： Iam将创建一个包含至少2亿个条目的庞大数据库。该数据库需要使用全文本进行搜索，并且应该是快速的。我的数据库从许多不同的数据源获取数据，我需要定期导入新数据或更新数据。将我的所有数据存储在诸如mysql之类的关系数据库中，然后创建一个nosql文档数据库（例如mongodb或elasticsearch）只是出于搜索目的，还是在可靠性和预防方面没有任何好处，这是一个好主意吗？多余的信
弹性搜索中的多索引搜索与单索引搜索

我有大量相同类型的实体，每个实体都有大量属性，并且我只有以下两种选择来存储它们: 将每个项存储在索引中并执行多索引搜索将所有enties存储在单个索引中，并且只搜索1个索引。一般而言，我想要一个时间复杂度之间的比较搜索“N”实体与“M”特征在上述每一种情况！
查询中的弹性搜索索引

我刚加入弹性搜索公司。而不知道如何在JSON请求中对索引和an类型发出正确的请求？（所以我不想像localhost:9200/myindex/mytype/_search那样在URL中使用索引和类型，而是向localhost:9200/_search发出JSON请求）我试过这样的东西。但我得到的结果是'AAA'索引而不是'BBB'索引。如何只从bbb索引得到结果或者根本没有结果？

弹性搜索只在数据库中索引最近添加的记录，而忽略以前添加的记录

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档