当前位置: 首页 > 工具软件 > sql4es > 使用案例 >

使用SQL语言操作Elasticsearch

封德华
2023-12-01

Elasticsearch在Basic授权中支持以SQL语句的形式检索文档,SQL语句在执行时会被翻译为DSL执行。从语法的角度来看,Elastisearch中的SQL语句与RDBMS中的SQL语句基本一致,所以对于有数据库编程基础的人来说大大降低了使用 Elaticsearch的学习成本。

Elasticsearch提供了多种执行SQL语句的方法,可使用类似_search样的REST接口执行也可以通过命令行执行。它甚至还提供了JDBC和ODBC驱动来执行SQL语句,但JDBC和ODBC属于Platinum(白金版)授权需要付费,所以这里只介绍_sql接口。

sql接口

在早期版本中,Elasticsearch执行SQL的REST接口为_xpack/sql,但在版本7以后这个接口已经被废止而推荐使用_sql接口。

例如:

POST _sql?format=txt
{
  "query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc limit 3
"""
}
输出结果:
  DestCountry  | OriginCountry |  AvgTicketPrice  
---------------+---------------+------------------
US             |PR             |1199.109130859375 
JP             |IN             |1196.7706298828125
AR             |CO             |1195.72509765625  

在示例中,_sql接口通过query参数接收SQL语句,而SQL语句也包含有select、from、where、order by、limit等子句。_sql接口的URL请求参数format定义了返回结果格式。比如在示例中定义了返回结果格式为txt。除了txt以外,_sqI接口还支持csv、json、tsv、yaml等等格式。示例中的请求会将所有航空公同为Kibana Airines的航班文档检索出来,并以文本表格的形式返回。

对于总量比较大的SOL查询,sql接口还支持以游标的形式实现分页。当_sql接口的请求参数中添加了fetch_size参数,_sql接口在返回结来时就会根据fetch_size参数设置的大小返回相应的条数,并在返回结果中添加游标标识。具体来说,当请求_sql接口时设置的forma为json时,返回结果中会包含cursor属性;而其他情况下则会在响应中添加Cursor报头。

例如还是执行示例中的SOL ,但是加入分页支持:

POST _sql?format=json
{
  "query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc
""",
"fetch_size": 3
}
输出结果:
{
  "columns" : [
    {
      "name" : "DestCountry",
      "type" : "keyword"
    },
    {
      "name" : "OriginCountry",
      "type" : "keyword"
    },
    {
      "name" : "AvgTicketPrice",
      "type" : "float"
    }
  ],
  "rows" : [
    [
      "US",
      "PR",
      1199.109130859375
    ],
    [
      "JP",
      "IN",
      1196.7706298828125
    ],
    [
      "AR",
      "CO",
      1195.72509765625
    ]
  ],
  "cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBUl9xNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}

在示例的请求中,将format设置为json,这样在返回结果中就能直接看到cursor值,当需要请求下一页的数据时,只需传递cursor值即可进行查询:

POST _sql?format=json
{
  "cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}
输出结果:
{
  "rows" : [
    [
      "IT",
      "CA",
      1195.3363037109375
    ],
    [
      "KR",
      "NL",
      1194.945556640625
    ],
    [
      "JP",
      "CO",
      1194.386962890625
    ]
  ],
  "cursor" : "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}

反复执行请求,Elasticsearch就会将第一次请求的全部内容以每次10个的数量全部迭代出来。在请求完所有数据后,应该使用_sql/close接口将游标关闭以释放资源。

POST _sql/close?format=json
{
  "cursor": "49itAwFaAXNARFhGMVpYSjVRVzVrUm1WMFkyZ0JBQUFBQUFBU0JQNFdlbFYzTWxWaGFXaFRlV1Z6UlhONGFtNXhOVzlWVVE9Pf8PAwFmC0Rlc3RDb3VudHJ5AQtEZXN0Q291bnRyeQEHa2V5d29yZAEAAAFmDU9yaWdpbkNvdW50cnkBDU9yaWdpbkNvdW50cnkBB2tleXdvcmQBAAABZg5BdmdUaWNrZXRQcmljZQEOQXZnVGlja2V0UHJpY2UBBWZsb2F0AAAAAQc="
}

除了fetch_size以外还有些可以在_sql接口请求体中使用的参数,如下:

  • query:需要执行的SQL语句,必须要设置的参数
  • fetch_size:默认1000,每次返回的行数
  • filter:默认none,使用DSL设置过滤器
  • request_timeout:默认90s,请求超时时间
  • page_timeout:默认45s,分页超时时间
  • tume_zone:默认Z,时区
  • field_multi_value_leniency:默认false,如果一个字段返回多个值时是否忽略

在这些参数中,fiter可以使用DSL对文档做过滤,支持DSL中介绍的所有查询条件。query中的SQL语句在翻译为DSL后,会与filter中的DSL查询语句共同组合到bool查询中。其中SQL语句生成的DSL将出现在must子句,而filter中的DSL则出现在filter子句中。来想要查看SQL语句翻译后的DSL可以使用_sql/translate执行相同的请求,在返回结果中就可以看到翻译后的DSL了。

post _sql/translate
{
  "query": """
select DestCountry, OriginCountry,AvgTicketPrice
from kibana_sample_data_flights
where Carrier = 'Kibana Airlines' order by AvgTicketPrice desc
""",
"fetch_size": 3
}
输出结果:
{
  "size" : 3,
  "query" : {
    "term" : {
      "Carrier" : {
        "value" : "Kibana Airlines",
        "boost" : 1.0
      }
    }
  },
  "_source" : {
    "includes" : [
      "AvgTicketPrice"
    ],
    "excludes" : [ ]
  },
  "docvalue_fields" : [
    {
      "field" : "DestCountry"
    },
    {
      "field" : "OriginCountry"
    }
  ],
  "sort" : [
    {
      "AvgTicketPrice" : {
        "order" : "desc",
        "missing" : "_first",
        "unmapped_type" : "float"
      }
    }
  ]
}

SQL语法

Elasticsearch支持传统关系型数据库SQL语句中的查询语句,但并不支持DML、DCL句。换句话说,它只支持SELECT语句,不支持INSERT、UPDATE、DELETE语句。SELECT语句以外,Elaticsarch还支持DESCRIBE和SHOW语句。

SELECT语句

SELECT语句用于查询文档,基本语法格式如下:

SELECT select_expr,
[ FROM table_name」
[WHERE condtion]
[ GROUP BY grouping_element]
[ HAVING condition J
[ ORDER BY expression [ASC|DESC]
[ LIMIT[ count]]

通过示例可以看出,Elastiesearch的SELECT语句跟普通SQL几乎没有什么区别,支持SELECT、FROM、WHERE、 GROUP BY、HAVING、ORDER BY及LIMTT子句。

SELECT子句中可以使用星号或文档字段名称列表,FROM子句则指定要检索的索引名称,而WHERE子句则设定了检索的条件。一般的SQL查询使用这三个子句就足够了,而GROUP BY和HAVING子句则用于分组,ORDER BY子句用于排
序,而LIMIT一般则可以用于分页。和传统SQL语句非常接近。

DESCRIBE语句

DESCRIBE语句用于查看一个索引的基础信息,在返回结果中一般会包含column、type、mapping三个列,分别对应文档的字段名称、传统数据库类型及文档字段中的类型。

例如要查看索引的基本信息:

POST _sql?format=txt
{
  "query": "describe kibana_sample_data_flights"
}
输出结果:
      column      |     type      |    mapping    
------------------+---------------+---------------
AvgTicketPrice    |REAL           |float          
Cancelled         |BOOLEAN        |boolean        
Carrier           |VARCHAR        |keyword        
Dest              |VARCHAR        |keyword        
DestAirportID     |VARCHAR        |keyword        
DestCityName      |VARCHAR        |keyword        
DestCountry       |VARCHAR        |keyword        
DestLocation      |GEOMETRY       |geo_point      
DestRegion        |VARCHAR        |keyword        
DestWeather       |VARCHAR        |keyword        
DistanceKilometers|REAL           |float          
DistanceMiles     |REAL           |float          
FlightDelay       |BOOLEAN        |boolean        
FlightDelayMin    |INTEGER        |integer        
FlightDelayType   |VARCHAR        |keyword        
FlightNum         |VARCHAR        |keyword        
FlightTimeHour    |VARCHAR        |keyword        
FlightTimeMin     |REAL           |float          
Origin            |VARCHAR        |keyword        
OriginAirportID   |VARCHAR        |keyword        
OriginCityName    |VARCHAR        |keyword        
OriginCountry     |VARCHAR        |keyword        
OriginLocation    |GEOMETRY       |geo_point      
OriginRegion      |VARCHAR        |keyword        
OriginWeather     |VARCHAR        |keyword        
dayOfWeek         |INTEGER        |integer        
timestamp         |TIMESTAMP      |datetime       

SHOW语句

SHOW语句包括三种形式,即SHOW COLUMNS、SHOW FUNCTIONS和SHOW TABLES。

SHOW COLUMNS用于查看一个索引中的字段情况,它的作用与DESCRIBE语句完全一样,甚至连返回结果都是一样的。

POST _sql?format=txt
{
  "query": "show columns in kibana_sample_data_flights"
}
输出结果:
      column      |     type      |    mapping    
------------------+---------------+---------------
AvgTicketPrice    |REAL           |float          
Cancelled         |BOOLEAN        |boolean        
Carrier           |VARCHAR        |keyword        
Dest              |VARCHAR        |keyword        
DestAirportID     |VARCHAR        |keyword        
DestCityName      |VARCHAR        |keyword        
DestCountry       |VARCHAR        |keyword        
DestLocation      |GEOMETRY       |geo_point      
DestRegion        |VARCHAR        |keyword        
DestWeather       |VARCHAR        |keyword        
DistanceKilometers|REAL           |float          
DistanceMiles     |REAL           |float          
FlightDelay       |BOOLEAN        |boolean        
FlightDelayMin    |INTEGER        |integer        
FlightDelayType   |VARCHAR        |keyword        
FlightNum         |VARCHAR        |keyword        
FlightTimeHour    |VARCHAR        |keyword        
FlightTimeMin     |REAL           |float          
Origin            |VARCHAR        |keyword        
OriginAirportID   |VARCHAR        |keyword        
OriginCityName    |VARCHAR        |keyword        
OriginCountry     |VARCHAR        |keyword        
OriginLocation    |GEOMETRY       |geo_point      
OriginRegion      |VARCHAR        |keyword        
OriginWeather     |VARCHAR        |keyword        
dayOfWeek         |INTEGER        |integer        
timestamp         |TIMESTAMP      |datetime       

SHOW FUNCTIONS用于返回在Elastiesearch SQL中支持的所有函数,返回结果中包括MIN、MAX、COUNT等常用的聚集函数。

POST _sql?format=txt
{
  "query": "show functions"
}
输出结果:
      name       |     type      
-----------------+---------------
AVG              |AGGREGATE      
COUNT            |AGGREGATE      
FIRST            |AGGREGATE      
FIRST_VALUE      |AGGREGATE      
LAST             |AGGREGATE      
LAST_VALUE       |AGGREGATE      
MAX              |AGGREGATE      
MIN              |AGGREGATE      
SUM              |AGGREGATE      
KURTOSIS         |AGGREGATE      
MAD              |AGGREGATE      
PERCENTILE       |AGGREGATE      
PERCENTILE_RANK  |AGGREGATE      
SKEWNESS         |AGGREGATE      
STDDEV_POP       |AGGREGATE      
SUM_OF_SQUARES   |AGGREGATE      
VAR_POP          |AGGREGATE      
HISTOGRAM        |GROUPING       
CASE             |CONDITIONAL    
COALESCE         |CONDITIONAL    
GREATEST         |CONDITIONAL    
IFNULL           |CONDITIONAL    
IIF              |CONDITIONAL    
ISNULL           |CONDITIONAL    
LEAST            |CONDITIONAL    
NULLIF           |CONDITIONAL    
NVL              |CONDITIONAL    
CURDATE          |SCALAR         
CURRENT_DATE     |SCALAR         
CURRENT_TIME     |SCALAR         
CURRENT_TIMESTAMP|SCALAR         
CURTIME          |SCALAR         
DAY              |SCALAR         
DAYNAME          |SCALAR         
DAYOFMONTH       |SCALAR         
DAYOFWEEK        |SCALAR         
DAYOFYEAR        |SCALAR         
DAY_NAME         |SCALAR         
DAY_OF_MONTH     |SCALAR         
DAY_OF_WEEK      |SCALAR         
DAY_OF_YEAR      |SCALAR         
DOM              |SCALAR         
DOW              |SCALAR         
DOY              |SCALAR         
HOUR             |SCALAR         
HOUR_OF_DAY      |SCALAR         
IDOW             |SCALAR         
ISODAYOFWEEK     |SCALAR         
ISODOW           |SCALAR         
ISOWEEK          |SCALAR         
ISOWEEKOFYEAR    |SCALAR         
ISO_DAY_OF_WEEK  |SCALAR         
ISO_WEEK_OF_YEAR |SCALAR         
IW               |SCALAR         
IWOY             |SCALAR         
MINUTE           |SCALAR         
MINUTE_OF_DAY    |SCALAR         
MINUTE_OF_HOUR   |SCALAR         
MONTH            |SCALAR         
MONTHNAME        |SCALAR         
MONTH_NAME       |SCALAR         
MONTH_OF_YEAR    |SCALAR         
NOW              |SCALAR         
QUARTER          |SCALAR         
SECOND           |SCALAR         
SECOND_OF_MINUTE |SCALAR         
TODAY            |SCALAR         
WEEK             |SCALAR         
WEEK_OF_YEAR     |SCALAR         
YEAR             |SCALAR         
ABS              |SCALAR         
ACOS             |SCALAR         
ASIN             |SCALAR         
ATAN             |SCALAR         
ATAN2            |SCALAR         
CBRT             |SCALAR         
CEIL             |SCALAR         
CEILING          |SCALAR         
COS              |SCALAR         
COSH             |SCALAR         
COT              |SCALAR         
DEGREES          |SCALAR         
E                |SCALAR         
EXP              |SCALAR         
EXPM1            |SCALAR         
FLOOR            |SCALAR         
LOG              |SCALAR         
LOG10            |SCALAR         
MOD              |SCALAR         
PI               |SCALAR         
POWER            |SCALAR         
RADIANS          |SCALAR         
RAND             |SCALAR         
RANDOM           |SCALAR         
ROUND            |SCALAR         
SIGN             |SCALAR         
SIGNUM           |SCALAR         
SIN              |SCALAR         
SINH             |SCALAR         
SQRT             |SCALAR         
TAN              |SCALAR         
TRUNCATE         |SCALAR         
ASCII            |SCALAR         
BIT_LENGTH       |SCALAR         
CHAR             |SCALAR         
CHARACTER_LENGTH |SCALAR         
CHAR_LENGTH      |SCALAR         
CONCAT           |SCALAR         
INSERT           |SCALAR         
LCASE            |SCALAR         
LEFT             |SCALAR         
LENGTH           |SCALAR         
LOCATE           |SCALAR         
LTRIM            |SCALAR         
OCTET_LENGTH     |SCALAR         
POSITION         |SCALAR         
REPEAT           |SCALAR         
REPLACE          |SCALAR         
RIGHT            |SCALAR         
RTRIM            |SCALAR         
SPACE            |SCALAR         
SUBSTRING        |SCALAR         
UCASE            |SCALAR         
CAST             |SCALAR         
CONVERT          |SCALAR         
DATABASE         |SCALAR         
USER             |SCALAR         
ST_ASTEXT        |SCALAR         
ST_ASWKT         |SCALAR         
ST_DISTANCE      |SCALAR         
ST_GEOMETRYTYPE  |SCALAR         
ST_GEOMFROMTEXT  |SCALAR         
ST_WKTTOSQL      |SCALAR         
ST_X             |SCALAR         
ST_Y             |SCALAR         
ST_Z             |SCALAR         
SCORE            |SCORE          

最后,SHOW TABLES用看Elaticecearch中所有的索引。

POST _sql?format=txt
{
  "query": "show tables"
}
输出结果:
           name           |     type      |     kind      
--------------------------+---------------+---------------
.apm-agent-configuration  |BASE TABLE     |INDEX          
.kibana                   |VIEW           |ALIAS          
.kibana_1                 |BASE TABLE     |INDEX          
.kibana_task_manager      |VIEW           |ALIAS          
.kibana_task_manager_1    |BASE TABLE     |INDEX          
articles                  |BASE TABLE     |INDEX          
colleges                  |BASE TABLE     |INDEX          
employees                 |BASE TABLE     |INDEX          
employees1                |BASE TABLE     |INDEX          
kibana_sample_data_flights|BASE TABLE     |INDEX          
kibana_sample_data_logs   |BASE TABLE     |INDEX          

这三种形式都支持使用LIKE子句过滤返回结果,LIKE子句在用法上与SQL语句中的LIKE类似。

例如,show tables like 'a%'将只返回以a开头的索引。

POST _sql?format=txt
{
  "query": "show tables like 'a%'"
}
输出结果:
     name      |     type      |     kind      
---------------+---------------+---------------
articles       |BASE TABLE     |INDEX          

操作符与函数

Elasticsearch SQL中支持的操作符与函数有100多种,这些操作符大多与普通SQL语言一致,所以这里只介绍一些与普通 SQL语句不一样的地方。

先来看一下比较操作符。一般等于比较在SQL中使用等号“=”,这在ElasticsearchSQL中也成立。但是Elasticseareh SQL 还引人了另一个等号比较“< = >”,这种等号可以在左值为null时不出现异常。

LIKE操作符,在LIKE子句中可以使用%代表任意多个字符,而使用下划线_代表单个字符。Elasticsearch SQL不仅支持 LIKE子句,还支持通过RLIKE子句以正则表达式的形式做匹配,这大大扩展了SQL语句模糊匹配的能力。

尽管使用LIKE和RLIKE可以实现模糊匹配,但它离全文检索还差得很远。SQL语句的WHERE子句一般都是使用字段整体值做比较,而没有使用词项做匹配的能力。为此Elasticsearch SQL提供了MATCH和QUERY两个函数,以实现在SQL做全文检索。

例如下面的两个请求分别使用match和query函数,它们的作用都是检索DestCounty字段为CN的文档:

POST _sql?format=txt
{
  "query": """
select DestCountry, OriginCountry,AvgTicketPrice,score()
from kibana_sample_data_flights
where match(DestCountry,'CN') limit 3
"""
}
输出结果:
  DestCountry  | OriginCountry | AvgTicketPrice  |    SCORE()    
---------------+---------------+-----------------+---------------
CN             |MX             |730.041748046875 |2.4774308      
CN             |MX             |922.4990844726562|2.4774308      
CN             |RU             |277.4297180175781|2.4774308      
POST _sql?format=txt
{
  "query": """
select DestCountry, OriginCountry,AvgTicketPrice,score()
from kibana_sample_data_flights
where query('DestCountry:CN') limit 3
"""
}
输出结果:
  DestCountry  | OriginCountry | AvgTicketPrice  |    SCORE()    
---------------+---------------+-----------------+---------------
CN             |MX             |730.041748046875 |2.4774308      
CN             |MX             |922.4990844726562|2.4774308      
CN             |RU             |277.4297180175781|2.4774308      

在示例中的两个请求的selet子句中都使用了SCORE函数,它的作用是获取检索的相关度评分值。

Elasticsearch SQL支持传统SQL中的聚集函数,这包括MAX、MIN、AVG、COUNT、SUM等。同时,它还支持一些 Elasticsearch特有的聚集函数,这些聚集函数与Elasticsearch聚集查询相对应。这包括FIRST/FIRST_ VALUE和 LAST/LAST_VALUE,可用于查看某个字段首个和最后一个非空值;PERCENTILE和PERCENTILE RANK用于百分位聚集,KURTOSIS、SKEWNESS、STDDEV_ POP、SUM_OF SQUARES和VAR_ POP可用于运算其他统计聚集。除了以上这些函数和操作符,Elasticsearch SQL还定义了一组用于日期、数值以及字符串运算的函数。

 类似资料: