当前位置: 首页 > 工具软件 > Kythe > 使用案例 >

Kythe-Writing a New Indexer 编写新的索引器

史和泰
2023-12-01

官网原文
词典工具

This document is an overview of the steps to take to add support for a new language to Kythe.
We assume(假设) that you have the Kythe release package extracted to /opt/kythe. You can also build the tools from source (but it is not necessary to build Kythe to provide(提供) it with graph data). Sample code snippets([ˈsnipit]片段) are written in JavaScript, but this document is not about indexing any particular([pəˈtikjulə] 独特的) language.
本文档概述了向Kythe添加新语言支持的步骤。
我们假设您已将kythe发布包提取到/opt/kythe。您还可以从源代码构建工具(但不必构建Kythe来为其提供图形数据)。
示例代码片段是用JavaScript编写的,但本文档不是关于索引任何特定语言的。

In the Kythe pipeline, a language’s indexer is responsible(负责) for building a subgraph that represents(表示) a particular program. Complete(完整) indexers usually accept(接受) .kzip files that contain a program, all of its dependencies(依赖项), and the arguments necessary for a compiler(编译器) or interpreter(解释器) to understand it. This data is packaged by a separate(单独) component(组件) called an extractor(此数据由提取器打包生成). Depending on the language and build system involved(所涉及的), it may be possible(合理的,可以允许的) to use a generic extractor to produce these hermetic([həːˈmetik]密封) compilation(编译) units(根据所涉及的语言和构建系统,可以使用通用提取器来生成这些密封编译单元). We will not address(讨论) extraction here.
在kythe流水线中,语言的索引器负责构建表示特定程序的子图。
完整的索引器通常接受.kzip文件,其中包含程序、其所有依赖项以及编译器或解释器理解该程序所需的参数。
此数据由称为提取器的单独组件打包。
根据所涉及的语言和构建系统,可以使用通用提取器来生成这些密封编译单元。我们不会在这里讨论提取。

For development and testing, it’s useful for the indexer to accept program text directly as input; this is how we will proceed(继续进行) in these instructions(指令)(这就是我们在这些指令中将如何进行的). First, we’ll begin by writing some scripts to insert file content into a small Kythe graph. From there, we’ll see how to encode Kythe nodes and edges into entries(项), the unit of exchange between many of our tools(我们将看到如何将节点和边编码到项中,这是我们许多工具之间的交换单位). We’ll see that certain(['sə:tən]某些) kinds of nodes are used to represent(表示) common sorts of semantic(语义(学)的) objects in programming languages and that other nodes are used to represent syntactic spans ([sin'tæktik][spæns]语法范围) of text. We will add relationships as edges between these nodes to add cross-reference data to the graph. This allows users to jump between definitions and references in programs we’ve indexed. Finally, we’ll discuss(详述) how to write tests for (and how to debug) Kythe indexers.
对于开发和测试,索引器直接接受程序文本作为输入是很有用的。这就是我们在这些指令中将如何进行的。
首先,我们将从编写一些脚本开始,将文件内容插入到一个小的Kythe图中。
从那里,我们将看到如何将节点和边编码到条目中,这是我们许多工具之间的交换单位。
我们将看到,某些类型的节点用于表示编程语言中常见的语义对象,而其他节点用于表示文本的语法范围。
而后,我们将添加关系作为这些节点之间的边,以将交叉引用数据添加到图中。在我们索引的程序中,允许用户在的定义和引用之间跳转。
最后,我们将讨论如何为Kythe索引器编写测试(以及如何调试)。

Bootstrapping(自举法) Kythe support

Kythe indexers emit(发射) directed(定向的) graph data as a stream(流) of entries that can represent(表示) either nodes or edges. These have various(不同的) encodings, but for simplicity(简易) we’ll use JSON. To get started, let’s write a script kythe-browse.sh that will turn a stream of JSON-formatted Kythe entries into a format that our example code browser can read. Put it in your Kythe root; it will clobber(彻底打垮) the directories //graphstore and //tables.
索引器将有向图数据作为可以表示节点或边的条目流发出。
它们有不同的编码,但为了简单起见,我们将使用JSON。
首先,让我们编写一个脚本kythe-browse.sh
它将把JSON格式的kythe条目流转换为我们的示例代码浏览器可以读取的格式。
把它放在你的Kythe 根目录中,它会破坏目录//graphstore//tables

#!/bin/bash -e
set -o pipefail
BROWSE_PORT="${BROWSE_PORT:-8080}"

# binaries at 
#
# https://github.com/kythe/kythe/releases/tag/v0.0.30.
#

# This script assumes that they are installed to /opt/kythe.
# If you build the tools yourself or install them to a different location,
# make sure to pass the correct public_resources directory to http_server.
rm -f -- graphstore/* tables/*
mkdir -p graphstore tables

# Read JSON entries from standard in to a graphstore.
/opt/kythe/tools/entrystream \
  --read_format=json | \
  /opt/kythe/tools/write_entries \
  -graphstore graphstore

# Convert the graphstore to serving tables.
/opt/kythe/tools/write_tables \
  -graphstore graphstore \
  -out=tables

echo -e "\nhttp://localhost:${BROWSE_PORT}\n"
# Host the browser UI.
# ":${BROWSE_PORT}" allows access from other machines
/opt/kythe/tools/http_server \
  -public_resources /opt/kythe/web/ui \
  -serving_table tables \
  -listen="localhost:${BROWSE_PORT}"

提示
The protocol buffer encoding of Kythe facts(事实) is more efficient(效率高的) than the JSON encoding we’re using here. Kythe supports JSON because some languages do not have good support for protocol buffers. This only comes into play for languages that emit a large amount of data, like C++. The entrystream tool used in kythe-browse.sh is invoked(调用) to read a stream of JSON entries from standard input and emit a varint32-delimited(为…定界) stream of kythe.proto.Entry messages on standard output.
Kythe facts的协议缓冲区编码比我们在这里使用的JSON编码更有效。
Kythe 支持 JSON,因为某些语言 没有很好的支持 协议缓冲区。
这只适用于发出大量数据的语言,如C++。
调用kythe-browse.sh中使用的entrystream工具从标准输入读取JSON条目流,
并在标准输出上发出varint32分隔的kythe.proto.Entry消息流。

You can test this with a very short entry stream. The only tricky(错综复杂的) part here is that Kythe fact values, when serialized to JSON, are base64-encoded. This ensures that they can be properly(完整地) deserialized(反序列化) later, since fact values may contain arbitrary(任意的) binary data, but JSON strings permit only UTF-8 characters. ZmlsZQ== is file and SGVsbG8sIHdvcmxkIQ== is Hello, world!.
您可以使用一个非常短的入口流来测试这一点。
这里唯一棘手的部分是,Kythe fact values 在序列化为JSON时采用 Base64 编码。
这 确保以后可以正确反序列化它们,
因为fact values 可能包含任意二进制数据,但 JSON 字符串只允许使用 UTF-8 字符。
ZMLSZQ==fileSGVSBG8SIHDVCMXKIQ==Hello, world!

echo '
{"source":{"corpus":"example","path":"hello"},
 "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="}
{"source":{"corpus":"example","path":"hello"},
 "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="}
' | ./kythe-browse.sh

You can check that http://localhost:8080/#hello?corpus=example shows ‘Hello, world!’.

Modeling Kythe entries

File content

Cross-references

Specifying spans of text

Linking anchors to semantic nodes

Testing

Testing for variable definitions and references

dasd

 类似资料: