使用 txtai 进行抽取式 QA

韦睿

2023-12-01

使用 txtai 进行抽取式 QA

本教程系列将涵盖txtai的主要用例，这是一个 AI 驱动的语义搜索平台。该系列的每章都有相关代码，可也可以在colab 中使用。
colab

本文在之前介绍的基础上，扩展到构建提取式问答系统。

安装依赖

安装txtai和所有依赖项。

pip install txtai

创建 Embeddings 和 Extractor 实例

Embeddings 实例是 txtai 的主要入口点。Embeddings 实例定义了用于标记文本段并将其转换为嵌入向量的方法。

Extractor 实例是抽取式问答的入口点。

Embeddings 和 Extractor 实例都采用一条通往 Transformer 模型的路径。Hugging Face 模型中心上的任何模型都可以用来代替下面的模型。

from txtai.embeddings import Embeddings
from txtai.pipeline import Extractor

# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

# Create extractor instance
extractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")

data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays beat Red Sox final score 2-1",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Lightning goaltender pulled, lose to Flyers 4-1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

questions = ["What team won the game?", "What was score?"]

execute = lambda query: extractor([(question, query, question, False) for question in questions], data)

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in execute(query):
        print(answer)
    print()

# Ad-hoc questions
question = "What hockey team won?"

print("----", question, "----")
print(extractor([(question, question, question, False)], data))

参考

https://dev.to/neuml/tutorial-series-on-txtai-ibg

使用 txtai 进行抽取式 QA

使用 txtai 进行抽取式 QA

安装依赖

创建 Embeddings 和 Extractor 实例

参考

相关阅读

相关文章

相关问答

相关文档