本教程系列将涵盖txtai的主要用例,这是一个 AI 驱动的语义搜索平台。该系列的每章都有相关代码,可也可以在colab 中使用。
colab
本文在之前介绍的基础上,扩展到构建提取式问答系统。
安装txtai
和所有依赖项。
pip install txtai
Embeddings 实例是 txtai 的主要入口点。Embeddings 实例定义了用于标记文本段并将其转换为嵌入向量的方法。
Extractor 实例是抽取式问答的入口点。
Embeddings 和 Extractor 实例都采用一条通往 Transformer 模型的路径。Hugging Face 模型中心上的任何模型都可以用来代替下面的模型。
from txtai.embeddings import Embeddings
from txtai.pipeline import Extractor
# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})
# Create extractor instance
extractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")
data = ["Giants hit 3 HRs to down Dodgers",
"Giants 5 Dodgers 4 final",
"Dodgers drop Game 2 against the Giants, 5-4",
"Blue Jays beat Red Sox final score 2-1",
"Red Sox lost to the Blue Jays, 2-1",
"Blue Jays at Red Sox is over. Score: 2-1",
"Phillies win over the Braves, 5-0",
"Phillies 5 Braves 0 final",
"Final: Braves lose to the Phillies in the series opener, 5-0",
"Lightning goaltender pulled, lose to Flyers 4-1",
"Flyers 4 Lightning 1 final",
"Flyers win 4-1"]
questions = ["What team won the game?", "What was score?"]
execute = lambda query: extractor([(question, query, question, False) for question in questions], data)
for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
print("----", query, "----")
for answer in execute(query):
print(answer)
print()
# Ad-hoc questions
question = "What hockey team won?"
print("----", question, "----")
print(extractor([(question, question, question, False)], data))
https://dev.to/neuml/tutorial-series-on-txtai-ibg