Duckling1是Facebook开源的结构化抽取工具,
Language, engine, and tooling for expressing, testing, and evaluat ing composable language rules on input strings.
比如,可以完成以下解析:
"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}
可以在 wit. ai2中进行尝试,中文也能够识别一些,不过能力比较有限。
那么问题来了:
type Bool = Boolean
type Maybe = Option
type Text = String
val EQ = 0
val LT = -1
val GT = 1
Haskell中函数应用时,经常会有partial apply,对应到Scala中可以将函数Curry化5,这样能够抄起来更顺手,比如:
-- Haskell
ptree :: Text -> Entity -> IO ()
ptree sentence Entity {enode} = pnode sentence 0 enode
pnode :: Text -> Int -> Node -> IO ()
pnode sentence depth Node {children, rule, nodeRange = Range start end} = do
Text.putStrLn out
mapM_ (pnode sentence (depth + 1)) children
where
out = Text.concat [ Text.replicate depth "-- ", name, " (", body, ")" ]
name = fromMaybe "regex" rule
body = Text.drop start $ Text.take end sentence
// Scala Curry: a => b => c
def ptree(sentence: Text)(entity: Entity): Unit = {
pnode(sentence, 0)(entity.enode)
}
def pnode(sentence: Text, depth: Int)(node: Node): Unit = {
val name = node.rule.getOrElse("regex")
val body = sentence.substring(node.nodeRange.start, node.nodeRange.end)
val out = "%s%s(%s)".format("-- " * depth, name, body)
println(out)
node.children.foreach(pnode(sentence, depth + 1))
}
Haskell中用Monad来处理副作用,这部分代码基本可以忽略,理解是什么类型就可以了。上面的示例代码中的mapM_
看不懂也没关系,直接看成map一样不妨碍理解。
有些工具上会有差异,比如对PCRE.regex
的结果不了解,单步又只能看到指针。这个时候可以用trace来打印一下信息,比如:
import Debug.Trace (trace)
-- | Returns all matches matching the first pattern item of `match`,
-- resuming from a Match position
matchFirst :: Document -> Stash -> Match -> Duckling [Match]
matchFirst _ _ (Rule {pattern = []}, _, _) = return []
matchFirst sentence stash (rule@Rule{pattern = p : ps, name = name}, position, route) =
map (mkMatch route newRule) <$> lookupItem sentence p stash position
where
newRule = trace ("matchFirst: rule - " ++ (show name)) (rule { pattern = ps })
上面虽然讲的是抄袭小技巧,但是Haskell上手门槛还是比较高的,有很多概念非常学究,也不太好理解。先学习一至两个月Haskell课程再来看代码还是有必要的,强上恐怕(本渣码农)不太行。
学习资料:
CIS 194: Introduction to Haskell6
Real World Haskell7
Learn You a Haskell for Great Good!8
Haskell函数式编程入门9
前面几个可以只学前几章,最后一本中文的可以通读,里面晦涩的例子跳过就好。
Haskell整体上是设计严谨,代码清晰的,最重要的是纯函数式的。想通过Scala来进入函数式语言的世界,学学Haskell就知道,这是不可能的。
有几个问题:
有没有什么很好的点呢?有的,纯函数式语言,代码有不可变性,不用跟踪中间状态(不会有一个东西值变来变去的),对于代码理解来说还是很赞的。
用Haskell实现的Duckling项目,其实相对于前一版Clojure10(Lisp方言)实现的来说,可读性已经有了大幅提升。项目还是有一些东西比较绕:
代码转换到Scala、修复了其中的BUG之后,就要开始用偏OO的方法来重新组织代码:
代码怎么没放出来?,都已经介绍到这个程度了,加油,你行的。
更新:代码已经开源在了MiNLP/duckling-fork-chinese
facebook/duckling: https://github.com/facebook/duckling ↩︎
在线演示:https://duckling.wit.ai/ ↩︎
如何评价scalaz这个库? - 阿莱克西斯的回答 - 知乎 https://www.zhihu.com/question/24845284/answer/124441377 ↩︎
Scalaz: Principled Functional Programming in Scala https://scalaz.github.io/ ↩︎
Currying is the technique of transforming a function with multiple arguments into a function with just one argument. http://baddotrobot.com/blog/2013/07/21/curried-functions/ ↩︎
CIS 194: Introduction to Haskell https://www.seas.upenn.edu/~cis194 ↩︎
Real World Haskell http://book.realworldhaskell.org/ ↩︎
Learn You a Haskell for Great Good! http://learnyouahaskell.com/chapters ↩︎
Haskell函数式编程入门 https://www.amazon.cn/dp/B00TGW03P0 ↩︎
wit.ai /duckling_old https://github.com/wit-ai/duckling_old ↩︎