CLASS1 — NLP INTRODUCTION SUMMARY
Applications of NLP
- Information Extraction 信息抽取
- Information Extraction & Sentiment Analysis 信息抽取与情感分析
- Machine Translation
Three kind of language technology
A. mostly solved
- spam detection
- part-of-speech(POS) tagging词性标签
- named entity recognition(NER)
B. making good progress
- sentiment analysis
- coreference resolution
- word sense disambiguous词义消歧(WSD)
C. still really hard
- question answering(QA)
- paraphrase反义句
- summarization
- dialog
What makes NLP hard?
Ambiguity — crash blossoms
Why else is NL understanding difficult?
- non-standard English
- segmentation issue
- idioms
- neologisms新词
- world knowledge
- tricky entity names
What tools do we need?
- knowledge about language
- knowledge about the world
- a way to combine knowledge sources
How we generally do this?
- probabilities models built from language data
- rough text features can often do half the job.