textblob 情感分析
by Arun Mathew Kurian
通过阿伦·马修·库里安(Arun Mathew Kurian)
This blog is based on the video Twitter Sentiment Analysis — Learn Python for Data Science #2 by Siraj Raval. In this challenge, we will be building a sentiment analyzer that checks whether tweets about a subject are negative or positive. We will be making use of the Python library textblob for this.
该博客基于Twitter情绪分析视频-Siraj Raval的《 学习Python for Data Science#2》 。 在这一挑战中,我们将构建一个情绪分析器,以检查有关某个主题的推文是否为正面或负面。 我们将为此使用Python库textblob。
Sentiment Analysis, also called opinion mining or emotion AI, is the process of determining whether a piece of writing is positive, negative, or neutral. A common use case for this technology is to discover how people feel about a particular topic. Sentiment analysis is widely applied to reviews and social media for a variety of applications.
情感分析,也称为观点挖掘或情感AI,是确定某篇文章是正面的,负面的还是中立的过程。 该技术的一个常见用例是发现人们对特定主题的感觉。 情绪分析已广泛应用于评论和社交媒体,具有多种应用。
Sentiment analysis can be performed in many different ways. Many brands and marketers use keyword-based tools that classify data (i.e. social, news, review, blog, etc.) as positive/negative/neutral.
情感分析可以通过许多不同的方式进行。 许多品牌和营销商使用基于关键字的工具将数据(即社交,新闻,评论,博客等)分类为正面/负面/中性。
Automated sentiment tagging is usually achieved through word lists. For example, mentions of ‘hate’ would be tagged negatively.
自动情感标记通常是通过单词列表来实现的。 例如,对“仇恨”的提及会被加上负面标签。
There can be two approaches to sentiment analysis.
情绪分析可以有两种方法。
1. Lexicon-based methods2. Machine Learning-based methods.
1.基于词典的方法2。 基于机器学习的方法。
In this problem, we will be using a Lexicon-based method.
在此问题中,我们将使用基于Lexicon的方法。
Lexicon based methods define a list of positive and negative words, with a valence — (eg ‘nice’: +2, ‘good’: +1, ‘terrible’: -1.5 etc). The algorithm looks up a text to find all known words. It then combines their individual results by summing or averaging. Some extensions can check some grammatical rules, like negation or sentiment modifier (like the word “but”, which weights sentiment values in text differently, to emphasize the end of text).
基于词汇的方法定义了一个带价数的正负词列表(例如,“ nice”:+ 2,“ good”:+ 1,“ terrible”:-1.5等)。 该算法查找文本以查找所有已知单词。 然后,通过求和或求平均值来合并其各自的结果。 一些扩展名可以检查一些语法规则,例如否定或情感修饰符(例如单词“ but”,它对文本中的情感值进行不同的加权,以强调文本的结尾)。
Let’s build the analyzer now.
让我们现在构建分析器。
Before we start coding, we need to register for the Twitter API https://apps.twitter.com/. Here we need to register an app to generate various keys associated with our API. The Twitter API can be used to perform many actions like create and search.
在开始编码之前,我们需要注册Twitter API https://apps.twitter.com/ 。 在这里,我们需要注册一个应用程序以生成与我们的API相关的各种密钥。 Twitter API可用于执行许多操作,例如创建和搜索。
Now after creating the app we can start coding.
现在,在创建应用程序之后,我们就可以开始编码了。
We need to install two packages:
我们需要安装两个软件包:
pip install tweepy
点安装tweepy
This package will be used for handling the Twitter API.
该软件包将用于处理Twitter API。
pip install textblob
点安装textblob
This package will be used for the sentiment analysis.
该软件包将用于情感分析。
sentiment_analyzer.py
sentiment_analyzer.py
import tweepyfrom textblob import TextBlob
We need to declare the variables to store the various keys associated with the Twitter API.
我们需要声明变量以存储与Twitter API关联的各种密钥。
consumer_key = ‘[consumer_key]’
consumer_key_secret = ‘[consumer_key_secret]’
access_token = ‘[access_token]’
access_token_secret = ‘[access_token_secret]’
The next step is to create a connection with the Twitter API using tweepy with these tokens.
下一步是使用带有这些令牌的tweepy与Twitter API建立连接。
Tweepy supports OAuth authentication. Authentication is handled by the tweepy.OAuthHandler class.
Tweepy支持OAuth身份验证。 认证由tweepy.OAuthHandler类处理。
An OAuthHandler instance must be created by passing a consumer token and secret.
必须通过传递使用者令牌和机密来创建OAuthHandler实例。
On this auth instance, we will call a function set_access_token by passing the access_token and access_token_secret.
在此auth实例上,我们将通过传递access_token和access_token_secret来调用set_access_token函数。
Finally, we create our tweepy API instance by passing this auth instance into the API function of tweepy.
最后,通过将auth实例传递给tweepy的API函数来创建tweepy API实例。
auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
We can now search Twitter for any topic using the search method of the API.
现在,我们可以使用API的搜索方法在Twitter上搜索任何主题。
public_tweets = api.search(‘Dogs’)
Now we will be getting all the tweets related to the topic ‘Dogs’. We can perform sentiment analysis using the library textblob.
现在,我们将获得与“狗”主题相关的所有推文。 我们可以使用库textblob进行情感分析。
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
TextBlob是用于处理文本数据的Python(2和3)库。 它提供了一个简单的API,用于深入研究普通自然语言处理(NLP)任务,例如词性标记,名词短语提取,情感分析,分类,翻译等。
A textblob can be created in the following way (example, and not part of the original code):
可以通过以下方式创建textblob(示例,而不是原始代码的一部分):
example = TextBlob("Python is a high-level, general-purpose programming language.")
And tokenization can be performed by the following methods:
令牌化可以通过以下方法执行:
words: returns the words of text
words :返回文本的单词
usage:
用法:
example.words
sentences: returns the sentences of text
句子:返回文字的句子
usage:
用法:
example.sentences
Part-of-speech tags can be accessed through the tags property.
可以通过标签属性访问词性标签。
wiki.tags[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]
The sentiment property returns a named tuple of the form Sentiment (polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
情感属性返回形式为情感(极性,主观性)的命名元组。 极性分数在[-1.0,1.0]范围内浮动。 主观性是在[0.0,1.0]范围内的浮动,其中0.0是非常客观的,而1.0是非常主观的。
Now back to the code.
现在回到代码。
We can iterate the publice_tweets array, and check the sentiment of the text of each tweet based on the polarity.
我们可以迭代publice_tweets数组,并根据极性检查每个tweet的文本情感。
for tweet in public_tweets: print(tweet.text) analysis = TextBlob(tweet.text) print(analysis.sentiment) if analysis.sentiment[0]>0: print 'Positive' elif analysis.sentiment[0]<0: print 'Negative' else: print 'Neutral'
Now we run the code using the following:
现在,我们使用以下代码运行代码:
python sentiment_analyzer.py
python sentiment_analyzer.py
and we get the output:
然后我们得到输出:
We can see that the sentiment of the tweet is displayed.
我们可以看到显示了推文的情绪。
This is an example of how sentiment analysis can be done on data from social media like Twitter. I hope you find it useful!
这是如何对来自Twitter等社交媒体的数据进行情感分析的示例。 希望对你有帮助!
Find the code at https://github.com/amkurian/twitter_sentiment_challenge
在https://github.com/amkurian/twitter_sentiment_challenge中找到代码
textblob 情感分析