使用Python在Twitter上进行基本数据分析

麻书
2023-12-01

by Lucas Kohorst

卢卡斯·科斯特(Lucas Kohorst)

使用Python在Twitter上进行基本数据分析 (Basic data analysis on Twitter with Python)

After creating the Free Wtr bot using Tweepy and Python and this code, I wanted a way to see how Twitter users were perceiving the bot and what their sentiment was. So I created a simple data analysis program that takes a given number of tweets, analyzes them, and displays the data in a scatter plot.

在使用Tweepy和Python以及这段代码创建Free Wtr机器人之后,我想要一种方法来查看Twitter用户如何看待该机器人以及他们的感受。 因此,我创建了一个简单的数据分析程序,该程序采用给定数量的tweet,对其进行分析,并在散点图中显示数据。

建立 (Setup)

I had to install a few packages to create this: Tweepy, Tkinter, Textblob and matplotlib. You can install each of these with the pip package manager. For example:

我必须安装一些软件包来创建它: TweepyTkinterTextblobmatplotlib 。 您可以使用pip软件包管理器安装其中的每一个。 例如:

pip install tweepy

or you can clone into the Github repository like this.

或者您可以像这样克隆到Github存储库。

git clone https://github.com/sloria/textblobcd textblobpython setup.py install

Next you will need to create a new Python file and import the following packages.

接下来,您将需要创建一个新的Python文件并导入以下软件包。

import tweepy #The Twitter APIfrom Tkinter import * #For the GUIfrom time import sleepfrom datetime import datetimefrom textblob import TextBlob #For Sentiment Analysisimport matplotlib.pyplot as plt #For Graphing the Data

Twitter凭证 (Twitter Credentials)

Now we need to link a Twitter account to our script. If you don’t have one already, create one.

现在,我们需要将Twitter帐户链接到我们的脚本。 如果您还没有,请创建一个。

Go to apps.twitter.com and sign in with your account. Create a Twitter application and generate a Consumer Key, Consumer Secret, Access Token, and Access Token Secret.

转到apps.twitter.com并使用您的帐户登录。 创建一个Twitter应用程序并生成使用者密钥,使用者密钥,访问令牌和访问令牌密钥。

Under your import statements, store your credentials in variables and then use the second block of code to authenticate your account with Tweepy.

在导入语句下,将凭据存储在变量中,然后使用第二段代码对Tweepy的帐户进行身份验证。

consumer_key = 'consumer key'consumer_secret = 'consumer secrets'access_token = 'access token'access_token_secret = 'access token secret'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)auth.set_access_token(access_token, access_token_secret)api = tweepy.API(auth)

If you want to test to see if your account is properly authenticated, you could simply print your username to the console.

如果要测试以查看您的帐户是否经过了正确的身份验证,则只需将用户名打印到控制台即可。

user = api.me()print (user.name)

创建GUI (Creating the GUI)

For the interface, we will use two labels: one for the search and the other for the sample size or number of tweets to be analyzed. We will also need a submit button so that when clicked, we can call our getData function.

对于界面,我们将使用两个标签:一个用于搜索 ,另一个用于样本大小或要分析的推文数量。 我们还将需要一个提交按钮,以便在单击时可以调用我们的getData函数。

root = Tk()
label1 = Label(root, text="Search")E1 = Entry(root, bd =5)
label2 = Label(root, text="Sample Size")E2 = Entry(root, bd =5)
submit = Button(root, text ="Submit", command = getData)

So that the computer knows to keep the GUI on the screen, we need to pack our labels and then loop the root display.

为了使计算机知道将GUI保留在屏幕上,我们需要打包标签,然后循环显示根目录。

label1.pack()E1.pack()
label2.pack()E2.pack()
submit.pack(side =BOTTOM)
root.mainloop()

Simply running this code, you should see a window that looks something like this:

只需运行此代码,您应该会看到一个类似于以下内容的窗口:

However when text is input into the labels or the submit button is clicked, nothing happens. We have to collect the data.

但是,当在标签中输入文本或单击“ 提交”按钮时,什么也不会发生。 我们必须收集数据。

分析推文 (Analyzing Tweets)

First, we have to get the text input into the labels.

首先,我们必须将文本输入到标签中。

def getE1():    return E1.get()
def getE2():    return E2.get()

Now we are ready to code the getData function. From now on, all code is in this function:

现在我们准备对getData函数进行编码。 从现在开始,所有代码都在此函数中:

def getData():    #Code

We need to use the GetE1() and GetE2() functions. These store our search and sample size in variables that we can loop through.

我们需要使用GetE1()GetE2()函数。 这些将我们的搜索样本量存储在可以循环的变量中。

getE1()    keyword = getE1()
getE2()    numberOfTweets = getE2()    numberOfTweets = int(numberOfTweets)

In order to store our data, we can use lists. One list is for the polarity (or sentiment) of the tweets, and another for the number of the tweet (so that we can graph the data).

为了存储我们的数据,我们可以使用列表。 一个列表用于推文的极性(或情感),另一个用于推文的编号(以便我们可以绘制数据图)。

polarity_list = []    numbers_list = []    number = 1

The number of tweets needs to be declared as 1 because the default value is 0.

由于默认值为0,因此需要将tweets的数目声明为1。

We can now begin to iterate through the tweets and analyze them. Using TextBlob, we can find the sentiment of each tweet and store it to a variable polarity . We can then append this variable to our polarity_list along with appending the number to our number_list.

现在,我们可以开始遍历这些推文并对其进行分析。 使用TextBlob,我们可以找到每个tweet的情绪并将其存储为可变polarity 。 然后,我们可以添加这个变量我们polarity_list与追加数量我们一起number_list

analysis = TextBlob(tweet.text)analysis = analysis.sentimentpolarity = analysis.polarity            polarity_list.append(polarity)            numbers_list.append(number)number = number + 1

We take this code and, using a for loop and try statement, we iterate it over the number of tweets for the search keyword.

我们采用此代码,并使用for循环和try语句,将其遍历搜索关键字的tweet数量

for tweet in tweepy.Cursor(api.search, keyword, lang="en").items(numberOfTweets):        try:            analysis = TextBlob(tweet.text)            analysis = analysis.sentiment            polarity = analysis.polarity            polarity_list.append(polarity)            numbers_list.append(number)            number = number + 1
except tweepy.TweepError as e:            print(e.reason)
except StopIteration:            break

散点图 (Graphing Scatter Plot)

In order to graph our scatter plot with matplotlib, we first have to define the axis

为了用matplotlib绘制散点图我们首先必须定义轴

axes = plt.gca()axes.set_ylim([-1, 2])

and then plot our lists of data.

然后绘制我们的数据列表。

plt.scatter(numbers_list, polarity_list)

Key information is shown in a box. In order to show the overall sentiment of the tweets we gathered, we calculate the average across all collected Tweets. Also, since we are displaying the sentiment at a specific time, we want to display the date and time.

密钥信息显示在一个框中。 为了显示我们收集的推文的总体感觉,我们计算了所有收集到的推文的平均值。 另外,由于我们在特定时间显示情感,因此我们想显示日期和时间。

averagePolarity = (sum(polarity_list))/(len(polarity_list))averagePolarity = "{0:.0f}%".format(averagePolarity * 100)time  = datetime.now().strftime("At: %H:%M\nOn: %m-%d-%y")
plt.text(0, 1.25, "Average Sentiment:  " + str(averagePolarity) + "\n" + time, fontsize=12, bbox = dict(facecolor='none', edgecolor='black', boxstyle='square, pad = 1'))

For the title, we can use this

对于标题,我们可以使用它

plt.title("Sentiment of " + keyword + " on Twitter") plt.xlabel("Number of Tweets")plt.ylabel("Sentiment")

and finally use plot.show() to display the graph.

最后使用plot.show()显示图形。

(Example)

Testing this for my Free Wtr bot, the sentiment was sky high!

测试我的免费Wtr 机器人,人气很高!

as for Donald Trump, I cannot say the same:

至于唐纳德·特朗普,我不能说同样的话:

Here is the full source code on Github.

这是Github上的完整源代码

翻译自: https://www.freecodecamp.org/news/basic-data-analysis-on-twitter-with-python-251c2a85062e/

 类似资料: