当前位置: 首页 > 知识库问答 >
问题:

从初始加载时不可见的页面正文中删除数据

何超英
2023-03-14

我正试图使用美丽的汤刮从这个网站的数据。如果您向下滚动到单个播放部分,请单击“共享和更多”

我试图使用python包漂亮的汤来刮取这些数据。我目前正在做的是

nfl_url = #the url I have linked above
driver = webdriver.Chrome(executable_path=r'C:/path/to/chrome/driver') 
driver.get(nfl_url)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
print(soup.find(id="csv_all_plays"))

这只会导致打印“无”。我知道,因为在加载页面时没有显示这些数据,这意味着我不能使用Requests包,我必须使用一些实际获取整个页面源的东西(我使用的是Selenium)。这不是我在这里做的吗?我无法获取CSV数据的原因是否不同?

共有2个答案

高勇
2023-03-14

你可以用熊猫

import pandas as pd

table = pd.read_html('https://www.pro-football-reference.com/play-index/play_finder.cgi?request=1&match=summary_all&year_min=2018&year_max=2018&game_type=R&game_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99&quarter%5B%5D=4&minutes_max=15&seconds_max=00&minutes_min=00&seconds_min=00&down%5B%5D=0&down%5B%5D=1&down%5B%5D=2&down%5B%5D=3&down%5B%5D=4&field_pos_min_field=team&field_pos_max_field=team&end_field_pos_min_field=team&end_field_pos_max_field=team&type%5B%5D=PUNT&no_play=N&turnover_type%5B%5D=interception&turnover_type%5B%5D=fumble&score_type%5B%5D=touchdown&score_type%5B%5D=field_goal&score_type%5B%5D=safety&rush_direction%5B%5D=LE&rush_direction%5B%5D=LT&rush_direction%5B%5D=LG&rush_direction%5B%5D=M&rush_direction%5B%5D=RG&rush_direction%5B%5D=RT&rush_direction%5B%5D=RE&pass_location%5B%5D=SL&pass_location%5B%5D=SM&pass_location%5B%5D=SR&pass_location%5B%5D=DL&pass_location%5B%5D=DM&pass_location%5B%5D=DR&order_by=yards')[4]
table.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8-sig',index = False )
窦宏旷
2023-03-14

您可以使用selenium将鼠标悬停在“共享”按钮上

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.pro-football-reference.com/play-index/play_finder.cgi?request=1&match=summary_all&year_min=2018&year_max=2018&game_type=R&game_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99&quarter%5B%5D=4&minutes_max=15&seconds_max=00&minutes_min=00&seconds_min=00&down%5B%5D=0&down%5B%5D=1&down%5B%5D=2&down%5B%5D=3&down%5B%5D=4&field_pos_min_field=team&field_pos_max_field=team&end_field_pos_min_field=team&end_field_pos_max_field=team&type%5B%5D=PUNT&no_play=N&turnover_type%5B%5D=interception&turnover_type%5B%5D=fumble&score_type%5B%5D=touchdown&score_type%5B%5D=field_goal&score_type%5B%5D=safety&rush_direction%5B%5D=LE&rush_direction%5B%5D=LT&rush_direction%5B%5D=LG&rush_direction%5B%5D=M&rush_direction%5B%5D=RG&rush_direction%5B%5D=RT&rush_direction%5B%5D=RE&pass_location%5B%5D=SL&pass_location%5B%5D=SM&pass_location%5B%5D=SR&pass_location%5B%5D=DL&pass_location%5B%5D=DM&pass_location%5B%5D=DR&order_by=yards')
scroll = ActionChains(d).move_to_element(d.find_element_by_id('all_all_plays'))
scroll.perform()
spans = [i for i in d.find_elements_by_tag_name('span') if 'Share & more' in i.text]
hover = ActionChains(d).move_to_element(spans[-1])
hover.perform()
b = [i for i in d.find_elements_by_tag_name('button') if 'get table as csv' in i.text.lower()][0]
b.send_keys('\n')
csv_data = soup(d.page_source, 'html.parser').find('pre', {'id':'csv_all_plays'}).text

输出(由于SO的字符限制而缩短):

"\nDate,Tm,Opp,Quarter,Time,Down,ToGo,Location,Score,Detail,Yds,EPB,EPA,Diff,PYds,PRYds\n2018-09-09,Texans,Patriots,4,4:41,4,8,HTX 36,13-27,Trevor Daniel punts 47 yards muffed catch by Riley McCarron recovered by Johnson Bademosi and returned for no gain,0,-0.980,4.510,5.49,47,\n2018-09-09,Jaguars,Giants,4,0:54,4,6,JAX 40,20-15,Logan Cooke punts 41 yards muffed catch by Kaelin Clay recovered by Donald Payne and returned for no gain,0,-0.720,4.170,4.89,41,\n2018-09-09,Chiefs,Chargers,4,10:35,4,6,KAN 27,31-20,Dustin Colquitt punts 59 yards returned by JJ Jones for no gain. JJ Jones fumbles (forced by De'Anthony Thomas) recovered by James Winchester at LAC-2,0,-1.570,6.740,8.31,59,\n2018-09-23,Dolphins,Raiders,4,12:33,4,5,MIA 39,14-17,Matt Haack punts 42 yards muffed catch by Jordy Nelson recovered by Jordy Nelson and returned for no gain,0,-0.780,0.060,.84,42,\n2018-09-30,Jets,Jaguars,4,8:59,4,10,NYJ 14,12-25,Lac Edwards punts 46 yards muffed catch by Jaydon Mickens ball out of bounds at JAX-41,0,-2.470,-1.660,.81,46,\n2018-10-11,Giants,Eagles,4,12:27,4,17,NYG 33,13-34,Riley Dixon punts 50 yards muffed catch by DeAndre Carter recovered by DeAndre Carter and returned for no gain,0,-1.180,-0.040,1.14,50,\n2018-10-28,Jets,Bears,4,5:37,4,13,NYJ 37,10-24,Lac Edwards punts 48 yards muffed catch by Tarik Cohen recovered by Tarik Cohen and returned for no gain,0,-0.910,0.320,1.23,48,\n2018-11-25,Vikings,Packers,4,6:00,4,13,GNB 37,24-14,Matt Wile punts 21 yards muffed catch by Tramon Williams recovered by Marcus Sherels and returned for no gain,0,0.790,4.580,3.79,21,\n2018-12-13,Chiefs,Chargers,4,2:47,4,15,KAN 6,28-21,Dustin Colquitt punts 55 yards muffed catch by Desmond King recovered by Desmond King and returned for no gain,0,-2.490,-1.600,.89,55,

要将csv数据写入文件,请执行以下操作:

import csv
with open('individual_stats.csv', 'w') as f:
  write = csv.writer(f)
  write.writerows([list(filter(None, i.split(','))) for i in filter(None, csv_data.split('\n'))])

输出(前16行):

Date,Tm,Opp,Quarter,Time,Down,ToGo,Location,Score,Detail,Yds,EPB,EPA,Diff,PYds,PRYds
2018-09-09,Texans,Patriots,4,4:41,4,8,HTX 36,13-27,Trevor Daniel punts 47 yards muffed catch by Riley McCarron recovered by Johnson Bademosi and returned for no gain,0,-0.980,4.510,5.49,47
2018-09-09,Jaguars,Giants,4,0:54,4,6,JAX 40,20-15,Logan Cooke punts 41 yards muffed catch by Kaelin Clay recovered by Donald Payne and returned for no gain,0,-0.720,4.170,4.89,41
2018-09-09,Chiefs,Chargers,4,10:35,4,6,KAN 27,31-20,Dustin Colquitt punts 59 yards returned by JJ Jones for no gain. JJ Jones fumbles (forced by De'Anthony Thomas) recovered by James Winchester at LAC-2,0,-1.570,6.740,8.31,59
2018-09-23,Dolphins,Raiders,4,12:33,4,5,MIA 39,14-17,Matt Haack punts 42 yards muffed catch by Jordy Nelson recovered by Jordy Nelson and returned for no gain,0,-0.780,0.060,.84,42
2018-09-30,Jets,Jaguars,4,8:59,4,10,NYJ 14,12-25,Lac Edwards punts 46 yards muffed catch by Jaydon Mickens ball out of bounds at JAX-41,0,-2.470,-1.660,.81,46
2018-10-11,Giants,Eagles,4,12:27,4,17,NYG 33,13-34,Riley Dixon punts 50 yards muffed catch by DeAndre Carter recovered by DeAndre Carter and returned for no gain,0,-1.180,-0.040,1.14,50
2018-10-28,Jets,Bears,4,5:37,4,13,NYJ 37,10-24,Lac Edwards punts 48 yards muffed catch by Tarik Cohen recovered by Tarik Cohen and returned for no gain,0,-0.910,0.320,1.23,48
2018-11-25,Vikings,Packers,4,6:00,4,13,GNB 37,24-14,Matt Wile punts 21 yards muffed catch by Tramon Williams recovered by Marcus Sherels and returned for no gain,0,0.790,4.580,3.79,21
2018-12-13,Chiefs,Chargers,4,2:47,4,15,KAN 6,28-21,Dustin Colquitt punts 55 yards muffed catch by Desmond King recovered by Desmond King and returned for no gain,0,-2.490,-1.600,.89,55
2018-12-16,Bears,Packers,4,2:51,4,6,CHI 12,24-14,Pat O'Donnell punts 51 yards muffed catch by Josh Jackson recovered by Josh Jackson and returned for no gain,0,-2.490,-1.660,.83,51
2018-12-16,Eagles,Rams,4,3:03,4,12,PHI 15,30-23,Cameron Johnston punts 52 yards returned by Jojo Natson for 3 yards. Jojo Natson fumbles recovered by D.J. Alexander at LAR-36,0,-2.440,3.180,5.62,52,3
2018-12-02,Giants,Bears,4,12:46,4,18,NYG 12,24-14,Riley Dixon punts 53 yards returned by Tarik Cohen for 8 yards (tackle by Rhett Ellison). Tarik Cohen fumbles (forced by Rhett Ellison) recovered by Tarik Cohen at CHI-45. Penalty on Josh Bellamy: Illegal Block Above the Waist 10 yards,-2,-2.490,-0.670,1.82,53,8
2018-11-25,Jaguars,Bills,4,13:33,4,25,JAX 15,14-21,Logan Cooke punts 55 yards returned by Isaiah McKenzie for 9 yards (tackle by Jarrod Wilson). Isaiah McKenzie fumbles (forced by Jarrod Wilson) recovered by Isaiah McKenzie at BUF-43. Penalty on Marcus Murphy: Illegal Block Above the Waist 10 yards,-4,-2.440,-0.670,1.77,55,9
2018-09-06,Eagles,Falcons,4,7:42,4,14,PHI 21,10-12,Cameron Johnston punts 46 yards out of bounds,-1.960,-1.140,.82,46
2018-09-06,Falcons,Eagles,4,5:04,4,14,ATL 29,12-10,Matthew Bosher punts 52 yards returned by Darren Sproles for 12 yards (tackle by Eric Saubert). Penalty on Eric Saubert: Face Mask (15 Yards) 15 yards,-1.440,-1.990,-0.55,52,12
 类似资料:
  • 问题内容: 链接到pdf 当我尝试从上面的pdf中提取文本时,我混合了在evince查看器中不可见的文本和可见的文本。此外,某些所需的文本缺少查看器中未缺少的字符,例如“ FALCONS”中的“ S”和许多缺少的“ 1/2”字符。我认为这是由于来自不可见文本的干扰,因为在查看器中突出显示pdf时,可以看到不可见文本与可见文本重叠。 有没有办法删除不可见的文字?还是有其他解决方案? 码: 输出(粗体

  • 当通过数据库加载csv时,第2行下面的第4列没有加载。CSV的列数每行不同。 null 但是没有将上面的rdd转换为dataframe会导致错误 df2=sqlcontext.read.format(“com.databricks.spark.csv”).schema(模式).load(“sample_files/test_01.csv”) df2.show() df2.show() 但是,当no

  • 链接到pdf 当我尝试从上面的pdf中提取文本时,我得到了在evince viewer中不可见的文本和可见的文本的混合。此外,一些所需的文本缺少查看器中没有缺少的字符,例如,“FALCONS”中的“S”和许多缺少的“½”字符。我认为这是由于不可见文本的干扰,因为在查看器中突出显示pdf时,可以看到不可见文本与可见文本重叠。 有没有办法去掉不可见的文字?还是有别的解决办法? 代码: 输出(粗体文本为

  • 问题内容: 我正在使用我使用android studio Tabbed Activity 创建的应用程序上工作,我选择了此活动,以便在用户滑动时从json url加载一些数据,并且我创建了另一个类,该类可以在方法上获取JSON数据,并且所有这些都可以正常工作,除非在应用程序时从调用方法的主要活动开始,并且当我调用布局时未填充任何数据时 ,我想要的是加载应用程序MainActivity时要显示的数据

  • 问题内容: 这是这里的代码: 我遇到的问题是Hello World!中的’o’和’r’之间插入了额外的空格。该样式设置为无,所以有人请告诉我怎样才能得到,而没有空间句话? 问题答案: 找不到特定的帖子,但是聪明的人提出了3种解决方法: 将所有内容放在一行中(删除所有换行符) 注释掉换行符,例如 –> rld 将容器设置为并将其重新设置为-s 更新 至少还有2种方法: 破坏标签,例如 </span

  • 我已经使用了主变体颜色作为整个应用程序的背景,但仍然在UI最初加载时看到了一个白色屏幕。有什么办法吗? 编辑:我创建了一个新的空项目,并应用了@Philip Dukhov的bellow建议。还是一样的结果。在Surface开始加载之前,白屏首先出现,并在屏幕上停留至少两秒钟。 感谢您的帮助!