当前位置: 首页 > 知识库问答 >
问题:

单击后等待加载数据表/Selenium

艾仲渊
2023-03-14

我正在尝试使用selenium/python从印度中央污染控制委员会读取数据表。这是一个输出示例。我基本上遵循此处介绍的方法:https://github.com/RachitKamdar/Python-Scraper.

多亏了@Prophet,我才能够从第一页读取数据(使用Python的XPATH选择元素?)但我无法让selenium在切换到第2页时等待数据表重新加载。我试图添加webdriverwait指令,但这似乎确实有效。任何帮助都将不胜感激。谢谢

这就是我想做的

browser.find_element_by_tag_name("select").send_keys("100")
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)

i = 1
while i < maxpage + 1:
    browser.find_element(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a[contains(text(),'{}')]".format(i)).click()


    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper"))) 

    #this works ok for page 1
    #this does not wait after the click for the data table to update. As a result res is wrong for page 2 [empty].

    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')
    ...
    i = i + 1

更新1:根据Prophet的建议,我做了以下修改:

browser.find_element_by_tag_name("select").send_keys("100")
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper")))
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))
maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)
print(maxpage)
i = 1
while i < maxpage + 1:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper")))
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1
    browser.find_element(By.XPATH,'//a[@class="paginate_button next"]').click()

这仍然会在第2页崩溃(数据为空)。此外,数据应该包含第1页中的100个项目,但只包含10个。maxpage编号正确(15)。

更新2:

以下是整合先知建议后的整个剧本[原稿如下https://github.com/RachitKamdar/Python-Scraper]. 这只从第一页检索到10个点,无法切换到下一页。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select

def getValsHtml(table):
    data = []
    heads = table.find_all('th')
    data.append([ele.text.strip() for ele in heads])
    rows = table.find_all('tr')

    for row in rows:

        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols]) # Get rid of empty values                                                                                                                                                                                                                                                                                                                                                                    
    data.pop(1)
    data = pd.DataFrame(data[1:],columns = data[0])
    return data


def parameters(br,param):
    br.find_element_by_class_name("list-filter").find_element_by_tag_name("input").send_keys(param)
    br.find_elements_by_class_name("pure-checkbox")[1].click()
    br.find_element_by_class_name("list-filter").find_element_by_tag_name("input").clear()


timeout = 60
url = 'https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/data'
chdriverpath="/net/f1p/my_soft/chromedriver"
option = webdriver.ChromeOptions()
browser = webdriver.Chrome(executable_path="{}".format(chdriverpath), chrome_options=option)
browser.get(url)

station="Secretariat, Amaravati - APPCB"
state="Andhra Pradesh"
city="Amaravati"

sd=['01', 'Jan', '2018']
ed=['31', 'Dec', '2021']
duration="24 Hours"


WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.CLASS_NAME,"toggle")))

browser.find_elements_by_class_name("toggle")[0].click()
browser.find_element_by_tag_name("input").send_keys(state)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[1].click()
browser.find_element_by_tag_name("input").send_keys(city)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[2].click()
browser.find_element_by_tag_name("input").send_keys(station)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[4].click()
browser.find_element_by_class_name("filter").find_element_by_tag_name("input").send_keys(duration)
browser.find_element_by_class_name("options").click()
browser.find_element_by_class_name("c-btn").click()
for p in ['NH3']:
    print(p)
    try:
        parameters(browser,p)
    except:
        print("miss")
        browser.find_element_by_class_name("list-filter").find_element_by_tag_name("input").clear()
        pass
browser.find_element_by_class_name("wc-date-container").click()
browser.find_element_by_class_name("month-year").click()
browser.find_element_by_id("{}".format(sd[1].upper())).click()
browser.find_element_by_class_name("year-dropdown").click()
browser.find_element_by_id("{}".format(int(sd[2]))).click()
browser.find_element_by_xpath('//span[text()="{}"]'.format(int(sd[0]))).click()
browser.find_elements_by_class_name("wc-date-container")[1].click()
browser.find_elements_by_class_name("month-year")[1].click()
browser.find_elements_by_id("{}".format(ed[1].upper()))[1].click()
browser.find_elements_by_class_name("year-dropdown")[1].click()
browser.find_element_by_id("{}".format(int(ed[2]))).click()
browser.find_elements_by_xpath('//span[text()="{}"]'.format(int(ed[0])))[1].click()
browser.find_elements_by_tag_name("button")[-1].click()


next_page_btn_xpath = '//a[@class="paginate_button next"]'
actions = ActionChains(browser)

#This is how you should treat the Select drop down                                                                                                                                                                                                                                                                                                                                                                                      
select = Select(browser.find_element_by_tag_name("select"))
select.select_by_value('100')

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]')))
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)


i = 1
while i < maxpage + 1:
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1

    #scroll to the next page btn and then click it                                                                                                                                                                                                                                                                                                                                                                                      
    next_page_btn = browser.find_element_by_xpath(next_page_btn_xpath)
    actions.move_to_element(next_page_btn).perform()
    browser.find_element(By.XPATH,next_page_btn).click()

browser.quit()

共有1个答案

淳于健
2023-03-14

而不是

browser.find_element(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a[contains(text(),'{}')]".format(i)).click()

尝试单击此元素:

browser.find_element(By.XPATH,'//a[@class="paginate_button next"]').click()

这只是下一页按钮,它不会改变每一页你
同样,而不是

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper"))) 

试试这个

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]'))) 

当您尝试使用仅为第一页定义的所有页面时,此元素将相同。
UPD
正确的代码应该是这样的:

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select

next_page_btn_xpath = '//a[@class="paginate_button next"]'
actions = ActionChains(driver)

#This is how you should treat the Select drop down
select = Select(driver.find_element_by_tag_name("select"))
select.select_by_value('100')

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]')))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)

i = 1
while i < maxpage + 1:
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1
    
    #scroll to the next page btn and then click it
    next_page_btn = driver.find_element_by_xpath(next_page_btn_xpath)
    actions.move_to_element(next_page_btn).perform()
    browser.find_element(By.XPATH,next_page_btn).click()
 类似资料:
  • 当我运行以下代码时,执行会突然结束,除非我取消对Thread.sleep()的注释。因此,我在撤回url servlet中的代码不会被执行。单击是一个提交按钮单击,加载另一个页面。 让硒等到页面加载的正确方法是什么? 我正在使用以下selenium版本

  • 问题内容: 有人知道如何等待页面加载吗?我尝试了在网上找到的所有可能的变体,但根本无法正常工作。 触发click()命令后,我需要等待,Web服务器上存在一些内部脚本,这些脚本会愚弄检查,例如(我排除了导入所需模块并使用标准命名约定的代码): 要么 要么 上述所有检查均无效,从某种意义上来说,即使页面仍在加载中,它们也会返回True。这会导致我正在阅读的文本不完整,因为click()命令后页面未完

  • 我想在点击后获取页面的页面源。然后使用browser.back()函数返回。但是Selenium不会让页面在点击后完全加载,并且由JavaScript生成的内容不包含在该页面的页面源中。

  • 我使用Python3和Selenium firefox提交一个表单,然后获取它们登陆的URL。我是这样做的 这在大多数情况下都有效,但有时页面加载时间超过5秒,我会得到旧的URL,因为新的URL还没有加载。 我应该怎么做?

  • 我试图创建一个等待加载页面的方法,但我出现了一个错误。可能我没有正确使用这个方法。 错误是:

  • 我被困在一个有硒等待的情况下。我正在使用硒爪哇和cucumber。点击一个按钮,一个新的页面加载,但内容还不能点击。当页面加载到背面时,将显示一个灰色屏幕阻止器,以便在加载整个页面之前使其不可编辑。所以我不能使用waitforpageload或wait for element使其可见,因为它们都返回true,因为元素在后台可用。我尝试使用一个条件来检查元素是否可点击,以确保页面完全加载。但那也没用