问题：

单击后等待加载数据表/Selenium

艾仲渊

2023-03-14

我正在尝试使用selenium/python从印度中央污染控制委员会读取数据表。这是一个输出示例。我基本上遵循此处介绍的方法：https://github.com/RachitKamdar/Python-Scraper.

多亏了@Prophet，我才能够从第一页读取数据（使用Python的XPATH选择元素？）但我无法让selenium在切换到第2页时等待数据表重新加载。我试图添加webdriverwait指令，但这似乎确实有效。任何帮助都将不胜感激。谢谢

这就是我想做的

browser.find_element_by_tag_name("select").send_keys("100")
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)

i = 1
while i < maxpage + 1:
    browser.find_element(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a[contains(text(),'{}')]".format(i)).click()


    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper"))) 

    #this works ok for page 1
    #this does not wait after the click for the data table to update. As a result res is wrong for page 2 [empty].

    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')
    ...
    i = i + 1

更新1：根据Prophet的建议，我做了以下修改：

browser.find_element_by_tag_name("select").send_keys("100")
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper")))
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))
maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)
print(maxpage)
i = 1
while i < maxpage + 1:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper")))
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1
    browser.find_element(By.XPATH,'//a[@class="paginate_button next"]').click()

这仍然会在第2页崩溃（数据为空）。此外，数据应该包含第1页中的100个项目，但只包含10个。maxpage编号正确（15）。

更新2：

以下是整合先知建议后的整个剧本[原稿如下https://github.com/RachitKamdar/Python-Scraper]. 这只从第一页检索到10个点，无法切换到下一页。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select

def getValsHtml(table):
    data = []
    heads = table.find_all('th')
    data.append([ele.text.strip() for ele in heads])
    rows = table.find_all('tr')

    for row in rows:

        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols]) # Get rid of empty values                                                                                                                                                                                                                                                                                                                                                                    
    data.pop(1)
    data = pd.DataFrame(data[1:],columns = data[0])
    return data


def parameters(br,param):
    br.find_element_by_class_name("list-filter").find_element_by_tag_name("input").send_keys(param)
    br.find_elements_by_class_name("pure-checkbox")[1].click()
    br.find_element_by_class_name("list-filter").find_element_by_tag_name("input").clear()


timeout = 60
url = 'https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/data'
chdriverpath="/net/f1p/my_soft/chromedriver"
option = webdriver.ChromeOptions()
browser = webdriver.Chrome(executable_path="{}".format(chdriverpath), chrome_options=option)
browser.get(url)

station="Secretariat, Amaravati - APPCB"
state="Andhra Pradesh"
city="Amaravati"

sd=['01', 'Jan', '2018']
ed=['31', 'Dec', '2021']
duration="24 Hours"


WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.CLASS_NAME,"toggle")))

browser.find_elements_by_class_name("toggle")[0].click()
browser.find_element_by_tag_name("input").send_keys(state)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[1].click()
browser.find_element_by_tag_name("input").send_keys(city)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[2].click()
browser.find_element_by_tag_name("input").send_keys(station)
browser.find_element_by_class_name("options").click()
browser.find_elements_by_class_name("toggle")[4].click()
browser.find_element_by_class_name("filter").find_element_by_tag_name("input").send_keys(duration)
browser.find_element_by_class_name("options").click()
browser.find_element_by_class_name("c-btn").click()
for p in ['NH3']:
    print(p)
    try:
        parameters(browser,p)
    except:
        print("miss")
        browser.find_element_by_class_name("list-filter").find_element_by_tag_name("input").clear()
        pass
browser.find_element_by_class_name("wc-date-container").click()
browser.find_element_by_class_name("month-year").click()
browser.find_element_by_id("{}".format(sd[1].upper())).click()
browser.find_element_by_class_name("year-dropdown").click()
browser.find_element_by_id("{}".format(int(sd[2]))).click()
browser.find_element_by_xpath('//span[text()="{}"]'.format(int(sd[0]))).click()
browser.find_elements_by_class_name("wc-date-container")[1].click()
browser.find_elements_by_class_name("month-year")[1].click()
browser.find_elements_by_id("{}".format(ed[1].upper()))[1].click()
browser.find_elements_by_class_name("year-dropdown")[1].click()
browser.find_element_by_id("{}".format(int(ed[2]))).click()
browser.find_elements_by_xpath('//span[text()="{}"]'.format(int(ed[0])))[1].click()
browser.find_elements_by_tag_name("button")[-1].click()


next_page_btn_xpath = '//a[@class="paginate_button next"]'
actions = ActionChains(browser)

#This is how you should treat the Select drop down                                                                                                                                                                                                                                                                                                                                                                                      
select = Select(browser.find_element_by_tag_name("select"))
select.select_by_value('100')

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]')))
WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, "//*[@id='DataTables_Table_0_paginate']/span/a")))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)


i = 1
while i < maxpage + 1:
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1

    #scroll to the next page btn and then click it                                                                                                                                                                                                                                                                                                                                                                                      
    next_page_btn = browser.find_element_by_xpath(next_page_btn_xpath)
    actions.move_to_element(next_page_btn).perform()
    browser.find_element(By.XPATH,next_page_btn).click()

browser.quit()

共有1个答案

淳于健

2023-03-14

而不是

browser.find_element(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a[contains(text(),'{}')]".format(i)).click()

尝试单击此元素：

browser.find_element(By.XPATH,'//a[@class="paginate_button next"]').click()

这只是下一页按钮，它不会改变每一页你
同样，而不是

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.ID,"DataTables_Table_0_wrapper")))

试试这个

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]')))

当您尝试使用仅为第一页定义的所有页面时，此元素将相同。
UPD
正确的代码应该是这样的：

from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import Select

next_page_btn_xpath = '//a[@class="paginate_button next"]'
actions = ActionChains(driver)

#This is how you should treat the Select drop down
select = Select(driver.find_element_by_tag_name("select"))
select.select_by_value('100')

WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH,'//div[@class="dataTables_wrapper no-footer"]')))

maxpage = int(browser.find_elements(By.XPATH,"//*[@id='DataTables_Table_0_paginate']/span/a")[-1].text)

i = 1
while i < maxpage + 1:
    res = browser.page_source
    soup = BeautifulSoup(res, 'html.parser')
    soup = soup.find(id = 'DataTables_Table_0')

    if i == 1:
        data = getValsHtml(soup)
    else:
        data = data.append(getValsHtml(soup))
    print(i)
    print(data)
    i = i + 1
    
    #scroll to the next page btn and then click it
    next_page_btn = driver.find_element_by_xpath(next_page_btn_xpath)
    actions.move_to_element(next_page_btn).perform()
    browser.find_element(By.XPATH,next_page_btn).click()

类似资料：

Selenium java SafariDriver：单击后等待页面加载

当我运行以下代码时，执行会突然结束，除非我取消对Thread.sleep（）的注释。因此，我在撤回url servlet中的代码不会被执行。单击是一个提交按钮单击，加载另一个页面。让硒等到页面加载的正确方法是什么？我正在使用以下selenium版本
python selenium：在单击click（）命令后不等待页面加载

问题内容：有人知道如何等待页面加载吗？我尝试了在网上找到的所有可能的变体，但根本无法正常工作。触发click（）命令后，我需要等待，Web服务器上存在一些内部脚本，这些脚本会愚弄检查，例如（我排除了导入所需模块并使用标准命名约定的代码）：要么要么上述所有检查均无效，从某种意义上来说，即使页面仍在加载中，它们也会返回True。这会导致我正在阅读的文本不完整，因为click（）命令后页面未完
Selenium Python：点击后如何等待页面加载？

我想在点击后获取页面的页面源。然后使用browser.back（）函数返回。但是Selenium不会让页面在点击后完全加载，并且由JavaScript生成的内容不包含在该页面的页面源中。
Python Selenium -在表单提交后等待下一页加载

我使用Python3和Selenium firefox提交一个表单，然后获取它们登陆的URL。我是这样做的这在大多数情况下都有效，但有时页面加载时间超过5秒，我会得到旧的URL，因为新的URL还没有加载。我应该怎么做？
等待页面加载

我试图创建一个等待加载页面的方法，但我出现了一个错误。可能我没有正确使用这个方法。错误是：
Selenium Java：等待直到可单击

我被困在一个有硒等待的情况下。我正在使用硒爪哇和cucumber。点击一个按钮，一个新的页面加载，但内容还不能点击。当页面加载到背面时，将显示一个灰色屏幕阻止器，以便在加载整个页面之前使其不可编辑。所以我不能使用waitforpageload或wait for element使其可见，因为它们都返回true，因为元素在后台可用。我尝试使用一个条件来检查元素是否可点击，以确保页面完全加载。但那也没用

单击后等待加载数据表/Selenium

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档