当前位置: 首页 > 知识库问答 >
问题:

以列表形式打印输出

潘宪
2023-03-14

下面的代码运行良好。它根据LinkedIn上的列表收集信息。

(提供帐户信息并免费使用,因为它是测试帐户)

但是,输出连接数据,而不是每个字段都有自己的字段。

import time
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
test1=[]
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())

url = "https://www.linkedin.com/uas/login?session_redirect=https%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fsearch%2Fresults%2Fpeople%2F%3FcurrentCompany%3D%255B%25221252860%2522%255D%26geoUrn%3D%255B%2522103644278%2522%255D%26keywords%3Dsales%26origin%3DFACETED_SEARCH%26page%3D2&fromSignIn=true&trk=cold_join_sign_in"
driver.get(url)
time.sleep(2)

username = driver.find_element_by_id('username')
username.send_keys('kbradons04@gmail.com')
password = driver.find_element_by_id('password')

password.send_keys('Applesauce1')
password.submit()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(3)

elementj=(WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".subline-level-2.t-12.t-black--light.t-normal.search-result__truncate"))))
place1=[j.text for j in elementj]


elementk=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".subline-level-1.t-14.t-black.t-normal.search-result__truncate")))
compan=[c.text for c in elementk]


element1 = driver.find_elements_by_class_name("actor-name")
title=[t.text for t in element1]


diction={"Location":place1,"Company":compan,"Title":title}
test1.append(diction)
print(test1)

共有1个答案

卢鸿博
2023-03-14

我可以运行你的代码,

下面是我得到的,在pandas数据帧中的多个列表列的有效解套(分解)方法的帮助下

import time
import pandas as pd
import numpy as np
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
test1=[]
options = Options()
driver = webdriver.Chrome(ChromeDriverManager().install())

url = "https://www.linkedin.com/uas/login?session_redirect=https%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fsearch%2Fresults%2Fpeople%2F%3FcurrentCompany%3D%255B%25221252860%2522%255D%26geoUrn%3D%255B%2522103644278%2522%255D%26keywords%3Dsales%26origin%3DFACETED_SEARCH%26page%3D2&fromSignIn=true&trk=cold_join_sign_in"
driver.get(url)
time.sleep(2)

username = driver.find_element_by_id('username')
username.send_keys('kbradons04@gmail.com')
password = driver.find_element_by_id('password')

password.send_keys('Applesauce1')
password.submit()
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

time.sleep(3)

elementj=(WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".subline-level-2.t-12.t-black--light.t-normal.search-result__truncate"))))
place1=[j.text for j in elementj]


elementk=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".subline-level-1.t-14.t-black.t-normal.search-result__truncate")))
compan=[c.text for c in elementk]


element1 = driver.find_elements_by_class_name("actor-name")
title=[t.text for t in element1]


diction={"Location":place1,"Company":compan,"Title":title}
test1.append(diction)
print(test1)

df = pd.DataFrame(test1)

def explode(df, lst_cols, fill_value=''):
    # make sure `lst_cols` is a list
    if lst_cols and not isinstance(lst_cols, list):
        lst_cols = [lst_cols]
    # all columns except `lst_cols`
    idx_cols = df.columns.difference(lst_cols)

    # calculate lengths of lists
    lens = df[lst_cols[0]].str.len()

    if (lens > 0).all():
        # ALL lists in cells aren't empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .loc[:, df.columns]
    else:
        # at least one list in cells is empty
        return pd.DataFrame({
            col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
            for col in idx_cols
        }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
          .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
          .loc[:, df.columns]

explode(df,['Location','Company','Title'])

和结果

    Location            Company                                 Title
0   Dayton, Ohio Area   National Account Executive              LinkedIn Member
1   Dayton, Ohio Area   Currently seeking permanent employment  LinkedIn Member
2   Dayton, Ohio Area   Account Manager at LexisNexis           LinkedIn Member
3   Greater Denver Area Currently seeking new opportunities in managem...   LinkedIn Member
4   Dayton, Ohio Area   Advertising Sales Representative at AMOS MEDIA  LinkedIn Member
5   Dayton, Ohio Area   Territory Manager at Huntington Outdoor, LLC    LinkedIn Member
6   Vandalia, Ohio, United States   Cintas  LinkedIn Member
7   Dayton, Ohio Area   Outside Sales Representative at Carter Lumber.  LinkedIn Member
8   Dayton, Ohio Area   Actively Searching  LinkedIn Member
9   Corpus Christi, Texas Area  Currently looking for sales position    LinkedIn Member
 类似资料:
  • 问题内容: 我有以以下格式(示例)保存的数据(数字): 是否有任何python-way方法来排列数字并将其作为 (我无法预测列的大小)。 问题答案: 这是一个简单的独立示例,显示了如何设置可变列宽的格式: 输出:

  • 标准输出 1.gossh远程执行命令返回格式. #批量模式首行首先打印所有的远程机器IP. [servers]=[192.168.56.2 192.168.56.2] #机器ip ip=xxx.xxx.56.2 #远程执行命令 command=uname #命令执行完后的退出值,就是$? return=0 #远程执行命令输出到标准输出和错误输出的结果 Linux ##换行和---分隔线 ---

  • 我只想用下面的公式从任何给定的数字创建一个因子列表。我不允许使用列表,因此,我模仿使用字符串如下: 例如,假设我们选择num=12:

  • 问题内容: 我有一个对象列表,我想在一个漂亮的表中打印每个对象的参数。 我的代码在这里: 我想要得到的输出是 问题答案: for attr in (‘thing’, ‘owner’, ‘color’): for item in bin: print ‘%-10s’%getattr(item, attr), print 使用列表理解更紧凑

  • 我正在使用Google Cloud Shell中的命令。我正在遵循的教程以及留档(https://cloud.google.com/sdk/gcloud/reference/compute/zones/list)指出,要以表格形式列出所有区域,您应该使用以下命令: 当我运行它时,结果不会列在表格中,而是如下所示: 这是一个全新的原始Google Cloud帐户。Google是否更改了命令的默认输出

  • 问题内容: 我是一个json对象 在上面的说法应该是相当打印结果。如果我做类似的事情,它就是这样做的。但是,我想通过将其附加到div中来输出给用户。当我这样做时,我只会显示一行。(我认为它不起作用,因为中断和空格未解释为html?) 有没有办法以漂亮的打印方式将结果输出到div? 问题答案: 请使用 标签 演示:http : //jsfiddle.net/K83cK/