问题：

为搜索列表中的标题提取imdbid

暴博远

2023-03-14

是否有可能获得所有符合搜索条件（如投票数、语言、发行年份等）的标题的IMDb ID？

我的首要任务是编制一份清单，列出所有被归类为故事片的IMDb身份证，并有超过25,000张选票（也就是那些有资格的人出现在前250名名单上）。在发布这篇文章时，有4296部电影符合这一标准。

（如果您不熟悉IMDb ID：它是与数据库中的每个电影/人物/角色/等相关联的唯一7位数代码。例如，对于电影Drive（2011），IMDb ID是0780504。（

然而，在未来，它将有助于设置搜索条件，因为我可以在键入url地址（与

我一直在使用IMDBpy，并取得了巨大的成功，以拉取单个电影标题的信息，如果我描述的这个搜索功能可以通过该库访问，我会很高兴。

到目前为止，我一直在生成随机的7位字符串并测试它们是否符合我的标准，但这将是低效的，因为我在多余的ID上浪费了处理时间。

from imdb import IMDb, IMDbError
import random
i =  IMDb(accessSystem='http')
movies = []
for _ in range(11000):
    randID = str(random.randint(0, 7221897)).zfill(7)
    movies.append(randID)

for m in movies:
    try:
        movie = i.get_movie(m)
    except IMDbError as err:
      print(err)`

    if str(movie)=='':
        continue

    kind = movie.get('kind')
    if kind != 'movie':
        continue

    votes=movie.get('votes')
    if votes == None:
        continue

    if votes>=25000:

共有2个答案

胡昊

2023-03-14

基于Alexandru Olteanu编写的教程，找到了使用美丽汤的解决方案

这是我的代码：

from requests import get
from bs4 import BeautifulSoup
import re
import math
from time import time, sleep
from random import randint
from IPython.core.display import clear_output
from warnings import warn

url = "http://www.imdb.com/search/title?num_votes=25000,&title_type=feature&view=simple&sort=num_votes,desc&page=1&ref_=adv_nxt"
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)

num_films_text = html_soup.find_all('div', class_ = 'desc')
num_films=re.search('of (\d.+) titles',str(num_films_text[0])).group(1)
num_films=int(num_films.replace(',', ''))
print(num_films)

num_pages = math.ceil(num_films/50)
print(num_pages)

ids = []
start_time = time()
requests = 0

# For every page in the interval`
for page in range(1,num_pages+1):    
    # Make a get request    
    url = "http://www.imdb.com/search/title?num_votes=25000,&title_type=feature&view=simple&sort=num_votes,desc&page="+str(page)+"&ref_=adv_nxt"
    response = get(url)

    # Pause the loop
    sleep(randint(8,15))  

    # Monitor the requests
    requests += 1
    sleep(randint(1,3))
    elapsed_time = time() - start_time
    print('Request: {}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
    clear_output(wait = True) 

    # Throw a warning for non-200 status codes
    if response.status_code != 200:
        warn('Request: {}; Status code: {}'.format(requests, response.status_code))   

    # Break the loop if the number of requests is greater than expected
    if requests > num_pages:
        warn('Number of requests was greater than expected.')  
        break

    # Parse the content of the request with BeautifulSoup
    page_html = BeautifulSoup(response.text, 'html.parser')

    # Select all the 50 movie containers from a single page
    movie_containers = page_html.find_all('div', class_ = 'lister-item mode-simple')

    # Scrape the ID 
    for i in range(len(movie_containers)):
        id = re.search('tt(\d+)/',str(movie_containers[i].a)).group(1)
        ids.append(id)
print(ids)

况唯

2023-03-14

看看http://www.omdbapi.com/您可以直接使用API，通过标题或ID进行搜索。

在蟒蛇3中

import urllib.request
urllib.request.urlopen("http://www.omdbapi.com/?apikey=27939b55&s=moana").read()

类似资料：

iOS 表格检视标题中的搜寻列

本文向大家介绍iOS 表格检视标题中的搜寻列，包括了iOS 表格检视标题中的搜寻列的使用技巧和注意事项，需要的朋友参考一下示例本示例使用搜索控制器来过滤表视图控制器中的单元格。搜索栏位于表格视图的标题视图内部。表格视图的内容以与搜索栏相同的高度偏移，因此首先隐藏了搜索栏。向上滚动超过表格视图的顶部边缘时，将显示搜索栏。然后，当搜索栏变为活动状态时，它会隐藏导航栏。将UITableViewCo
给列提供多个索引/标题

问题内容：我正在使用基本上是时间序列的熊猫数据帧，如下所示：我想要拥有的是level列的多个索引/标题，如下所示：所以基本上我正在寻找类似的东西，，。原因是一个位置可以有多个数据集，而我希望能够从一个合并的大数据框中选择一个位置的所有数据，或者所有位置的特定类型的所有数据。我可以从pandas文档中设置一个示例数据框，并测试我的选择，但是对于我的真实数据，我需要像示例中那样以不同的方式设
从DataFrame列标题获取列表

我想从数据帧中获取列标题列表。数据帧将来自用户输入，因此我不知道将有多少列或它们将被调用。例如，如果我得到这样的DataFrame：我会得到这样的列表：
搜索排序列表？

问题内容：用Python的方式搜索或操作排序序列是什么？问题答案：是标准库的一部分-您正在寻找这种东西吗？
如何搜索表中的所有列？

问题内容：如何在SQL Server中搜索表的所有列？问题答案：如果您正在寻找完全的全场比赛。如果要查找子字符串匹配项，则必须进行很长的路要走：
字典搜索的Python列表

问题内容：假设我有这个：并通过搜索“ Pam”作为名称，我想检索相关的字典：如何实现呢？问题答案：你可以使用生成器表达式：如果你需要处理不存在的商品，则可以按照用户Matt的建议进行操作，并使用稍有不同的API提供默认值：

为搜索列表中的标题提取imdbid

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档