利用Python爬虫获取IP2Location定位数据

苏季同

2023-12-01

IP2Location是一个很好的IP定位信息查询网站，该网站提供以下几种数据服务：
1.收费数据包：可以使用一年，根据内容详略情况收取不同的价格
2.LITE包：数据包内信息只能精确到IPv4的C段地址
3.Sample包：只包含0.0.0.0~99.255.255.255的地址空间，且其中信息较为陈旧
4.网页查询：未注册用户每天可以查询50次，注册后可查询200次，信息包含十九项

本人就是随便玩玩，没钱买数据包，而免费的包又不好用，因此想到写一个python的爬虫获取网页查询的结果

编写爬虫时主要遇到问题及解决方法：

1.网页采用POST形式传递参数
利用Firefox+burp抓取查询IP时的数据，发现查询时POST的数据主要有两项：

ipAddress=8.8.8.8
btnLookup=search

知道了POST数据，这样就可以利用python中的urllib传递参数了

2.解析HTML获取相应信息
首先使用正则表达式，总觉得不好用，然后利用BeautifulSoup，果然世界都清爽了

下面直接贴出代码


import urllib2
import urllib
from bs4 import BeautifulSoup

def get_html(IPv4):
    address = IPv4
    url = 'https://www.ip2location.com/demo'
    headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
    value = {
    'ipAddress':address,
    'btnLookup':'search',
    }
    data = urllib.urlencode(value)
    req = urllib2.Request(url,data,headers)
    response = urllib2.urlopen(req)
    the_page = response.read()
    return the_page

def get_information(html):
    infolist = []
    temp = []
    soup = BeautifulSoup(html)
    items = soup.find_all(style = 'vertical-align:middle;')
    for item in items:
        infolist.append(item.string)
    #Location title sturcture is unique, including <img>
    for item in items[1].descendants:
        temp.append(item)
    infolist[1] = temp[1]
    #for item in infolist:
        #print item
    return infolist

    #infolist
    #--------
    #IP Address
    #Location
    #Latitude & Longitude
    #ISP
    #Local Time
    #Domain
    #Net Speed
    #IDD & Area Code
    #ZIP Code
    #Weather Station
    #Mobile Country Code (MCC)
    #Mobile Network Code (MNC)
    #Carrier Name
    #Elevation
    #Usage Type
    #Anonymous Proxy
    #Shortcut
    #Twitterbot
    #Slackbot

#function call
html = get_html('8.8.8.8')
get_information(html)

调用函数后返回名为infolist的列表，获取的内容如下所示：

8.8.8.8
 United States, California, Mountain View
37.405992, -122.078515 (37°24'22"N   122°4'43"W)
Google Inc.
12 Nov, 2016 08:38 PM (UTC -07:00)
google.com
-
(1) 650
94043
Mountain View (USCA0746)
-
-
-
31m
(SES) Search Engine Spider
No
http://www.ip2location.com/8.8.8.8
@ip2location 8.8.8.8
/ip2location 8.8.8.8

利用Python爬虫获取IP2Location定位数据

相关阅读

相关文章

相关问答

相关文档