当前位置: 首页 > 工具软件 > Crawlab > 使用案例 >

爬虫系统 Crawlab 搭建

桓喜
2023-12-01

Crawlab 搭建

基于unbuntu系统环境配置

1.更新系统指令 
sudo apt-get update

部署docker

2.1.若有docker则先卸载旧版
apt-get remove docker docker-engine docker.io containerd runc

2.2.
apt-get install ca-certificates curl gnupg lsb-release

2.3.安装证书
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

2.4.写入软件源
sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

2.5.安装
sudo apt-get install docker-ce docker-ce-cli containerd.io

2.6.启动Docker
systemctl start docker

2.7.安装工具
apt-get -y install apt-transport-https ca-certificates curl software-properties-common

2.8.重启Docker
service docker restart

2.9.启动hellow-world测试
sudo docker run hello-world

查看docker版本
sudo docker version

查看镜像
sudo docker images

拉取镜像

保证您已经安装好 Docker,并能够拉取 Crawlab 和 MongoDB 的镜像。

docker pull crawlabteam/crawlab
docker pull mongo

如果您还没安装 Docker Composeopen in new window,您可以执行以下命令。

pip install docker-compose

出错后更新 pip3

 Downloading websocket_client-0.59.0-py2.py3-none-any.whl (67 kB)
     |███████████████████▌            | 40 kB 9.1 kB/s eta 0:00:03ERROR: Exception:
Traceback (most recent call last):
  File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 425, in _error_catcher
    yield
  File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 507, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/usr/share/python-wheels/CacheControl-0.12.6-py2.py3-none-any.whl/cachecontrol/filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "/usr/lib/python3.8/http/client.py", line 459, in read
    n = self.readinto(b)
  File "/usr/lib/python3.8/http/client.py", line 503, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/base_command.py", line 186, in _main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/_internal/commands/install.py", line 357, in run
    resolver.resolve(requirement_set)
  File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 177, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/usr/lib/python3/dist-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for
    abstract_dist = self.preparer.prepare_linked_requirement(req)
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 480, in prepare_linked_requirement
    local_path = unpack_url(
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 282, in unpack_url
    return unpack_http_url(
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 158, in unpack_http_url
    from_path, content_type = _download_http_url(
  File "/usr/lib/python3/dist-packages/pip/_internal/operations/prepare.py", line 303, in _download_http_url
    for chunk in download.chunks:
  File "/usr/lib/python3/dist-packages/pip/_internal/utils/ui.py", line 160, in iter
    for x in it:
  File "/usr/lib/python3/dist-packages/pip/_internal/network/utils.py", line 15, in response_chunks
    for chunk in response.raw.stream(
  File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 564, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 529, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/share/python-wheels/urllib3-1.25.8-py2.py3-none-any.whl/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

sudo pip3 install --upgrade pip

配置 docker-compose.yml

创建配置文件并命名为 docker-compose.yml,然后输入以下内容。

version: '3.3'
services:
  master:
    image: crawlabteam/crawlab
    container_name: crawlab_master
    environment:
      CRAWLAB_NODE_MASTER: "Y"
      CRAWLAB_MONGO_HOST: "mongo"
    ports:
      - "8080:8080"
    depends_on:
      - mongo
  mongo:
    image: mongo:4.2

#启动 Crawlab

执行以下命令启动 Crawlab 以及 MongoDB。

docker-compose up -d

现在可以打开浏览器并导航到 http://localhost:8080 并开始使用 Crawlab。

搭建Crawlab

crawlab中文文档:https://docs.crawlab.cn/zh/

 类似资料: