Python Crawler(5)Deployment on RaspberryPi
胡安怡
2023-12-01
Python Crawler(5)Deployment on RaspberryPi
Check python version
>python -V
Python 2.7.13
Install pip on raspberryPi
>sudo apt-get install python-pip
>pip -V
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)
It worked before but today when I run pip -V, it stuck.
Try this one
>curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
>python get-pip.py
>pip -V
pip 9.0.1 from /usr/local/lib/python2.7/dist-packages (python 2.7)
Install scrapy ENV
>sudo pip install scrapy
Exception
No package 'libffi' found
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Solution:
>sudo apt-get install libxml2-dev libxslt1-dev
>sudo pip install lxml
Exceptions:
Could not import setuptools which is required to install from a source distribution.
Please install setuptools.
src/lxml/etree.c:91:20: fatal error: Python.h: No such file or directory
Running setup.py install for cffi ... error
Running setup.py install for cryptography ... error
Solution:
>sudo apt-get install python-dev
>sudo pip install -U setuptools
>sudo apt-get install python-cffi
>sudo apt-get install gcc libffi-dev libssl-dev python-dev
Compile and Install does not work.
So try this one
>sudo apt-get install python-cryptography
>sudo apt-get install python-crypto
>sudo apt-get install -y python-lxml
>sudo pip install scrapy
It does not work on raspberryPi 1 and 2. So I just install scrapyd there.
>sudo pip install scrapyd
>scrapyd --version
twistd (the Twisted daemon) 17.5.0
Copyright (c) 2001-2016 Twisted Matrix Laboratories.
See LICENSE for details.
Even on my raspberrypi1, I have issues when I run the command
>scrapy shell 'http://quotes.toscrape.com/page/1'
Exceptions:
'module' object has no attribute 'OP_NO_TLSv1_1'
Solution:
https://github.com/scrapy/scrapy/issues/2473
>sudo pip install --upgrade scrapy
>sudo pip install --upgrade twisted
>sudo pip install --upgrade pyopenssl
Install the Clients
>sudo pip install scrapyd-client
Install deploy tool
>sudo pip install scrapyd-deploy
Install selenium Support
>sudo pip install selenium
Start the Server
>scrapyd
Bind issue I guess, I can access 6800 on that server with localhost:6800, but not work from remote.
Add one file in /opt/scrapyd
cat scrapyd.conf
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
>nohup scrapyd &
Command to install all dependencies
>pip install -r requirements.txt
List the base version
https://docs.resin.io/runtime/resin-base-images/?ref=dockerhub
I also docker that application.
start.sh easily start the service
#!/bin/sh -ex
#start the service
cd /tool/scrapyd/
scrapyd
The Makefile open 6800 Port
IMAGE=sillycat/public
TAG=raspberrypi-scrapyd
NAME=raspberrypi-scrapyd
docker-context:
build: docker-context
docker build -t $(IMAGE):$(TAG) .
run:
docker run -d -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG)
debug:
docker run -ti -p 6800:6800 --name $(NAME) $(IMAGE):$(TAG) /bin/bash
clean:
docker stop ${NAME}
docker rm ${NAME}
logs:
docker logs ${NAME}
publish:
docker push ${IMAGE}:${TAG}
fetch:
docker pull ${IMAGE}:${TAG}
The Dockerfile has all the installation steps
#Set up FTP in Docker
#Prepre the OS
FROM resin/raspberrypi3-python
MAINTAINER Carl Luo <luohuazju@gmail.com>
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get -y update
RUN apt-get -y dist-upgrade
#install the software
RUN pip install scrapyd
#copy the config
RUN mkdir -p /tool/scrapyd/
ADD conf/scrapyd.conf /tool/scrapyd/
#set up the app
EXPOSE 6800
RUN mkdir -p /app/
ADD start.sh /app/
WORKDIR /app/
CMD[ "./start.sh" ]
conf/scrapyd.conf will have the configurations
[scrapyd]
eggs_dir = eggs
logs_dir = logs
items_dir =
jobs_to_keep = 100
dbs_dir = dbs
max_proc = 0
max_proc_per_cpu = 20
finished_to_keep = 100
poll_interval = 5.0
bind_address = 0.0.0.0
http_port = 6800
debug = off
runner = scrapyd.runner
application = scrapyd.app.application
launcher = scrapyd.launcher.Launcher
webroot = scrapyd.website.Root
[services]
schedule.json = scrapyd.webservice.Schedule
cancel.json = scrapyd.webservice.Cancel
addversion.json = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json = scrapyd.webservice.ListSpiders
delproject.json = scrapyd.webservice.DeleteProject
delversion.json = scrapyd.webservice.DeleteVersion
listjobs.json = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
References:
scrape
http://sillycat.iteye.com/blog/2391523
http://sillycat.iteye.com/blog/2391524
http://sillycat.iteye.com/blog/2391685
http://sillycat.iteye.com/blog/2391926
https://stackoverflow.com/questions/33785755/getting-could-not-find-function-xmlcheckversion-in-library-libxml2-is-libxml2
https://github.com/fredley/play-pi/issues/22
https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory