Lambda Container 爬虫
文章的关键是:如何构建 image for Lambda。
一、爬虫取图片上传S3
importscraperimportaws_s3 as s3sdefhandler(event, context):
scr=scraper.ImageScraper()
urls= scr.get_image_urls(query=event['query'], max_urls=event['count'], sleep_between_interactions=1)
files=[]for url inurls:
img_obj, img_hash= scr.get_in_memory_image(url, 'jpeg')
files.append(img_hash)
s3s.upload_object(img_obj, event['bucket'], event['folder_path']+img_hash, 'jpeg')
scr.close_connection()return "Successfully loaded {} images to bucket {}. Folder path {} and file names {}.".format(event['count'],
event['bucket'],
event['folder_path'],
files)
问题的关键,就是如何理解“触发机制”:
可见,container的三大 关键字:WORKDIR, ENTRYPOINT, CMD。
#
#Stage 4 - final runtime image#
#Grab a fresh copy of the Python image
FROM python-alpine
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}#Copy in the built dependencies
COPY --from=build-image2 ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN apk add jpeg-dev zlib-dev libjpeg-turbo-dev \
&& apk add chromium chromium-chromedriver#(Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
RUN chmod 755 /usr/bin/aws-lambda-rie#Copy handler function
COPY app/*${FUNCTION_DIR}
COPY entry.sh/ENTRYPOINT ["/entry.sh"]
CMD ["app.handler" ]
下面有四个阶段,每个阶段,变量都要引用一下,记得这点。
#Define global args
ARG FUNCTION_DIR="/home/app/"ARG RUNTIME_VERSION="3.9"ARG DISTRO_VERSION="3.12"
#
# Stage 1 - bundle base image + runtime#
#Grab a fresh copy of the image and install GCC#A minimal Docker image based on Alpine Linux with a complete package index and only 5 MB in size!
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine#Install GCC (Alpine uses musl but we compile and link dependencies with GCC)
RUN apk add --no-cache \
libstdc++
#
# Stage 2 - build function and dependencies
#FROM python-alpine AS build-image#Install aws-lambda-cpp build dependencies
RUN apk add --no-cache \
build-base \
libtool \
autoconf \
automake \
libexecinfo-dev \
make \
cmake \
libcurl#Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION#Create function directory
RUN mkdir -p ${FUNCTION_DIR}#*** Install Lambda Runtime Interface Client for Python
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}#
# Stage 3 - Add app related dependencies 前两步是套路,这里根据具体情况,添加依赖内容。
#FROM python-alpine as build-image2
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}#Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}#Copy over and install requirements
RUN apk update \&& apk add gcc python3-dev musl-dev \&& apk add jpeg-dev zlib-dev libjpeg-turbo-dev
COPY requirements.txt .
RUN python${RUNTIME_VERSION}-m pip install -r requirements.txt --target ${FUNCTION_DIR}#
# Stage 4 - final runtime image 主要是拷贝 主代码#
#Grab a fresh copy of the Python image
FROM python-alpine
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}#Copy in the built dependencies
COPY --from=build-image2 ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN apk add jpeg-dev zlib-dev libjpeg-turbo-dev \&& apk add chromium chromium-chromedriver#(Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
RUN chmod 755 /usr/bin/aws-lambda-rie#Copy handler function
COPY app/*${FUNCTION_DIR}
COPY entry.sh/ENTRYPOINT ["/entry.sh"]
CMD ["app.handler" ]
二、本地测试
共享了aws的账户信息。
image-scraper-lambda-container$ docker run -p 9000:8080 -v ~/.aws/:/root/.aws/ lambda/image-scraper:1.0
time="2021-01-21T13:03:12.648" level=info msg="exec '/usr/local/bin/python' (cwd=/home/app, handler=app.handler)"
如此,就可以操作S3等服务了呢。
image-scraper-lambda-container$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"query":"beagle puppy", "count":3, "bucket":"my-dogs-youtube", "folder_path":"local/"}'
"Successfully loaded 3 images to bucket my-dogs-youtube. Folder path local/ and file names ['a154ddd833.jpeg', '952693f107.jpeg', 'ecd5070ed4.jpeg']."
三、线上测试
创建Lambda by Image。
然后测试:
{"query": "beagle puppy","count": 3,"bucket": "tmp-my-dogs-youtube","folder_path": "local/"}
Lambda 支持 10GB 镜像
与上一个例子没有区别,只要就是多出了 API GATEWAY。
/* implement */
End.