我想通过Docker容器在AWS Lamda上运行selenium脚本。
我正在使用AWS EC2构建容器,然后通过AWS Lambda RIE在本地测试容器。一旦测试成功,容器将在ECR注册,以便馈送AWS Lambda。
尽管RIE在EC2上的本地测试总是成功的,但我无法让Lambda正常工作。Lambda测试当前总是失败,并显示以下错误消息:
{
"errorMessage": "Message: session not created\nfrom tab crashed\n (Session info: headless chrome=93.0.4577.63)\n",
"errorType": "SessionNotCreatedException",
"stackTrace": [
" File \"/var/task/app.py\", line 32, in handler\n driver = webdriver.Chrome(\n",
" File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n RemoteWebDriver.__init__(\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n self.start_session(capabilities, browser_profile)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n response = self.execute(Command.NEW_SESSION, parameters)\n",
" File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n self.error_handler.check_response(response)\n",
" File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n raise exception_class(message, screen, stacktrace)\n"
]
}
在这里,您可以找到我实际使用的所有代码:
文档
FROM public.ecr.aws/lambda/python:3.8
#Download and install Chrome
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm > ./google-chrome-stable_current_x86_64.rpm
RUN yum install -y ./google-chrome-stable_current_x86_64.rpm
RUN rm ./google-chrome-stable_current_x86_64.rpm
#Download and install chromedriver
RUN yum install -y unzip
RUN curl http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip > /tmp/chromedriver.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN rm /tmp/chromedriver.zip
RUN yum remove -y unzip
#Upgrade pip and install python dependences
RUN pip3 install --upgrade pip
RUN pip3 install selenium --target "${LAMBDA_TASK_ROOT}"
#Copy app.py
COPY app.py ${LAMBDA_TASK_ROOT}
CMD ["app.handler"]
app.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def handler(event, context):
chrome_options = Options()
chrome_options.add_argument("--allow-running-insecure-content")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-tools")
chrome_options.add_argument("--no-zygote")
chrome_options.add_argument("--v=99")
chrome_options.add_argument("--single-process")
chrome_options.binary_location = '/usr/bin/google-chrome-stable'
capabilities = webdriver.DesiredCapabilities().CHROME
capabilities['acceptSslCerts'] = True
capabilities['acceptInsecureCerts'] = True
driver = webdriver.Chrome(
executable_path='/usr/local/bin/chromedriver',
options=chrome_options,
desired_capabilities=capabilities)
if driver:
response = {
"statusCode": 200,
"body": json.dumps("Selenium Driver Initiated")
}
return response
使用RIE进行本地容器测试
$ docker run -p 9000:8080 aws-scraper
results in > time="2021-09-03T15:24:13.269" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"
$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
results in > {"statusCode": 200, "body": "\"Selenium Driver Initiated\""}[
我真的想不出来。我也试图在AWS EC2上跟踪硒的工作,但没有在AWS Lambda上,但没有用。
任何帮助都将非常受欢迎。事先谢谢你。
通过从本回购协议中借用dockerfile和selenium webdriver chrome选项来解决:https://github.com/rchauhan9/image-scraper-lambda-container.git
Dockerfile现在看起来如下所示:
# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.12"
# Stage 1
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine
RUN apk add --no-cache \
libstdc++
# Stage 2
FROM python-alpine AS build-image
RUN apk add --no-cache \
build-base \
libtool \
autoconf \
automake \
libexecinfo-dev \
make \
cmake \
libcurl
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
RUN mkdir -p ${FUNCTION_DIR}
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}
# Stage 3
FROM python-alpine as build-image2
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN apk update \
&& apk add gcc python3-dev musl-dev \
&& apk add jpeg-dev zlib-dev libjpeg-turbo-dev
COPY requirements.txt .
RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
# Stage 4
FROM python-alpine
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}
COPY --from=build-image2 ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN apk add jpeg-dev zlib-dev libjpeg-turbo-dev \
&& apk add chromium chromium-chromedriver
ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
RUN chmod 755 /usr/bin/aws-lambda-rie
COPY app/* ${FUNCTION_DIR}
COPY entry.sh /
ENTRYPOINT [ "/entry.sh" ]
CMD [ "app.handler" ]
和应用程序。py现在看起来如下所示;
import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
def handler(event, context):
chrome_options = Options()
chrome_options.add_argument('--autoplay-policy=user-gesture-required')
chrome_options.add_argument('--disable-background-networking')
chrome_options.add_argument('--disable-background-timer-throttling')
chrome_options.add_argument('--disable-backgrounding-occluded-windows')
chrome_options.add_argument('--disable-breakpad')
chrome_options.add_argument('--disable-client-side-phishing-detection')
chrome_options.add_argument('--disable-component-update')
chrome_options.add_argument('--disable-default-apps')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-domain-reliability')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--disable-features=AudioServiceOutOfProcess')
chrome_options.add_argument('--disable-hang-monitor')
chrome_options.add_argument('--disable-ipc-flooding-protection')
chrome_options.add_argument('--disable-notifications')
chrome_options.add_argument('--disable-offer-store-unmasked-wallet-cards')
chrome_options.add_argument('--disable-popup-blocking')
chrome_options.add_argument('--disable-print-preview')
chrome_options.add_argument('--disable-prompt-on-repost')
chrome_options.add_argument('--disable-renderer-backgrounding')
chrome_options.add_argument('--disable-setuid-sandbox')
chrome_options.add_argument('--disable-speech-api')
chrome_options.add_argument('--disable-sync')
chrome_options.add_argument('--disk-cache-size=33554432')
chrome_options.add_argument('--hide-scrollbars')
chrome_options.add_argument('--ignore-gpu-blacklist')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--metrics-recording-only')
chrome_options.add_argument('--mute-audio')
chrome_options.add_argument('--no-default-browser-check')
chrome_options.add_argument('--no-first-run')
chrome_options.add_argument('--no-pings')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--no-zygote')
chrome_options.add_argument('--password-store=basic')
chrome_options.add_argument('--use-gl=swiftshader')
chrome_options.add_argument('--use-mock-keychain')
chrome_options.add_argument('--single-process')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--user-data-dir={}'.format('/tmp/user-data'))
chrome_options.add_argument('--data-path={}'.format('/tmp/data-path'))
chrome_options.add_argument('--homedir={}'.format('/tmp'))
chrome_options.add_argument('--disk-cache-dir={}'.format('/tmp/cache-dir'))
driver = webdriver.Chrome(
executable_path='/usr/bin/chromedriver',
options=chrome_options)
if driver:
print("Selenium Driver Initiated")
response = {
"statusCode": 200,
"body": json.dumps(html, ensure_ascii=False)
}
return response
老实说,我仍然不明白为什么这些修改做了这项工作。任何关于这一点的想法都非常受欢迎!
再次感谢大家的帮助和支持
问题内容: 我在CI和CD上创建了Jenkinsfile,Dockerfile,Dockerfile.test到CI和CD,在GitHub上构建了我的服务器API,我在Jenkins上构建了该构建,并且构建成功,并且我的docker在Jenkinsfile阶段也在容器上运行,我创建了用于测试和部署在服务器API上,并使用docker作为容器 我也使用docker-compose在docker上运行
尽管CAS服务器在Tomcat下工作得很好,但我有一些问题要使它在WebLogic12c下工作。在Weblogic上部署之前,我遵循以下指南:https://github.com/gentics/gentics-sso-cas/wiki/oracle-weblogic-configuration在webcontent/web-inf/with content中添加文件Weblogic.xml:
我创建了一个JasperReport应用程序,它在tomcat服务器上运行良好。但是当我使用相同的jar在Jboss上运行时,它会显示错误 原因:java.lang.ClassCastException:org.apache.xerces.jaxp.DocumentBuilderFactoryImpl无法强制转换为javax.xml.parsers.DocumentBuilderFactor.ne
问题内容: 我有一个运行在Amazon EC2服务器上的简单meteor应用程序。一切都很好。我通过项目目录中的用户手动启动它。 但是,我想要这个应用程序 开机启动 不受挂断的困扰 我尝试通过运行它,但是当我尝试注销EC2实例时,出现“您有正在运行的作业”消息。继续注销将停止该应用程序。 如何使应用程序在启动时启动并保持运行状态(除非由于某种原因而崩溃)? 问题答案: 永久安装并使用启动脚本。 我
问题内容: 任何人都可以指出以下步骤/资源: 如何在Amazon EC2上部署Java EE应用 实例重新启动后(可能使用amazon-ebs)维护对应用服务器的元数据的更改(部署新应用程序) 问题答案: 如果您还没有运行过它,请先检查一下:http : //docs.aws.amazon.com/gettingstarted/latest/awsgsg- intro/intro.html, 它可