当前位置: 首页 > 面试题库 >

在Docker Alpine中安装熊猫

汪成仁
2023-03-14
问题内容

真的 很难尝试在中安装稳定的数据科学软件包配置docker。使用这样的主流相关工具应该更容易。

以下是 曾经 工作过的 Dockerfile ,有点 破译 ,将其从软件包核心中删除并单独安装,并指定了(因为据称更高版本与冲突)。
__pandas``pandas<0.21.0``numpy

    FROM alpine:3.6

    ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    nltk \
    "

RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && pip3 install 'pandas<0.21.0' \    #<---------- PANDAS
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $PACKAGES \
    && rm -rf /var/cache/apk/*

# set working directory
WORKDIR /usr/src/app

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt # other than data science packages go here
RUN pip install -r requirements.txt

# add entrypoint.sh
COPY ./entrypoint.sh /usr/src/app/entrypoint.sh

RUN chmod +x /usr/src/app/entrypoint.sh

# add app
COPY . /usr/src/app

# run server
CMD ["/usr/src/app/entrypoint.sh"]

上面的配置可以正常工作。 现在 发生的事情是构建确实可以通过,但是 导入pandas失败 并出现以下错误:

ImportError: Missing required dependencies ['numpy']

numpy 1.16.1安装以来,我不知道哪个numpy pandas正在尝试找到…

有谁知道如何为此获得稳定的解决方案?

注意docker至少从上述软件包中抽取数据的交钥匙映像构成的解决方案Dockerfile也将非常受欢迎。

编辑1

如果我将数据包的安装移至requirements.txt,如注释中所建议,如下所示:

requirements.txt

(...)
numpy==1.16.1 # or numpy==1.16.0
scikit-learn==0.20.2
scipy==1.2.1
nltk==3.4   
pandas==0.24.1 # or pandas== 0.23.4
matplotlib==3.0.2 
(...)

Dockerfile

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt

再次pandas抱怨,抱怨numpy

Collecting numpy==1.16.1 (from -r requirements.txt (line 61))
  Downloading https://files.pythonhosted.org/packages/2b/26/07472b0de91851b6656cbc86e2f0d5d3a3128e7580f23295ef58b6862d6c/numpy-1.16.1.zip (5.1MB)
Collecting scikit-learn==0.20.2 (from -r requirements.txt (line 62))
  Downloading https://files.pythonhosted.org/packages/49/0e/8312ac2d7f38537361b943c8cde4b16dadcc9389760bb855323b67bac091/scikit-learn-0.20.2.tar.gz (10.3MB)
Collecting scipy==1.2.1 (from -r requirements.txt (line 63))
  Downloading https://files.pythonhosted.org/packages/a9/b4/5598a706697d1e2929eaf7fe68898ef4bea76e4950b9efbe1ef396b8813a/scipy-1.2.1.tar.gz (23.1MB)
Collecting nltk==3.4 (from -r requirements.txt (line 64))
  Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c583fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Collecting pandas==0.24.1 (from -r requirements.txt (line 65))
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 361, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_e5z6o6_/pandas/

编辑2

这似乎是一个未pandas解决的问题。有关更多详细信息,请参阅:

pandas-dev github

“不幸的是,这意味着require.txt文件不足以设置安装了熊猫的新环境(例如在docker容器中)”。

  **ImportError**:

  IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

  Importing the multiarray numpy extension module failed.  Most
  likely you are trying to import a failed build of numpy.
  Here is how to proceed:
  - If you're working with a numpy git repository, try `git clean -xdf`
    (removes all files not under version control) and rebuild numpy.
  - If you are simply trying to use the numpy version that you have installed:
    your installation is broken - please reinstall numpy.
  - If you have already reinstalled and that did not fix the problem, then:
    1. Check that you are using the Python you expect (you're using /usr/local/bin/python),
       and that you have no directories in your PATH or PYTHONPATH that can
       interfere with the Python and numpy versions you're trying to use.
    2. If (1) looks fine, you can open a new issue at
       https://github.com/numpy/numpy/issues.  Please include details on:
       - how you installed Python
       - how you installed numpy
       - your operating system
       - whether or not you have multiple versions of Python installed
       - if you built from source, your compiler versions and ideally a build log

编辑3

requirements.txt -–>
https://pastebin.com/0icnx0iu

编辑4

从20年1月12日开始,接受的解决方案开始不再起作用。
现在,生成中断没有pandas,但scipy但经过numpy,同时建立scipy's轮。这是日志:

  ----------------------------------------
  ERROR: Failed building wheel for scipy
  Running setup.py clean for scipy
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
       cwd: /tmp/pip-install-s6nahssd/scipy
  Complete output (9 lines):

  `setup.py clean` is not supported, use one of the following instead:

    - `git clean -xdf` (cleans all files)
    - `git clean -Xdf` (cleans all versioned files, doesn't touch
                        files that aren't checked into the git repo)

  Add `--force` to your command to use it anyway if you must (unsupported).

  ----------------------------------------
  ERROR: Failed cleaning build dir for scipy
Successfully built numpy
Failed to build scipy
ERROR: Could not build wheels for scipy which use PEP 517 and cannot be installed directly

从错误看来,构建过程正在使用python3.6,而我正在使用FROM alpine:3.7

完整日志在这里-> https://pastebin.com/Tw4ubxSA

这是当前的Dockerfile:

https://pastebin.com/3SftEufx


问题答案:

如果您未绑定Alpine 3.6,则应使用Alpine 3.7(或更高版本)。

在Alpine 3.6上,安装matplotlib失败:

Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/26/04/8b381d5b166508cc258632b225adbafec49bbe69aa9a4fa1f1b461428313/matplotlib-3.0.3.tar.gz (36.6MB)
    Complete output from command python setup.py egg_info:
    Download error on https://pypi.org/simple/numpy/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    Couldn't find index page for 'numpy' (maybe misspelled?)
    Download error on https://pypi.org/simple/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    No local packages or working download links found for numpy>=1.10.0

但是,在Alpine
3.7上,它起作用了。这可能是由于numpy版本问题(请参阅此处),但是我无法确定。克服了这个问题,软件包的构建和安装成功完成-
花了大约30分钟的时间(由于Alpine的musl-libc与Python的Wheels格式不兼容,因此所有使用pip安装的软件包都必须从源代码构建)。

请注意,这是一项重要的更改:您只应在之后删除build-runtime虚拟包(apk del build-runtimepip install。此外,如果适用,您可以取代numpy的1.16.11.16.2,这是出厂的版本(否则1.16.2将被卸载,1.16.1从源头建立,进一步提高构建时间)
-我还没有尝试这样做,虽然。

作为参考,这是我稍作修改的Dockerfile和docker
build输出。

注意:

通常,选择Alpine作为最小化图像大小的基础(Alpine也很光滑,但是由于glibc /
musl而与大陆Linux应用程序存在兼容性问题)。为此,必须从源代码构建Python软件包,因为您会得到一个非常肿的映像-
在进行任何清理之前需要900MB,这也需要很长时间才能构建。可以通过除去所有中间编译工件,构建依赖项等来极大地压缩映像,但是仍然可以。

如果无法获得Python软件包版本,而无需从源代码构建它们,则需要在Alpine上工作,我建议您尝试使用其他更小,更兼容的基本映像,例如debian- slimubuntu

编辑:

在具有附加要求的“编辑3”之后,这里是更新的Dockerfile和Docker
build输出。添加了以下软件包来满足构建依赖性:

postgresql-dev libffi-dev libressl-dev libxml2 libxml2-dev libxslt libxslt-dev libjpeg-turbo-dev zlib-dev

对于由于特定标头而无法构建的软件包,我使用了Alpine的软件包内容搜索来查找丢失的软件包。专门针对cffiffi.h缺少标头,需要libffi- dev打包:https
:
//pkgs.alpinelinux.org/contents?file=ffi.h&path=&name=&branch=v3.7。

或者,当软件包构建失败不是很明显时,可以参考特定软件包的安装说明,例如Pillow。

在压缩之前,新的映像大小为1.04GB。为了减少它,您可以删除Python和pip缓存:

RUN apk del build-runtime && \
    find -type d -name __pycache__ -prune -exec rm -rf {} \; && \
    rm -rf ~/.cache/pip

使用时,图片大小可减少到661MB docker build --squash



 类似资料:
  • 我很难在中安装稳定的数据科学包配置。有了这样的主流相关工具,这应该更容易实现。 下面是Dockerfile,它使用了一些技巧,从包核心中删除,并单独安装,指定

  • 我有两个python发行版(python2.7,python3.6),在这两个我已经安装了和,但不能使用 这些是我尝试导入熊猫时产生的错误 在Python2.7中 文件"/usr/local/lib/python2.7/dist-包/熊猫/init.py",第19行,在"缺少必需的依赖项{0}"中。格式(missing_dependencies)) 缺少必需的依赖项 然后导入Numpy 进口恐怖主

  • 我已经用和python3.7安装了它,但是当我尝试导入pandas并运行代码时,会出现错误。 Traceback(最近一次调用最后一次):文件/用户/芭比/Python/测试/test.py,第1行,在导入熊猫为pd ModuleNotFoundError:没有名为'熊猫'的模块 如果我尝试再次安装...它说这个。 已满足pip3安装pandas要求:已满足pandas in/usr/local/

  • 问题内容: 我在使用某些熊猫功能时遇到了麻烦。如何检查我的安装版本是什么? 问题答案: 检查: Pandas还提供了一个实用程序功能,它还报告其依赖项的版本:

  • 我正试图在Ubuntu 16上安装tensorflow for python 2.7。我正在使用pip安装tensorflow gpu,在终端中收到以下消息: 已满足要求:tensorflow gpu in/usr/local/lib/python3.5/dist-packages已满足要求:wheel 当我尝试导入tensorflow时,它会说

  • 问题内容: 所以我试图在我的virtualenv中(在控制台中)运行一个简单的matplotlib示例。这是代码: 但是,当我运行它时,我得到: ImportError:Gtk *后端需要安装pygtk。 现在,乐趣开始了。我试图点安装pygtk,但它抛出: 我检查了文件并说尝试。然而。我不太确定如何在virtualenv中执行此操作。为了在virtualenv中安装pygtk,我在哪里解压缩源代