【TLJH】the-littlest-jupyterhub国内搭建和配置详细教程

万英武

2023-12-01

前言

什么是jupyterhub

jupyterhub项目地址在https://github.com/jupyterhub/jupyterhub,使用JupyterHub，可以创建一个多用户的Hub，它生成、管理和代理单用户Jupyter notebook服务器的多个实例。
关于Jupyter notebook不是本文章主要内容，学习使用python的相关人员应该知道，若是想了解可以到官网https://jupyter.org/去学习了解。

什么是TLJH(the-littlest-jupyterhub)

TLJH(the-littlest-jupyterhub)可以帮助在一台服务器上为1-100个用户提供jupyter notebook实例，可以说是最小规模的jupyterhub了。
用于在单台机器上（在云中或在自己的硬件机器）部署 JupyterHub。它旨在成为一个更轻量级和可维护的解决方案，适用于大小、可扩展性和成本节约不是大问题的jupyterhub实例。
若是想部署大集群的话Zero to JupyterHub on Kubernetes允许你在Kubernetes上部署JupyterHub。这使得JupyterHub可以扩展到数千个用户，可以灵活地增加/缩小所需资源的大小，并在管理用户会话时使用容器技术。需要的话可以自行学习搭建，不是本文章内容。

一、安装要求

必须是ubuntu18.04以上的机器
需要root权限
有独立IP（若本机使用则不需要）
配置要求至少1G RAM 2核处理器，2G硬盘，若是用户数量多的话按需增加。

二、用户权限

Littlest JupyterHub 处于测试状态，不应在安全关键情况下使用。
每个 JupyterHub 用户都会在首次启动服务器时创建自己的 Unix 用户帐户。这可以保护用户彼此之间，在众所周知的位置为他们提供一个主目录，并允许基于文件系统权限进行共享。
TLJH会为名为 <username> 的 JupyterHub 用户创建的 unix 用户帐户是 jupyter-<username>。此前缀有助于防止与已经存在的用户发生冲突 - 否则名为 root 的用户可以轻松获得对您服务器的完全 root 访问权限。如果用户名（包括 jupyter- 前缀）超过 26 个字符，则将其截断为 26 个字符并附加一个 5 个字符的哈希。这将用户名保持在 32 个字符的 linux 用户名限制之下，同时也减少了冲突的机会。
jupyterHub 管理员用户被添加到用户组 jupyterhub-admins，该用户组被授予对整个服务器的完全 root 访问权限，并在终端上使用 sudo 命令。无需密码。
当您从JupyterHub管理控制台删除用户时，他们的unix用户帐户不会被删除。这意味着即使您将他们从JupyterHub删除后，他们仍然可以访问服务器。管理员应该手动删除用户从服务器&存档他们的主目录需要。例如，以下命令删除了与JupyterHub用户yuvipanda关联的unix用户。

sudo userdel jupyter-yuvipanda

/tmp 由大多数计算系统中的所有用户共享，这一直是安全问题的根源。 Littlest JupyterHub 使用 systemd 的 PrivateTmp 功能为每个用户提供自己的临时 /tmp。

安装步骤

国内安装

在此之前先说下因为网络原因，下载很难成功，我这边贴出我修改的国内安装脚本。大家复制粘贴下来直接使用python运行即可，示例

sudo python3 <python脚本文件> |  sudo -E python3 - --admin admin

"""
Bootstrap an installation of TLJH.

Sets up just enough TLJH environments to invoke tljh.installer.

This script is run as:

    curl <script-url> | sudo python3 -

Constraints:
  - Entire script should be compatible with Python 3.6 (We run on Ubuntu 18.04+)
  - Script should parse in Python 3.4 (since we exit with useful error message on Ubuntu 14.04+)
  - Use stdlib modules only
"""
import os
from http.server import SimpleHTTPRequestHandler, HTTPServer
import multiprocessing
import subprocess
import sys
import logging
import shutil
import urllib.request

html = """
<html>
<head>
    <title>The Littlest Jupyterhub</title>
</head>
<body>
  <meta http-equiv="refresh" content="30" >
  <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <meta name="viewport" content="width=device-width">
  <img class="logo" src="https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/docs/images/logo/logo.png">
  <div class="loader center"></div>
  <div class="center main-msg">Please wait while your TLJH is building...</div>
  <div class="center logs-msg">Click the button below to see the logs</div>
  <div class="center tip" >Tip: to update the logs, refresh the page</div>
  <button class="logs-button center" οnclick="window.location.href='/logs'">View logs</button>
</body>

  <style>
    button:hover {
      background: grey;
    }

    .logo {
      width: 150px;
      height: auto;
    }
    .center {
      margin: 0 auto;
      margin-top: 50px;
      text-align:center;
      display: block;
    }
    .main-msg {
      font-size: 30px;
      font-weight: bold;
      color: grey;
      text-align:center;
    }
    .logs-msg {
      font-size: 15px;
      color: grey;
    }
    .tip {
      font-size: 13px;
      color: grey;
      margin-top: 10px;
      font-style: italic;
    }
    .logs-button {
      margin-top:15px;
      border: 0;
      color: white;
      padding: 15px 32px;
      font-size: 16px;
      cursor: pointer;
      background: #f5a252;
    }
    .loader {
      width: 150px;
      height: 150px;
      border-radius: 90%;
      border: 7px solid transparent;
      animation: spin 2s infinite ease;
      animation-direction: alternate;
    }
    @keyframes spin {
      0% {
        transform: rotateZ(0deg);
        border-top-color: #f17c0e
      }
      100% {
        transform: rotateZ(360deg);
        border-top-color: #fce5cf;
      }
    }
  </style>
</head>
</html>

"""

logger = logging.getLogger(__name__)

def get_os_release_variable(key):
    """
    Return value for key from /etc/os-release

    /etc/os-release is a bash file, so should use bash to parse it.

    Returns empty string if key is not found.
    """
    return subprocess.check_output([
        '/bin/bash', '-c',
        "source /etc/os-release && echo ${{{key}}}".format(key=key)
    ]).decode().strip()

# Copied into tljh/utils.py. Make sure the copies are exactly the same!
def run_subprocess(cmd, *args, **kwargs):
    """
    Run given cmd with smart output behavior.

    If command succeeds, print output to debug logging.
    If it fails, print output to info logging.

    In TLJH, this sends successful output to the installer log,
    and failed output directly to the user's screen
    """
    logger = logging.getLogger('tljh')
    proc = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, *args, **kwargs)
    printable_command = ' '.join(cmd)
    if proc.returncode != 0:
        # Our process failed! Show output to the user
        logger.error('Ran {command} with exit code {code}'.format(
            command=printable_command, code=proc.returncode
        ))
        logger.error(proc.stdout.decode())
        raise subprocess.CalledProcessError(cmd=cmd, returncode=proc.returncode)
    else:
        # This goes into installer.log
        logger.debug('Ran {command} with exit code {code}'.format(
            command=printable_command, code=proc.returncode
        ))
        # This produces multi line log output, unfortunately. Not sure how to fix.
        # For now, prioritizing human readability over machine readability.
        logger.debug(proc.stdout.decode())

def validate_host():
    """
    Make sure TLJH is installable in current host
    """
    # Support only Ubuntu 18.04+
    distro = get_os_release_variable('ID')
    version = float(get_os_release_variable('VERSION_ID'))
    if distro != 'ubuntu':
        print('The Littlest JupyterHub currently supports Ubuntu Linux only')
        sys.exit(1)
    elif float(version) < 18.04:
        print('The Littlest JupyterHub requires Ubuntu 18.04 or higher')
        sys.exit(1)

    if sys.version_info < (3, 5):
        print("bootstrap.py must be run with at least Python 3.5")
        sys.exit(1)

    if not (shutil.which('systemd') and shutil.which('systemctl')):
        print("Systemd is required to run TLJH")
        # Only fail running inside docker if systemd isn't present
        if os.path.exists('/.dockerenv'):
            print("Running inside a docker container without systemd isn't supported")
            print("We recommend against running a production TLJH instance inside a docker container")
            print("For local development, see http://tljh.jupyter.org/en/latest/contributing/dev-setup.html")
        sys.exit(1)

class LoaderPageRequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        if self.path == "/logs":
            with open("/opt/tljh/installer.log", "r") as log_file:
                logs = log_file.read()

            self.send_response(200)
            self.send_header('Content-Type', 'text/plain; charset=utf-8')
            self.end_headers()
            self.wfile.write(logs.encode('utf-8'))
        elif self.path == "/index.html":
            self.path = "/var/run/index.html"
            return SimpleHTTPRequestHandler.do_GET(self)
        elif self.path == "/favicon.ico":
            self.path = "/var/run/favicon.ico"
            return SimpleHTTPRequestHandler.do_GET(self)
        elif self.path == "/":
            self.send_response(302)
            self.send_header('Location','/index.html')
            self.end_headers()
        else:
            SimpleHTTPRequestHandler.send_error(self, code=403)

def serve_forever(server):
    try:
        server.serve_forever()
    except KeyboardInterrupt:
        pass

def main():
    flags = sys.argv[1:]
    temp_page_flag = "--show-progress-page"

    # Check for flag in the argv list. This doesn't use argparse
    # because it's the only argument that's meant for the boostrap script.
    # All the other flags will be passed to and parsed by the installer.
    if temp_page_flag in flags:
        with open("/var/run/index.html", "w+") as f:
            f.write(html)
        favicon_url="https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/share/jupyterhub/static/favicon.ico"
        urllib.request.urlretrieve(favicon_url, "/var/run/favicon.ico")

        # If the bootstrap is run to upgrade TLJH, then this will raise an "Address already in use" error
        try:
            loading_page_server = HTTPServer(("", 80), LoaderPageRequestHandler)
            p = multiprocessing.Process(target=serve_forever, args=(loading_page_server,))
            # Serves the loading page until TLJH builds
            p.start()

            # Remove the flag from the args list, since it was only relevant to this script.
            flags.remove("--show-progress-page")

            # Pass the server's pid as a flag to the istaller
            pid_flag = "--progress-page-server-pid"
            flags.extend([pid_flag, str(p.pid)])
        except OSError:
            # Only serve the loading page when installing TLJH
            pass

    validate_host()
    install_prefix = os.environ.get('TLJH_INSTALL_PREFIX', '/opt/tljh')
    hub_prefix = os.path.join(install_prefix, 'hub')

    # Set up logging to print to a file and to stderr
    os.makedirs(install_prefix, exist_ok=True)
    file_logger_path = os.path.join(install_prefix, 'installer.log')
    file_logger = logging.FileHandler(file_logger_path)
    # installer.log should be readable only by root
    os.chmod(file_logger_path, 0o500)

    file_logger.setFormatter(logging.Formatter('%(asctime)s %(message)s'))
    file_logger.setLevel(logging.DEBUG)
    logger.addHandler(file_logger)

    stderr_logger = logging.StreamHandler()
    stderr_logger.setFormatter(logging.Formatter('%(message)s'))
    stderr_logger.setLevel(logging.INFO)
    logger.addHandler(stderr_logger)
    logger.setLevel(logging.DEBUG)

    logger.info('Checking if TLJH is already installed...')
    if os.path.exists(os.path.join(hub_prefix, 'bin', 'python3')):
        logger.info('TLJH already installed, upgrading...')
        initial_setup = False
    else:
        logger.info('Setting up hub environment')
        initial_setup = True
        # Install software-properties-common, so we can get add-apt-repository
        # That helps us make sure the universe repository is enabled, since
        # that's where the python3-pip package lives. In some very minimal base
        # VM images, it looks like the universe repository is disabled by default,
        # causing bootstrapping to fail.
        run_subprocess(['apt-get', 'update', '--yes'])
        run_subprocess(['apt-get', 'install', '--yes', 'software-properties-common'])
        run_subprocess(['add-apt-repository', 'universe'])

        run_subprocess(['apt-get', 'update', '--yes'])
        run_subprocess(['apt-get', 'install', '--yes',
            'python3',
            'python3-venv',
            'python3-pip',
            'git'
        ])
        logger.info('Installed python & virtual environment')
        os.makedirs(hub_prefix, exist_ok=True)
        run_subprocess(['python3', '-m', 'venv', hub_prefix])
        logger.info('Set up hub virtual environment')

    if initial_setup:
        logger.info('Setting up TLJH installer...')
    else:
        logger.info('Upgrading TLJH installer...')

    pip_flags = ['--upgrade']
    if os.environ.get('TLJH_BOOTSTRAP_DEV', 'no') == 'yes':
        pip_flags.append('--editable')
    tljh_repo_path = os.environ.get(
        'TLJH_BOOTSTRAP_PIP_SPEC',
        'git+https://gitee.com/maple1017/the-littlest-jupyterhub'
    )

    # Upgrade pip
    run_subprocess([
        os.path.join(hub_prefix, 'bin', 'pip'),
        'install',
        '--upgrade',
        'pip==20.0.*'
    ])
    logger.info('Upgraded pip')

    run_subprocess([
        os.path.join(hub_prefix, 'bin', 'pip'),
        'install'
    ] + pip_flags + [tljh_repo_path])
    logger.info('Setup tljh package')

    logger.info('Starting TLJH installer...')
    os.execv(
        os.path.join(hub_prefix, 'bin', 'python3'),
        [
            os.path.join(hub_prefix, 'bin', 'python3'),
            '-m',
            'tljh.installer',
        ] + flags
    )

if __name__ == '__main__':
    main()

然后显示

Checking if TLJH is already installed...
TLJH already installed, upgrading...
Upgrading TLJH installer...
Upgraded pip
Setup tljh package
Starting TLJH installer...
Granting passwordless sudo to JupyterHub admins...
Setting up user environment...
Setting up JupyterHub...
Waiting for JupyterHub to come up (1/20 tries)
Done!

完成后登陆ip输入admin，加任意密码即可登录，任意密码需要记住，下次使用该密码登录。

国外安装

下面这行命令可以自动安装并且自由配置：

curl -L https://tljh.jupyter.org/bootstrap.py \
 | sudo python3 - \
   <parameters>

其中一些参数
--show-progress-page 会展示一个 “TLJH is building” 的进度页面
该页面可在您的浏览器中通过 http:///index.html 访问。 TLJH 安装完成后，进度页面将停止，您将能够像往常一样通过 http:// 访问 TLJH。
在进度页面中，您还可以通过单击“日志”按钮或在浏览器中直接访问 http:///logs 来访问安装日志。要更新日志，请刷新页面。
以下是整个命令：

curl -L https://tljh.jupyter.org/bootstrap.py \
| sudo python3 - \
 --admin admin --showprogress-page

等5-10分钟运行成功即可

配置

TLJH 的配置以嵌套的树形结构组织。您可以使用以下命令设置特定属性

sudo tljh-config set <property-path> <value>

<property-path> 是要设置的属性的点分隔路径。
<value> 是您要将属性设置为的值。
例如，要设置 DummyAuthenticator 的密码，您需要设置 auth.DummyAuthenticator.password 属性。你可以这样做：

sudo tljh-config set auth.DummyAuthenticator.password mypassword

要取消设置配置属性，您可以使用以下命令

sudo tljh-config unset <property-path>

取消设置配置属性会从配置文件中删除该属性。如果您只想更改属性的值，则应使用 set 并用所需的值覆盖它。
下面列举下属性类别

Base URL

使用http.port和https.port设置TLJH监听的端口，默认为80和443。但是，如果您更改了这些，请注意TLJH会对系统做很多其他事情(主要使用用户帐户和sudo规则)，这可能会打破其他应用程序的安全性假设，因此使用时要格外小心。

sudo tljh-config set http.port 8080
sudo tljh-config set https.port 8443
sudo tljh-config reload proxy

User Lists

users.allowed进白名单
users.banned进黑名单
users.admin进管理员名单

sudo tljh-config add-item users.allowed good-user_1
sudo tljh-config add-item users.allowed good-user_2
sudo tljh-config add-item users.banned bad-user_6
sudo tljh-config add-item users.admin admin-user_0
sudo tljh-config remove-item users.allowed good-user_2

User Server Limits

limits.memory每个用户可以使用的最大内存。默认情况下，没有内存限制。该限制可以指定为绝对字节值。您可以使用后缀K、M、G或T分别表示Kilobyte、Megabyte、Gigabyte或Terabyte。将其设置为None将禁用内存限制。

sudo tljh-config set limits.memory 4G

即使您希望单个用户使用尽可能多的内存，将内存限制设置为总物理内存的80-90%仍然是一种很好的做法。这就防止了一个用户能够通过OOMing意外地关闭机器。
limits.cpu一个浮点数，表示每个用户可以使用的cpu核总数。默认情况下，没有CPU限制。1表示1个满的CPU, 4表示4个满的CPU, 0.5表示1个CPU的一半，以此类推。这个值最终被转换为一个百分比，并四舍五入到最接近的整数百分比，例如，1.5被转换为150%，0.125被转换为12%，等等。设置为None将禁用CPU限制。

sudo tljh-config set limits.cpu 2

User Environment

user_environment.default_app设置启动用户的默认应用程序。目前可以设置为以下值 jupyterlab 或者 nteract

sudo tljh-config set user_environment.default_app jupyterlab

Extra User Groups

users.extra_user_groups是一个配置选项，可用于自动将用户添加到特定组。默认情况下，没有定义额外的组。
用户可以与所需的、现有的组配对使用

tljh-config set只需要添加一个用户到所需的组:
```
tljh-config set users.extra_user_groups.group1 user1
```

tljh-config add-item,若有多个用户加入:

tljh-config add-item users.extra_user_groups.group1 user1
tljh-config add-item users.extra_user_groups.group1 user2

View current configuration¶

查看当前配置信息。使用

sudo tljh-config show

这将打印TLJH的当前配置。这在寻求支持时非常有用!

Reloading JupyterHub to apply configuration

修改配置后，需要重新加载JupyterHub才能生效。你可以这样做:

sudo tljh-config reload

这应该不会影响任何正在运行的用户。JupyterHub将重新启动并加载新的配置。

Advanced: config.yaml¶

Tljh-config是一个简单的程序，用于修改配置的内容。Yaml文件位于/opt/tljh/config/config. Yaml。tljh-config是编辑/查看配置的推荐方法，因为在终端文本编辑器中手工编辑YAML会产生大量的错误。

安装应用库和kernel

jupyterhub 的一个优势在于所有用户共享底层的依赖包，同时代码在各自的空间维护，从而节省资源。安装完成后我们可以借助conda安装常用依赖

sudo -E conda install -c conda-forge gdal
sudo -E pip install there

请注意sudo -E非常重要，它保证我们安装的内容可以被所有用户访问
我们也可以像单机安装时那样添加一个kernel以支持不同的语言，比如一个js的kernel。当然sudo -E同样是必需的。

sudo -E npm install -g --unsafe-perm ijavascript
sudo -E ijsinstall --install=global

分享文件

在用户间分享文件，是多用户服务中的一个常见需求。这一操作在官方文档中也有详解

sudo mkdir -p /srv/data/my_shared_data_folder

我们可以把要分享的文件传入my_shared_data_folder文件夹，然后执行如下命令

cd /etc/skel
sudo ln -s /srv/data/my_shared_data_folder my_shared_data_folder

此后新建的用户都可以直接访问被分享的文件夹，而已存在的用户需要在用户对应的home目录手动运行此命令

允许用户自行注册登录

在实际使用中我们需要开放注册权限以使更多用户自助登录使用，首先在admin用户下允许如下命令

sudo tljh-config set auth.type nativeauthenticator.NativeAuthenticator
sudo tljh-config set auth.NativeAuthenticator.open_signup true
sudo tljh-config reload

此后新用户可以访问server-host/signup进行账号注册，这里需要注意的一点是目前发现进行上述操作后已有的用户密码会失效，所以所有用户包括admin用户需要重新注册登录

调整服务清理时间

对每个用户的server，系统会每隔一段时间检查server是否活动，如果不活跃时间超过预设延时，系统会自动清理该server以空出服务器资源。系统预设值为每60s检查，最长不活跃时间600s，但这样的体验很不好，需要频繁重启，所以我们可以把这两个值适当调大

sudo tljh-config set services.cull.every <number-of-sec-this-check-is-done>
sudo tljh-config set services.cull.timeout <max-idle-sec-before-server-is-culled>
sudo tljh-config reload

总结

整体国内搭建和配置详细教程如上，若想自定义权限校验设置，看大家反馈是否需求我再详细输出。大家赶紧enjoy一下TLJH吧！