问题：

使用FastAPI，如何将字符集添加到OpenAPI（Swagger）文档请求头上的内容类型（媒体类型）中？

王翰墨

2023-03-14

@app.post("/")
def post_hello(username: str = Form(...)):
   return {"Hello": username}

OpenAPI(http:///docs)显示“application/x-www-form-urlencoded”。

我试着改变，就像：

def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
   return {"Hello": "World!", "userName": username}

但不是添加字符集=cp932

我想在请求时将“Application/x-www-form-urlencoded； charset=cp932”设置为Content-Type。我想通过字符集解码用户名。

那弘

2023-03-14

使用FastAPI，如何将字符集添加到自动生成的OpenAPI（Swagger）文档中的内容类型请求头中？

@app.post("/")
def post_hello(username: str = Form(...)):
   return {"Hello": username}

通过上述路径，OpenAPI文档在http生成：//

我试过这个：

@app.post("/")
def post_hello(username: str = Form(..., media_type="application/x-www-form-urlencoded; charset=cp932")):
   return {"Hello": "World!", "userName": username}

但是文档仍然只显示Application/x-www-form-urlencoded。

我想设置应用程序/x-www-form-urlencoded；charset=cp932作为此endpoint/路径函数响应中内容类型的值。我希望接收到的表单数据使用该编码方案进行解码。

在一般情况下，这似乎不是一个好主意；我认为没有一种简单的内置方法可以做到这一点；而且可能没有必要。

这个GitHub问题讨论了为什么将； charset=UTF-8附加到Application/json不是一个好主意，那里提出的相同观点也适用于这种情况。

HTTP/1.1规范指出，内容类型标题列出了媒体类型。

注意：HTTP/2与HTTP/1.1共享这些组件

IANA管理所有常用媒体类型（MIME）的注册表。。

Application/x-www-form-urlencoded的条目说：

Media type name: application
Media subtype name: x-www-form-urlencoded

Required parameters: No parameters

Optional parameters:
No parameters

Encoding considerations: 7bit

将其与text/html的条目进行比较：

MIME media type name : Text

MIME subtype name : Standards Tree - html

Required parameters : No required parameters

Optional parameters :
charset
The charset parameter may be provided to definitively specify the document's character encoding, overriding any character encoding declarations in the document. The parameter's value must be one of the labels of the character encoding used to serialize the file.

Encoding considerations : 8bit

应用程序/x-www-form-urlencoded的条目不允许添加字符集。那么应该如何从字节中解码呢？URL规范规定：

听起来，无论编码是什么，UTF-8都应该始终用于解码。

当前的HTML/URL规范也有关于应用程序的注释：

应用程序x-www-form-urlencoded格式在许多方面都是一个异常的怪物，是多年来实施事故和妥协的结果，导致了一系列互操作性所必需的要求，但决不能代表良好的设计实践。特别要提醒读者注意扭曲的细节，这些细节涉及字符编码和字节序列之间的重复（有时是嵌套）转换。不幸的是，由于HTML表单的流行，这种格式被广泛使用。

所以听起来，做一些不同的事情并不是一个好主意。

注意：这些解决方案的内置方式是使用自定义的请求类。

构建/openapi.json对象时，当前版本的FastAPI检查依赖项是否是Form的实例，然后使用空Form实例来构建架构，即使实际依赖项是Form的子类。

窗体的介质类型参数的默认值__init\uuu是应用程序/x-www-form-urlencoded，因此具有form（）依赖关系的每个endpoint/路径函数都将在文档中显示相同的媒体类型，即使类init\uu（有一个Media\u Type参数。

有几种方法可以更改/openapi.json中列出的内容，这是用于生成留档的内容，FastAPI文档列出了一种官方方式。

对于问题中的示例，这将起作用：

from fastapi import FastAPI, Form
from fastapi.openapi.utils import get_openapi

app = FastAPI()


@app.post("/")
def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi

值得注意的是，通过这一更改，docs用户界面改变了其显示实验部分的方式：

与未指定字符集的应用程序/x-www-form-urlencoded相比，如下所示：

上述更改只会更改文档中列出的媒体类型。发送到endpoint/路径函数的任何表单数据仍然是：

由python multipart解析（大致遵循规范中描述的相同步骤）
由starlette解码，使用拉丁语-1

因此，即使将starlette更改为使用不同的编码方案来解码表单数据，python multipart仍然遵循规范中概述的步骤，使用硬编码字节值

幸运的是，前128个字符/代码点中的大多数*都映射到cp932和UTF-8之间的相同字节序列，因此

*除了0x5C之外，它有时是

将starlette更改为使用cp932编码的一种方法是使用中间件：

import typing
from unittest.mock import patch
from urllib.parse import unquote_plus

import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser

app = FastAPI()

form_path = "/"


@app.post(form_path)
async def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi


class CP932FormParser(FormParser):
    async def parse(self) -> FormData:
        """
        copied from:
        https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
        """
        # Callbacks dictionary.
        callbacks = {
            "on_field_start": self.on_field_start,
            "on_field_name": self.on_field_name,
            "on_field_data": self.on_field_data,
            "on_field_end": self.on_field_end,
            "on_end": self.on_end,
        }

        # Create the parser.
        parser = multipart.QuerystringParser(callbacks)
        field_name = b""
        field_value = b""

        items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []

        # Feed the parser with data from the request.
        async for chunk in self.stream:
            if chunk:
                parser.write(chunk)
            else:
                parser.finalize()
            messages = list(self.messages)
            self.messages.clear()
            for message_type, message_bytes in messages:
                if message_type == FormMessage.FIELD_START:
                    field_name = b""
                    field_value = b""
                elif message_type == FormMessage.FIELD_NAME:
                    field_name += message_bytes
                elif message_type == FormMessage.FIELD_DATA:
                    field_value += message_bytes
                elif message_type == FormMessage.FIELD_END:
                    name = unquote_plus(field_name.decode("cp932"))  # changed line
                    value = unquote_plus(field_value.decode("cp932"))  # changed line
                    items.append((name, value))

        return FormData(items)


class CustomRequest(Request):
    async def form(self) -> FormData:
        """
        copied from
        https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
        """
        if not hasattr(self, "_form"):
            assert (
                parse_options_header is not None
            ), "The `python-multipart` library must be installed to use form parsing."
            content_type_header = self.headers.get("Content-Type")
            content_type, options = parse_options_header(content_type_header)
            if content_type == b"multipart/form-data":
                multipart_parser = MultiPartParser(self.headers, self.stream())
                self._form = await multipart_parser.parse()
            elif content_type == b"application/x-www-form-urlencoded":
                form_parser = CP932FormParser(
                    self.headers, self.stream()
                )  # use the custom parser above
                self._form = await form_parser.parse()
            else:
                self._form = FormData()
        return self._form


@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
    if request.scope["path"] == form_path:
        # starlette creates a new Request object for each middleware/app
        # invocation:
        # https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
        # this temporarily patches the Request object starlette
        # uses with our modified version
        with patch("starlette.routing.Request", new=CustomRequest):
            return await call_next(request)

然后，必须手动对数据进行编码：

>>> import sys
>>> from urllib.parse import quote_plus
>>> name = quote_plus("username").encode("cp932")
>>> value = quote_plus("cp932文字コード").encode("cp932")
>>> with open("temp.txt", "wb") as file:
...     file.write(name + b"=" + value)
...
59

并作为二进制数据发送：

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
  --data-binary "@temp.txt" \
  --silent \
| jq -C .

{
  "Hello": "cp932文字コード"
}

在手动编码步骤中，输出将如下所示：

username=cp932%E6%96%87%E5%AD%97%E3%82%B3%E3%83%BC%E3%83%89

百分比编码步骤的一部分用缩小的ASCII范围内的字节替换表示高于0x7E的字符的任何字节（ASCII中的字节）。由于cp932和UTF-8都将这些字节映射到相同的代码点（0x5C除外，0x5C可能是代码或代码），字节序列将解码为相同的字符串：

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded;charset=cp932' \
  --data-urlencode "username=cp932文字コード" \
  --silent \
| jq -C .

{
  "Hello": "cp932文字コード"
}

这仅适用于百分比编码数据。

未经百分比编码发送的任何数据的处理和解释方式将与发送方预期的不同。例如，在OpenAPI（Swagger）文档中，“试用”实验部分给出了一个使用curl-d（与数据相同）的示例，它不会对数据进行URL编码：

$ curl -X 'POST' \
  'http://localhost:8000/' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  --data "username=cp932文字コード" \
  --silent \
| jq -C .
{
  "Hello": "cp932æ–‡å—ã‚³ãƒ¼ãƒ‰"
}

只使用cp932来处理来自以类似于服务器的方式配置的发件人的请求可能仍然是一个好主意。

一种方法是，如果发送方指定数据已使用cp932编码，则修改中间件功能以仅处理此类数据：

import typing
from unittest.mock import patch
from urllib.parse import unquote_plus

import multipart
from fastapi import FastAPI, Form, Request, Response
from fastapi.openapi.utils import get_openapi
from multipart.multipart import parse_options_header
from starlette.datastructures import FormData, UploadFile
from starlette.formparsers import FormMessage, FormParser

app = FastAPI()

form_path = "/"


@app.post(form_path)
async def post_hello(username: str = Form(...)):
    return {"Hello": username}


def custom_openapi():
    if app.openapi_schema:
        return app.openapi_schema

    app.openapi_schema = get_openapi(
        title=app.title,
        version=app.version,
        openapi_version=app.openapi_version,
        description=app.description,
        terms_of_service=app.terms_of_service,
        contact=app.contact,
        license_info=app.license_info,
        routes=app.routes,
        tags=app.openapi_tags,
        servers=app.servers,
    )

    requestBody = app.openapi_schema["paths"]["/"]["post"]["requestBody"]
    content = requestBody["content"]
    new_content = {
        "application/x-www-form-urlencoded;charset=cp932": content[
            "application/x-www-form-urlencoded"
        ]
    }
    requestBody["content"] = new_content

    return app.openapi_schema


app.openapi = custom_openapi


class CP932FormParser(FormParser):
    async def parse(self) -> FormData:
        """
        copied from:
        https://github.com/encode/starlette/blob/0.17.1/starlette/formparsers.py#L72-L110
        """
        # Callbacks dictionary.
        callbacks = {
            "on_field_start": self.on_field_start,
            "on_field_name": self.on_field_name,
            "on_field_data": self.on_field_data,
            "on_field_end": self.on_field_end,
            "on_end": self.on_end,
        }

        # Create the parser.
        parser = multipart.QuerystringParser(callbacks)
        field_name = b""
        field_value = b""

        items: typing.List[typing.Tuple[str, typing.Union[str, UploadFile]]] = []

        # Feed the parser with data from the request.
        async for chunk in self.stream:
            if chunk:
                parser.write(chunk)
            else:
                parser.finalize()
            messages = list(self.messages)
            self.messages.clear()
            for message_type, message_bytes in messages:
                if message_type == FormMessage.FIELD_START:
                    field_name = b""
                    field_value = b""
                elif message_type == FormMessage.FIELD_NAME:
                    field_name += message_bytes
                elif message_type == FormMessage.FIELD_DATA:
                    field_value += message_bytes
                elif message_type == FormMessage.FIELD_END:
                    name = unquote_plus(field_name.decode("cp932"))  # changed line
                    value = unquote_plus(field_value.decode("cp932"))  # changed line
                    items.append((name, value))

        return FormData(items)


class CustomRequest(Request):
    async def form(self) -> FormData:
        """
        copied from
        https://github.com/encode/starlette/blob/0.17.1/starlette/requests.py#L238-L253
        """
        if not hasattr(self, "_form"):
            assert (
                parse_options_header is not None
            ), "The `python-multipart` library must be installed to use form parsing."
            content_type_header = self.headers.get("Content-Type")
            content_type, options = parse_options_header(content_type_header)
            if content_type == b"multipart/form-data":
                multipart_parser = MultiPartParser(self.headers, self.stream())
                self._form = await multipart_parser.parse()
            elif content_type == b"application/x-www-form-urlencoded":
                form_parser = CP932FormParser(
                    self.headers, self.stream()
                )  # use the custom parser above
                self._form = await form_parser.parse()
            else:
                self._form = FormData()
        return self._form


@app.middleware("http")
async def custom_form_parser(request: Request, call_next) -> Response:
    if request.scope["path"] != form_path:
        return await call_next(request)

    content_type_header = request.headers.get("content-type", None)
    if not content_type_header:
        return await call_next(request)

    media_type, options = parse_options_header(content_type_header)
    if b"charset" not in options or options[b"charset"] != b"cp932":
        return await call_next(request)

    # starlette creates a new Request object for each middleware/app
    # invocation:
    # https://github.com/encode/starlette/blob/0.17.1/starlette/routing.py#L59
    # this temporarily patches the Request object starlette
    # uses with our modified version
    with patch("starlette.routing.Request", new=CustomRequest):
        return await call_next(request)

即使进行了此修改，我认为规范中关于使用解码百分比解析内容的注释也应该突出显示：

⚠ 警告当输入包含非ASCII字节的字节时，使用UTF-8解码以外的任何内容而不使用BOM可能是不安全的，因此不建议使用。

因此，我会对实施任何这些解决方案持谨慎态度。

使用FastAPI，如何将字符集添加到OpenAPI（Swagger）文档请求头上的内容类型（媒体类型）中？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档