【从EvalAI学习Django】如何优雅的对Zip压缩文件进行解析与存储 | zipfile

司空赞

2023-12-01

需求分析

作为合格的AI竞赛平台，EvalAI支持组织者创建比赛，比赛的定义标准已经在EvalAI-starters中进行声明，组织者只需要根据文档修改后上传即可，其提供了两种上传方式，一种为在github创建fork修改由服务器拉取，而另一种是运行run.sh后打包本地的修改后的配置文件为一个压缩文件，然后上传。
问：如何解析这个压缩文件？

问题分析

通过压缩文件创建比赛，那么我们需要提前定义好比赛的模型，并根据相应模型提供可用的配置信息，在压缩文件上传后我们需要对其解压后分析其中所需的文件是否存在，不存在则返回错误信息。

使用库包

tempfile
requests
zipfile

处理逻辑

1. 基本模型定义

ChallengeConfiguration提供user、challenge、zip_configuration等字段。
其中user关联用户模型，challenge关联比赛模型。

模型定义

# models.py
# deconstructible为自定义文件存储系统：根据存储id更改文件名
# https://docs.djangoproject.com/zh-hans/3.0/topics/migrations/#custom-deconstruct-method
@deconstructible
class RandomFileName(object):
    def __init__(self, path):
        self.path = path
	
    def __call__(self, instance, filename):
        extension = os.path.splitext(filename)[1]
        path = self.path
        if "id" in self.path and instance.pk:
            path = self.path.format(id=instance.pk)
        filename = "{}{}".format(uuid.uuid4(), extension)
        filename = os.path.join(path, filename)
        return filename

# https://github.com/Cloud-CV/EvalAI/blob/187ad4892dbd1f5de0937badfa1f40c09acac8c0/apps/challenges/models.py#L476    
class ChallengeConfiguration(TimeStampedModel):
	user = models.ForeignKey(User, on_delete=models.CASCADE)
	challenge = models.OneToOneField(
        Challenge, null=True, blank=True, on_delete=models.CASCADE
    )
    # upload_to参数表示文件存储地址
	zip_configuration = models.FileField(
        upload_to=RandomFileName("zip_configuration_files/challenge_zip")
    )
	...

序列化模型

ChallengeConfigSerializer接收zip_configuration和user两个参数。

# serializers.py
# https://github.com/Cloud-CV/EvalAI/blob/187ad4892dbd1f5de0937badfa1f40c09acac8c0/apps/challenges/serializers.py#L174
class ChallengeConfigSerializer(serializers.ModelSerializer):
    """
    Serialize the ChallengeConfiguration Model.
    """

    def __init__(self, *args, **kwargs):
        super(ChallengeConfigSerializer, self).__init__(*args, **kwargs)
        context = kwargs.get("context")
        if context:
            request = context.get("request")
            kwargs["data"]["user"] = request.user.pk

    class Meta:
        model = ChallengeConfiguration
        fields = ("zip_configuration", "user")

2. 开始处理上传压缩文件

上传的zip_configuration字段的值表示上传文件。

流程：

创建临时文件夹
传入序列化模型获取文件保存地址
请求文件地址保存到唯一文件夹下
解压文件到同一文件夹下
开始解析内容

# https://github.com/Cloud-CV/EvalAI/blob/187ad4892dbd1f5de0937badfa1f40c09acac8c0/apps/challenges/views.py#L857
# view.py
@api_view(["POST"])
...
def create_challenge_using_zip_file(request, challenge_host_team_pk):
	...
    # request.data = <QueryDict: {'status': ['submitting'], 'zip_configuration': [<InMemoryUploadedFile: challenge_config.zip (application/x-zip-compressed)>]}>
    # 创建临时文件夹
	BASE_LOCATION = tempfile.mkdtemp()
	... # 省略通过模板文件创建的加载过程
	# 复制data
	data = request.data.copy()
	# 传入序列化模型获取文件保存地址
    serializer = ChallengeConfigSerializer(data=data, context={"request": request})
    # 校验
    if serializer.is_valid():
        # 序列化模型保存
        uploaded_zip_file = serializer.save()
        # 获取zip_configuration的文件所在地址
        uploaded_zip_file_path = serializer.data["zip_configuration"]
    else:
    	# 错误则返回
        response_data = serializer.errors
        return Response(response_data, status=status.HTTP_400_BAD_REQUEST)
    # 更换压缩文件保存位置
    try:
   	    # 请求序列化保存后返回的文件地址
        response = requests.get(uploaded_zip_file_path, stream=True)
        # 生成唯一文件名
        # https://github.com/Cloud-CV/EvalAI/blob/187ad4892dbd1f5de0937badfa1f40c09acac8c0/apps/challenges/utils.py#L404
        unique_folder_name = get_unique_alpha_numeric_key(10)
        # 生成新的文件地址
        CHALLENGE_ZIP_DOWNLOAD_LOCATION = os.path.join(
            BASE_LOCATION, "{}.zip".format(unique_folder_name))
	 # 尝试将流内容写入新的文件地址
        try:
        	# 确保请求成功后写入内容
            if response and response.status_code == 200:
                with open(CHALLENGE_ZIP_DOWNLOAD_LOCATION, "wb") as zip_file:
                    zip_file.write(response.content)
        except IOError:
            ... # 省略报错Response
    except requests.exceptions.RequestException:
        ... # 省略报错Response
    # 准备解压
    try:
        # 使用zipfile解压，声明只读
        zip_ref = zipfile.ZipFile(CHALLENGE_ZIP_DOWNLOAD_LOCATION, "r")
        # 解压到到事先创建好的BASE_LOCATION下的unique_folder_name文件夹（与新压缩文件同名不同类型）
        zip_ref.extractll(os.path.join(BASE_LOCATION, unique_folder_name))
        # 关闭文件读取
        zip_ref.close()
    except zipfile.BadZipfile:
        ... # 省略报错Response
	 
    ...
    # 便利解压后文件夹: namelist()
    for name in zip_ref.namelist():
        # 通过文件名判断是否有必要配置文件存在
        if (name.endswith(".yaml") or name.endswith(".yml")) and (not name.startswith("__MACOSX")):
         	yaml_file = name
            ...
    # 通过文件名读取解压文件夹中的文件
    # 可以先判断文件是否存在
    if os.path.isfile(evaluation_script_path):
        # 打开文件
     	with open(evaluation_script_path, "rb") as challenge_evaluation_script:
		    ...
    ...

【从EvalAI学习Django】如何优雅的对Zip压缩文件进行解析与存储 | zipfile

需求分析

问题分析

使用库包

处理逻辑

1. 基本模型定义

模型定义

序列化模型

2. 开始处理上传压缩文件

相关阅读

相关文章

相关问答

相关文档