AWS S3 存储桶复制及权限同步

鲁英卫
2023-12-01

1、存储桶复制

分为2种: SCR , CCR 

SCR和CCR的操作文档可以参考AWS 官方文档,这里就不重复了:

复制对象 - Amazon Simple Storage Service

使用 S3 分批复制以复制现有对象 - Amazon Simple Storage Service

授予 Amazon S3 分批操作的权限 - Amazon Simple Storage Service

SCR可以同步对象的权限,不需要额外的权限同步操作。

CCR无法同步除所有者之外的权限,需要进行其他权限的同步,需要通过写批量同步权限的脚本完成同步操作

下面是同步公开READ的权限脚本示例,供参考:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

# Copyright WUZL. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

"""
Purpose
Show how to use AWS SDK for Python (Boto3) with Amazon Simple Storage Service
(Amazon S3) to perform basic object acl operations, Synchronize the public read permissions of the source and target buckets. 
"""

import json
import logging

# 在操作系统里需要先安全AWS boto3 SDK包 # pip3 install boto3
import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# 建立一个filehandler来把日志记录在文件里,级别为debug以上
fh = logging.FileHandler("boto3_s3_object_acl_modi.log")
fh.setLevel(logging.DEBUG)
# 建立一个streamhandler来把日志打在CMD窗口上,级别为error以上
ch = logging.StreamHandler()
ch.setLevel(logging.ERROR)
# 设置日志格式
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(lineno)s %(message)s",datefmt="%Y-%m-%d %H:%M:%S")
ch.setFormatter(formatter)
fh.setFormatter(formatter)
#将相应的handler添加在logger对象中
logger.addHandler(ch)
logger.addHandler(fh)
# 开始打日志
# logger.debug("debug message")
# logger.info("info message")
# logger.warn("warn message")
# logger.error("error message")
# logger.critical("critical message")

# snippet-start:[python.example_code.s3.helper.ObjectWrapper]

# args 变量值根据实际情况自己定义
s_region_name='eu-west-2'
s_aws_access_key_id='xxx'
s_aws_secret_access_key='xxx'
s_bucket='xxx'

t_region_name='us-west-2'
t_awt_accest_key_id=''
t_awt_secret_accest_key='xxx'
target_bucket='xxx'


class ObjectWrapper:
    """Encapsulates S3 object actions."""
    def __init__(self, s3_object):
        """
        :param s3_object: A Boto3 Object resource. This is a high-level resource in Boto3
                          that wraps object actions in a class-like structure.
        """
        self.object = s3_object
        self.key = self.object.key
# snippet-end:[python.example_code.s3.helper.ObjectWrapper]

# snippet-start:[python.example_code.s3.GetObject]
    def get(self):
        """
        Gets the object.

        :return: The object data in bytes.
        """
        try:
            body = self.object.get()['Body'].read()
            logger.info(
                "Got object '%s' from bucket '%s'.",
                self.object.key, self.object.bucket_name)
        except ClientError:
            logger.exception(
                "Couldn't get object '%s' from bucket '%s'.",
                self.object.key, self.object.bucket_name)
            raise
        else:
            return body
# snippet-end:[python.example_code.s3.GetObject]

# snippet-start:[python.example_code.s3.ListObjects]
    @staticmethod
    def list(bucket, prefix=None):
        """
        Lists the objects in a bucket, optionally filtered by a prefix.

        :param bucket: The bucket to query. This is a Boto3 Bucket resource.
        :param prefix: When specified, only objects that start with this prefix are listed.
        :return: The list of objects.
        """
        try:
            if not prefix:
                objects = list(bucket.objects.all())
            else:
                objects = list(bucket.objects.filter(Prefix=prefix))
            # logger.info("Got objects %s from bucket '%s'", [o.key for o in objects], bucket.name)
            logger.info("Got objects from bucket '%s'", bucket.name)
        except ClientError:
            logger.exception("Couldn't get objects for bucket '%s'.", bucket.name)
            raise
        else:
            return objects
# snippet-end:[python.example_code.s3.ListObjects]

# snippet-start:[python.example_code.s3.ListObjectsKeys]
    @staticmethod
    def list_all_keys(bucket, prefix=None):
        """
        Lists the ListObjectsKeys in a bucket, optionally filtered by a prefix.

        :param bucket: The bucket to query. This is a Boto3 Bucket resource.
        :param prefix: When specified, only objects that start with this prefix are listed.
        :return: The list of objects.
        """
        try:
            if not prefix:
                objects = list(bucket.objects.all())
            else:
                objects = list(bucket.objects.filter(Prefix=prefix))
            all_keys = [o.key for o in objects]
            # logger.info("Got objects %s from bucket '%s'", [o.key for o in objects], bucket.name)
            logger.info("Got objects list from bucket '%s'", bucket.name)
        except ClientError:
            logger.exception("Couldn't get objects for bucket '%s'.", bucket.name)
            raise
        else:
            return all_keys
# snippet-end:[python.example_code.s3.ListObjectsKeys]

# snippet-start:[python.example_code.s3.GetObjectAcl]
    def get_acl(self):
        """
        Gets the ACL of the object.

        :return: The ACL of the object.
        """
        try:
            acl = self.object.Acl()
            # logger.info("Got ACL for object %s owned by %s.", self.object.key, acl.owner['DisplayName'])
        except ClientError:
            logger.exception("Couldn't get ACL for object %s.", self.object.key)
            raise
        else:
            return acl
# snippet-end:[python.example_code.s3.GetObjectAcl]

# snippet-start:[python.example_code.s3.PutObjectAcl]
    def put_acl(self, uri):
        """
        Applies an ACL to the object that grants read access to an AWS user identified
        by email address.

        :param email: The email address of the user to grant access.
        """
        try:
            acl = self.object.Acl()
            # Putting an ACL overwrites the existing ACL, so append new grants
            # if you want to preserve existing grants.
            grants = acl.grants if acl.grants else []
            grants.append({'Grantee': {'Type': 'Group', 'URI': uri}, 'Permission': 'READ'})
            acl.put(
                AccessControlPolicy={
                    'Grants': grants,
                    'Owner': acl.owner
                }
            )
            # logger.info("Granted read access to %s.", uri)
        except ClientError:
            logger.exception("Couldn't add ACL to object '%s'.", self.object.key)
            raise
# snippet-end:[python.example_code.s3.PutObjectAcl]


# snippet-start:[python.example_code.s3.Scenario_ObjectManagement]
def usage_demo():
    # print('-'*88)
    # print("Welcome to the Amazon S3 object acl modi demo!")
    # print('-'*88)

    # logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
    # LOG_FORMAT = "%(asctime)s - %(levelname)s - %(message)s"
    # logging.basicConfig(filename='boto3_s3_object_acl_modi.log', level=logging.DEBUG, format=LOG_FORMAT)
    
    # s3_client = boto3.client('s3', region_name=s_region_name, aws_access_key_id=s_aws_access_key_id, aws_secret_access_key=s_aws_secret_access_key)
    # response = s3_client.list_buckets()
    # print('Existing buckets:')
    # for bucket in response['Buckets']:
    #     print(f'  {bucket["Name"]}')

    s3_resource = boto3.resource('s3', region_name=s_region_name, aws_access_key_id=s_aws_access_key_id, aws_secret_access_key=s_aws_secret_access_key)
    bucket = s3_resource.Bucket(s_bucket)
    # print(dir(bucket))

    t_s3_resource = boto3.resource('s3', region_name=t_region_name, aws_access_key_id=t_awt_accest_key_id, aws_secret_access_key=t_awt_secret_accest_key)
    t_bucket = t_s3_resource.Bucket(target_bucket)
    # print(dir(t_bucket))
    # t_objects = ObjectWrapper.list(t_bucket)
    # print(t_objects)

    # prepare objects keys for modi
    objects = ObjectWrapper.list(bucket)
    all_keys = ObjectWrapper.list_all_keys(bucket)
    # print(objects)
    try:
        keys=[]
        len_all_keys = len(all_keys)
        logger.info("len_all_keys: %s", len_all_keys)
        for object_summary in objects:
            len_all_keys = len_all_keys - 1
            logger.info("left_keys: %s", len_all_keys)
            key=str(object_summary.key)
            # logger.info("object_key: '%s'", key)
            # print(key+':')
            object_acl = object_summary.Acl()
            # print(object_acl)
            # print(object_acl.grants)
            # logger.info("object_grants: '%s'", str(object_acl.grants))
            for grant in object_acl.grants:
                if 'READ' == grant['Permission']:
                    # print('very good!') 
                    keys.append(key)
                    break
    except ClientError as error:
        print(error)
        
    # print(keys)
    logger.info("keys list len: %s", len(keys))
    logger.info("source keys: %s", keys)
    
    logger.info("Modi target bucket object grants:")
    
    # prepare target objects keys for modi
    t_objects = ObjectWrapper.list(t_bucket)
    # print(t_objects)
    # exit()
    t_all_keys = ObjectWrapper.list_all_keys(t_bucket)
    logger.info("t_all_keys list len: %s", len(t_all_keys))
    try:
        modi_keys=[]
        t_keys=[]
        tmp_keys = []
        for tmp_key in keys:
            tmp_keys.append(tmp_key)
        len_left_t_keys = len(keys)
        logger.info("len_left_t_keys: %s", len_left_t_keys)
        for key in keys:
            # logger.info("len of keys: %s, keys: %s", len(keys), keys)
            len_left_t_keys = len_left_t_keys - 1
            logger.info("len_left_t_keys: %s", len_left_t_keys)
            if key in t_all_keys:
                t_key=key
                object_summary = t_s3_resource.ObjectSummary(target_bucket,t_key)                
                # logger.info("t_object_key: '%s'", t_key)
                # print(key+':')
                object_acl = object_summary.Acl()
                # print(object_acl)
                # print(object_acl.grants)
                # logger.info("object_grants: '%s'", str(object_acl.grants))
                # t_keys.append(t_key)
                # logger.info("len of t_keys: %s, t_keys: %s", len(t_keys), t_keys)
                for grant in object_acl.grants:
                    # logger.info("grant: %s", grant)
                    # if 'READ' == grant['Permission']:
                    if grant['Permission'] == 'READ':
                        # logger.info("object %s have permission READ", t_key)
                        tmp_keys.remove(t_key)
                        break
                    
            # logger.info("len of tmp_keys: %s, keys: %s", len(tmp_keys), tmp_keys)
            modi_keys=tmp_keys
        logger.info("len of modi_keys: %s ,modi_keys: '%s'", len(modi_keys), str(modi_keys))
    except ClientError as error:
        print(error)
    
    len_left_modi_keys = len(modi_keys)
    for key in modi_keys:
        len_left_modi_keys = len_left_modi_keys - 1
        logger.info("len_left_modi_keys: %s", len_left_modi_keys)
        object_key = key
        # print(object_key)
        obj_wrapper = ObjectWrapper(t_bucket.Object(object_key))
        # print(t_bucket.Object(object_key))
        object_acl = t_bucket.Object(object_key).Acl()
        # print(object_acl)
        # print(object_acl.grants)
        try:
            obj_wrapper.put_acl(uri='http://acs.amazonaws.com/groups/global/AllUsers')
            acl = obj_wrapper.get_acl()
            # logger.info("Put ACL grants on object '%s': '%s'", str(obj_wrapper.key), str(json.dumps(acl.grants)))
            logger.info("Put ACL grants on object '%s'", str(obj_wrapper.key))
        except ClientError as error:
            if error.response['Error']['Code'] == 'UnresolvableGrantByEmailAddress':
                print('*'*88)
                print("This demo couldn't apply the ACL to the object because the email\n"
                    "address specified as the grantee is for a test user who does not\n"
                    "exist. For this request to succeed, you must replace the grantee\n"
                    "email with one for an existing AWS user.")
                print('*' * 88)
            else:
                raise


# snippet-end:[python.example_code.s3.Scenario_ObjectManagement]


if __name__ == '__main__':
    usage_demo()

代码参考:

S3 — Boto3 Docs 1.26.26 documentation

aws-doc-sdk-examples/object_wrapper.py at main · awsdocs/aws-doc-sdk-examples · GitHub

 类似资料: