AWS Certified Solution Architect Official Study Guide, Study Notes - S3

高化
2023-12-01

AWS S3

This is personal study notes for AWS Certified Solution Architect Official Study Guide, Associate.

common use cases

  1. backup and archive for on-premisesor cloud data
  2. content, media, software storage and distribution
  3. bigdata analytics
  4. static website hosting
  5. Cloud native mobile and internet application hosting
  6. disaster recovery

storage classes

S3 has three classies of storage classes

  1. general purpose
  2. infrequent access
  3. archive

configurable life cycle polices

storage services comparation

AWS itemsfeaturestraditional items
EBSblcok storage, files are stored in blocks, in a byte stream fashionSAN or local disk
EFSfiles storage, files are stored as filesNAS and NFS
S3object storgagefiles are stored as an object

for the difference between object and file and block storage, please check below
https://www.scality.com/topics/object-storage-vs-nas/
from user piont of view,
block storage, user is interacting with a block storage device, like a disk, lv directly.
files storage, user is interacting with a file system, with folders, and files. each files has meta data like file name, time stamps and so on.
object storage, user is ineracting with an object is a bucket. the bucket is flat, no folders and other structures. object storage is designed for multi-user shared storage. So each object more metadata than the metadata for a common file to support than. Also, an object has more related method than a common files. like what we have in development codings.

S3

buckets

buckets are global unique, amonge the whole AWS, because you may need have more replicas of this object in other regions.
bucket 63 bytes long,
100 buckets per account by default

best practise, put your domain name in bucket names

buckets are stoerd in a region selected by customer

Objects

common parameters

upto 5T each
has data and metadata, data portion is treated as a stream of bytes.
metadata is a set of name/value pairs
metadata has system metadata and user metadata

  • systemmetadata is created and used by S3.
  • usermetadata is can only be created by user when the object is created and it is optional and can be used to tag the object.

Keys of Objects

each object is identified by a key, a 1024 bytes long string of UTF-8. can be think of the filename. unique within the bucket.

Object URL

example:
bucket-name.s3.amazonaws.com/object-key
since the object key can have ‘/’ or ‘’ or other utf-8 characters, it may looks like
this-is-a-bucket.s3.amazonaws.com/cn/tj/file1.doc
actually, the key is /cn/tj/file1.doc.
a bucket is a simple flat namespace with no hierarchy. you can use tricks in naming a key to make it looks like has hierarchy in web, and it can navigate within s3 buckets, but that does not mean it has folders named cn or tj. this trick can help you orgnize your objects.

S3 Operations

  • Buckets
    • Create a bucket
    • delete a bucket
    • list keys in a bucket
  • Objects
    • write an object
    • read an object
    • delete an object

REST API

TIP:always use https
in most cases, people use SDK rather than REST API
AWS also support SOAP

Durability and Availability

  • standard s3 has 99.999999999% durability (9x2.9x9)
    99.99% availability less than one hour per year

  • RRS reduced redundancy storage has 99.99% durability

Data Consistency

Eventually consistency
for new key and object, AWS is read-after-write
for update or delete objects, the read result may be stale, or the new data, but never old and new mix data.

Access Control

Hosting a static website

we can use S3 as a static website, just following below steps:

  1. create a bucket
  2. upload files
  3. make the files public (world readable)
  4. enable static website hosting for the bucket (system will setup index document and error document)
  5. create DNS

Advanced features

prefixes and delimiters

Service classes

service nameshort nameduarabilityminimum object sizeminimum object live timeretrival timecostsenario
standardna9x11??realtimerelatively highcommon usage
standard infrequent-accessIA9x11128KB30 daysrealtimelower than standardlong time storage
reduced redundancy storageRRS9x4realtimelower than IAeasy reproduce datajio
Glacierna???3-5 hours5% of your data in glacier free each montharchieved data, no access in most cases

tips:Set a data retieval policy if you use glacier, to avoid unwanted cost.

object life cycle management :

configured on a bucket, can move objects from one tier to another automatically, even delete them in the end
commenly, from standard to IA to Glacier and then delete

Encryption

it is strongly recommanded to encrypt sensitive data in S3

In-flight encryption

use SSL API, so when data is encrypted during transit

At Rest Encryption

Server Side Encryption (SSE)

all of them use AES-256

  1. SSE-S3 (AWS-Manged keys)
    AWS manage the keys, master key for the key’s encryption and change the keys and master keys at least monthly.
    encrypted data, key, masters are stored separately
    every object has a unqiue key
  2. SSE-KSM
    SSE-S3 + separate permission for the using of master keys + auditing of key use, both success and failure.
  3. SSE-C (customer-provide-keys)
Client Side Encryption

you can use an AWS-KMS managed customer master key
or use a client-side master key

tips: use SSE-S3 or SSE-KMS are the simpliest and easiest ways.

Versioning

turned on at bucket level
once enabled, cannot be removed from a bucket, it can only be susended.

MFA Delete (Multi-Factor Authentication)

when permanntly delete an object, or change versionning stauts of a bucket, you need MFA.
MFA can only be enabled by root account.

Pre-Signed URLS

accessers’ own crediential
time limited
http method

Multipart Upload

S3 Mulitpart upload API
upload large files as multiparts
it has three steps:

  1. initiation
  2. uploading
  3. completing

parts are uploaded indepdently, at any order
suggested for object larger than 100Mb
MUST use for object larger than 5GB

Range GETs

get a part of an object by bytes
when you have poor connectivity with AWS

Cross Region replicas

bucket level
metadata and ACLs are for the bucket as well
versioning must be turned on for both source and destination buckets
this is used for reduce latency
if turn on this in an existing bucket, only new object are copied. old object need to copied with separate commands.

Logging

OFF by default
you must choose where the logs will be stored (buckets)
logs are best efforts with slightly delay
log info includes

  • ip
  • account
  • bucket name
  • aciton
  • response status or error code

Event Notification

this can be used for
work flows
send alerts
trigger for other actions.

this is at bucket level
notifications are sent when object are

  • created
  • removed
  • RRS items are lost

objects can be configured with prefix and surffix
notications can be sent via SQS or SNS.

Best Practices, Patterns, and Performances

pattern 1

data in on-premiss, backup to S3

pattern 2:

use S3 as data store, where stores real data, and keep the index in somewhere else. such as keep all the files in S3, and keep the index of files in a DynamoDB.

pattern 3:

for high access rate, it will auto scale
for higher rates, like 100 requests per second, please refer to developer guide or S3 best practise.

tips: if it is get-intensive mode, please considering use a Amazon CloundFront distribution as a cachng layer.

Amazon Glacier:

archives:

glacier put objects in Archives, not buckets. each archive contain upto 40T data. number of archives are unlimited.
archives has unique ID (cannot be customized)
archives are automatically encrypted
archives are immutable*

Vaults and vaults locks

by default, one account can have upto 1,000 vaults
standard S3 is buckets-objects structure. glacier is vaults-archives structure.

logical level for policiesfiles and data
bucketobjects
vaultsarchives

vault lock is the policy for the vault. once it is set, it cannot be changed.

Data Retrival

  • cost is based on retrival rate
  • every month has 5% free retrival, basing on daily prorate.
  • extra fee is based on maximum retrival rate, not data size.
 类似资料: