AWS Certified Solution Architect Official Study Guide, Study Notes - S3

高化

2023-12-01

AWS S3

This is personal study notes for AWS Certified Solution Architect Official Study Guide, Associate.

common use cases

backup and archive for on-premisesor cloud data
content, media, software storage and distribution
bigdata analytics
static website hosting
Cloud native mobile and internet application hosting
disaster recovery

storage classes

S3 has three classies of storage classes

general purpose
infrequent access
archive

configurable life cycle polices

storage services comparation

AWS items	features	traditional items
EBS	blcok storage, files are stored in blocks, in a byte stream fashion	SAN or local disk
EFS	files storage, files are stored as files	NAS and NFS
S3	object storgage	files are stored as an object

for the difference between object and file and block storage, please check below
https://www.scality.com/topics/object-storage-vs-nas/
from user piont of view,
block storage, user is interacting with a block storage device, like a disk, lv directly.
files storage, user is interacting with a file system, with folders, and files. each files has meta data like file name, time stamps and so on.
object storage, user is ineracting with an object is a bucket. the bucket is flat, no folders and other structures. object storage is designed for multi-user shared storage. So each object more metadata than the metadata for a common file to support than. Also, an object has more related method than a common files. like what we have in development codings.

S3

buckets

buckets are global unique, amonge the whole AWS, because you may need have more replicas of this object in other regions.
bucket 63 bytes long,
100 buckets per account by default

best practise, put your domain name in bucket names

buckets are stoerd in a region selected by customer

Objects

common parameters

upto 5T each
has data and metadata, data portion is treated as a stream of bytes.
metadata is a set of name/value pairs
metadata has system metadata and user metadata

systemmetadata is created and used by S3.
usermetadata is can only be created by user when the object is created and it is optional and can be used to tag the object.

Keys of Objects

each object is identified by a key, a 1024 bytes long string of UTF-8. can be think of the filename. unique within the bucket.

Object URL

example:
bucket-name.s3.amazonaws.com/object-key
since the object key can have ‘/’ or ‘’ or other utf-8 characters, it may looks like
this-is-a-bucket.s3.amazonaws.com/cn/tj/file1.doc
actually, the key is /cn/tj/file1.doc.
a bucket is a simple flat namespace with no hierarchy. you can use tricks in naming a key to make it looks like has hierarchy in web, and it can navigate within s3 buckets, but that does not mean it has folders named cn or tj. this trick can help you orgnize your objects.

S3 Operations

Buckets
- Create a bucket
- delete a bucket
- list keys in a bucket
Objects
- write an object
- read an object
- delete an object

REST API

TIP:always use https
in most cases, people use SDK rather than REST API
AWS also support SOAP

Durability and Availability

standard s3 has 99.999999999% durability (9x2.9x9)
99.99% availability less than one hour per year
RRS reduced redundancy storage has 99.99% durability

Data Consistency

Eventually consistency
for new key and object, AWS is read-after-write
for update or delete objects, the read result may be stale, or the new data, but never old and new mix data.

Access Control

Hosting a static website

we can use S3 as a static website, just following below steps:

create a bucket
upload files
make the files public (world readable)
enable static website hosting for the bucket (system will setup index document and error document)
create DNS

Advanced features

prefixes and delimiters

Service classes

service name	short name	duarability	minimum object size	minimum object live time	retrival time	cost	senario
standard	na	9x11	?	?	realtime	relatively high	common usage
standard infrequent-access	IA	9x11	128KB	30 days	realtime	lower than standard	long time storage
reduced redundancy storage	RRS	9x4			realtime	lower than IA	easy reproduce datajio
Glacier	na	?	?	?	3-5 hours	5% of your data in glacier free each month	archieved data, no access in most cases

tips:Set a data retieval policy if you use glacier, to avoid unwanted cost.

object life cycle management :

configured on a bucket, can move objects from one tier to another automatically, even delete them in the end
commenly, from standard to IA to Glacier and then delete

Encryption

it is strongly recommanded to encrypt sensitive data in S3

In-flight encryption

use SSL API, so when data is encrypted during transit

At Rest Encryption

Server Side Encryption (SSE)

all of them use AES-256

SSE-S3 (AWS-Manged keys)
AWS manage the keys, master key for the key’s encryption and change the keys and master keys at least monthly.
encrypted data, key, masters are stored separately
every object has a unqiue key
SSE-KSM
SSE-S3 + separate permission for the using of master keys + auditing of key use, both success and failure.
SSE-C (customer-provide-keys)

Client Side Encryption

you can use an AWS-KMS managed customer master key
or use a client-side master key

tips: use SSE-S3 or SSE-KMS are the simpliest and easiest ways.

Versioning

turned on at bucket level
once enabled, cannot be removed from a bucket, it can only be susended.

MFA Delete (Multi-Factor Authentication)

when permanntly delete an object, or change versionning stauts of a bucket, you need MFA.
MFA can only be enabled by root account.

Pre-Signed URLS

accessers’ own crediential
time limited
http method

Multipart Upload

S3 Mulitpart upload API
upload large files as multiparts
it has three steps:

initiation
uploading
completing

parts are uploaded indepdently, at any order
suggested for object larger than 100Mb
MUST use for object larger than 5GB

Range GETs

get a part of an object by bytes
when you have poor connectivity with AWS

Cross Region replicas

bucket level
metadata and ACLs are for the bucket as well
versioning must be turned on for both source and destination buckets
this is used for reduce latency
if turn on this in an existing bucket, only new object are copied. old object need to copied with separate commands.

Logging

OFF by default
you must choose where the logs will be stored (buckets)
logs are best efforts with slightly delay
log info includes

ip
account
bucket name
aciton
response status or error code

Event Notification

this can be used for
work flows
send alerts
trigger for other actions.

this is at bucket level
notifications are sent when object are

created
removed
RRS items are lost

objects can be configured with prefix and surffix
notications can be sent via SQS or SNS.

Best Practices, Patterns, and Performances

pattern 1

data in on-premiss, backup to S3

pattern 2:

use S3 as data store, where stores real data, and keep the index in somewhere else. such as keep all the files in S3, and keep the index of files in a DynamoDB.

pattern 3:

for high access rate, it will auto scale
for higher rates, like 100 requests per second, please refer to developer guide or S3 best practise.

tips: if it is get-intensive mode, please considering use a Amazon CloundFront distribution as a cachng layer.

Amazon Glacier:

archives:

glacier put objects in Archives, not buckets. each archive contain upto 40T data. number of archives are unlimited.
archives has unique ID (cannot be customized)
archives are automatically encrypted
archives are immutable*

Vaults and vaults locks

by default, one account can have upto 1,000 vaults
standard S3 is buckets-objects structure. glacier is vaults-archives structure.

logical level for policies	files and data
bucket	objects
vaults	archives

vault lock is the policy for the vault. once it is set, it cannot be changed.

Data Retrival

cost is based on retrival rate
every month has 5% free retrival, basing on daily prorate.
extra fee is based on maximum retrival rate, not data size.