This is personal study notes for AWS Certified Solution Architect Official Study Guide, Associate.
S3 has three classies of storage classes
configurable life cycle polices
AWS items | features | traditional items |
---|---|---|
EBS | blcok storage, files are stored in blocks, in a byte stream fashion | SAN or local disk |
EFS | files storage, files are stored as files | NAS and NFS |
S3 | object storgage | files are stored as an object |
for the difference between object and file and block storage, please check below
https://www.scality.com/topics/object-storage-vs-nas/
from user piont of view,
block storage, user is interacting with a block storage device, like a disk, lv directly.
files storage, user is interacting with a file system, with folders, and files. each files has meta data like file name, time stamps and so on.
object storage, user is ineracting with an object is a bucket. the bucket is flat, no folders and other structures. object storage is designed for multi-user shared storage. So each object more metadata than the metadata for a common file to support than. Also, an object has more related method than a common files. like what we have in development codings.
buckets are global unique, amonge the whole AWS, because you may need have more replicas of this object in other regions.
bucket 63 bytes long,
100 buckets per account by default
best practise, put your domain name in bucket names
buckets are stoerd in a region selected by customer
upto 5T each
has data and metadata, data portion is treated as a stream of bytes.
metadata is a set of name/value pairs
metadata has system metadata and user metadata
each object is identified by a key, a 1024 bytes long string of UTF-8. can be think of the filename. unique within the bucket.
example:
bucket-name.s3.amazonaws.com/object-key
since the object key can have ‘/’ or ‘’ or other utf-8 characters, it may looks like
this-is-a-bucket.s3.amazonaws.com/cn/tj/file1.doc
actually, the key is /cn/tj/file1.doc.
a bucket is a simple flat namespace with no hierarchy. you can use tricks in naming a key to make it looks like has hierarchy in web, and it can navigate within s3 buckets, but that does not mean it has folders named cn or tj. this trick can help you orgnize your objects.
TIP:always use https
in most cases, people use SDK rather than REST API
AWS also support SOAP
standard s3 has 99.999999999% durability (9x2.9x9)
99.99% availability less than one hour per year
RRS reduced redundancy storage has 99.99% durability
Eventually consistency
for new key and object, AWS is read-after-write
for update or delete objects, the read result may be stale, or the new data, but never old and new mix data.
we can use S3 as a static website, just following below steps:
service name | short name | duarability | minimum object size | minimum object live time | retrival time | cost | senario |
---|---|---|---|---|---|---|---|
standard | na | 9x11 | ? | ? | realtime | relatively high | common usage |
standard infrequent-access | IA | 9x11 | 128KB | 30 days | realtime | lower than standard | long time storage |
reduced redundancy storage | RRS | 9x4 | realtime | lower than IA | easy reproduce datajio | ||
Glacier | na | ? | ? | ? | 3-5 hours | 5% of your data in glacier free each month | archieved data, no access in most cases |
tips:Set a data retieval policy if you use glacier, to avoid unwanted cost.
configured on a bucket, can move objects from one tier to another automatically, even delete them in the end
commenly, from standard to IA to Glacier and then delete
it is strongly recommanded to encrypt sensitive data in S3
use SSL API, so when data is encrypted during transit
all of them use AES-256
you can use an AWS-KMS managed customer master key
or use a client-side master key
tips: use SSE-S3 or SSE-KMS are the simpliest and easiest ways.
turned on at bucket level
once enabled, cannot be removed from a bucket, it can only be susended.
when permanntly delete an object, or change versionning stauts of a bucket, you need MFA.
MFA can only be enabled by root account.
accessers’ own crediential
time limited
http method
S3 Mulitpart upload API
upload large files as multiparts
it has three steps:
parts are uploaded indepdently, at any order
suggested for object larger than 100Mb
MUST use for object larger than 5GB
get a part of an object by bytes
when you have poor connectivity with AWS
bucket level
metadata and ACLs are for the bucket as well
versioning must be turned on for both source and destination buckets
this is used for reduce latency
if turn on this in an existing bucket, only new object are copied. old object need to copied with separate commands.
OFF by default
you must choose where the logs will be stored (buckets)
logs are best efforts with slightly delay
log info includes
this can be used for
work flows
send alerts
trigger for other actions.
this is at bucket level
notifications are sent when object are
objects can be configured with prefix and surffix
notications can be sent via SQS or SNS.
data in on-premiss, backup to S3
use S3 as data store, where stores real data, and keep the index in somewhere else. such as keep all the files in S3, and keep the index of files in a DynamoDB.
for high access rate, it will auto scale
for higher rates, like 100 requests per second, please refer to developer guide or S3 best practise.
tips: if it is get-intensive mode, please considering use a Amazon CloundFront distribution as a cachng layer.
glacier put objects in Archives, not buckets. each archive contain upto 40T data. number of archives are unlimited.
archives has unique ID (cannot be customized)
archives are automatically encrypted
archives are immutable*
by default, one account can have upto 1,000 vaults
standard S3 is buckets-objects structure. glacier is vaults-archives structure.
logical level for policies | files and data |
---|---|
bucket | objects |
vaults | archives |
vault lock is the policy for the vault. once it is set, it cannot be changed.