Perl Multithreaded multipart sync to Amazon Glacier service.
Amazon Glacier is an archive/backup service with very low storage price. However with some caveats in usage and archive retrieval prices.Read more about Amazon Glacier
mt-aws-glacier is a client application for Amazon Glacier, written in Perl programming language, for *nix.
Script is made for Unix OS. Tested under Linux. Should work under other POSIX OSes (*BSD, Solaris). Lightly tested under Mac OS X.Will NOT work under Windows/Cygwin. Minimum Perl version required is 5.8.8 (pretty old, AFAIK there are no supported distributions with older Perls)
NOTE: If you've used manual installation before, please remove previously installed mtglacier
executable from your path.
NOTE: If you've used CPAN installation before, please remove previously installed module, (cpanm is capable to do that)
Can be installed/updated via PPA vsespb/mt-aws-glacier:
sudo apt-get update
sudo apt-get install software-properties-common python-software-properties
sudo add-apt-repository ppa:vsespb/mt-aws-glacier
(GPG key id/fingerprint would be D2BFA5E4 and D7F1BC2238569FC447A8D8249E86E8B2D2BFA5E4)
sudo apt-get update
sudo apt-get install libapp-mtaws-perl
Can be installed/updated via custom repository
wget -O - http://mt-aws.com/vsespb.gpg.key | sudo apt-key add -
(this will add GPG key 2C00 B003 A56C 5F2A 75C4 4BF8 2A6E 0307 D0FF 5699)
Add repository
echo "deb http://dl.mt-aws.com/debian/current squeeze main"|sudo tee /etc/apt/sources.list.d/mt-aws.list
sudo apt-get update
sudo apt-get install libapp-mtaws-perl
(To use HTTPS you also need:)
sudo apt-get install build-essential libssl-dev
install/update LWP::UserAgent
and LWP::Protocol::https
using cpanm
Can be installed/updated via custom repository
wget -O - https://mt-aws.com/vsespb.gpg.key | sudo apt-key add -
(this will add GPG key 2C00 B003 A56C 5F2A 75C4 4BF8 2A6E 0307 D0FF 5699)
Add repository
echo "deb http://dl.mt-aws.com/debian/current wheezy main"|sudo tee /etc/apt/sources.list.d/mt-aws.list
sudo apt-get update
sudo apt-get install libapp-mtaws-perl
Can be installed/updated via custom repository
wget -O - https://mt-aws.com/vsespb.gpg.key | sudo apt-key add -
(this will add GPG key 2C00 B003 A56C 5F2A 75C4 4BF8 2A6E 0307 D0FF 5699)
Add repository
echo "deb http://dl.mt-aws.com/debian/current jessie main"|sudo tee /etc/apt/sources.list.d/mt-aws.list
sudo apt-get update
sudo apt-get install libapp-mtaws-perl
sudo apt-get install libwww-perl libjson-xs-perl
sudo yum install perl-Digest-SHA
sudo yum groupinstall "Development Tools"
sudo yum install openssl-devel
JSON::XS
, LWP::UserAgent
and LWP::Protocol::https
using cpanmYou also can install mtglacier
prerequisites without CPAN if you have EPEL repository enabled and if you don't need HTTPS:
sudo yum install perl-Digest-SHA perl-JSON-XS perl-libwww-perl
sudo yum install perl-core perl-CGI
sudo yum groupinstall "Development Tools"
sudo yum install openssl-devel
JSON::XS
, LWP::UserAgent
and LWP::Protocol::https
using cpanmYou also can install mtglacier
prerequisites without CPAN if you have EPEL repository enabled and if you don't need HTTPS:
sudo yum install perl-core perl-CGI perl-JSON-XS perl-libwww-perl
sudo apt-get install libwww-perl libjson-xs-perl
To use HTTPS you also need:
sudo apt-get install build-essential libssl-dev
install/update LWP::UserAgent
and LWP::Protocol::https
using cpanm
sudo yum install perl-core perl-CGI perl-JSON-XS perl-libwww-perl perl-LWP-Protocol-https
sudo zypper install perl-libwww-perl libopenssl-devel
sudo zypper install --type pattern Basis-Devel
0.9.8r
(to check version use openssl version
), can be found here (more info here RT#81575)ExtUtils::MakeMaker
via cpanmLWP::UserAgent
, LWP::Protocol::https
, JSON::XS
using cpanmsudo yum install perl-core perl-JSON-XS perl-libwww-perl perl-LWP-Protocol-https
Install the following packages:
Install LWP::UserAgent
(p5-libwww-perl
), JSON::XS
(p5-json-XS
). For HTTPS support you need LWP::Protocol::https
, however on MacOS Xyou probably need Mozilla::CA
(it should go with LWP::Protocol::https
, but it can be missing). Try to use HTTPS without Mozilla::CA
- if it does not work, installMozilla::CA
git clone https://github.com/vsespb/mt-aws-glacier.git
(or just download and unzip https://github.com/vsespb/mt-aws-glacier/archive/master.zip
)
After that you can execute mtglacier
script (found in root of repository) from any directory, or create a symlink to it - it will find other package files by itself(don't forget to remove it later, if you decide to switch to CPAN install)
cpan -i App::MtAws
That's it.
for old Perl < 5.9.3 (i.e. CentOS 5.x), install also Digest::SHA (or Debian package libdigest-sha-perl or RPM package perl-Digest-SHA)
Some distributions with old Perl stuff (examples: Ubuntu 10.04, CentOS 5/6) to use HTTPS you need to upgrade LWP::Protocol::https to version 6+ via CPAN.
Fedora, CentOS 6 etc decoupled Perl,so package named perl
, which is a part of default installation, is not actually real, full Perl, which is misleading.perl-core
is looks much more like a real Perl (I hope so)
On newer RHEL distributions (some Fedora versions) you need install perl-LWP-Protocol-https to use HTTPS.
To inistall perl-JSON-XS
RPM package on RHEL5/6 you need to enable EPEL repository
If you've used manual installation before "CPAN" installation, it's probably better to remove previously installed mtglacier
executable from your path.
CPAN distribution of mt-aws-glacier has a bit more dependencies than manual installation, as it requires additional modules for testsuite.
New releases of mt-aws-glacier usually appear on CPAN within a ~week after official release.
On Fedora, CentOS 6 minimal you need to install perl-core
, perl-CPAN
, perl-CGI
before trying to install via CPAN
For some distributions with old Perl stuff (examples: CentOS 5/6) you need to update CPAN and Module::Build first: cpan -i CPAN
, cpan -i Module::Build
CPAN tool asks too many questions during install (but ignores important errors). You can avoid it by running cpan
command and configuring it like this:
o conf build_requires_install_policy yes
o conf prerequisites_policy follow
o conf halt_on_failure on
o conf commit
exit
Instead system cpan
tool you might want to try cpanm
- it's a bit easier to install and configure.
Installation of LWP::Protocol::https requires C header files ( yum groupinstall "Development Tools"
for RHEL or build-essential
for Debian ) and OpenSSL dev library (openssl-devel
RPM or libssl-dev
DEB).
When playing with Glacier make sure you will be able to delete all your archives, it's impossible to delete archiveor non-empty vault in amazon console now. Also make sure you have read all Amazon Glacier pricing/faq.
Read Amazon Glacier pricing FAQ again, really. Beware of retrieval fee.
Before using this program, you should read Amazon Glacier documentation and understand, in general, Amazon Glacier workflows and entities. This documentationdoes not define any new layer of abstraction over Amazon Glacier entities.
In general, all Amazon Glacier clients store metadata (filenames, file metadata) in own formats, incompatible with each other. To restore backup made with mt-aws-glacier
you'llneed mt-aws-glacier
, other software most likely will restore your data but loose filenames.
With low "partsize" option you pay a bit more (Amazon charges for each upload request)
For backup created with older versions (0.7x) of mt-aws-glacier, Journal file required to restore backup.
Use a Journal file only with same vault ( more info here and here and here)
When work with CD-ROM/CIFS/other non-Unix/non-POSIX filesystems, you might need set leaf-optimization
to 0
Please read ChangeLog when upgrading to new version, and especially when downgrading.(See "Compatibility" sections when downgrading)
Zero length files and empty directories are ignored (as Amazon Glacier does not support it)
See other limitations
Create a directory containing files to backup. Example /data/backup
Create config file, say, glacier.cfg
key=YOURKEY
secret=YOURSECRET
# region: eu-west-1, us-east-1 etc
region=us-east-1
# protocol=http (default) or https
protocol=http
(you can skip any config option and specify it directly in command line, command line options override same options in config)
Create a vault in specified region, using Amazon Console (myvault
) or using mtglacier
./mtglacier create-vault myvault --config glacier.cfg
(note that Amazon Glacier does not return error if vault already exists etc)
Choose a filename for the Journal, for example, journal.log
Sync your files
./mtglacier sync --config glacier.cfg --dir /data/backup --vault myvault --journal journal.log --concurrency 3
Add more files and sync again
Check that your local files not modified since last sync
./mtglacier check-local-hash --config glacier.cfg --dir /data/backup --journal journal.log
Delete some files from your backup location
Initiate archive restore job on Amazon side
./mtglacier restore --config glacier.cfg --dir /data/backup --vault myvault --journal journal.log --max-number-of-files 10
Wait 4+ hours for Amazon Glacier to complete archive retrieval
Download restored files back to backup location
./mtglacier restore-completed --config glacier.cfg --dir /data/backup --vault myvault --journal journal.log
Delete all your files from vault
./mtglacier purge-vault --config glacier.cfg --vault myvault --journal journal.log
Wait ~ 24-48 hours and you can try deleting your vault
./mtglacier delete-vault myvault --config glacier.cfg
(note: currently Amazon Glacier does not return error if vault is not exists)
In case you lost your journal file, you can restore it from Amazon Glacier metadata
Run retrieve-inventory command. This will request Amazon Glacier to prepare vault inventory.
./mtglacier retrieve-inventory --config glacier.cfg --vault myvault
Wait 4+ hours for Amazon Glacier to complete inventory retrieval (also note that you will get only ~24h old inventory..)
Download inventory and export it to new journal (this sometimes can be pretty slow even if inventory is small, wait a few minutes):
./mtglacier download-inventory --config glacier.cfg --vault myvault --new-journal new-journal.log
For files created by mt-aws-glacier version 0.8x and higher original filenames will be restored. For other files archive_id will be used as filename. See Amazon Glacier metadata format for mt-aws-glacier here: Amazon Glacier metadata format used by mt-aws glacier
Journal is a file in local filesystem, which contains list of all files, uploaded to Amazon Glacier.Strictly saying, this file contains a list of operations (list of records), performed with Amazon Glacier vault. Main operations are:file creation, file deletion and file retrieval.
Create operation records contains: local filename (relative to transfer root - --dir
), file size, file last modification time (in 1 second resolution), file TreeHash (Amazonhashing algorithm, based on SHA256), file upload time, and Amazon Glacier archive id
Delete operation records contains local filename and corresponding Amazon Glacier archive id
Having such list of operation, we can, any time reconstruct list of files, that are currently stored in Amazon Glacier.
As you see Journal records don't contain Amazon Glacier region, vault, file permissions, last access times and other filesystem metadata.
Thus you should always use a separate Journal file for each Amazon Glacier vault. Also, file metadata (except filename and file modification time) willbe lost, if you restore files from Amazon Glacier.
It's a text file. You can parse it with grep
awk
cut
, tail
etc, to extract information in case you need perform some advanced stuff, that mtglacier
can't do (NOTE: make sure you know what you're doing ).
To view only some files:
grep Majorca Photos.journal
To view only creation records:
grep CREATED Photos.journal | wc -l
To compare only important fields of two journals
cut journal -f 4,5,6,7,8 |sort > journal.cut
cut new-journal -f 4,5,6,7,8 |sort > new-journal.cut
diff journal.cut new-journal.cut
Each text line in a file represent one record
It's an append-only file. File opened in append-only mode, and new records only added to the end. This guarantees thatyou can recover Journal file to previous state in case of bug in program/crash/some power/filesystem issues. You can even use chattr +a
to set append-only protection to the Journal.
As Journal file is append-only, it's easy to perform incremental backups of it
Journal is needed to restore backup, and we can expect that if you need to restore a backup, that means that you lost your filesystem, together with Journal.
However Journal also needed to perform new backups (sync
command), to determine which files are already in Glacier and which are not. And also to checking local file integrity (check-local-hash
command).Actually, usually you perform new backups every day. And you restore backups (and loose your filesystem) very rare.
So fast (local) journal is essential to perform new backups fast and cheap (important for users who backups thousands or millions of files).
And if you lost your journal, you can restore it from Amazon Glacier (see retrieve-inventory
command). Also it's recommended to backup your journalto another backup system (Amazon S3 ? Dropbox ?) with another tool, because retrieving inventory from Amazon Glacier is pretty slow.
Also some users might want to backup same files from multiple different locations. They will need synchronization solution for journal files.
Anyway I think problem of putting Journals into cloud can be automated and solved with 3 lines bash script..
You can name journal with same name as your vault. Example: Vault name is Photos
. Journal file name is Photos.journal
. Or eu-west-1-Photos.journal
(Almost) Any command line option can be used in config file, so you can create myphotos.cfg
with following content:
key=YOURKEY
secret=YOURSECRET
protocol=http
region=us-east-1
vault=Photos
journal=/home/me/.glacier/photos.journal
Keeping journal/vault in config does looks to me more like a Unix way. It can be a bit danger, but easier to maintain, because:
Let's imaging I decided to put region/vault into Journal. There are two options:
a. Put it into beginning of the file, before journal creation.
b. Store same region/vault in each record of the file. It looks like a waste of disk space.
Option (a) looks better. So this way journal will contain something like
region=us-east-1
vault=Photos
in the beginning. But same can be achieved by putting same lines to the config file (see previous question)
Also, putting vault/region to journal will make command line options --vault
and --region
uselessfor general commands and will require to add another command (something like create-journal-file
)
There is a possibility to use different account id in Amazon Glacier (i.e. different person's account). It's not supported yet in mtglacier
,but when it will, I'll have to store account id together with region/vault. Also default account id is '-' (means 'my account'). If one wish to use samevault from a different Amazon Glacier account, he'll have to change '-' to real account id. So need to have ability to edit account id.And region/vault information does not have sense without account.
Some users can have different permissions for different vaults, so they needs to maintain key
/secret
/account_id
region/vault
journal
relation in same place(this only can be config file, because involves secret
)
Amazon might allow renaming of vaults or moving it across regions, in the future.
Currently journal consists of independent records, so can be split to separate records using grep
, or severaljournals can be merged using cat
(but be careful if doing that)
In the future, there can be other features and options added, such as compression/encryption, which might require to decide again where to put new attributes for it.
Usually there is different policy for backing up config files and journal files (modifiable). So if you loose your journal file, you won't be sure which config corresponds to which vault (and journal filecan be restored from a vault)
It's better to keep relation between vault and transfer root (--dir
option) in one place, such as config file.
If you want to store permissions, put your files to archives before backup to Amazon Glacier. There are lot's of different possible things to store as file metadata information,most of them are not portable. Take a look on archives file formats - different formats allows to store different metadata.
It's possible that in the future mtglacier
will support some other metadata things.
sync
Propagates current local filesystem state to Amazon Glacier server.
sync
accepts one or several of the following mode options: --new
, --replace-modified
, --delete-removed
If none of three above mode options provided, --new
is implied (basically for backward compatibility).
--new
Uploads files, which exist in local filesystem (and have non-zero size), but not exist in Amazon Glacier (i.e. in Journal)
--replace-modified
Uploads modified files (i.e. which exist in local filesystem and in Amazon Glacier). After file gets successfully uploaded,previous version of file is deleted. Logic of detection of modified files controlled by --detect
option.
--delete-removed
Deletes files, which exist in Amazon Glacier, but missing in local filesystem (or have zero size) , from Amazon Glacier.
--detect
Controls how --replace-modified
detect modified files. Possible values are: treehash
, mtime
, mtime-or-treehash
, mtime-and-treehash
,always-positive
, size-only
.Default value is mtime-and-treehash
File is always considered modified if its size changed (but not zero)
treehash
- calculates TreeHash checksum for file and compares with one in Journal. If checksum does not match - file is modified.
mtime
- compares file last modification time in local filesystem and in journal, if it differs - file is modified.
mtime-or-treehash
- compares file last modification time, if it differs - file is modified. If it matches - compares TreeHash.
mtime-and-treehash
- compares file last modification time, if it differs - compares TreeHash. If modification time is not changed, filetreated as not-modified, treehash not checked.
always-positive
- always treat files as modified, Modification time and TreeHash are ignored. Probably makes some sense only with --filter
options.
size-only
- treat files as modified only if size differs
NOTE: default mode for detect is mtime-and-treehash
, it's more performance wise (treehash checked only for files with modification time changed),but mtime-or-treehash
and treehash
are more safe in case you're not sure which programs change your files and how.
NOTE: mtime-or-treehash
is mnemonic for File is modified if mtime differs OR treehash differsmtime-and-treehash
is mnemonic for File is modified if mtime differs AND treehash differs. WordsAND and OR means here logical operators with short-circuit evaluationi.e. with mtime-and-treehash
treehash never checked if mtime not differs. And with mtime-or-treehash
treehash never checked if mtime differs.
NOTE: files with zero sizes are not supported by Amazon Glacier API, thus considered non-existing for consistency, for all sync
modes.
NOTE: sync
does not upload empty directories, there is no such thing as directory in Amazon Glacier.
NOTE: With --dry-run
option TreeHash will not be calculated, instead Will VERIFY treehash and upload... message will be displayed.
NOTE: TreeHash calculation performed in parallel, so some of workers (defined with --concurrency
) might be busy calculating treehash insteadof network IO.
restore
Initiate Amazon Glacier RETRIEVE oparation for files listed in Journal, which don't exist on local filesystem and forwhich RETRIEVE was not initiated during last 24 hours (that information obtained from Journal too - each retrieval loggedinto journal together with timestamp)
restore-completed
Donwloads files, listed in Journal, which don't exist on local filesystem, and which were previouslyRETRIEVED (using restore
command) and now available for download (i.e. in a ~4hours after retrieve).Unlike restore
command, list of retrieved files is requested from Amazon Glacier servers at runtime using API, not fromjournal.
Data downloaded to unique temporary files (created in same directory as destination file). Temp files renamed to real filesonly when download successfully finished. In case program terminated with error or after Ctrl-C, temp files with unfinisheddownloads removed.
If segment-size
specified (greater than 0) and particular file size in megabytes is larger than segment-size
,download for this file performed in multiple segments, i.e. using HTTP Range:
header (each of size segment-size
MiB, except last,which can be smaller). Segments are downloaded in parallel (and different segments from different files canbe downloaded at same time).
Only values that are power of two supported for segment-size
now.
Currenly if download breaks due to network problem, no resumption is performed, download of file or of current segmentstarted from beginning.
In case multi-segment downloads, TreeHash reported by Amazon Glacier for each segment is compared with actual TreeHash, calculated for segment at runtime.In case of mismatch error is thrown and process stopped. Final TreeHash for whole file not checked yet.
In case full-file downloads, TreeHash reported by Amazon Glacier for whole file is compared with one calculated runtime and with one found in Journal file,in case of mismatch, error is thrown and process stopped.
Unlike partsize
option, segment-size
does not allocate buffers in memory of the size specified, so you can use large segment-size
.
upload-file
Uploads a single file into Amazon Glacier. File will be tracked with Journal (just like when using sync
command).
There are several possible combinations of options for upload-file
:
--filename and --dir
Uploads what: a file, pointed by filename
.
Filename in Journal and Amazon Glacier metadata: A relative path from dir
to filename
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --dir /data/backup --filename /data/backup/dir1/myfile
(this will upload content of /data/backup/dir1/myfile
to Amazon Glacier and use dir1/myfile
as filename for Journal )
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --dir data/backup --filename data/backup/dir1/myfile
(Let's assume current directory is /home
. Then this will upload content of /home/data/backup/dir1/myfile
to Amazon Glacier and use dir1/myfile
as filename for Journal)
NOTE: file filename
should be inside directory dir
NOTE: both -filename
and --dir
resolved to full paths, before determining relative path from --dir
to --filename
. Thus yo'll get an errorif parent directories are unreadable. Also if you have /dir/ds
symlink to /dir/d3
directory, then --dir /dir
--filename /dir/ds/file
will result in relativefilename d3/file
not ds/file
--filename and --set-rel-filename
Uploads what: a file, pointed by filename
.
Filename in Journal and Amazon Glacier metadata: As specified in set-rel-filename
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --filename /tmp/myfile --set-rel-filename a/b/c
(this will upload content of /tmp/myfile
to Amazon Glacier and use a/b/c
as filename for Journal )
(NOTE: set-rel-filename
should be a relative filename i.e. must not start with /
)
--stdin, --set-rel-filename and --check-max-file-size
Uploads what: a file, read from STDIN
Filename in Journal and Amazon Glacier metadata: As specified in set-rel-filename
Also, as file size is not known until the very end of upload, need to be sure that file will not exceed 10 000 parts limit, and you mustspecify check-max-file-size
-- maximum possible size of file (in Megabytes), that you can expect. What this option do is simply throw errorif check-max-file-size
/partsize
> 10 000 parts (in that case it's recommended to adjust partsize
). That's all. I remind that you can put this (andany other option to config file)
./mtglacier upload-file --config glacier.cfg --vault myvault --journal journal.log --stdin --set-rel-filename path/to/file --check-max-file-size 131
(this will upload content of file read from STDIN to Amazon Glacier and use path/to/file
as filename for Journal. )
(NOTE: set-rel-filename
should be a relative filename i.e. must not start with /
)
NOTES:
In the current version of mtglacier you are disallowed to store multiple versions of same file. I.e. upload multiple files with same relative filenameto a single Amazon Glacier vault and single Journal. Simple file versioning will be implemented in the future versions.
You can use other optional options with this command (concurrency
, partsize
)
retrieve-inventory
Issues inventory retrieval request for --vault
.
You can specify inventory format with --request-inventory-format
. Allowed values are json
and csv
. Defaults to json
.Although it's not recommended to use csv
unless you have to. Amazon CSV format is not documented, has bugs and mt-aws-glacier
CSV parsingimplementation (i.e. download-inventory
command) is ~ 10 times slower than JSON.
See also Restoring journal for retrieve-inventory
, download-inventory
commands examples.
download-inventory
Parses Amazon glacier job list (for --vault
) taken from Amazon servers at runtime, finds latest (by initiation date) inventory retrieval request,downloads it, converts to journal file and saves to --new-journal
. Both CSV
and JSON
jobs are supported.
See also Restoring journal for retrieve-inventory
, download-inventory
commands examples.
list-vaults
Lists all vaults in region specified by --region
(with a respect to IAM permissions for listing vaults), prints it to the screen. Default format is human readable, notfor parsing. Use --format=mtmsg
for machine readable tab separated format (which is not yet documented here, however it's self-explanatory and backward compatability is guaranteed;one note - LastInventoryDate can be empty string as Amazon API can return it as null).
See usage for examples of use of the following commands: purge-vault
, check-local-hash
, create-vault
, delete-vault
.
filter
, include
, exclude
options allow you to construct a list of RULES to select only certain files for the operation.Can be used with commands: sync
, purge-vault
, restore
, restore-completed
and check-local-hash
--filter
Adds one or several RULES to the list of rules. One filter value can contain multiple rules, it has same effect as multiple filter values with oneRULE each.
--filter 'RULE1 RULE2' --filter 'RULE3'
is same as
--filter 'RULE1 RULE2 RULE3'
RULES should be a sequence of PATTERNS, followed by '+' or '-' and separated by a spaces. There can be a space between '+'/'-' and PATTERN.
RULES: [+-]PATTERN [+-]PATTERN ...
'+' means INCLUDE PATTERN, '-' means EXCLUDE PATTERN
NOTES:
1. If RULES contain spaces or wildcards, you must quote it when running `mtglacier` from Shell (Example: `mtglacier ... --filter -tmp/` but `mtglacier --filter '-log/ -tmp/'`)
2. Although, PATTERN can contain spaces, you cannot use if, because RULES separated by a space(s).
3. PATTERN can be empty (Example: `--filter +data/ --filter -` - excludes everything except any directory with name `data`, last pattern is empty)
4. Unlike other options, `filter`, `include` and `exclude` cannot be used in config file (in order to avoid mess with order of rules)
--include
Adds an INCLUDE PATTERN to list of rules (Example: --include /data/ --filter '+/photos/ -'
- include only photos and data directories)
--exclude
Adds an EXCLUDE PATTERN to list of rules (Example: --exclude /data/
- include everything except /data and subdirectories)
NOTES:
1. You can use spaces in PATTERNS here (Example: `--exclude '/my documents/'` - include everything except "/my documents" and subdirectories)
How PATTERNS work
If the pattern starts with a '/' then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the finalcomponent of the filename.
/tmp/myfile
- matches only /tmp/myfile
. But tmp/myfile
- matches /tmp/myfile
and /home/john/tmp/myfile
If the pattern ends with a '/' then it will only match a directory and all files/subdirectories inside this directory. It won't match regular file.Note that if directory is empty, it won't be synchronized to Amazon Glacier, as it does not support directories
log/
- matches only directory log
, but not a file log
If pattern does not end with a '/', it won't match directory (directories are not supported by Amazon Glacier, so it makes no sense to match a directorywithout subdirectories). However if, in future versions, we find a way to store empty directories in Amazon Glacier, this behavior may change.
log
- matches only file log
, but not a directory log
nor files inside it
if the pattern contains a '/' (not counting a trailing '/') then it is matched against the full pathname, including any leading directories.Otherwise it is matched only against the final component of the filename.
myfile
- matches myfile
in any directory (i.e. matches both /home/ivan/myfile
and /data/tmp/myfile
), but it does not match/tmp/myfile/myfile1
. While tmp/myfile
matches /data/tmp/myfile
and /tmp/myfile/myfile1
Wildcard '*' matches zero or more characters, but it stops at slashes.
/tmp*/file
matches /tmp/file
, /tmp1/file
, /tmp2/file
but not tmp1/x/file
Wildcard '**' matches anything, including slashes.
/tmp**/file
matches /tmp/file
, /tmp1/file
, /tmp2/file
, tmp1/x/file
and tmp1/x/y/z/file
When wildcard '**' meant to be a separated path component (i.e. surrounded with slashes/beginning of line/end of line), it matches 0 or more subdirectories.
/foo/**/bar
matches foo/bar
and foo/x/bar
. Also **/file
matches /file
and x/file
Wildcard '?' matches any (exactly one) character except a slash ('/').
??.txt
matches 11.txt
, xy.txt
but not abc.txt
if PATTERN is empty, it matches anything.
mtglacier ... --filter '+data/ -'
- Last pattern is empty string (followed by '-')
If PATTERN is started with '!' it only match when rest of pattern (i.e. without '!') does not match.
mtglacier ... --filter '-!/data/ +*.gz' -
- include only *.gz
files inside data/
directory.
How rules are processed
File's relative filename (relative to --dir
root) is checked against rules in the list. Once filename match PATTERN, file is included or excluded depending on the kind of PATTERN matched.No other rules checked after first match.
--filter '+*.txt -file.txt'
File file.txt
is INCLUDED, it matches 1st pattern, so 2nd pattern is ignored
If no rules matched - file is included (default rule is INCLUDE rule).
--filter '+*.jpeg'
File file.txt
is INCLUDED, as it does not match any rules
When traverse directory tree, (in contrast to behavior of some tools, like Rsync), if a directory (and all subdirectories) match exclude pattern,directory tree is not pruned, traversal go into the directory. So this will work fine (it will include /tmp/data/a/b/c
, but exclude all other files in /tmp/data
):
--filter '+/tmp/data/a/b/c -/tmp/data/ +'
mtglacier
absolutely sure that it won't break behavior (4) described above.Currently it's guaranteed that traversal stop only in case when:A directory match EXCLUDE rule without '!' prefix, ending with '/' or '**', or empty rule
AND there are no INCLUDE rules before this EXCLUDE RULE
`--filter '-*.tmp -/media/ -/proc/ +*.jpeg'` - system '/proc' and huge '/media' directory is not traversed.
NOTE: Any command line option can be used in config file as well, but options specified on command line override options specified in config.
concurrency
(with sync
, upload-file
, restore
, restore-completed
commands) - number of parallel upload streams to run. (default 4)
--concurrency 4
partsize
(with sync
, upload-file
command) - size of file chunk to upload at once, in Megabytes. (default 16)
--partsize 16
segment-size
(with restore-completed
command) - size of download segment, in MiB (default: none)
If segment-size
specified (greater than zero), and file size in megabytes is larger than segment-size
, download performed inmultiple segments.
If omited or zero, multi-segment download is disabled (i.e this is default)
segment-size
should be power of two.
max-number-of-files
(with sync
or restore
commands) - limit number of files to sync/restore. Program will finish when reach this limit.
--max-number-of-files 100
key/secret/region/vault/protocol
- you can override any option from config
dry-run
(with sync
, purge-vault
, restore
, restore-completed
and even check-local-hash
commands) - do not perform actual work, print what will happen instead.
--dry-run
leaf-optimization
(only sync
command). 0
- disable. 1
- enable (default).Similar to find (coreutils tools) -noleaf
option and File::Find $dont_use_nlink
option.When disabled number of hardlinks to directory is ignored during file tree traversal. This slow down file search, but morecompatible with (some) CIFS/CD-ROM filesystems.For more information see find and File::Find manuals.
token
(all commands which connect Amazon Glacier API) - a STS/IAM security token, described in Amazon STS/IAM Using Temporary Security Credentials to Access AWS
timeout
(all commands which connect Amazon Glacier API)
Sets the timeout value in seconds, default value is 180 seconds. Request to Amazon Glacier is retried, if if no activityon the connection to the server is observed for timeout
seconds. This means that the time it takes for the complete wholerequest might be longer.
follow
(only sync
command)
Follow symbolic links during directory traversal. This option hits performance and increases memory usage. Similar to find -L
Autodetection of locale/encodings not implemented yet, but currently there is ability to tune encodings manually.
Below 4 options, that can be used in config file and in command line.
terminal-encoding
- Encoding of your terminal (STDOUT/STDERR for system messages)
filenames-encoding
- Encoding of filenames in filesystem.
Under most *nix filesystems filenames stored as byte sequences, not characters. So in theory application is responsible for managing encodings.
config-encoding
- Encoding of your config file (glacier.cfg
in examples above)
journal-encoding
- Encoding to be used for Journal file (when reading and writing journal specified with --journal
and --new-journal
options)
Default value for all options is 'UTF-8'. Under Linux and Mac OS X you usually don't need to change encodings.Under *BSD systems often single-byte encodings are used. Most likely yo'll need to change terminal-encoding
and filenames-encoding
. Optionaly you can alsochange config-encoding
and journal-encoding
.
Notes:
Before switching config-encoding
and journal-encoding
you are responsible for transcoding file content of config and journal files manually.
You are responsible for encoding compatibility. For example Don't try to work with UTF-8 journal with non-Cyrilic characters and KOI8-R (Cyrilic) filesystem.
Don't try to use UTF-16 for *nix filesystem. It's not ASCII compatible and contains \x00 bytes, which can't be stored in filesystem.
Don't use UTF8
- it does not validate data, use UTF-8
(one with a dash) instead.
To get list of encodings installed with your Perl run:
perl -MEncode -e 'print join qq{\n}, Encode->encodings(q{:all})'
Config file name (specified with --config
) can be in any encoding (it's used as is) Of course it will work only if your terminal encoding match yourfilesystem encoding or if your config file name consists of ASCII-7bit characters only.
Additional information about encoding support in Perl programming language: CPAN module Encode::Supported
Amazon Glacier metadata (on Amazon servers) is always stored in UTF-8. No way to override it. You can use Journal in any encoding with samemetdata without problems and you can dump metadata to journals with different encodings (using download-inventory
command)
See also convmv tool
Only support filenames, which consist of octets, that can be mapped to a valid character sequence in desired encoding (i.e. filenamewhich are made of random bytes/garbage is not supported. usually it's not a problem).
Filenames with CR (Carriage return, code 0x0D) LF (Line feed, code 0x0A) and TAB (0x09) are not supported (usually not a problem too).
Length of relative filenames. Currently limit is about 700 ASCII characters or 350 2-byte UTF-8 character (.. or 230 3-byte characters).
File modification time should be in range from year 1000 to year 9999.
(NOTE: if above requirements are not met, error will be thrown)
If you uploaded files with file modifications dates past Y2038 on system which supports it, and then restored on systemwhich does not (like Linux 32bit), resulting file timestamp (of course) wrong and alsounpredictible (undefined behaviour). The only thing is guaranteed that if you restore journal from Amazon servers on affected (i.e. 32bit)machine - journal will contain correct timestamp (same as on 64bit).
Memory usage (for 'sync') formula is ~ min(NUMBER_OF_FILES_TO_SYNC, max-number-of-files) + partsize*concurrency
With high partsize*concurrency there is a risk of getting network timeouts HTTP 408/500.
create empty dir MYDIR
Set vault name inside cycletest.sh
Run
./cycletest.sh init MYDIR
./cycletest.sh retrieve MYDIR
./cycletest.sh restore MYDIR
OR
./cycletest.sh init MYDIR
./cycletest.sh purge MYDIR
mtglacier
code)Something like this (including permissions to create/delete vaults):
{
"Statement": [
{
"Effect": "Allow",
"Resource":["arn:aws:glacier:eu-west-1:*:vaults/test1",
"arn:aws:glacier:us-east-1:*:vaults/test1",
"arn:aws:glacier:eu-west-1:*:vaults/test2",
"arn:aws:glacier:eu-west-1:*:vaults/test3"],
"Action":["glacier:UploadArchive",
"glacier:InitiateMultipartUpload",
"glacier:UploadMultipartPart",
"glacier:UploadPart",
"glacier:DeleteArchive",
"glacier:ListParts",
"glacier:InitiateJob",
"glacier:ListJobs",
"glacier:GetJobOutput",
"glacier:ListMultipartUploads",
"glacier:CompleteMultipartUpload"]
},
{
"Effect": "Allow",
"Resource":["arn:aws:glacier:eu-west-1:*",
"arn:aws:glacier:us-east-1:*"],
"Action":["glacier:CreateVault",
"glacier:DeleteVault", "glacier:ListVaults"]
}
]
}
aws terraform Sometimes, when you face a challenge, you might be able to solve it with routine processes. But other times you need to try something completely new, something that you know nothing abou
软件分类: 软件中心 -- 应用软件 --由ZCLinux友情提供-- 软件名称 AWStats 大小 平台 Windows/Linux 作者 Laurent Destailleur 主页 http://awstats.sourceforge.net 下载 awstats-6.2.tgz --- 793KB awstats-61.zip awstats-61.exe awstats-6.1.tgz
Amazon Glacier是一项安全,耐用且成本极低的云存储服务,用于数据归档和长期备份。 Glacier提供了一种冷藏数据存档解决方案,这意味着已存储的数据不可立即检索。 您首先需要请求检索数据,访问时间可能从几分钟到几小时不等,具体取决于您选择的服务级别。 冷藏库乍一看可能很麻烦,但它也有其优点。 没有人能够不小心修改重要的存档文件。 如果需要,也可以完全防止删除。 Glacier设计用于不
创建4个信任角色的IAM ROLE,用于服务内使用的角色 lambda角色 datalakeLambdaRole glue角色 datalakeGlueRole step functions角色 datalakeStepfunctionRole Resources: datalakeLambdaRole: Type: AWS::IAM::Role Properties:
Overview AWS Service Catalog enables organizations to create and manage catalogs of IT services that are approved for AWS. These IT services can include everything from virtual machine images, servers
1. install php apc yum install php-pear yum install php-devel yum install httpd-devel yum install gcc yum install make yum install pcre-devel pecl install apc echo "extension=apc.so" > /etc/php
Overview AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that ru
MT是手机腾讯网前端团队开发维护的一个专注于移动端的js模块管理框架。 github: https://github.com/mtjs/mt 为了方便大家我们还在http://git.oschina.net上放了一个镜像: http://git.oschina.net/luyongfugx/mt 为什么使用MT 无更新不下载 简单友好的模块定义规范 简单易用的打包管理工具 强大的js增量更新代理服
多树AUTH / MT-AUTH:用户,应用,API管理中心 MT-AUTH是一款基于Spring Boot, OAuth2与事件驱动的角色的权限管理(RBAC)系统,通过集成Spring Cloud Gateway实现了API鉴权,缓存,跨域,CSRF防护,特殊字符过滤等常用功能 项目特点 基于事件的系统架构 应用,API与用户管理 JWT不对称钥匙 支持OAuth2 支持websocket 密
回顾总览中的描述:一个分布式的全局事务,整体是 两阶段提交 的模型。全局事务是由若干分支事务组成的,分支事务要满足 两阶段提交 的模型要求,即需要每个分支事务都具备自己的: 一阶段 prepare 行为 二阶段 commit 或 rollback 行为 根据两阶段行为模式的不同,我们将分支事务划分为 Automatic (Branch) Transaction Mode 和 Manual (Bra
问题:在OSX 10.8.2和Java7上运行IntelliJ 12。 我有一个全新的OSX 1082 Mt Lion安装,并且我已经安装了官方的Oracle Java 7(没有Java 6)。 我已经安装了IntelliJ IDEA的最新版本(12.0.1 ),并编辑了/Applications/IntelliJ IDEA 12 CE.app/Contents/Info.plist,使JVMVe
Mint UI 中 组件mt-picker默认显示3条数据,且第一行设置为空,怎么能多显示几条数据 且让默认选择的放在第一行的位置 默认第一行为空 改成 第一条数据放在之前空白行处 并默认选中
我想知道使用AWS OpsWorks与AWS Beanstalk和AWS CloudFormation的优缺点是什么? 我感兴趣的是一个可以自动伸缩的系统,它可以处理任意数量的并发web请求(从每分钟1000个请求到1000万rpm),包括一个可以自动伸缩的数据库层。 理想情况下,我希望有效地共享一些硬件资源,而不是为每个应用程序提供单独的实例。在过去,我主要使用EC2实例RDS Cloudtop