Amazon Glacier is a storage service optimized for infrequently used data, or cold data. It is an extremely low-cost storage service that provides durable storage with security features for data archiving and backup.
If the file size is larger then it is best to upload the file in chunks using multipart.
Multipart Upload to Glacier can be done in three steps
- Initiate multipart.
- Upload chunks.
- Complete multipart.
#glacier_multipart.rb
glacier_client = Aws::Glacier::Client.new(
region: "us-west-2",
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)
file_path = 'your/file/path'
file = File.open(file_path, "r")
source_file = file.read
file_size = file.size
Initiate multipart
- This call is to get the upload_id which will be used in upload_multipart_part.
- Be careful with part_size, it should always be power of two and must not be null.
#glacier_multipart.rb
...
response = glacier_client.initiate_multipart_upload({
account_id: ENV['AWS_ACCOUNT_ID'],
part_size: 2097152,
vault_name: ENV["GLACIER_VAULT_NAME"]
})
upload_id = response.to_h[:upload_id]
Upload chunks
- Create a class called File with a method which reads the file into chunks.
#file.rb
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
- Use Treehash gem or any other for generating checksum.
- Pay attention when setting the range, range indicates which chunk of the entire file should be uploaded.
- If the part_size is set to 2 MB make sure the range is also set from 0 MB to 2MB, 2 MB to 4 MB and so forth.
#glacier_multipart.rb
...
upload_id = response.to_h[:upload_id]
# split the file into chunks with the size same as range
File.open(file_path, "rb") do |f|
start_byte = 0
end_byte = 2097151
f.each_chunk(2097152) do |chunk|
# last chunk's end_byte can be less than the part_size
if end_byte > file_size
end_byte = file_size - 1
end
# generate checksum for each chunk
chunk_checksum = Treehash::calculate_tree_hash chunk
resp = glacier_client.upload_multipart_part({
account_id: ENV['AWS_ACCOUNT_ID'],
vault_name: ENV["GLACIER_VAULT_NAME"],
upload_id: upload_id,
checksum: chunk_checksum,
range: "bytes #{start_byte}-#{end_byte}/*",
body: chunk
})
start_byte = start_byte + 2097152
end_byte = end_byte + 2097152
end
end
Complete multipart
- Generate a checksum for the entire file. If the upload is successful, it returns archive_id.
#glacier_multipart.rb
...
total_checksum = Treehash::calculate_tree_hash source_file
resp = glacier_client.complete_multipart_upload({
account_id: ENV['AWS_ACCOUNT_ID'],
archive_size: file_size,
checksum: total_checksum,
upload_id: upload_id,
vault_name: ENV["GLACIER_VAULT_NAME"]
})
archive_id = resp.to_h[:archive_id]
if archive_id.present?
puts "Successfully uploaded, Archive id is #{archive_id}"
else
puts "Please Try again."
end
Keep Coding!!!
Ameena