How to upload a file using multipart to glacier in Ruby | Inkoop Blog

Glacier is a service provided by AWS to store files which are infrequently used at very low cost. We talk about how to use Ruby sdk to upload large files in glacier.

Posted by Ameena on 29 May 2017

Amazon Glacier is a storage service optimized for infrequently used data, or cold data. It is an extremely low-cost storage service that provides durable storage with security features for data archiving and backup.

If the file size is larger then it is best to upload the file in chunks using multipart.

Multipart Upload to Glacier can be done in three steps

  • Initiate multipart.
  • Upload chunks.
  • Complete multipart.
#glacier_multipart.rb

glacier_client = Aws::Glacier::Client.new(
  region: "us-west-2",
  access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)
file_path = 'your/file/path'
file = File.open(file_path, "r")
source_file = file.read
file_size = file.size

Initiate multipart

  • This call is to get the uploadid which will be used in uploadmultipart_part.
  • Be careful with part_size, it should always be power of two and must not be null.
#glacier_multipart.rb

...
response = glacier_client.initiate_multipart_upload({
  account_id: ENV['AWS_ACCOUNT_ID'],
  part_size: 2097152,
  vault_name: ENV["GLACIER_VAULT_NAME"]
})
upload_id = response.to_h[:upload_id]

Upload chunks

  • Create a class called File with a method which reads the file into chunks.
#file.rb

class File
  def each_chunk(chunk_size = MEGABYTE)
    yield read(chunk_size) until eof?
  end
end
  • Use Treehash gem or any other for generating checksum.
  • Pay attention when setting the range, range indicates which chunk of the entire file should be uploaded.
  • If the part_size is set to 2 MB make sure the range is also set from 0 MB to 2MB, 2 MB to 4 MB and so forth.
#glacier_multipart.rb

...
upload_id = response.to_h[:upload_id]
# split the file into chunks with the size same as range
File.open(file_path, "rb") do |f|
  start_byte = 0
  end_byte = 2097151
  f.each_chunk(2097152) do |chunk|
    # last chunk's end_byte can be less than the part_size
    if end_byte > file_size
      end_byte = file_size - 1
    end
    # generate checksum for each chunk
    chunk_checksum = Treehash::calculate_tree_hash chunk
    resp = glacier_client.upload_multipart_part({
      account_id: ENV['AWS_ACCOUNT_ID'],
      vault_name: ENV["GLACIER_VAULT_NAME"],
      upload_id: upload_id,
      checksum: chunk_checksum,
      range: "bytes #{start_byte}-#{end_byte}/*",
      body: chunk
    })
    start_byte = start_byte + 2097152
    end_byte = end_byte + 2097152
  end
end

Complete multipart

  • Generate a checksum for the entire file. If the upload is successful, it returns archive_id.
#glacier_multipart.rb

...
total_checksum = Treehash::calculate_tree_hash source_file
resp = glacier_client.complete_multipart_upload({
  account_id: ENV['AWS_ACCOUNT_ID'],
  archive_size: file_size,
  checksum: total_checksum,
  upload_id: upload_id,
  vault_name: ENV["GLACIER_VAULT_NAME"]
})

archive_id = resp.to_h[:archive_id]

if archive_id.present?
  puts "Successfully uploaded, Archive id is #{archive_id}"
else
  puts "Please Try again."
end

Keep Coding!!!

Ameena


Have a Project in mind?