The S3 API
Introduction
Building the authentication library was a bit of a grind, but now that you have that done, you can jump in to the fun stuff: building a library that talks to S3. The library that we’re going to build is not complete. It does most of the things that you will need when you start using Amazon S3, but doesn’t cover some notable features of S3, including Logging and Query String Request Authentication. Also, we’ll only be talking about the REST API.
Before we begin, let’s talk very briefly about the philosophy of the API we’re building. My goal is to make the API feel like you’re working with standard Ruby classes, and to pretty much hide the fact that you’re working with S3. You may violently disagree with this. That’s okay; you’ll still be able to take a look at the code we’re going to build and create something that shows the plumbing a bit more.
Here’s an example of what using the API will look like. It shows off the three main classes that are implemented: Bucket, Object and Acl.
Example 5.1. api_example.rb - An example of using the S3Lib API <<(/code/introduction_api/api_example.rb)
If you’ve used Marcel Molina’s AWS/S3 library, then you’re probably feeling a sense of deja-vu here. That’s totally on purpose: Marcel has implemented a beautiful interface to S3, and I saw no reason to try to redo any of his work. So, I purposefully copied his interface. If you are just using a library and feel no need to create one, then I highly recommend using Marcel’s library instead of the one we’re creating here. It’s much more complete than this, and has been used and tested by many users. It also has great documentation. You can find it at http://amazon.rubyforge.org.
Listing All of Your Buckets
The Problem
You finally have authentication working and you’re chomping at the bit to try it out. You decide to start with the simplest request possible: getting a list of all of your buckets.
The Solution
You get a list of all of your buckets by making an authenticated GET request to the root of the Amazon S3 service’s URL. You can get a sample of the XML response by making that GET request using S3Lib.request:
1 $> irb -r lib/s3lib
2 >> response = S3Lib.request(:get, '')
3 >> puts response.read
4 <?xml version="1.0" encoding="UTF-8"?>
5 <ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
6 <Owner>
7 <ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f</ID>
8 <DisplayName>scottpatten</DisplayName>
9 </Owner>
10 <Buckets>
11 <Bucket>
12 <Name>assets0.plotomatic.com</Name>
13 <CreationDate>2007-09-06T16:25:25.000Z</CreationDate>
14 </Bucket>
15 <Bucket>
16 <Name>assets1.plotomatic.com</Name>
17 <CreationDate>2007-09-06T16:53:18.000Z</CreationDate>
18 </Bucket>
19
20 ...
21
22 <Bucket>
23 <Name>zunior_bucket</Name>
24 <CreationDate>2008-07-27T18:31:07.000Z</CreationDate>
25 </Bucket>
26 </Buckets>
27 </ListAllMyBucketsResult>
Note that the response only includes the Name and CreationDate for each bucket. Doing a GET on a given bucket will give us a lot more information, but we’ll deal with that later. Our goal for now will be to write a S3Lib::Service.buckets method that returns an array of buckets. Since we haven’t written the Bucket class yet, we’ll just stub it out with something that takes a Bucket XML element and parses that element to find out the name of the bucket. Here’s a first step:
Example 5.2. service.rb <<(code/listing_all_of_your_buckets_api_recipe/service.rb)
So, we make an authenticated GET request to the root URL, and pass the result to REXML::Document.new. We then use REXML::XPath to get all of the Bucket elements in the response, and create a new Bucket instance for each Bucket element. We then return an Array of Bucket instances, one for each bucket you own.
Discussion
Let’s try this class out in an irb session.
1 $> irb -r lib/service.rb
2 >> S3Lib::Service.buckets.collect {|bucket| bucket.name}
3 => ["assets0.plotomatic.com", "assets1.plotomatic.com", ..., "zunior_bucket"]
Hey, it works! That wasn’t too much work, and we’ve established a pattern that we’ll see repeated throughout the API when we’re getting information about a list of things: make a request to get the XML representation of that list of things, then use XPath to grab the correct elements from that request.
Finding a Bucket
The Problem
Given the name of a bucket you own or have read access to, you want to be able to get information about the bucket, including the bucket’s name and the objects it contains.
The Solution
To get information about a bucket, you make a GET request to the bucket’s name, like this:
1 GET /spatten_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb 2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string
To make the authenticated request using the s3_authenticator library
1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3
4 response = S3Lib.request(:get,'/spatten_bucket')
The XML response
When you get a bucket, the body of the response will contain XML describing the bucket.
1 $> irb -r s3_authenticator.rb
2 >> response = S3Lib.request(:get, 'spatten_bucket')
3 => #<StringIO:0x164df88>
4 >> puts response.read
5 <?xml version="1.0" encoding="UTF-8"?>
6 <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>spatte\
7 n_bucket</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTrunc\
8 ated>false</IsTruncated><Contents><Key>file2</Key><LastModified>2008-03-26T22:54\
9 :30.000Z</LastModified><ETag>"1c1c96fd2cf8330db0bfa936ce82f3b9"</ETag>\
10 <Size>5</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91\
11 b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><StorageClass>STANDA\
12 RD</StorageClass></Contents><Contents><Key>some_object.txt</Key><LastModified>20\
13 08-02-20T22:39:10.000Z</LastModified><ETag>"964c5260427cee786af075b68828558\
14 c"</ETag><Size>25</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72\
15 d769214012f7e91b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><Stor\
16 ageClass>STANDARD</StorageClass></Contents><Contents><Key>test1</Key><LastModifi\
17 ed>2008-03-26T22:52:44.000Z</LastModified><ETag>"5a105e8b9d40e1329780d62ea2\
18 265d8a"</ETag><Size>5</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c64\
19 6f72d769214012f7e91b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><\
20 StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
Here’s that xml response formatted a bit more nicely
1 <?xml version="1.0" encoding="UTF-8"?>
2 <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
3 <Name>spatten_bucket</Name>
4 <Prefix></Prefix>
5 <Marker></Marker>
6 <MaxKeys>1000</MaxKeys>
7 <IsTruncated>false</IsTruncated>
8 <Contents>
9 <Key>file2</Key>
10 <LastModified>2008-03-26T22:54:30.000Z</LastModified>
11 <ETag>"1c1c96fd2cf8330db0bfa936ce82f3b9"</ETag>
12 <Size>5</Size>
13 <Owner>
14 <ID>
15 9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
16 </ID>
17 <DisplayName>scottpatten</DisplayName>
18 </Owner>
19 <StorageClass>STANDARD</StorageClass>
20 </Contents>
21 <Contents>
22 <Key>some_object.txt</Key>
23 <LastModified>2008-02-20T22:39:10.000Z</LastModified>
24 <ETag>"964c5260427cee786af075b68828558c"</ETag>
25 <Size>25</Size>
26 <Owner>
27 <ID>
28 9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
29 </ID>
30 <DisplayName>scottpatten</DisplayName>
31 </Owner>
32 <StorageClass>STANDARD</StorageClass>
33 </Contents>
34 <Contents>
35 <Key>test1</Key>
36 <LastModified>2008-03-26T22:52:44.000Z</LastModified>
37 <ETag>"5a105e8b9d40e1329780d62ea2265d8a"</ETag>
38 <Size>5</Size>
39 <Owner>
40 <ID>
41 9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
42 </ID>
43 <DisplayName>scottpatten</DisplayName>
44 </Owner>
45 <StorageClass>STANDARD</StorageClass>
46 </Contents>
47 </ListBucketResult>
This object has three objects in it, with keys of file2, some_object.txt and test2. If it had more, then there would just be more <Contents> tags along everything contained in them. If a bucket has no objects, then there will be no <Content> tags.
The response includes the bucket name and any parameters sent to it, including Prefix, Marker, Delimiter and MaxKeys.
Errors
Trying to find a bucket that does not exist will result in a S3Lib::BucketNotFoundError error. Trying to find a bucket that you don’t have read permission on will raise a S3Lib::BucketNotFoundError.
Processing the XML response
We want the Bucket object to have a getter for all of its attributes. This is done by the Bucket.initialize method. The objects will be instantiated (and cached) by the first call to the objects instance method.
Since a bucket can have thousands of objects, we don’t want to parse the objects every time you call the Bucket#objects method. We also don’t want to parse it when you first instantiate the object, as that would be a waste if you just wanted to find out, for example, if the bucket already existed or not. To deal with that, we cache the objects in the @objects instance variable the first time the objects are asked for. If you want to refresh the objects listing, then you can use send :refresh => true to the Bucket#objects method.
1 # bucket.rb
2 # bucket.rb
3 require File.join(File.dirname(__FILE__), 's3_authenticator')
4 require 'rexml/document'
5
6 module S3Lib
7
8 class NotYourBucketError < S3Lib::S3ResponseError
9 end
10
11 class BucketNotFoundError < S3Lib::S3ResponseError
12 end
13
14 class BucketNotEmptyError < S3Lib::S3ResponseError
15 end
16
17 class Bucket
18
19 attr_reader :name, :xml, :prefix, :marker, :max_keys
20
21 # Errors for find
22 # Trying to find a bucket that doesn't exist will raise a
23 # NoSuchBucket error
24 # Trying to find a bucket that you don't have access to will raise a
25 # NotSignedUp error
26 def self.find(name, params = {})
27 begin
28 response = S3Lib.request(:get, name)
29 rescue S3Lib::S3ResponseError => error
30 case error.amazon_error_type
31 when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '\
32 #{name}' does not exist.", error.io, error.s3requester)
33 when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{\
34 name}' is not owned by you", error.io, error.s3requester)
35 else # Re-raise the error if it's not one of the above
36 raise
37 end
38 end
39 doc = REXML::Document.new(response)
40 Bucket.new(doc, params)
41 end
42
43 def initialize(doc, params = {})
44 @xml = doc.root
45 @params = params
46 @name = @xml.elements['Name'].text
47 @max_keys = @xml.elements['MaxKeys'].text.to_i
48 @prefix = @xml.elements['Prefix'].text
49 @marker = @xml.elements['Marker'].text
50 end
51
52 def is_truncated?
53 @xml.elements['IsTruncated'].text == 'true'
54 end
55
56 def objects(params = {})
57 refresh if params[:refresh]
58 @objects || get_objects
59 end
60
61 def refresh
62 refreshed_bucket = Bucket.find(@name, @params)
63 @xml = refreshed_bucket.xml
64 @objects = nil
65 end
66
67 private
68
69 def get_objects
70 @objects = REXML::XPath.match(@xml, '//Contents').collect do |object|
71 key = object.elements['Key'].text
72 S3Lib::S3Object.new(self, key, :lazy_load => true)
73 end
74 end
75
76 end
77
78 end
Discussion
This is a first cut of the Bucket object. In the next two recipes, we’ll add functionality to create and destroy buckets. These additions will create some repetition in the code, so the recipe after that will refactor the Bucket class and clean it up a bit. The use of caching and refreshing is something that we’ll be repeating a few times as we build our S3 API, so take a close look at it now.
Creating a Bucket
The Problem
You want to actually create buckets, not just read them. While you’re at it, you want the ability to set the access control policy of the new bucket to a canned access control policy like ‘Public Read’.
The Solution
To create a bucket, you make a PUT request to the bucket’s name, like this:
1 PUT /my_new_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb 2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string
To make the authenticated request using the s3_authenticator library
1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3
4 response = S3Lib.request(:put,'/my_new_bucket')
Setting Access control
You can set the access control policy to one of the four canned access-control-policies during bucket creation. You do this by adding a ‘x-amz-acl’ header to the PUT request. So, to make your new bucket publicly readable you would set ‘x-amz-acl’ to ‘public-read’:
1 PUT /my_new_bucket
2 Host: s3.amazonaws.com
3 x-amz-acl: public-read
4 Content-Length: 0
5 Date: Wed, 13 Feb 2008 12:00:00 GMT
6 Authorization: AWS some_id:some_authentication_string
Using the s3_authenticator library:
1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3
4 response = S3Lib.request(:put,'/my_new_bucket',
5 'x-amz-acl' => 'public-read')
To make the Bucket#create method a little more user friendly, we’ll also allow you to set the access control policy using the :access symbol, like this
1 Bucket.create('my_new_bucket', :access => 'public-read')
Errors
If you try to create a bucket that is already owned by someone else, you will raise a 409 (“Conflict”) error
1 $> irb -r s3_authenticator
2 >> S3Lib.request(:put, 'test')
3 S3Lib::S3ResponseError: 409 Conflict, BucketAlreadyExists
4 from ./s3_authenticator.rb:39:in `request'
5 from (irb):1
You can catch this in your code with a begin / rescue block.
Trying to create a bucket and failing is something that should raise an exception. Instead of keeping the default S3Lib::S3ResponseError error, the library code will re-raise it as a S3Lib::NotYourBucket error.
1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3
4 begin
5 response = S3Lib.request(:put,name)
6 rescue S => error
7 if error.io.status == ["409", "Conflict"]
8 raise S3Lib::NotYourBucket, "The bucket '#{name}' is already owned by somebo\
9 dy else."
10 else
11 raise # re-raise the exception if it's not a 409 conflict
12 end
13 end
The Bucket#create method
Here’s what our Bucket.create method will look like. It takes a bucket name and creates the bucket. The optional params hash is a hash of headers to be sent along with the PUT request.
If you try to create a bucket that is already owned by somebody else, a S3Lib::NotYourBucket error will be raised.
If the bucket is created successfully, the method will return true.
1 # s3_bucket.rb
2 # s3_bucket.rb
3 require File.join(File.dirname(__FILE__), 's3_authenticator')
4 module S3Lib
5
6 class NotYourBucketError < S3Lib::S3ResponseError
7 end
8
9 class Bucket
10
11 def self.create(name, params = {})
12 if params[:access] # translate from :access to 'x-amz-acl'
13 params['x-amz-acl'] = params.delete(:access)
14 end
15 begin
16 response = S3Lib.request(:put, name, params)
17 rescue S3Lib::S3ResponseError => error
18 if error.amazon_error_type == "BucketAlreadyExists"
19 raise S3Lib::NotYourBucketError.new("The bucket '#{name}' is already o\
20 wned by somebody else", error.io, error.s3requester)
21 else
22 raise # re-raise the exception if it's not a BucketAlreadyExists error
23 end
24 end
25 response.status[0] == "200" ? true : false
26 end
27
28 end
29
30 end
Let’s try it out in irb
1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('my_new_bucket')
3 => true
You can create the bucket ‘virtual hosted style’ by adding a ‘Host’ entry to the params hash.
1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('/', "host" => "mynewbucket.s3.amazonaws.com")
3 => true
You can add a canned access control policy by sending :access in the params hash.
1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('/my_readable_bucket', :access => 'public-read')
3 => true
If you try to create a bucket that is owned by someone else, you will get a S3Lib::NotYourBucket error
1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create("test")
3 S3Lib::NotYourBucketError: The bucket 'test' is already owned by somebody else, \
4 BucketAlreadyExists
5 from ./library/bucket_create.rb:29:in `create'
6 from (irb):1
Discussion
The Bucket class is starting to shape up nicely now. We can read and create them. The next recipe, “Deleting a Bucket”, will talk about deleting Buckets, after which we’ll have full functionality. Notice that we’re really focussing on making the interface easy to use by adding helpful shortcuts like the :access parameter. You’ve probably also noticed some of the repetition that we’ll be cleaning up in the “Refactoring the Bucket Class”.
Deleting a Bucket
The Problem
You want to be able to delete buckets that you own
The Solution
to delete a bucket, you make a DELETE request to the bucket’s name, like this:
1 DELETE /spatten_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb 2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string
To make the authenticated request using the s3_authenticator library
1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3
4 response = S3Lib.request(:delete,'/spatten_bucket')
Errors
Trying to delete a bucket that is not empty will raise a BucketNotEmpty error. Trying to delete a bucket that does not exists will raise a NoSuchBucket error. Trying to delete a bucket that you do not own will raise a NotSignedUp error.
1 $> irb -r s3_authenticator
2 >> S3Lib.request(:delete, 'spatten_bucket')
3 S3Lib::S3ResponseError: 409 Conflict, BucketNotEmpty
4 from ./library/s3_authenticator.rb:39:in `request'
5 from (irb):15
6 >> S3Lib.request(:delete, 'spatten_bucketasdasdas')
7 S3Lib::S3ResponseError: 404 Not Found, NoSuchBucket
8 from ./library/s3_authenticator.rb:39:in `request'
9 from (irb):16
10 >> S3Lib.request(:delete, 'test')
11 S3Lib::S3ResponseError: 403 Forbidden, NotSignedUp
12 from ./library/s3_authenticator.rb:39:in `request'
13 from (irb):17
Because empty buckets cannot be deleted, we will create a Bucket::delete_all class method, and a corresponding instance method. As well, if you pass :force => true in the params hash of Bucket::delete, then the bucket will be deleted even if it is not empty.
1 $> irb -r library/s3lib.rb
2 >> S3Lib::Bucket.delete('spatten_not_empty_bucket')
3 S3Lib::BucketNotEmptyError: The bucket 'spatten_not_empty_bucket' is not empty, \
4 so you can't delete it.
5 Try using Bucket.delete_all first, or Bucket.delete('spatten_not_empty_bucket', \
6 :force => true).
7 from ./library/bucket.rb:45:in `delete'
8 from (irb):1
9 >> exit
10 $> irb -r library/s3lib.rb
11 >> S3Lib::Bucket.delete('spatten_not_empty_bucket')
12 S3Lib::BucketNotEmptyError: The bucket 'spatten_not_empty_bucket' is not empty, \
13 so you can't delete it.
14 Try using Bucket.delete_all('spatten_not_empty_bucket') first, or Bucket.delete(\
15 'spatten_not_empty_bucket', :force => true).
16 from ./library/bucket.rb:45:in `delete'
17 from (irb):1
18 >> S3Lib::Bucket.delete('spatten_not_empty_bucket', :force => true)
19 => #<StringIO:0x167d1e8>
20 >> S3Lib::Bucket.find('spatten_not_empty_bucket')
21 S3Lib::BucketNotFoundError: The bucket 'spatten_not_empty_bucket' does not exist.
22 from ./library/bucket.rb:75:in `find'
23 from (irb):3
Warning
The example above won’t work for you yet, as we haven’t created the S3Object class, which is called by Bucket::delete if params[:force] == true.
Here’s the code that takes care of all the deleting
1 # bucket.rb
2 require File.join(File.dirname(__FILE__), 's3_authenticator')
3 require 'rexml/document'
4
5 module S3Lib
6
7 class NotYourBucketError < S3Lib::S3ResponseError
8 end
9
10 class BucketNotFoundError < S3Lib::S3ResponseError
11 end
12
13 class BucketNotEmptyError < S3Lib::S3ResponseError
14 end
15
16 class Bucket
17
18 attr_reader :name, :xml, :prefix, :marker, :max_keys
19
20 # passing :force => true will cause the bucket to be deleted even if it is n\
21 ot empty.
22 def self.delete(name, params = {})
23 if params.delete(:force)
24 self.delete_all(name, params)
25 end
26 begin
27 response = S3Lib.request(:delete, name, params)
28 rescue S3Lib::S3ResponseError => error
29 case error.amazon_error_type
30 when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
31 {name}' does not exist.", error.io, error.s3requester)
32 when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
33 ame}' is not owned by you.", error.io, error.s3requester)
34 when "BucketNotEmpty": raise S3Lib::BucketNotEmptyError.new("The bucket \
35 '#{name}' is not empty, so you can't delete it.\nTry using Bucket.delete_all('#{\
36 name}') first, or Bucket.delete('#{name}', :force => true).", error.io, error.s3\
37 requester)
38 else # Re-raise the error if it's not one of the above
39 raise
40 end
41 end
42 end
43
44 def delete(params = {})
45 self.class.delete(@name, @params.merge(params))
46 end
47
48 def self.delete_all(name, params = {})
49 bucket = Bucket.find(name, params)
50 bucket.delete_all
51 end
52
53 def delete_all
54 objects.each do |object|
55 object.delete
56 end
57 end
58
59 end
60
61 end
Discussion
We now have a bucket class that can find, create and delete bucket. All right! However, the class is getting kind of ugly. We’ll clean that up in the “Refactoring the Bucket Class”.
Refactoring the Bucket Class
The Problem
We now have a fully functional bucket class, but there’s lots of repetition. You want to clean it up to make it cleaner. This will also make it easier to add new functionality.
The Solution
Here’s the current state of the Bucket class:
1 require File.join(File.dirname(__FILE__), 's3_authenticator')
2 require 'rexml/document'
3
4 module S3Lib
5
6 class NotYourBucketError < S3Lib::S3ResponseError
7 end
8
9 class BucketNotFoundError < S3Lib::S3ResponseError
10 end
11
12 class BucketNotEmptyError < S3Lib::S3ResponseError
13 end
14
15 class Bucket
16
17 attr_reader :name, :xml, :prefix, :marker, :max_keys
18
19 def self.create(name, params = {})
20 params['x-amz-acl'] = params.delete(:access) if params[:access] # translat\
21 e from :access to 'x-amz-acl'
22 begin
23 response = S3Lib.request(:put, name, params)
24 rescue OpenURI::HTTPError => error
25 if error.amazon_error_type == "BucketAlreadyExists"
26 S3Lib::NotYourBucketError.new("The bucket '#{name}' is already owned b\
27 y somebody else", error.io, error.s3requester)
28 else
29 raise # re-raise the exception if it's not a BucketAlreadyExists error
30 end
31 end
32 response.status[0] == "200" ? true : false
33 end
34
35 # passing :force => true will cause the bucket to be deleted even if it is n\
36 ot empty.
37 def self.delete(name, params = {})
38 if params.delete(:force)
39 self.delete_all(name, params)
40 end
41 begin
42 response = S3Lib.request(:delete, name, params)
43 rescue S3Lib::S3ResponseError => error
44 case error.amazon_error_type
45 when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
46 {name}' does not exist.", error.io, error.s3requester)
47 when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
48 ame}' is not owned by you.", error.io, error.s3requester)
49 when "BucketNotEmpty": raise S3Lib::BucketNotEmptyError.new("The bucket \
50 '#{name}' is not empty, so you can't delete it.\nTry using Bucket.delete_all('#{\
51 name}') first, or Bucket.delete('#{name}', :force => true).", error.io, error.s3\
52 requester)
53 else # Re-raise the error if it's not one of the above
54 raise
55 end
56 end
57 end
58
59 def delete(params = {})
60 self.class.delete(@name, @params.merge(params))
61 end
62
63 def self.delete_all(name, params = {})
64 bucket = Bucket.find(name, params)
65 bucket.delete_all
66 end
67
68 def delete_all
69 objects.each do |object|
70 object.delete
71 end
72 end
73
74 # Errors for find
75 # Trying to find a bucket that doesn't exist will raise a NoSuchBucket error
76 # Trying to find a bucket that you don't have access to will raise a NotSign\
77 edUp error
78 def self.find(name, params = {})
79 begin
80 response = S3Lib.request(:get, name)
81 rescue S3Lib::S3ResponseError => error
82 case error.amazon_error_type
83 when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
84 {name}' does not exist.", error.io, error.s3requester)
85 when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
86 ame}' is not owned by you", error.io, error.s3requester)
87 else # Re-raise the error if it's not one of the above
88 raise
89 end
90 end
91 doc = REXML::Document.new(response)
92 Bucket.new(doc, params)
93 end
94
95 def initialize(doc, params = {})
96 @xml = doc.root
97 @params = params
98 @name = @xml.elements['Name'].text
99 @max_keys = @xml.elements['MaxKeys'].text.to_i
100 @prefix = @xml.elements['Prefix'].text
101 @marker = @xml.elements['Marker'].text
102 end
103
104 def is_truncated?
105 @xml.elements['IsTruncated'].text == 'true'
106 end
107
108 def objects(params = {})
109 refresh if params[:refresh]
110 @objects || get_objects
111 end
112
113 def refresh
114 refreshed_bucket = Bucket.find(@name, @params)
115 @xml = refreshed_bucket.xml
116 @objects = nil
117 end
118
119 private
120
121 def get_objects
122 @objects = REXML::XPath.match(@xml, '//Contents').collect do |object|
123 key = object.elements['Key'].text
124 S3Lib::S3Object.new(self, key, :lazy_load => true)
125 end
126 end
127
128 end
129
130 end
There’s a lot of repetition between the create, delete and find class methods. I’m going to refactor that out in to a single method to clean things up. I’ll call that method bucket_request. Just to make things easier as more errors are added, I’ll also move the errors in to a separate file called s3_errors.rb. The final refactoring will be to create a file, s3lib.rb, that will load up all of the files required by our Bucket class. Whenever you want to use the Bucket class, require s3lib.rb instead.
Here’s the refactored Bucket class:
Example 5.3. bucket.rb <<(code/refactoring_the_bucket_class_recipe/bucket.rb)
Ahh, much cleaner. All that repetition was making me itchy. Here are the s3_error.rb and s3lib.rb files
Example 5.4. s3_errors.rb <<(code/refactoring_the_bucket_class_recipe/s3_errors.rb)
Example 5.5. s3lib.rb <<(code/refactoring_the_bucket_class_recipe/s3_lib.rb)
Notice that I moved the S3ResponseError class from s3_authenticator.rb into s3_errors.rb where it belongs.
Discussion
We now have the first major class all done and tucked away, with reasonable code. This class will set the pattern for the next two classes we implement: the S3Object and Acl classes.
The
The Problem
You want to be able to read, create and delete the objects that live in your buckets.
The Solution
Now that we have our Bucket class all sorted out, the next obvious step is to create the object class. Having the Object and the Bucket taken care of will create most of the functionality we want. The HTTP verbs that an object responds to are PUT, GET, DELETE and HEAD. The HEAD verb is one that we haven’t talked about before. It is used to get information about a resource without actually downloading the whole resource. An object on Amazon S3 is a perfect example of why this is necessary: you don’t want to have to download a 200 MB file just to find out when it was created. Instead, just do a HEAD request to the object and get its metadata.
Here are the S3Object::create and S3Object::find class methods. They are pretty simple methods. The create method just does a :put to the Amazon S3 service with the correct URL and then returns an instance of the S3Object class (I’ll show you what the S3Object.object_request command looks like soon). The find method does even less. It just creates a new instance of the S3Object class. The url method creates the URL to the object based on the object’s bucket and key. The bucket can be either a string giving the bucket’s name or a Bucket object.
Example 5.6. s3object_create.rb - creating and finding objects
1 module S3Lib
2
3 class S3Object
4
5 DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
6
7 attr_reader :key, :bucket
8
9 # This is just an alias for S3Object.new
10 def self.find(bucket, key, options = {})
11 S3Object.new(bucket, key, options)
12 end
13
14 def self.create(bucket, key, value = "", options = {})
15 # translate from :access to 'x-amz-acl'
16 params['x-amz-acl'] = params.delete(:access) if params[:access]
17 options.merge!({:body => value || "",
18 'content-type' => DEFAULT_CONTENT_TYPE})
19 response = S3Object.object_request(:put, S3Object.url(bucket, key),
20 options)
21 response.status[0] == "200" ?
22 S3Object.new(bucket, key, options) : false
23 end
24
25 # bucket can be either a Bucket object or a string containing
26 # the bucket's name
27 def self.url(bucket, key)
28 bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
29 File.join(bucket_name, key)
30 end
31
32 def url
33 S3Object.url(@bucket.name, @key)
34 end
35
36 end
37 end
Avoiding unnecessary or premature downloads and requests to the server is what creates almost all of the complexity in the S3Object class. I’ll be using two techniques to this end. First, if something is downloaded, it will be cached in the object. Second, the object’s value will be loaded lazily. When a new instance of an S3Object is created, only the object’s metadata will be pulled from the Amazon S3 Service. If you are loading a large number of objects, then you don’t want to make a request to Amazon S3 for every object. In this case, it’s best to pass :lazy_load => true in the options hash. This is done by the Bucket class to avoid calling hundreds of HTTP calls when loading up a bucket with lots of objects in it.
To make that a bit more concrete, let’s look at the S3Object#initialize method.
Example 5.7. s3_object_initialize.rb
1 module S3Lib
2 class S3Object
3
4 # Both metadata and value are loaded lazily if options[:lazy_load]
5 # is true. This is used by Bucket.find so you don't make a request
6 # for every object in the bucket
7 # The bucket can be either a bucket object or a string containing
8 # the bucket's name.
9 # The key is a string.
10 def initialize(bucket, key, options = {})
11 options.merge!(:lazy_load => false)
12 bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
13 @bucket = bucket
14 @key = key
15 @options = options
16 get_metadata unless options.delete(:lazy_load)
17 end
18 end
19 end
Notice that if you pass :lazy_load => true and if the bucket parameter is an instance of the Bucket class, then no HTTP requests will be made.
All requests to the server are made using the S3Object::object_request method. This takes care of all the error handling and makes sure that the :lazy_load method is not sent up to the server.
Example 5.8. s3_object_object_request.rb - S3Object::object_request
<<(code/the_s3object_class_api_recipe/s3_object_object_request.rb)
The value method is used to get the value of an object. It looks to see if the value has been retrieved already, returns the cached value if it has and downloads the value from Amazon S3 if it hasn’t. You can refresh the value by sending :refresh => true when you request the value or by calling the refresh method.
Example 5.9. s3_object_value.rb - getting the value of an object
1 module S3Lib
2
3 class S3Object
4
5 def value(params = {})
6 refresh if params[:refresh]
7 @value || get_value
8 end
9
10 def refresh
11 get_value
12 end
13
14 def get_value
15 request = S3Object.object_request(:get, url, @options)
16 @metadata = request.meta
17 @value = request.read
18 end
19
20 end
21 end
To change the value of an object, you just re-create the object.
Example 5.10. s3object_set_value.rb changing the value of an object
1 def value=(value)
2 S3Object.create(@bucket, @key, value, @options)
3 @value = value
4 refresh_metadata
5 end
Here’s the full listing for the S3Object class.
Example 5.11. s3_object.rb
1 module S3Lib
2
3 class S3Object
4
5 DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
6
7 attr_reader :key, :bucket
8
9 # This is just an alias for S3Object.new
10 def self.find(bucket, key, options = {})
11 S3Object.new(bucket, key, options)
12 end
13
14 def self.create(bucket, key, value = "", options = {})
15 # translate from :access to 'x-amz-acl'
16 params['x-amz-acl'] = params.delete(:access) if params[:access]
17 options.merge!({:body => value || "",
18 'content-type' => DEFAULT_CONTENT_TYPE})
19 response = S3Object.object_request(:put, S3Object.url(bucket, key),
20 options)
21 response.status[0] == "200" ?
22 S3Object.new(bucket, key, options) : false
23 end
24
25 # Delete an object given the object's bucket and key.
26 # No error will be raised if the object does not exist.
27 def self.delete(bucket, key, options = {})
28 S3Object.object_request(:delete, S3Object.url(bucket, key), options)
29 end
30
31 def delete
32 S3Object.delete(@bucket, @key, @options)
33 end
34
35 def self.value(bucket, key, options = {})
36 request = S3Object.object_request(:get, S3Object.url(bucket, key),
37 options)
38 request.read
39 end
40
41 # Both metadata and value are loaded lazily if options[:lazy_load]
42 # is true. This is used by Bucket.find so you don't make a request
43 # for every object in the bucket.
44 # The bucket can be either a bucket object or a string containing
45 # the bucket's name.
46 # The key is a string.
47 def initialize(bucket, key, options = {})
48 bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
49 @bucket = bucket
50 @key = key
51 @options = options
52 get_metadata unless options[:lazy_load]
53 end
54
55 # bucket can be either a Bucket object or a string containing
56 # the bucket's name
57 def self.url(bucket, key)
58 bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
59 File.join(bucket_name, key)
60 end
61
62 def url
63 S3Object.url(@bucket.name, @key)
64 end
65
66 def metadata
67 @metadata || get_metadata
68 end
69
70 def value(params = {})
71 refresh if params[:refresh]
72 @value || get_value
73 end
74
75 def value=(value)
76 S3Object.create(@bucket, @key, value, @options)
77 @value = value
78 refresh_metadata
79 end
80
81 def refresh
82 get_value
83 end
84
85 def refresh_metadata
86 get_metadata
87 end
88
89 def content_type
90 metadata["content-type"]
91 end
92
93 def etag
94 metadata["etag"]
95 end
96
97 def length
98 metadata["content-length"].to_i
99 end
100
101 private
102
103 def self.object_request(verb, url, options = {})
104 begin
105 options.delete(:lazy_load)
106 response = S3Lib.request(verb, url, options)
107 rescue S3Lib::S3ResponseError => error
108 case error.amazon_error_type
109 when 'NoSuchBucket'
110 raise S3Lib::BucketNotFoundError.new(
111 "The bucket '#{bucket}' does not exist.",
112 error.io, error.s3requester)
113 when 'NotSignedUp'
114 raise S3Lib::NotYourBucketError.new(
115 "The bucket '#{bucket}' is owned by somebody else",
116 error.io, error.s3requester)
117 when 'AccessDenied'
118 raise S3Lib::NotYourBucketError.new(
119 "The bucket '#{bucket}' is owned by someone else.",
120 error.io, error.s3requester)
121 when 'MissingContentLength'
122 raise S3Lib::NoContentError.new(
123 "You must provide a value to put in the object.\nUsage: " +
124 "S3Lib::S3Object.create(bucket, key, value, options)",
125 error.io, error.s3requester)
126 else # Re-raise the error if it's not one of the above
127 raise
128 end
129 end
130 response
131 end
132
133 def get_metadata
134 request = S3Object.object_request(:head, url, @options)
135 @metadata = request.meta
136 end
137
138 def get_value
139 request = S3Object.object_request(:get, url, @options)
140 @metadata = request.meta
141 @value = request.read
142 end
143
144 end
145
146 end
Discussion
To close up this section, let’s look at a few things I haven’t covered yet. First, note that the get_metadata does a :head request to the server, so you don’t have to download the full object just to get the object’s metadata. Also, some convenience methods, content_type, etag and size, have been made to give you slightly cleaner access to the object’s metadata. Finally, the S3Object::delete method makes a :delete request to the object’s URL.
That’s it! This isn’t a feature complete S3Object class, but it gives you a good start. Notably missing is any ability to change the access control of an object. This will be discussed in the next few recipes.
Reading a Bucket or Object’s Access Control Policy
The Problem
You want to add the ability to get information about a bucket or object’s Access Control Policy to your API.
The Solution
As discussed in the “Access Control Policies”, you read a bucket or object’s Access Control List by doing an authenticated GET request to the bucket or object’s URL with ?acl appended to it. For example, a bucket called my_test_bucket has a URL of http://s3.amazonaws.com/my_test_bucket, and you read its ACL by doing an authenticated GET request to http://s3.amazonaws.com/my_test_bucket?acl. To read the ACL of a bucket or object, you must have READ access to the bucket or object’s ACL.
To avoid writing ‘bucket or object’ over and over in this section, I’m going to use resource to refer to both buckets and objects.
ACLs, permissions and grants were discussed rather exhaustively in “Access Control Policies”. Here’s a brief refresher. An ACL consists of a list of grants. Each grant gives a permission to a grantee. The permissions can be one of READ, WRITE, READ_ACL, WRITE_ACL or FULL_CONTROL. The grantee can be one of four types, user by email, user by canonical representation, all AWS users or anyone (anonymous access).
To model this in our API, we’ll create two new classes: the Acl class and the Grant class. You’ll never instantiate or interact with the Grant class directly; it will just be used by the Acl class to store and manipulate the ACL’s grants. To create a new instance of an Acl, we’ll make a call to Acl.new, passing in the URL of the resource that we want the ACL for.
1 $> irb -rubygems -r s3lib
2 >> S3Lib::Acl.new('spatten_test_bucket')
The Acl.new method won’t do anything else. Any calls to Amazon S3 will be deferred until a request to see the list of the Acl’s grants is made. This will be done using the Acl#grants method. When Acl#grants is called, a check will be made to see if the ACL XML has been downloaded from Amazon S3 yet. If it hasn’t, then an authenticated GET request will be made to the ACL’s URL and the XML will be parsed and used to create a set of Grant objects. Here’s the code for the Acl object
Example 5.12. acl.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/acl.rb)
The grants method checks to see if the array of Grant objects has been cached in the @grants instance variable or not. If it has, it just returns it. Otherwise, it calls get_grants. Most of the action is in the get_grants method. It GETs the ACL XML response from Amazon S3 and then uses REXML’s XPATH parser to get a list of all the <Grant> objects in the response. Each <Grant> object is parsed to get the permission and the grantee element, and then used to create a new Grant object. Here’s the code for the Grant object.
Example 5.13. grant.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/grant.rb)
This is pretty straightforward: it takes the permission String and the grantee XML and parses them to assign the proper values to the @permission, @type and @grantee instance variables.
Okay, now all of the pieces are in place to read a resource’s grants. Let’s write a simple script to try it out.
Example 5.14. grant_reader.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/grant_reader.rb)
Try it out by passing the URL of a resource you control in to it
1 $> ruby grant_reader.rb spatten_test_bucket
2 Grants for spatten_test_bucket
3 full_control, 9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
4 read, all_s3
Discussion
We can now read the grants of a resource, but there’s still more to do before we have full ACL functionality. We’ll need a way to refresh the cached list of grants if it changes. We’ll obviously want to be able to create and delete grants as well. Finally, we’ll want to add an acl method to the S3Object and Bucket classes so that we can do things like some_bucket.acl.grants. These requirements are covered in the next few recipes.
Refreshing the Cached ACL
The Problem
You want to be able to refresh the cached grants of an Acl instance in your API.
The Solution
This is pretty straightforward. We’re going to add the Acl#reresh_grants method to the Acl class and change the Acl#grants method so that it refreshes the @grants instance variable if you pass :refresh => true to it. The changes are in bold below.
Example 5.15. acl.rb <<(code/refreshing_the_cached_acl_api_recipe/acl.rb)
Discussion
Now your API will efficiently make a request to Amazon S3 only the first time you ask it about the ACL of a resource, but still allow you to make the call again if you want to make sure you have the latest copy.
Creating a New Grant
The Problem
Your API reads the ACL of a resource, but you also want to be able to add new grants to a resource.
The Solution
Adding a grant to a resource’s ACL is done by PUTting a new ACL to the resource’s ACL URL. The procedure for adding a grant, then is to
- GET the resource’s ACL from the resource’s ACL URL
- Add a new grant to the ACL
- PUT the new ACL to the resource’s ACL URL
This is going to involve quite a few changes to the Acl and Grant classes. First, we’re going to have to change the Grant#initialize method to allow the creation of a new Grant with the grantee as a Hash, rather than from the XML you get from Amazon S3. This isn’t strictly necessary, but it would sure be annoying to the users of your API if they had to create the XML representation every time they wanted to add a new grant. We’ll also need another method on Grant that converts a Grant instance to an XML representation.
The Acl class will require a few changes as well. First, we’ll add the Acl#add_grant method. It will take a permission and a Hash containing information about the grantee as arguments, use those arguments to create a new Grant instance, and add that instance to the @grants instance variable. We’ll also need to change the Acl::acl_request method to make sure the content-type is set when you PUT the ACL XML up to Amazon S3. Finally, we’ll need to add an Acl#to_xml method as well to create the XML representation of the ACL.
The Grant Class
Here’s the new Grant class:
1 module S3Lib
2 class Grant
3 attr_reader :acl, :grantee, :type, :permission
4 GRANT_TYPES = {:canonical => 'CanonicalUser',
5 :email => 'AmazonCustomerByEmail',
6 :all_s3 => 'Group',
7 :public => 'Group'}
8 GROUP_URIS = {
9 'http://acs.amazonaws.com/groups/global/AuthenticatedUsers' =>
10 :all_s3,
11 'http://acs.amazonaws.com/groups/global/AllUsers' => :public}
12 PERMISSIONS = [:read, :write, :read_acl, :write_acl, :full_control]
13 NAMESPACE_URI = 'http://www.w3.org/2001/XMLSchema-instance'
14
15 # Create a new grant.
16 # permission is one of the PERMISSIONS defined above
17 # grantee can be either a REXML::Document object or a Hash
18 # The grantee Hash should look like this:
19 # {:type => :canonical|:email|:all_s3|:public,
20 # :grantee => canonical_user_id | email_address}
21 #
22 # The :grantee element of the hash is only required (and meaningful)
23 # for :canonical and :email Grants
24 def initialize(permission, grantee)
25 @type = parse_type(grantee)
26 @permission = parse_permission(permission)
27 @grantee = parse_grantee(grantee)
28 end
29
30 def to_xml
31 builder = Builder::XmlMarkup.new(:indent => 2)
32 xml = builder.Grant do
33 builder.Grantee('xmlns:xsi' => NAMESPACE_URI,
34 'xsi:type' => GRANT_TYPES[@type]) do
35 case type
36 when :canonical: builder.ID(@grantee)
37 when :email: builder.EmailAddress(@grantee)
38 when :all_s3: builder.URI(group_uri_from_group_type(:all_s3))
39 when :public: builder.URI(group_uri_from_group_type(:public))
40 else
41 end
42 end
43 builder.Permission(@permission.to_s.upcase)
44 end
45 end
46
47 private
48
49 # permission can either be the String provided by S3
50 # or a symbol (see the PERMISSIONS array for allowed values)
51 def parse_permission(permission)
52 if permission.is_a?(String)
53 permission.downcase.to_sym
54 else
55 permission
56 end
57 end
58
59 def parse_type(grantee)
60 if grantee.is_a?(Hash)
61 grantee[:type]
62 else # Assume it's a REXML::Doc object
63 type = grantee.attributes['xsi:type']
64 case type
65 when 'CanonicalUser': :canonical
66 when 'AmazonCustomerByEmail': :email
67 when 'Group'
68 group_uri = grantee.elements['URI'].text
69 group_type_from_group_uri(group_uri)
70 else
71 raise BadGrantTypeError
72 end
73 end
74 end
75
76 def parse_grantee(grantee)
77 if grantee.is_a?(Hash)
78 if [:canonical, :email].include?(@type)
79 grantee[:grantee]
80 else
81 @type
82 end
83 else # it's a REXML::Doc object
84 case @type
85 when :canonical
86 grantee.elements['ID'].text
87 when :email
88 grantee.elements['EmailAddress'].text
89 when :all_s3: :all_s3
90 when :public: :public
91 else
92 nil
93 end
94 end
95 end
96
97 def group_type_from_group_uri(group_uri)
98 GROUP_URIS[group_uri]
99 end
100
101 def group_uri_from_group_type(group_type)
102 GROUP_URIS.invert[group_type]
103 end
104
105 end
106 end
There are a lot of changes here. Let’s break it down a bit. The parse_type and parse_grantee methods have been amended so that the grantee and type information can be passed in as either a Hash or a REXML object. There’s not much to say here. The methods check whether the argument that has been passed in is a Hash, and if it is parse the Hash. If it’s not, they assume that it’s a REXML object and parse that.
The next change is the addition of the group_type_from_group_uri and group_uri_from_group_type methods. These two methods allow you to get the group type from the group URI and vice versa. Previously, we only needed to get the group type from the group URI, so we used the GROUP_URIS hash to get the group type. Now that we’re reading and writing grants, we need to convert in both directions.
The last change is the biggest and, if you’re new to crazy dyamic languages like Ruby, the strangest looking. The to_xml method uses the Builder library to build the grant’s XML representation. If you’re interested, there’s more information on how Builder works in The XML Builder Library. For now, you can just take it for granted that it creates XML output like this:
1 <Grant>
2 <Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3 xsi:type="CanonicalUser">
4 <ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f</ID>
5 </Grantee>
6 <Permission>FULL_CONTROL</Permission>
7 </Grant>
The Acl Class
Here’s the new Acl class
Example 5.21. acl.rb <<(code/creating_a_new_grant_api_recipe/acl.rb)
Once again, there were a lot of changes. We added the add_grant, add_grant!, set_grants, owner and to_xml methods. We also changed the acl_request method to set the content-type if the request is a PUT.
Let’s start by looking at the add_grant, set_grants and add_grant! methods. The add_grant method is a one-liner: It creates a new instance of a Grant object and pushes it on to the @grants Array. Note that it uses the grants method to push things on to, so that it’s sure that the grants have been downloaded from S3.
The set_grants method takes the XML represenation of the ACL and PUTs it up to Amazon S3. Notice that if you use add_grant to add a new Grant to a resource, your changes will be lost unless you make a subsequent call to set_grants.
That’s where add_grant! comes in. It adds a grant and then pushes the change to Amazon S3. The ! is Ruby short-hand for ‘this does something more dangerous than the method without the exclamation mark’ (see http://dablog.rubypal.com/2007/8/15/bang-methods-or-danger-will-rubyist for a deeper discussion on this). So, add_grant makes a local change to the resource’s ACL and add_grant! makes a permanent change.
The to_xml method once again uses the Builder library to create the XML representation of the ACL. It uses the owner method to grab the Canonical ID of the Owner from the original ACL’s XML representation. One subtle thing that wasn’t mentioned in the The XML Builder Library was that, since the return value from the Builder::XmlMarkup instance is just a string, you can actually just push new data in to the string using <<. Acl#to_xml uses this to push the XML representation of the Acl’s Grants in to the XML.
Discussion
Wow, that was a lot of work. We’re almost done with the Acl. The last step will be to tie the Acl class to the S3Object and Bucket classes.
Tieing the Acl Class to a Bucket or Object
The Problem
You have a nice Acl class, but your Buckets and Objects don’t know anything about it. You want to be able to do things like
1 b = S3Lib::Bucket.find('spatten_test_bucket')
2 b.acl.add_grant(:public_read, :all_s3)
The Solution
We need to make (small) changes to three classes: Bucket, S3Object and Acl. The changes to Bucket and S3Object will be equivalent. We’ll add two public methods, acl and refresh_acl and one private method, get_acl. On the Acl class, we’ll change the initialize method so that you can pass in a Bucket or S3Object instance as well as a resource’s URL. We’ll also add a @parent instance variable and make it readable. This is the final code for all three classes, so I’m going to include every line of it rather than just the changes. Let’s start with the Acl class.
Example 5.22. acl.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/acl.rb)
Next, the Bucket class
Example 5.23. bucket.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/bucket.rb)
and, finally, the S3Object class
Example 5.24. object.rb
1 # s3_object.rb
2
3 module S3Lib
4
5 class S3Object
6
7 DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
8
9 attr_reader :key, :bucket
10
11 # This is just an alias for S3Object.new
12 def self.find(bucket, key, options = {})
13 S3Object.new(bucket, key, options)
14 end
15
16 def self.create(bucket, key, value = "", options = {})
17 # translate from :access to 'x-amz-acl'
18 options['x-amz-acl'] = options.delete(:access) if options[:access]
19 options.merge!({:body => value || "",
20 'content-type' => DEFAULT_CONTENT_TYPE})
21 response = S3Object.object_request(:put,
22 S3Object.url(bucket, key),
23 options)
24 response.status[0] == "200" ?
25 S3Object.new(bucket, key, options) : false
26 end
27
28 # Delete an object given the object's bucket and key.
29 # No error will be raised if the object does not exist.
30 def self.delete(bucket, key, options = {})
31 S3Object.object_request(:delete, S3Object.url(bucket, key), options)
32 end
33
34 def delete
35 S3Object.delete(@bucket, @key, @options)
36 end
37
38 def self.value(bucket, key, options = {})
39 request = S3Object.object_request(:get,
40 S3Object.url(bucket, key),
41 options)
42 request.read
43 end
44
45 # Both metadata and value are loaded lazily if options[:lazy_load]
46 # is true. This is used by Bucket.find so you don't make a request
47 # for every object in the bucket
48 # The bucket can be either a bucket object or a string containing
49 # the bucket's name
50 # The key is a string.
51 def initialize(bucket, key, options = {})
52 bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
53 @bucket = bucket
54 @key = key
55 @options = options
56 get_metadata unless options[:lazy_load]
57 end
58
59 # bucket can be either a Bucket object or a string containing
60 # the bucket's name
61 def self.url(bucket, key)
62 bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
63 File.join(bucket_name, key)
64 end
65
66 def url
67 S3Object.url(@bucket.name, @key)
68 end
69
70 def metadata
71 @metadata || get_metadata
72 end
73
74 def value(params = {})
75 refresh if params[:refresh]
76 @value || get_value
77 end
78
79 def value=(value)
80 S3Object.create(@bucket, @key, value, @options)
81 @value = value
82 refresh_metadata
83 end
84
85 def refresh
86 get_value
87 end
88
89 def refresh_metadata
90 get_metadata
91 end
92
93 def refresh_acl
94 get_acl
95 end
96
97 def content_type
98 metadata["content-type"]
99 end
100
101 # strip off the leading and trailing double-quotes
102 def etag
103 metadata["etag"].sub(/\A\"/,'').sub(/\"\Z/, '')
104 end
105
106 def length
107 metadata["content-length"].to_i
108 end
109
110 def acl(params = {})
111 refresh_acl if params[:refresh]
112 @acl || get_acl
113 end
114
115 private
116
117 def self.object_request(verb, url, options = {})
118 begin
119 options.delete(:lazy_load)
120 response = S3Lib.request(verb, url, options)
121 rescue S3Lib::S3ResponseError => error
122 case error.amazon_error_type
123 when 'NoSuchBucket'
124 raise S3Lib::BucketNotFoundError.new(
125 "The bucket '#{bucket}' does not exist.",
126 error.io, error.s3requester)
127 when 'NotSignedUp'
128 raise S3Lib::NotYourBucketError.new(
129 "The bucket '#{bucket}' is owned by somebody else",
130 error.io, error.s3requester)
131 when 'AccessDenied'
132 raise S3Lib::NotYourBucketError.new(
133 "The bucket '#{bucket}' is owned by someone else.",
134 error.io, error.s3requester)
135 when 'MissingContentLength':
136 raise S3Lib::NoContentError.new(
137 "You must provide a value to put in the object.\nUsage: " +
138 "S3Lib::S3Object.create(bucket, key, value, options)",
139 error.io, error.s3requester)
140 else # Re-raise the error if it's not one of the above
141 raise
142 end
143 end
144 response
145 end
146
147 def get_metadata
148 request = S3Object.object_request(:head, url, @options)
149 @metadata = request.meta
150 end
151
152 def get_value
153 request = S3Object.object_request(:get, url, @options)
154 @metadata = request.meta
155 @value = request.read
156 end
157
158 def get_acl
159 @acl = Acl.new(self)
160 end
161
162 end
163
164 end
Discussion
There are a few things worth noting in this code. First, let’s take a closer look at the Acl#initialize method.
1 def initialize(parent_or_url)
2 if parent_or_url.respond_to?(:url)
3 @parent = parent_or_url
4 @url = @parent.url.sub(/\/\Z/,'') + '?acl'
5 else
6 @url = parent_or_url.sub(/\/\Z/,'').sub(/\?acl/, '') + '?acl'
7 end
8 end
notice that the Acl#initialize method doesn’t care what class the parent is. It just cares whether or not it has a url method. This is known, in Ruby circles, as duck typing: if it looks like a duck and acts like a duck, then treat it like a duck. You could, of course re-write the method as
1 def initialize(parent_or_url)
2 if parent_or_url.is_a?(S3Lib::Bucket) || parent_or_url.is_a?(S3Lib::S3Obje\
3 ct)
4 @parent = parent_or_url
5 @url = @parent.url.sub(/\/\Z/,'') + '?acl'
6 else
7 @url = parent_or_url.sub(/\/\Z/,'').sub(/\?acl/, '') + '?acl'
8 end
9 end
Either one works. The first method is more flexible in the event that Amazon creates another resource type that has an Acl.
So, that was one way that I made my code a bit more flexible. Now, let’s talk about a way that I could have made it a bit more flexible (but decided not to). Notice that the methods that I added to Bucket and S3Object are exactly the same. Instead of writing them directly in the class, I could have made a module with those classes in it and then mixed it in to the Bucket and S3Object classes. Something like this
Example 5.25. acl_access.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/acl_access.rb)
Then, in the Bucket and S3Object classes, I could delete the refresh_url, acl and get_acl methods and add the following line:
1 include S3Lib::AclAccess
My own personal rule of thumb is that I don’t bother extracting functionality the first time I repeat something, so I didn’t bother doing this when I wrote the original code.