Leanpub: Publish Early, Publish Often

The S3 API

Introduction

Building the authentication library was a bit of a grind, but now that you have that done, you can jump in to the fun stuff: building a library that talks to S3. The library that we’re going to build is not complete. It does most of the things that you will need when you start using Amazon S3, but doesn’t cover some notable features of S3, including Logging and Query String Request Authentication. Also, we’ll only be talking about the REST API.

Before we begin, let’s talk very briefly about the philosophy of the API we’re building. My goal is to make the API feel like you’re working with standard Ruby classes, and to pretty much hide the fact that you’re working with S3. You may violently disagree with this. That’s okay; you’ll still be able to take a look at the code we’re going to build and create something that shows the plumbing a bit more.

Here’s an example of what using the API will look like. It shows off the three main classes that are implemented: Bucket, Object and Acl.

Example 5.1. api_example.rb - An example of using the S3Lib API <<(/code/introduction_api/api_example.rb)

If you’ve used Marcel Molina’s AWS/S3 library, then you’re probably feeling a sense of deja-vu here. That’s totally on purpose: Marcel has implemented a beautiful interface to S3, and I saw no reason to try to redo any of his work. So, I purposefully copied his interface. If you are just using a library and feel no need to create one, then I highly recommend using Marcel’s library instead of the one we’re creating here. It’s much more complete than this, and has been used and tested by many users. It also has great documentation. You can find it at http://amazon.rubyforge.org.

Listing All of Your Buckets

The Problem

You finally have authentication working and you’re chomping at the bit to try it out. You decide to start with the simplest request possible: getting a list of all of your buckets.

The Solution

You get a list of all of your buckets by making an authenticated GET request to the root of the Amazon S3 service’s URL. You can get a sample of the XML response by making that GET request using S3Lib.request:

 1 $> irb -r lib/s3lib
 2 >> response = S3Lib.request(:get, '')
 3 >> puts response.read
 4 <?xml version="1.0" encoding="UTF-8"?>
 5 <ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
 6 	<Owner>
 7 		<ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f</ID>
 8 		<DisplayName>scottpatten</DisplayName>
 9 	</Owner>
10 	<Buckets>
11 		<Bucket>
12 			<Name>assets0.plotomatic.com</Name>
13 			<CreationDate>2007-09-06T16:25:25.000Z</CreationDate>
14 		</Bucket>
15 		<Bucket>
16 			<Name>assets1.plotomatic.com</Name>
17 			<CreationDate>2007-09-06T16:53:18.000Z</CreationDate>
18 		</Bucket>
19 
20 		...			
21 
22 		<Bucket>
23 			<Name>zunior_bucket</Name>
24 			<CreationDate>2008-07-27T18:31:07.000Z</CreationDate>
25 		</Bucket>
26 	</Buckets>
27 </ListAllMyBucketsResult>

Note that the response only includes the Name and CreationDate for each bucket. Doing a GET on a given bucket will give us a lot more information, but we’ll deal with that later. Our goal for now will be to write a S3Lib::Service.buckets method that returns an array of buckets. Since we haven’t written the Bucket class yet, we’ll just stub it out with something that takes a Bucket XML element and parses that element to find out the name of the bucket. Here’s a first step:

Example 5.2. service.rb <<(code/listing_all_of_your_buckets_api_recipe/service.rb)

So, we make an authenticated GET request to the root URL, and pass the result to REXML::Document.new. We then use REXML::XPath to get all of the Bucket elements in the response, and create a new Bucket instance for each Bucket element. We then return an Array of Bucket instances, one for each bucket you own.

Discussion

Let’s try this class out in an irb session.

1 $> irb -r lib/service.rb 
2 >> S3Lib::Service.buckets.collect {|bucket| bucket.name}
3 => ["assets0.plotomatic.com", "assets1.plotomatic.com", ..., "zunior_bucket"]

Hey, it works! That wasn’t too much work, and we’ve established a pattern that we’ll see repeated throughout the API when we’re getting information about a list of things: make a request to get the XML representation of that list of things, then use XPath to grab the correct elements from that request.

Finding a Bucket

The Problem

Given the name of a bucket you own or have read access to, you want to be able to get information about the bucket, including the bucket’s name and the objects it contains.

The Solution

To get information about a bucket, you make a GET request to the bucket’s name, like this:

1 GET /spatten_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb  2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string

To make the authenticated request using the s3_authenticator library

1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3 
4 response = S3Lib.request(:get,'/spatten_bucket')

The XML response

When you get a bucket, the body of the response will contain XML describing the bucket.

 1 $> irb -r s3_authenticator.rb 
 2 		>> response = S3Lib.request(:get, 'spatten_bucket')
 3 		=> #<StringIO:0x164df88>
 4 		>> puts response.read
 5 		<?xml version="1.0" encoding="UTF-8"?>
 6 		<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>spatte\
 7 n_bucket</Name><Prefix></Prefix><Marker></Marker><MaxKeys>1000</MaxKeys><IsTrunc\
 8 ated>false</IsTruncated><Contents><Key>file2</Key><LastModified>2008-03-26T22:54\
 9 :30.000Z</LastModified><ETag>&quot;1c1c96fd2cf8330db0bfa936ce82f3b9&quot;</ETag>\
10 <Size>5</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91\
11 b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><StorageClass>STANDA\
12 RD</StorageClass></Contents><Contents><Key>some_object.txt</Key><LastModified>20\
13 08-02-20T22:39:10.000Z</LastModified><ETag>&quot;964c5260427cee786af075b68828558\
14 c&quot;</ETag><Size>25</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72\
15 d769214012f7e91b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><Stor\
16 ageClass>STANDARD</StorageClass></Contents><Contents><Key>test1</Key><LastModifi\
17 ed>2008-03-26T22:52:44.000Z</LastModified><ETag>&quot;5a105e8b9d40e1329780d62ea2\
18 265d8a&quot;</ETag><Size>5</Size><Owner><ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c64\
19 6f72d769214012f7e91b50c0de0f</ID><DisplayName>scottpatten</DisplayName></Owner><\
20 StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

Here’s that xml response formatted a bit more nicely

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
 3 	<Name>spatten_bucket</Name>
 4 	<Prefix></Prefix>
 5 	<Marker></Marker>
 6 	<MaxKeys>1000</MaxKeys>
 7 	<IsTruncated>false</IsTruncated>
 8 	<Contents>
 9 		<Key>file2</Key>
10 		<LastModified>2008-03-26T22:54:30.000Z</LastModified>
11 		<ETag>&quot;1c1c96fd2cf8330db0bfa936ce82f3b9&quot;</ETag>
12 		<Size>5</Size>
13 		<Owner>
14 			<ID>
15 			  9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
16 			</ID>
17 			<DisplayName>scottpatten</DisplayName>
18 		</Owner>
19 		<StorageClass>STANDARD</StorageClass>
20 	</Contents>
21 	<Contents>
22 		<Key>some_object.txt</Key>
23 		<LastModified>2008-02-20T22:39:10.000Z</LastModified>
24 		<ETag>&quot;964c5260427cee786af075b68828558c&quot;</ETag>
25 		<Size>25</Size>
26 		<Owner>
27 		<ID>
28 		  9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
29 		</ID>
30 		<DisplayName>scottpatten</DisplayName>
31 		</Owner>
32 		<StorageClass>STANDARD</StorageClass>
33 	</Contents>
34 	<Contents>
35 		<Key>test1</Key>
36 		<LastModified>2008-03-26T22:52:44.000Z</LastModified>
37 		<ETag>&quot;5a105e8b9d40e1329780d62ea2265d8a&quot;</ETag>
38 		<Size>5</Size>
39 		<Owner>
40 			<ID>
41 			  9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
42 			</ID>
43 			<DisplayName>scottpatten</DisplayName>
44 		</Owner>
45 		<StorageClass>STANDARD</StorageClass>
46 	</Contents>
47 </ListBucketResult>

This object has three objects in it, with keys of file2, some_object.txt and test2. If it had more, then there would just be more <Contents> tags along everything contained in them. If a bucket has no objects, then there will be no <Content> tags.

The response includes the bucket name and any parameters sent to it, including Prefix, Marker, Delimiter and MaxKeys.

Errors

Trying to find a bucket that does not exist will result in a S3Lib::BucketNotFoundError error. Trying to find a bucket that you don’t have read permission on will raise a S3Lib::BucketNotFoundError.

Processing the XML response

We want the Bucket object to have a getter for all of its attributes. This is done by the Bucket.initialize method. The objects will be instantiated (and cached) by the first call to the objects instance method.

Since a bucket can have thousands of objects, we don’t want to parse the objects every time you call the Bucket#objects method. We also don’t want to parse it when you first instantiate the object, as that would be a waste if you just wanted to find out, for example, if the bucket already existed or not. To deal with that, we cache the objects in the @objects instance variable the first time the objects are asked for. If you want to refresh the objects listing, then you can use send :refresh => true to the Bucket#objects method.

 1 	# bucket.rb
 2 	# bucket.rb
 3 	require File.join(File.dirname(__FILE__), 's3_authenticator')
 4 	require 'rexml/document'
 5 
 6 	module S3Lib
 7 
 8 	  class NotYourBucketError < S3Lib::S3ResponseError
 9 	  end
10 
11 	  class BucketNotFoundError < S3Lib::S3ResponseError
12 	  end
13 
14 	  class BucketNotEmptyError < S3Lib::S3ResponseError
15 	  end  
16 
17 	  class Bucket
18 
19 	    attr_reader :name, :xml, :prefix, :marker, :max_keys
20 
21 	    # Errors for find
22 	    # Trying to find a bucket that doesn't exist will raise a 
23 	    # NoSuchBucket error
24 	    # Trying to find a bucket that you don't have access to will raise a 
25 	    # NotSignedUp error
26 	    def self.find(name, params = {})
27 	     begin
28 	        response = S3Lib.request(:get, name)
29 	      rescue S3Lib::S3ResponseError => error
30 	        case error.amazon_error_type
31 	        when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '\
32 #{name}' does not exist.", error.io, error.s3requester)
33 	        when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{\
34 name}' is not owned by you", error.io, error.s3requester)
35 	        else # Re-raise the error if it's not one of the above
36 	          raise
37 	        end
38 	      end
39 	      doc = REXML::Document.new(response)
40 	      Bucket.new(doc, params)
41 	    end
42 
43 	    def initialize(doc, params = {})
44 	      @xml = doc.root
45 	      @params = params
46 	      @name = @xml.elements['Name'].text
47 	      @max_keys = @xml.elements['MaxKeys'].text.to_i
48 	      @prefix = @xml.elements['Prefix'].text
49 	      @marker = @xml.elements['Marker'].text
50 	    end
51 
52 	    def is_truncated?
53 	      @xml.elements['IsTruncated'].text == 'true'
54 	    end
55 
56 	    def objects(params = {})
57 	      refresh if params[:refresh]
58 	      @objects || get_objects
59 	    end
60 
61 	    def refresh
62 	      refreshed_bucket = Bucket.find(@name, @params)
63 	      @xml = refreshed_bucket.xml
64 	      @objects = nil
65 	    end
66 
67 	    private
68 
69 	    def get_objects
70 	      @objects = REXML::XPath.match(@xml, '//Contents').collect do |object|
71 	        key = object.elements['Key'].text
72 	        S3Lib::S3Object.new(self, key, :lazy_load => true)
73 	      end
74 	    end
75 
76 	  end
77 
78 	end

Discussion

This is a first cut of the Bucket object. In the next two recipes, we’ll add functionality to create and destroy buckets. These additions will create some repetition in the code, so the recipe after that will refactor the Bucket class and clean it up a bit. The use of caching and refreshing is something that we’ll be repeating a few times as we build our S3 API, so take a close look at it now.

Creating a Bucket

The Problem

You want to actually create buckets, not just read them. While you’re at it, you want the ability to set the access control policy of the new bucket to a canned access control policy like ‘Public Read’.

The Solution

To create a bucket, you make a PUT request to the bucket’s name, like this:

1 PUT /my_new_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb  2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string

To make the authenticated request using the s3_authenticator library

1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3 
4 response = S3Lib.request(:put,'/my_new_bucket')

Setting Access control

You can set the access control policy to one of the four canned access-control-policies during bucket creation. You do this by adding a ‘x-amz-acl’ header to the PUT request. So, to make your new bucket publicly readable you would set ‘x-amz-acl’ to ‘public-read’:

1 PUT /my_new_bucket
2 Host: s3.amazonaws.com
3 x-amz-acl: public-read
4 Content-Length: 0
5 Date: Wed, 13 Feb  2008 12:00:00 GMT
6 Authorization: AWS some_id:some_authentication_string

Using the s3_authenticator library:

1 #!/usr/bin/env ruby
2 require 's3_authenticator'
3 
4 response = S3Lib.request(:put,'/my_new_bucket', 
5                          'x-amz-acl' => 'public-read')

To make the Bucket#create method a little more user friendly, we’ll also allow you to set the access control policy using the :access symbol, like this

1 Bucket.create('my_new_bucket', :access => 'public-read')

Errors

If you try to create a bucket that is already owned by someone else, you will raise a 409 (“Conflict”) error

1 $> irb -r s3_authenticator
2 >> S3Lib.request(:put, 'test')
3 S3Lib::S3ResponseError: 409 Conflict, BucketAlreadyExists
4         from ./s3_authenticator.rb:39:in `request'
5         from (irb):1

You can catch this in your code with a begin / rescue block.

Trying to create a bucket and failing is something that should raise an exception. Instead of keeping the default S3Lib::S3ResponseError error, the library code will re-raise it as a S3Lib::NotYourBucket error.

 1 #!/usr/bin/env ruby
 2 require 's3_authenticator'
 3 
 4 begin
 5   response = S3Lib.request(:put,name)
 6 rescue S => error
 7   if error.io.status == ["409", "Conflict"]
 8     raise S3Lib::NotYourBucket, "The bucket '#{name}' is already owned by somebo\
 9 dy else."
10   else
11     raise # re-raise the exception if it's not a 409 conflict
12   end
13 end

The Bucket#create method

Here’s what our Bucket.create method will look like. It takes a bucket name and creates the bucket. The optional params hash is a hash of headers to be sent along with the PUT request.

If you try to create a bucket that is already owned by somebody else, a S3Lib::NotYourBucket error will be raised.

If the bucket is created successfully, the method will return true.

 1 # s3_bucket.rb
 2 # s3_bucket.rb
 3 require File.join(File.dirname(__FILE__), 's3_authenticator')
 4 module S3Lib
 5 
 6   class NotYourBucketError < S3Lib::S3ResponseError
 7   end
 8 
 9   class Bucket
10 
11     def self.create(name, params = {})
12       if params[:access] # translate from :access to 'x-amz-acl'
13         params['x-amz-acl'] = params.delete(:access)
14       end
15       begin
16         response = S3Lib.request(:put, name, params)
17       rescue S3Lib::S3ResponseError => error
18         if error.amazon_error_type == "BucketAlreadyExists"
19           raise S3Lib::NotYourBucketError.new("The bucket '#{name}' is already o\
20 wned by somebody else", error.io, error.s3requester)
21         else
22           raise # re-raise the exception if it's not a BucketAlreadyExists error
23         end
24       end    
25       response.status[0] == "200" ? true : false
26     end
27 
28   end
29 
30 end

Let’s try it out in irb

1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('my_new_bucket')
3 => true

You can create the bucket ‘virtual hosted style’ by adding a ‘Host’ entry to the params hash.

1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('/', "host" => "mynewbucket.s3.amazonaws.com")
3 => true

You can add a canned access control policy by sending :access in the params hash.

1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create('/my_readable_bucket', :access => 'public-read')
3 => true

If you try to create a bucket that is owned by someone else, you will get a S3Lib::NotYourBucket error

1 $> irb -r s3_bucket
2 >> S3Lib::Bucket.create("test")
3 S3Lib::NotYourBucketError: The bucket 'test' is already owned by somebody else, \
4 BucketAlreadyExists
5         from ./library/bucket_create.rb:29:in `create'
6         from (irb):1

Discussion

The Bucket class is starting to shape up nicely now. We can read and create them. The next recipe, “Deleting a Bucket”, will talk about deleting Buckets, after which we’ll have full functionality. Notice that we’re really focussing on making the interface easy to use by adding helpful shortcuts like the :access parameter. You’ve probably also noticed some of the repetition that we’ll be cleaning up in the “Refactoring the Bucket Class”.

Deleting a Bucket

The Problem

You want to be able to delete buckets that you own

The Solution

to delete a bucket, you make a DELETE request to the bucket’s name, like this:

1 DELETE /spatten_bucket
2 Host: s3.amazonaws.com
3 Content-Length: 0
4 Date: Wed, 13 Feb  2008 12:00:00 GMT
5 Authorization: AWS some_id:some_authentication_string

To make the authenticated request using the s3_authenticator library

1 #!/usr/bin/env ruby
2 	require 's3_authenticator'
3 
4 	response = S3Lib.request(:delete,'/spatten_bucket')

Errors

Trying to delete a bucket that is not empty will raise a BucketNotEmpty error. Trying to delete a bucket that does not exists will raise a NoSuchBucket error. Trying to delete a bucket that you do not own will raise a NotSignedUp error.

 1 $> irb -r s3_authenticator
 2 >> S3Lib.request(:delete, 'spatten_bucket')
 3 S3Lib::S3ResponseError: 409 Conflict, BucketNotEmpty
 4        from ./library/s3_authenticator.rb:39:in `request'
 5        from (irb):15
 6 >> S3Lib.request(:delete, 'spatten_bucketasdasdas')
 7 S3Lib::S3ResponseError: 404 Not Found, NoSuchBucket
 8        from ./library/s3_authenticator.rb:39:in `request'
 9        from (irb):16
10 >> S3Lib.request(:delete, 'test')
11 S3Lib::S3ResponseError: 403 Forbidden, NotSignedUp
12        from ./library/s3_authenticator.rb:39:in `request'
13        from (irb):17

Because empty buckets cannot be deleted, we will create a Bucket::delete_all class method, and a corresponding instance method. As well, if you pass :force => true in the params hash of Bucket::delete, then the bucket will be deleted even if it is not empty.

 1 $> irb -r library/s3lib.rb 
 2 >> S3Lib::Bucket.delete('spatten_not_empty_bucket')
 3 S3Lib::BucketNotEmptyError: The bucket 'spatten_not_empty_bucket' is not empty, \
 4 so you can't delete it.
 5 Try using Bucket.delete_all first, or Bucket.delete('spatten_not_empty_bucket', \
 6 :force => true).
 7         from ./library/bucket.rb:45:in `delete'
 8         from (irb):1
 9 >> exit
10 $> irb -r library/s3lib.rb 
11 >> S3Lib::Bucket.delete('spatten_not_empty_bucket')
12 S3Lib::BucketNotEmptyError: The bucket 'spatten_not_empty_bucket' is not empty, \
13 so you can't delete it.
14 Try using Bucket.delete_all('spatten_not_empty_bucket') first, or Bucket.delete(\
15 'spatten_not_empty_bucket', :force => true).
16         from ./library/bucket.rb:45:in `delete'
17         from (irb):1
18 >> S3Lib::Bucket.delete('spatten_not_empty_bucket', :force => true)
19 => #<StringIO:0x167d1e8>
20 >> S3Lib::Bucket.find('spatten_not_empty_bucket')
21 S3Lib::BucketNotFoundError: The bucket 'spatten_not_empty_bucket' does not exist.
22         from ./library/bucket.rb:75:in `find'
23         from (irb):3

Warning

The example above won’t work for you yet, as we haven’t created the S3Object class, which is called by Bucket::delete if params[:force] == true.

Here’s the code that takes care of all the deleting

 1 # bucket.rb
 2 require File.join(File.dirname(__FILE__), 's3_authenticator')
 3 require 'rexml/document'
 4 
 5 module S3Lib
 6   
 7   class NotYourBucketError < S3Lib::S3ResponseError
 8   end
 9   
10   class BucketNotFoundError < S3Lib::S3ResponseError
11   end
12   
13   class BucketNotEmptyError < S3Lib::S3ResponseError
14   end  
15   
16   class Bucket
17     
18     attr_reader :name, :xml, :prefix, :marker, :max_keys
19     
20     # passing :force => true will cause the bucket to be deleted even if it is n\
21 ot empty.
22     def self.delete(name, params = {})
23       if params.delete(:force)
24         self.delete_all(name, params)
25       end
26       begin
27         response = S3Lib.request(:delete, name, params)  
28       rescue S3Lib::S3ResponseError => error
29         case error.amazon_error_type
30         when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
31 {name}' does not exist.", error.io, error.s3requester)
32         when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
33 ame}' is not owned by you.", error.io, error.s3requester)
34         when "BucketNotEmpty": raise S3Lib::BucketNotEmptyError.new("The bucket \
35 '#{name}' is not empty, so you can't delete it.\nTry using Bucket.delete_all('#{\
36 name}') first, or Bucket.delete('#{name}', :force => true).", error.io, error.s3\
37 requester)
38         else # Re-raise the error if it's not one of the above
39           raise
40         end
41       end            
42     end
43     
44     def delete(params = {})
45       self.class.delete(@name, @params.merge(params))
46     end
47     
48     def self.delete_all(name, params = {})
49       bucket = Bucket.find(name, params)
50       bucket.delete_all
51     end
52     
53     def delete_all
54       objects.each do |object|
55         object.delete
56       end
57     end
58     
59   end
60   
61 end

Discussion

We now have a bucket class that can find, create and delete bucket. All right! However, the class is getting kind of ugly. We’ll clean that up in the “Refactoring the Bucket Class”.

Refactoring the Bucket Class

The Problem

We now have a fully functional bucket class, but there’s lots of repetition. You want to clean it up to make it cleaner. This will also make it easier to add new functionality.

The Solution

Here’s the current state of the Bucket class:

  1 require File.join(File.dirname(__FILE__), 's3_authenticator')
  2 require 'rexml/document'
  3 
  4 module S3Lib
  5 
  6   class NotYourBucketError < S3Lib::S3ResponseError
  7   end
  8 
  9   class BucketNotFoundError < S3Lib::S3ResponseError
 10   end
 11 
 12   class BucketNotEmptyError < S3Lib::S3ResponseError
 13   end  
 14 
 15   class Bucket
 16 
 17     attr_reader :name, :xml, :prefix, :marker, :max_keys
 18 
 19     def self.create(name, params = {})
 20       params['x-amz-acl'] = params.delete(:access) if params[:access] # translat\
 21 e from :access to 'x-amz-acl'
 22       begin
 23         response = S3Lib.request(:put, name, params)
 24       rescue OpenURI::HTTPError => error
 25         if error.amazon_error_type == "BucketAlreadyExists"
 26           S3Lib::NotYourBucketError.new("The bucket '#{name}' is already owned b\
 27 y somebody else", error.io, error.s3requester)
 28         else
 29           raise # re-raise the exception if it's not a BucketAlreadyExists error
 30         end
 31       end    
 32       response.status[0] == "200" ? true : false
 33     end
 34 
 35     # passing :force => true will cause the bucket to be deleted even if it is n\
 36 ot empty.
 37     def self.delete(name, params = {})
 38       if params.delete(:force)
 39         self.delete_all(name, params)
 40       end
 41       begin
 42         response = S3Lib.request(:delete, name, params)  
 43       rescue S3Lib::S3ResponseError => error
 44         case error.amazon_error_type
 45         when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
 46 {name}' does not exist.", error.io, error.s3requester)
 47         when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
 48 ame}' is not owned by you.", error.io, error.s3requester)
 49         when "BucketNotEmpty": raise S3Lib::BucketNotEmptyError.new("The bucket \
 50 '#{name}' is not empty, so you can't delete it.\nTry using Bucket.delete_all('#{\
 51 name}') first, or Bucket.delete('#{name}', :force => true).", error.io, error.s3\
 52 requester)
 53         else # Re-raise the error if it's not one of the above
 54           raise
 55         end
 56       end            
 57     end
 58 
 59     def delete(params = {})
 60       self.class.delete(@name, @params.merge(params))
 61     end
 62 
 63     def self.delete_all(name, params = {})
 64       bucket = Bucket.find(name, params)
 65       bucket.delete_all
 66     end
 67 
 68     def delete_all
 69       objects.each do |object|
 70         object.delete
 71       end
 72     end
 73 
 74     # Errors for find
 75     # Trying to find a bucket that doesn't exist will raise a NoSuchBucket error
 76     # Trying to find a bucket that you don't have access to will raise a NotSign\
 77 edUp error
 78     def self.find(name, params = {})
 79      begin
 80         response = S3Lib.request(:get, name)
 81       rescue S3Lib::S3ResponseError => error
 82         case error.amazon_error_type
 83         when "NoSuchBucket": raise S3Lib::BucketNotFoundError.new("The bucket '#\
 84 {name}' does not exist.", error.io, error.s3requester)
 85         when "NotSignedUp": raise S3Lib::NotYourBucketError.new("The bucket '#{n\
 86 ame}' is not owned by you", error.io, error.s3requester)
 87         else # Re-raise the error if it's not one of the above
 88           raise
 89         end
 90       end
 91       doc = REXML::Document.new(response)
 92       Bucket.new(doc, params)
 93     end
 94 
 95     def initialize(doc, params = {})
 96       @xml = doc.root
 97       @params = params
 98       @name = @xml.elements['Name'].text
 99       @max_keys = @xml.elements['MaxKeys'].text.to_i
100       @prefix = @xml.elements['Prefix'].text
101       @marker = @xml.elements['Marker'].text
102     end
103 
104     def is_truncated?
105       @xml.elements['IsTruncated'].text == 'true'
106     end
107 
108     def objects(params = {})
109       refresh if params[:refresh]
110       @objects || get_objects
111     end
112 
113     def refresh
114       refreshed_bucket = Bucket.find(@name, @params)
115       @xml = refreshed_bucket.xml
116       @objects = nil
117     end
118 
119     private
120 
121     def get_objects
122       @objects = REXML::XPath.match(@xml, '//Contents').collect do |object|
123         key = object.elements['Key'].text
124         S3Lib::S3Object.new(self, key, :lazy_load => true)
125       end
126     end
127 
128   end
129 
130 end

There’s a lot of repetition between the create, delete and find class methods. I’m going to refactor that out in to a single method to clean things up. I’ll call that method bucket_request. Just to make things easier as more errors are added, I’ll also move the errors in to a separate file called s3_errors.rb. The final refactoring will be to create a file, s3lib.rb, that will load up all of the files required by our Bucket class. Whenever you want to use the Bucket class, require s3lib.rb instead.

Here’s the refactored Bucket class:

Example 5.3. bucket.rb <<(code/refactoring_the_bucket_class_recipe/bucket.rb)

Ahh, much cleaner. All that repetition was making me itchy. Here are the s3_error.rb and s3lib.rb files

Example 5.4. s3_errors.rb <<(code/refactoring_the_bucket_class_recipe/s3_errors.rb)

Example 5.5. s3lib.rb <<(code/refactoring_the_bucket_class_recipe/s3_lib.rb)

Notice that I moved the S3ResponseError class from s3_authenticator.rb into s3_errors.rb where it belongs.

Discussion

We now have the first major class all done and tucked away, with reasonable code. This class will set the pattern for the next two classes we implement: the S3Object and Acl classes.

The

The Problem

You want to be able to read, create and delete the objects that live in your buckets.

The Solution

Now that we have our Bucket class all sorted out, the next obvious step is to create the object class. Having the Object and the Bucket taken care of will create most of the functionality we want. The HTTP verbs that an object responds to are PUT, GET, DELETE and HEAD. The HEAD verb is one that we haven’t talked about before. It is used to get information about a resource without actually downloading the whole resource. An object on Amazon S3 is a perfect example of why this is necessary: you don’t want to have to download a 200 MB file just to find out when it was created. Instead, just do a HEAD request to the object and get its metadata.

Here are the S3Object::create and S3Object::find class methods. They are pretty simple methods. The create method just does a :put to the Amazon S3 service with the correct URL and then returns an instance of the S3Object class (I’ll show you what the S3Object.object_request command looks like soon). The find method does even less. It just creates a new instance of the S3Object class. The url method creates the URL to the object based on the object’s bucket and key. The bucket can be either a string giving the bucket’s name or a Bucket object.

Example 5.6. s3object_create.rb - creating and finding objects

 1 module S3Lib
 2 
 3   class S3Object
 4 
 5     DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
 6 
 7     attr_reader :key, :bucket
 8 
 9     # This is just an alias for S3Object.new
10     def self.find(bucket, key, options = {})
11       S3Object.new(bucket, key, options)
12     end
13 
14     def self.create(bucket, key, value = "", options = {})    
15 			# translate from :access to 'x-amz-acl'
16 	    params['x-amz-acl'] = params.delete(:access) if params[:access]
17       options.merge!({:body => value || "", 
18                       'content-type' => DEFAULT_CONTENT_TYPE})
19       response = S3Object.object_request(:put, S3Object.url(bucket, key), 
20                                          options)
21       response.status[0] == "200" ? 
22          S3Object.new(bucket, key, options) : false
23     end
24 
25     # bucket can be either a Bucket object or a string containing 
26     # the bucket's name
27     def self.url(bucket, key)
28       bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
29       File.join(bucket_name, key)
30     end     
31     
32     def url
33       S3Object.url(@bucket.name, @key)
34     end
35 
36 	end
37 end

Avoiding unnecessary or premature downloads and requests to the server is what creates almost all of the complexity in the S3Object class. I’ll be using two techniques to this end. First, if something is downloaded, it will be cached in the object. Second, the object’s value will be loaded lazily. When a new instance of an S3Object is created, only the object’s metadata will be pulled from the Amazon S3 Service. If you are loading a large number of objects, then you don’t want to make a request to Amazon S3 for every object. In this case, it’s best to pass :lazy_load => true in the options hash. This is done by the Bucket class to avoid calling hundreds of HTTP calls when loading up a bucket with lots of objects in it.

To make that a bit more concrete, let’s look at the S3Object#initialize method.

Example 5.7. s3_object_initialize.rb

 1 	module S3Lib
 2   class S3Object
 3 
 4 		# Both metadata and value are loaded lazily if options[:lazy_load] 
 5 		# is true.  This is used by Bucket.find so you don't make a request 
 6 		# for every object in the bucket
 7 	  # The bucket can be either a bucket object or a string containing
 8 	  # the bucket's name.
 9 	  # The key is a string.
10 	  def initialize(bucket, key, options = {})
11 	    options.merge!(:lazy_load => false)
12 	    bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
13 	    @bucket = bucket
14 	    @key = key
15 	    @options = options
16 	    get_metadata unless options.delete(:lazy_load)      
17 	  end
18   end
19 end

Notice that if you pass :lazy_load => true and if the bucket parameter is an instance of the Bucket class, then no HTTP requests will be made.

All requests to the server are made using the S3Object::object_request method. This takes care of all the error handling and makes sure that the :lazy_load method is not sent up to the server.

Example 5.8. s3_object_object_request.rb - S3Object::object_request

<<(code/the_s3object_class_api_recipe/s3_object_object_request.rb)

The value method is used to get the value of an object. It looks to see if the value has been retrieved already, returns the cached value if it has and downloads the value from Amazon S3 if it hasn’t. You can refresh the value by sending :refresh => true when you request the value or by calling the refresh method.

Example 5.9. s3_object_value.rb - getting the value of an object

 1 module S3Lib
 2 
 3   class S3Object
 4 
 5 		def value(params = {})
 6       refresh if params[:refresh]
 7       @value || get_value
 8     end
 9 
10     def refresh
11       get_value
12     end
13 
14     def get_value
15       request = S3Object.object_request(:get, url, @options)
16       @metadata = request.meta      
17       @value = request.read
18     end
19 
20 	end
21 end

To change the value of an object, you just re-create the object.

Example 5.10. s3object_set_value.rb changing the value of an object

1   def value=(value)
2   S3Object.create(@bucket, @key, value, @options)
3   @value = value
4   refresh_metadata
5 end

Here’s the full listing for the S3Object class.

Example 5.11. s3_object.rb

  1 module S3Lib
  2 
  3   class S3Object
  4 
  5     DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
  6 
  7     attr_reader :key, :bucket
  8 
  9     # This is just an alias for S3Object.new
 10     def self.find(bucket, key, options = {})
 11       S3Object.new(bucket, key, options)
 12     end
 13 
 14     def self.create(bucket, key, value = "", options = {})    
 15 			# translate from :access to 'x-amz-acl'
 16 	    params['x-amz-acl'] = params.delete(:access) if params[:access]
 17       options.merge!({:body => value || "", 
 18                       'content-type' => DEFAULT_CONTENT_TYPE})
 19       response = S3Object.object_request(:put, S3Object.url(bucket, key), 
 20                                          options)
 21       response.status[0] == "200" ? 
 22                             S3Object.new(bucket, key, options) : false
 23     end
 24 
 25     # Delete an object given the object's bucket and key.
 26     # No error will be raised if the object does not exist.
 27     def self.delete(bucket, key, options = {})
 28       S3Object.object_request(:delete, S3Object.url(bucket, key), options)
 29     end
 30 
 31     def delete
 32       S3Object.delete(@bucket, @key, @options)
 33     end
 34 
 35     def self.value(bucket, key, options = {})
 36       request = S3Object.object_request(:get, S3Object.url(bucket, key), 
 37                                         options)
 38       request.read
 39     end
 40 
 41     # Both metadata and value are loaded lazily if options[:lazy_load] 
 42     # is true.  This is used by Bucket.find so you don't make a request 
 43     # for every object in the bucket.
 44     # The bucket can be either a bucket object or a string containing 
 45     # the bucket's name.
 46     # The key is a string.
 47     def initialize(bucket, key, options = {})
 48       bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
 49       @bucket = bucket
 50       @key = key
 51       @options = options
 52       get_metadata unless options[:lazy_load]
 53     end  
 54 
 55     # bucket can be either a Bucket object or a string containing 
 56     # the bucket's name
 57     def self.url(bucket, key)
 58       bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
 59       File.join(bucket_name, key)
 60     end     
 61 
 62     def url
 63       S3Object.url(@bucket.name, @key)
 64     end 
 65 
 66     def metadata
 67       @metadata || get_metadata
 68     end
 69 
 70     def value(params = {})
 71       refresh if params[:refresh]
 72       @value || get_value
 73     end
 74 
 75     def value=(value)
 76       S3Object.create(@bucket, @key, value, @options)
 77       @value = value
 78       refresh_metadata
 79     end
 80 
 81     def refresh
 82       get_value
 83     end
 84 
 85     def refresh_metadata
 86       get_metadata
 87     end
 88 
 89     def content_type
 90       metadata["content-type"]
 91     end
 92 
 93     def etag
 94       metadata["etag"]
 95     end
 96 
 97     def length
 98       metadata["content-length"].to_i
 99     end
100 
101     private
102 
103     def self.object_request(verb, url, options = {})
104       begin
105         options.delete(:lazy_load)
106         response = S3Lib.request(verb, url, options)
107       rescue S3Lib::S3ResponseError => error
108         case error.amazon_error_type
109         when 'NoSuchBucket'
110           raise S3Lib::BucketNotFoundError.new(
111           "The bucket '#{bucket}' does not exist.", 
112           error.io, error.s3requester)
113         when 'NotSignedUp'
114           raise S3Lib::NotYourBucketError.new(
115           "The bucket '#{bucket}' is owned by somebody else", 
116           error.io, error.s3requester)
117         when 'AccessDenied'
118           raise S3Lib::NotYourBucketError.new(
119             "The bucket '#{bucket}' is owned by someone else.", 
120             error.io, error.s3requester)
121         when 'MissingContentLength'
122           raise S3Lib::NoContentError.new(
123             "You must provide a value to put in the object.\nUsage: " + 
124             "S3Lib::S3Object.create(bucket, key, value, options)", 
125             error.io, error.s3requester)          
126         else # Re-raise the error if it's not one of the above
127           raise
128         end
129       end
130       response
131     end
132 
133     def get_metadata
134       request = S3Object.object_request(:head, url, @options)
135       @metadata = request.meta
136     end
137 
138     def get_value
139       request = S3Object.object_request(:get, url, @options)
140       @metadata = request.meta      
141       @value = request.read
142     end
143 
144   end
145 
146 end

Discussion

To close up this section, let’s look at a few things I haven’t covered yet. First, note that the get_metadata does a :head request to the server, so you don’t have to download the full object just to get the object’s metadata. Also, some convenience methods, content_type, etag and size, have been made to give you slightly cleaner access to the object’s metadata. Finally, the S3Object::delete method makes a :delete request to the object’s URL.

That’s it! This isn’t a feature complete S3Object class, but it gives you a good start. Notably missing is any ability to change the access control of an object. This will be discussed in the next few recipes.

Reading a Bucket or Object’s Access Control Policy

The Problem

You want to add the ability to get information about a bucket or object’s Access Control Policy to your API.

The Solution

As discussed in the “Access Control Policies”, you read a bucket or object’s Access Control List by doing an authenticated GET request to the bucket or object’s URL with ?acl appended to it. For example, a bucket called my_test_bucket has a URL of http://s3.amazonaws.com/my_test_bucket, and you read its ACL by doing an authenticated GET request to http://s3.amazonaws.com/my_test_bucket?acl. To read the ACL of a bucket or object, you must have READ access to the bucket or object’s ACL.

To avoid writing ‘bucket or object’ over and over in this section, I’m going to use resource to refer to both buckets and objects.

ACLs, permissions and grants were discussed rather exhaustively in “Access Control Policies”. Here’s a brief refresher. An ACL consists of a list of grants. Each grant gives a permission to a grantee. The permissions can be one of READ, WRITE, READ_ACL, WRITE_ACL or FULL_CONTROL. The grantee can be one of four types, user by email, user by canonical representation, all AWS users or anyone (anonymous access).

To model this in our API, we’ll create two new classes: the Acl class and the Grant class. You’ll never instantiate or interact with the Grant class directly; it will just be used by the Acl class to store and manipulate the ACL’s grants. To create a new instance of an Acl, we’ll make a call to Acl.new, passing in the URL of the resource that we want the ACL for.

1 $> irb -rubygems -r s3lib
2 >> S3Lib::Acl.new('spatten_test_bucket')

The Acl.new method won’t do anything else. Any calls to Amazon S3 will be deferred until a request to see the list of the Acl’s grants is made. This will be done using the Acl#grants method. When Acl#grants is called, a check will be made to see if the ACL XML has been downloaded from Amazon S3 yet. If it hasn’t, then an authenticated GET request will be made to the ACL’s URL and the XML will be parsed and used to create a set of Grant objects. Here’s the code for the Acl object

Example 5.12. acl.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/acl.rb)

The grants method checks to see if the array of Grant objects has been cached in the @grants instance variable or not. If it has, it just returns it. Otherwise, it calls get_grants. Most of the action is in the get_grants method. It GETs the ACL XML response from Amazon S3 and then uses REXML’s XPATH parser to get a list of all the <Grant> objects in the response. Each <Grant> object is parsed to get the permission and the grantee element, and then used to create a new Grant object. Here’s the code for the Grant object.

Example 5.13. grant.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/grant.rb)

This is pretty straightforward: it takes the permission String and the grantee XML and parses them to assign the proper values to the @permission, @type and @grantee instance variables.

Okay, now all of the pieces are in place to read a resource’s grants. Let’s write a simple script to try it out.

Example 5.14. grant_reader.rb <<(code/reading_a_bucket_or_objects_access_control_policy_api_recipe/grant_reader.rb)

Try it out by passing the URL of a resource you control in to it

1 $> ruby grant_reader.rb spatten_test_bucket
2 Grants for spatten_test_bucket
3 full_control, 9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f
4 read, all_s3

Discussion

We can now read the grants of a resource, but there’s still more to do before we have full ACL functionality. We’ll need a way to refresh the cached list of grants if it changes. We’ll obviously want to be able to create and delete grants as well. Finally, we’ll want to add an acl method to the S3Object and Bucket classes so that we can do things like some_bucket.acl.grants. These requirements are covered in the next few recipes.

Refreshing the Cached ACL

The Problem

You want to be able to refresh the cached grants of an Acl instance in your API.

The Solution

This is pretty straightforward. We’re going to add the Acl#reresh_grants method to the Acl class and change the Acl#grants method so that it refreshes the @grants instance variable if you pass :refresh => true to it. The changes are in bold below.

Example 5.15. acl.rb <<(code/refreshing_the_cached_acl_api_recipe/acl.rb)

Discussion

Now your API will efficiently make a request to Amazon S3 only the first time you ask it about the ACL of a resource, but still allow you to make the call again if you want to make sure you have the latest copy.

Creating a New Grant

The Problem

Your API reads the ACL of a resource, but you also want to be able to add new grants to a resource.

The Solution

Adding a grant to a resource’s ACL is done by PUTting a new ACL to the resource’s ACL URL. The procedure for adding a grant, then is to

GET the resource’s ACL from the resource’s ACL URL
Add a new grant to the ACL
PUT the new ACL to the resource’s ACL URL

This is going to involve quite a few changes to the Acl and Grant classes. First, we’re going to have to change the Grant#initialize method to allow the creation of a new Grant with the grantee as a Hash, rather than from the XML you get from Amazon S3. This isn’t strictly necessary, but it would sure be annoying to the users of your API if they had to create the XML representation every time they wanted to add a new grant. We’ll also need another method on Grant that converts a Grant instance to an XML representation.

The Acl class will require a few changes as well. First, we’ll add the Acl#add_grant method. It will take a permission and a Hash containing information about the grantee as arguments, use those arguments to create a new Grant instance, and add that instance to the @grants instance variable. We’ll also need to change the Acl::acl_request method to make sure the content-type is set when you PUT the ACL XML up to Amazon S3. Finally, we’ll need to add an Acl#to_xml method as well to create the XML representation of the ACL.

The Grant Class

Here’s the new Grant class:

  1 module S3Lib
  2   class Grant
  3     attr_reader :acl, :grantee, :type, :permission
  4     GRANT_TYPES = {:canonical => 'CanonicalUser',
  5                    :email => 'AmazonCustomerByEmail', 
  6                    :all_s3 => 'Group', 
  7                    :public => 'Group'}
  8     GROUP_URIS = {
  9       'http://acs.amazonaws.com/groups/global/AuthenticatedUsers' => 
 10          :all_s3,
 11       'http://acs.amazonaws.com/groups/global/AllUsers' => :public}
 12     PERMISSIONS = [:read, :write, :read_acl, :write_acl, :full_control]
 13     NAMESPACE_URI = 'http://www.w3.org/2001/XMLSchema-instance'
 14     
 15     # Create a new grant.  
 16     # permission is one of the PERMISSIONS defined above
 17     # grantee can be either a REXML::Document object or a Hash
 18     # The grantee Hash should look like this:
 19     # {:type => :canonical|:email|:all_s3|:public, 
 20     #  :grantee => canonical_user_id | email_address}
 21     #
 22     # The :grantee element of the hash is only required (and meaningful) 
 23     # for :canonical and :email Grants
 24     def initialize(permission, grantee)
 25       @type = parse_type(grantee)
 26       @permission = parse_permission(permission)
 27       @grantee = parse_grantee(grantee)
 28     end
 29     
 30     def to_xml
 31       builder = Builder::XmlMarkup.new(:indent => 2)
 32       xml = builder.Grant do
 33         builder.Grantee('xmlns:xsi' => NAMESPACE_URI, 
 34                         'xsi:type' => GRANT_TYPES[@type]) do
 35           case type
 36           when :canonical: builder.ID(@grantee)
 37           when :email: builder.EmailAddress(@grantee)
 38           when :all_s3: builder.URI(group_uri_from_group_type(:all_s3))
 39           when :public: builder.URI(group_uri_from_group_type(:public))
 40           else
 41           end
 42         end
 43         builder.Permission(@permission.to_s.upcase)
 44       end
 45     end
 46     
 47     private
 48     
 49     # permission can either be the String provided by S3
 50     # or a symbol (see the PERMISSIONS array for allowed values)
 51     def parse_permission(permission)
 52       if permission.is_a?(String)
 53         permission.downcase.to_sym
 54       else
 55         permission
 56       end
 57     end
 58     
 59     def parse_type(grantee)
 60       if grantee.is_a?(Hash)
 61         grantee[:type]
 62       else # Assume it's a REXML::Doc object
 63         type = grantee.attributes['xsi:type']
 64         case type
 65         when 'CanonicalUser': :canonical
 66         when 'AmazonCustomerByEmail': :email
 67         when 'Group'
 68           group_uri = grantee.elements['URI'].text
 69           group_type_from_group_uri(group_uri)
 70         else
 71           raise BadGrantTypeError
 72         end
 73       end
 74     end
 75     
 76     def parse_grantee(grantee)
 77       if grantee.is_a?(Hash) 
 78         if [:canonical, :email].include?(@type)
 79           grantee[:grantee]
 80         else
 81           @type
 82         end
 83       else # it's a REXML::Doc object
 84         case @type
 85         when :canonical
 86           grantee.elements['ID'].text
 87         when :email
 88           grantee.elements['EmailAddress'].text
 89         when :all_s3: :all_s3
 90         when :public: :public
 91         else
 92           nil
 93         end
 94       end
 95     end
 96     
 97     def group_type_from_group_uri(group_uri)
 98       GROUP_URIS[group_uri]
 99     end
100     
101     def group_uri_from_group_type(group_type)
102       GROUP_URIS.invert[group_type]
103     end
104     
105   end
106 end

There are a lot of changes here. Let’s break it down a bit. The parse_type and parse_grantee methods have been amended so that the grantee and type information can be passed in as either a Hash or a REXML object. There’s not much to say here. The methods check whether the argument that has been passed in is a Hash, and if it is parse the Hash. If it’s not, they assume that it’s a REXML object and parse that.

The next change is the addition of the group_type_from_group_uri and group_uri_from_group_type methods. These two methods allow you to get the group type from the group URI and vice versa. Previously, we only needed to get the group type from the group URI, so we used the GROUP_URIS hash to get the group type. Now that we’re reading and writing grants, we need to convert in both directions.

The last change is the biggest and, if you’re new to crazy dyamic languages like Ruby, the strangest looking. The to_xml method uses the Builder library to build the grant’s XML representation. If you’re interested, there’s more information on how Builder works in The XML Builder Library. For now, you can just take it for granted that it creates XML output like this:

1 <Grant>
2 	<Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
3 		xsi:type="CanonicalUser">
4 		<ID>9d92623ba6dd9d7cc06a7b8bcc46381e7c646f72d769214012f7e91b50c0de0f</ID>
5 	</Grantee>
6 	<Permission>FULL_CONTROL</Permission>
7 </Grant>

The XML Builder Library

The ‘Builder for Markup’ library (http://builder.rubyforge.org/) is written by Ruby metaprogramming master Jim Weirich and is inspired by a similiar class used in Groovy on Grails. It’s one of my favourite examples of how Ruby metaprogramming magic can help to create an elegant and intuitive interface. In this case, a Domain Specific Language for creating XML that reads pretty naturally once you’re used to basic Ruby syntax. If you’re coming from a more structured language like Java, however, it probably just looks plain crazy. Surely the Builder::XmlMarkup class doesn’t have a Grantee method? The answer is that of course it doesn’t. Instead, the Builder library uses Ruby’s method_missing method to do its magic. Here’s how it works.

In Ruby, every object has a method_missing method. Whenever an object is asked to invoke a method that it doesn’t have, the method_missing method is called. By default, method_missing just raises an error. However, you can over-ride it to do whatever you want. So, the Builder::XmlMarkup class intercepts methods called on it and assumes that they are the names of XML tags. The value passed to the object is assumed to be the content of the tag. The Builder::XmlMarkup class returns a String. So, a test script like

Example 5.17. builder_test.rb <<(code/creating_a_new_grant_api_recipe/builder_test.rb)

would result in

1 $> ruby builder_test.rb 
2 <Test>this is a test</Test>

Okay, have you digested that? Now it starts to get hairy. You can also pass a Hash of options to the Builder::XmlMarkup object, which gets interpreted as a set of attributes for the element you are creating

Example 5.18. builder_test2.rb <<(code/creating_a_new_grant_api_recipe/builder_test2.rb)

1 $> ruby builder_test.rb 
2 <Test testing="true" good_example="false">this is a test</Test>

You can also nest XML objects by passing a block in to the method call

Example 5.19. builder_test3.rb <<(code/creating_a_new_grant_api_recipe/builder_test3.rb)

1 $> ruby builder_test.rb 
2 <Name nice_guy="true"><first>Scott</first><last>Patten</last></Name>

One more thing and then I’ll stop (I promise!). The output of that last example was all on one line and a little hard to read. You can use the :indent parameter when you instantiate the Builder::XmlMarkeup to fix that

Example 5.20. builder_test4.rb <<(code/creating_a_new_grant_api_recipe/builder_test4.rb)

1 $> ruby builder_test.rb 
2 <Name nice_guy="true">
3   <first>Scott</first>
4   <last>Patten</last>
5 </Name>

So, that’s a bit about what’s going on with the Builder library and how it works. Needless to say, you can safely ignore this meta-programming stuff use whatever you want to build the XML. The Builder way is pretty darn sexy, though!

The Acl Class

Here’s the new Acl class

Example 5.21. acl.rb <<(code/creating_a_new_grant_api_recipe/acl.rb)

Once again, there were a lot of changes. We added the add_grant, add_grant!, set_grants, owner and to_xml methods. We also changed the acl_request method to set the content-type if the request is a PUT.

Let’s start by looking at the add_grant, set_grants and add_grant! methods. The add_grant method is a one-liner: It creates a new instance of a Grant object and pushes it on to the @grants Array. Note that it uses the grants method to push things on to, so that it’s sure that the grants have been downloaded from S3.

The set_grants method takes the XML represenation of the ACL and PUTs it up to Amazon S3. Notice that if you use add_grant to add a new Grant to a resource, your changes will be lost unless you make a subsequent call to set_grants.

That’s where add_grant! comes in. It adds a grant and then pushes the change to Amazon S3. The ! is Ruby short-hand for ‘this does something more dangerous than the method without the exclamation mark’ (see http://dablog.rubypal.com/2007/8/15/bang-methods-or-danger-will-rubyist for a deeper discussion on this). So, add_grant makes a local change to the resource’s ACL and add_grant! makes a permanent change.

The to_xml method once again uses the Builder library to create the XML representation of the ACL. It uses the owner method to grab the Canonical ID of the Owner from the original ACL’s XML representation. One subtle thing that wasn’t mentioned in the The XML Builder Library was that, since the return value from the Builder::XmlMarkup instance is just a string, you can actually just push new data in to the string using <<. Acl#to_xml uses this to push the XML representation of the Acl’s Grants in to the XML.

Discussion

Wow, that was a lot of work. We’re almost done with the Acl. The last step will be to tie the Acl class to the S3Object and Bucket classes.

Tieing the Acl Class to a Bucket or Object

The Problem

You have a nice Acl class, but your Buckets and Objects don’t know anything about it. You want to be able to do things like

1 b = S3Lib::Bucket.find('spatten_test_bucket')
2 b.acl.add_grant(:public_read, :all_s3)

The Solution

We need to make (small) changes to three classes: Bucket, S3Object and Acl. The changes to Bucket and S3Object will be equivalent. We’ll add two public methods, acl and refresh_acl and one private method, get_acl. On the Acl class, we’ll change the initialize method so that you can pass in a Bucket or S3Object instance as well as a resource’s URL. We’ll also add a @parent instance variable and make it readable. This is the final code for all three classes, so I’m going to include every line of it rather than just the changes. Let’s start with the Acl class.

Example 5.22. acl.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/acl.rb)

Next, the Bucket class

Example 5.23. bucket.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/bucket.rb)

and, finally, the S3Object class

Example 5.24. object.rb

  1 # s3_object.rb
  2 
  3 module S3Lib
  4 
  5   class S3Object
  6     
  7     DEFAULT_CONTENT_TYPE = 'binary/octect-stream'
  8     
  9     attr_reader :key, :bucket
 10         
 11     # This is just an alias for S3Object.new
 12     def self.find(bucket, key, options = {})
 13       S3Object.new(bucket, key, options)
 14     end
 15     
 16     def self.create(bucket, key, value = "", options = {})
 17       # translate from :access to 'x-amz-acl'
 18       options['x-amz-acl'] = options.delete(:access) if options[:access]
 19       options.merge!({:body => value || "", 
 20                       'content-type' => DEFAULT_CONTENT_TYPE})
 21       response = S3Object.object_request(:put, 
 22                                          S3Object.url(bucket, key), 
 23                                          options)
 24       response.status[0] == "200" ? 
 25         S3Object.new(bucket, key, options) : false
 26     end
 27     
 28     # Delete an object given the object's bucket and key.
 29     # No error will be raised if the object does not exist.
 30     def self.delete(bucket, key, options = {})
 31       S3Object.object_request(:delete, S3Object.url(bucket, key), options)
 32     end
 33     
 34     def delete
 35       S3Object.delete(@bucket, @key, @options)
 36     end
 37     
 38     def self.value(bucket, key, options = {})
 39       request = S3Object.object_request(:get, 
 40                                         S3Object.url(bucket, key), 
 41                                         options)      
 42       request.read
 43     end
 44     
 45     # Both metadata and value are loaded lazily if options[:lazy_load] 
 46     # is true.  This is used by Bucket.find so you don't make a request 
 47     # for every object in the bucket
 48     # The bucket can be either a bucket object or a string containing 
 49     # the bucket's name
 50     # The key is a string.
 51     def initialize(bucket, key, options = {})
 52       bucket = Bucket.find(bucket) unless bucket.respond_to?(:name)
 53       @bucket = bucket
 54       @key = key
 55       @options = options
 56       get_metadata unless options[:lazy_load]
 57     end  
 58     
 59     # bucket can be either a Bucket object or a string containing 
 60     # the bucket's name
 61     def self.url(bucket, key)
 62       bucket_name = bucket.respond_to?(:name) ? bucket.name : bucket
 63       File.join(bucket_name, key)
 64     end     
 65     
 66     def url
 67       S3Object.url(@bucket.name, @key)
 68     end 
 69     
 70     def metadata
 71       @metadata || get_metadata
 72     end
 73     
 74     def value(params = {})
 75       refresh if params[:refresh]
 76       @value || get_value
 77     end
 78     
 79     def value=(value)
 80       S3Object.create(@bucket, @key, value, @options)
 81       @value = value
 82       refresh_metadata
 83     end
 84     
 85     def refresh
 86       get_value
 87     end
 88     
 89     def refresh_metadata
 90       get_metadata
 91     end
 92     
 93     def refresh_acl
 94       get_acl
 95     end
 96     
 97     def content_type
 98       metadata["content-type"]
 99     end
100     
101     # strip off the leading and trailing double-quotes
102     def etag
103       metadata["etag"].sub(/\A\"/,'').sub(/\"\Z/, '')
104     end
105     
106     def length
107       metadata["content-length"].to_i
108     end
109     
110     def acl(params = {})
111       refresh_acl if params[:refresh]      
112       @acl || get_acl
113     end  
114     
115     private
116     
117     def self.object_request(verb, url, options = {})
118       begin
119         options.delete(:lazy_load)
120         response = S3Lib.request(verb, url, options)
121       rescue S3Lib::S3ResponseError => error
122         case error.amazon_error_type
123         when 'NoSuchBucket'
124           raise S3Lib::BucketNotFoundError.new(
125             "The bucket '#{bucket}' does not exist.", 
126             error.io, error.s3requester)
127         when 'NotSignedUp'
128           raise S3Lib::NotYourBucketError.new(
129             "The bucket '#{bucket}' is owned by somebody else", 
130             error.io, error.s3requester)
131         when 'AccessDenied'
132           raise S3Lib::NotYourBucketError.new(
133             "The bucket '#{bucket}' is owned by someone else.", 
134             error.io, error.s3requester)
135         when 'MissingContentLength': 
136           raise S3Lib::NoContentError.new(
137             "You must provide a value to put in the object.\nUsage: " + 
138             "S3Lib::S3Object.create(bucket, key, value, options)", 
139             error.io, error.s3requester)          
140         else # Re-raise the error if it's not one of the above
141           raise
142         end
143       end
144       response
145     end
146         
147     def get_metadata
148       request = S3Object.object_request(:head, url, @options)
149       @metadata = request.meta
150     end
151     
152     def get_value
153       request = S3Object.object_request(:get, url, @options)
154       @metadata = request.meta      
155       @value = request.read
156     end
157     
158     def get_acl
159       @acl = Acl.new(self)
160     end    
161     
162   end
163 
164 end

Discussion

There are a few things worth noting in this code. First, let’s take a closer look at the Acl#initialize method.

1     def initialize(parent_or_url)
2       if parent_or_url.respond_to?(:url)
3         @parent = parent_or_url
4         @url = @parent.url.sub(/\/\Z/,'') + '?acl'
5       else
6         @url = parent_or_url.sub(/\/\Z/,'').sub(/\?acl/, '') + '?acl'
7       end
8     end

notice that the Acl#initialize method doesn’t care what class the parent is. It just cares whether or not it has a url method. This is known, in Ruby circles, as duck typing: if it looks like a duck and acts like a duck, then treat it like a duck. You could, of course re-write the method as

1     def initialize(parent_or_url)
2       if parent_or_url.is_a?(S3Lib::Bucket) || parent_or_url.is_a?(S3Lib::S3Obje\
3 ct)
4         @parent = parent_or_url
5         @url = @parent.url.sub(/\/\Z/,'') + '?acl'
6       else
7         @url = parent_or_url.sub(/\/\Z/,'').sub(/\?acl/, '') + '?acl'
8       end
9     end

Either one works. The first method is more flexible in the event that Amazon creates another resource type that has an Acl.

So, that was one way that I made my code a bit more flexible. Now, let’s talk about a way that I could have made it a bit more flexible (but decided not to). Notice that the methods that I added to Bucket and S3Object are exactly the same. Instead of writing them directly in the class, I could have made a module with those classes in it and then mixed it in to the Bucket and S3Object classes. Something like this

Example 5.25. acl_access.rb <<(code/tieing_the_acl_class_to_a_bucket_or_object_recipe/acl_access.rb)

Then, in the Bucket and S3Object classes, I could delete the refresh_url, acl and get_acl methods and add the following line:

1 include S3Lib::AclAccess

My own personal rule of thumb is that I don’t bother extracting functionality the first time I repeat something, so I didn’t bother doing this when I wrote the original code.

Up next

Appendix A. A Short Introduction to Ruby