Monday, April 5, 2010

S3 vs DB Data Validations

Hi,
Relational databases have had decades of feature enhancements and maturing best practices to handle the thorny data integrity problems that arise in enterprise software construction. Cloud storage like Amazon S3 brings some very cool capabilities but hasn't had time to develop secondary features such as schema validation. The scenarios below occurred on a real, client project while storing user preference information.

1. Your program posts a JSON structure to S3 but a required field such as zip code (e.g.) is missing and other apps require it. Databases handle this very simply but there are no "not null" constraints in S3.

2. Your program posts a JSON structure that references a key from another data store such as user_id but that User doesn't exist or is inactive. Databases provide foreign key constraint or arc-constraints to deal with this scenario.

3. Your application posts a JSON structure and then immediately does an HTTP GET on the new resource.  It's possible the data hasn't replicated across S3 nodes yet (eventual consistency) and you'll actually get a 404 or an older version of the data.  Amazon may be implementing a flag where you can indicate "block-until-consistent" but that is not currently available.

There are other shortcomings such as session based commit and rollback with session only visibility until the data is committed. I understand that the purpose of RESTful "bucket" storage isn't to re-implement the database, that they're solving different problems and that it is up to the architect to select the proper persistence mechanism based on specific requirements. However, I predict that standards will emerge allowing XSD style validity enforcement. If these exist already, great! Please post a comment with links :-)

Code well,
Ben

Inevitable First Post

This blog will capture advanced technology related musings, prototypes and research. The definition of advanced tech will change over time but currently consists of  cloud computing, mobile application development, distributed computing (e.g. MapReduce) and rapid web application development (e.g. Grails).