Tuesday, 8 September 2009

Django and Google App Engine

Reading Jacob Kaplan-Moss's post Snakes on the Web my first thought was - why hasn't he mentioned Google App Engine? To be fair to Jacob, the GAE sdk when launched was pretty underwhelming but GAE is the ugly duckling that could very well grow into a swan. If Django is for 'perfectionists with deadlines' then GAE might be said to be for 'perfectionists with aspirations'.

The growth of Django compared with say Zope I think is in part due to it's dependence on relational databases and their ubiquity in business. Google is such a disruptive force that Bigtable could well become the next data foundation upon which enterprises are built in the same way DB2 and Oracle were for the relational era. Large enterprises are risk averse with data, but Google is a big enough vendor to threaten the status quo.

Much as Django tries to hide the relational model from the developer, the further we get down the development path with future features like multiple database support and recent additions such as aggregation the more it becomes apparent that we're exposing more problems in the ORM pattern than we're reducing.

Bigtable makes me think about the financials product we use (Microsoft Dynamics Nav - formerly Navision). Upgrading from the native Navision database to MS SQL Server required at least twice the hardware capacity and actually reduced performance. Why? well although the native database didn't scale, it had aspects that remind me of BigTable. It had an index called a Sum Index Flow Field (SIFT). This index was dimensioned by time and other optional dimensions.  It took ages to build the index, but filtering by date range and dimension a sum value could be computed with just two records at either end of the date range. Contrast this to the MS SQL relational approach which required an index on the primary key, dimension and value and needed to add up ALL the relevent records each time it was accessed.

While prototyping a Django 1.1 business intelligence app for internal use I was starting to realise why I want just want queues to be a standard feature and why I actually don't want to use aggregations no matter how tempting they seem. Stuffing data into memcache like a christmas turkey is not as smart as I thought it was either, and no matter how great South is, data migrations are just not exciting.

Now that it has Task Queues, and other parallel computing goodies it seems like there is even less reason not to try Google App Engine. Ironically the only thing that is holding me back is that I need to downgrade my Snow Leopard python interpreter to 2.5 after being delighted at seeing 2.6 installed as the default.

5 comments:

morais said...

Hi,

Don't downgrade Python, python2.5 should still be available on Snow Leopard.
Open the GoogleAppEngineLauncher preferences and specify Python path as "/usr/bin/python2.5".

Regards,
Pedro Morais

zgoda said...

AppEngine requires so much hand-crafted messy code to be comparable to "traditional" hosting in terms of features and performance, it has no chance to fit "deadlines", which are crucial to Django success. I'd even say it's for "wannabe-perfectionists with aspirations but without deadlines". Too much is out of your control, too much depends on Google (which is failing more often these times), this much is too much. You can not pretend you're a perfectionist if you host your application on a service that does not have any sort of SLA and the only description of system failure you get is "sorry, bro, we screwed".

Malcolm Tredinnick said...

There's a false assumption in this pice. Django doesn't try to hide the relational nature of the data store all. It exploits it whilst providing a natural Python-like way to express things (language expression shift, not paradigm shift). Part of the problem with replaceable data stores is determining exactly how much of that useful functionality is worth sacrificing as part of the trade-off for the differences in the various non-relational data stores. It becomes a bit of a race for the bottom in terms of finding some lowest common denonominator if you are only interested in Django running the same on all data stores. Might as well just focus on getting it working with CSV files and then everything else is just a slight enhancement. That would be a bad plan, but people sometimes miss that that's the obvious generalisation of the "just make this change for my particular preference of somewhat difference data storage model."

Brett said...

@Malcolm you're right. I have time based data on the brain but it most cases GAE would be like trying to drive to the shops in a dragster that only wants to go straight. I guess I just want multidb bigtable.models in tandem with db.models without the app engine if I'm really honest about it.

Ash said...

@zgoda
Messy code depends on how do we write it.Yes that I agree it sometimes becomes very difficult to attain an objective but there is always an alternative to an approach though it may take a longer time.

Hedged Down