Django : Optimizations within the platform
In my experience with both rails and django, i would have to admit that a lot of things need to be improved at the core of these platforms so that developers can truly deploy a really fast production site. Let talk about what we did at kwippy to make it that much more faster than the default Django setup.
- Use memcached properly : The trick in getting speed is to cache all logged out pages and heavy caching of the user objects when logged in. For example we recently got our sessions into the memcached cloud to see a good speed jump.
- Database structuring : Django does a lot of joins and magic in it ORM, and has a built in caching layer. But the problem is that we developers structure our tables to our percieved needs … not necessarily for the way the ORM works. One way is to start writing custom SQL inside your views, another is to understand the ORM better. Your choice
. - Database connection pooling : It is pretty shocking to realize that Django does not do connection pooling, I have used DButils connection pooling for our needs. But IMHO it should be a default thing inside the application platform.
- SMTP is slow : Imagine the user filling up a form which is emailed to you, and your SMTP is down. There is a good chance that you will lose that data and the application will give a 500
. To alleviate that i have created a command queue where emails are not sent within the application and a daemon is doing all the dirty work. I will put it all out in the market
, so that you guys can make you site that much more faster/robust. - Pagination : To be honest, i still do not like django pagination. The problem was that it will getting all the objects and as the database is on a separate machine a lot of data was being transferred over the network. So i created my custom pagination which was faster than ObjectPaginator or Paginator.
Well there are all i could think of right now. Obviously there are some other optimizations that i keep doing …. and will keep writing about. Till the next time, feel free to contact me about these optimizations at me@dipankar.name.
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.
custom SQL!! ew… why framework then
Hello,
Nice post.
Just one comment on the Pagination, ObjectPaginator exists for backwards compatibility and Paginator uses the existing queryset methods when building a page object so it shouldn’t be transfering all objects. Paginator uses slices and the QuerySet __getitem__ generates LIMIT SQL clauses.
Hmm, when i last was checking objectpaginator , it was transferring all the stuff … guess they have modifies paginator. In the recent upgrade of django for my site, i was lazy not to use paginator he he ….
Although i appreciate DJangos focus on simplicity, I would also second Dipankars point here.
For any web application to be scalable the DB interface or the ORM has to be strong and simple.
I cannot resist comparisions with Hibernate – which is a good ORM with easily adaptable, testable features which is something Hibernates ORM could possibly benchmark itself against.
As far as pagination is concerned also I would raise a small point. Tapestry (another Java technology) has really good features in this sense. Another possible benchmarking tool.
@rohitj
at least the database connection is initialized before hand ha ha …. truth is most ORMs are ok at basic stuff, anything non-trivial , bye bye ORM and in comes the SQL
I’m setting up an enterprise webhosting env for my company..
Well my team usually works with j2ee stuff (for scalability & slowness ;-P ), but theres a small pool of webservers that still run on perl, cgi since like 2001… now we are thinking of offering python hosting..upgrading the OS & apache from 1.3 to 2.2X, & start using ldap authentication etc..
I would love to get inputs from u…
Good advice on SMTP, connection pooling, and memcached sessions.
It would be nice if you would edit the part about Paginator, since it is wrong and may mislead newer users.
I hear people happily use pgpool for postgres pooling:
http://pgpool.projects.postgresql.org/
I also heard people pooh-pooh SQL Relay, but no details were forthcoming.
about Database structuring, I see that a major improve on queries is to change the proper field type in the bd engine instead using the Django choices.
For example in Mysql, in an status field that use choices. Faster queries are made if in the the bd change the varchar(140) type for ENUM(‘publish’, ’spam’, ‘deleted’), even greater optimizations if you use ENUM or SET with an Integer Field, that’s also valid to use Small Integer type fields.
You can see if you describe and analyze your table, that a lot of optimizations can be made. I recommend for mysql the describe and analize table action in phpmydamin, to know what would be this optimizations.
Another great optimization, in python code. Is using the mx.DateTime module instead datetime module that is slower and consume a lot of ram on complex date comparing and calculus
Regards