We built a new content management system which will be released as open source. The CMS is loosely based on Semantic Web principles and builds upon our experience building web sites, content management systems and programming in general. We call it Zotonic.
Zotonic enables hosting very popular sites on simple hardware.
I will write a couple of articles highlighting aspects of Zotonic. Today I focus on the reason we built a new CMS and on performance aspects. In other articles I will highlight the general design of Zotonic, the template system, modules, screen components, form handling, and adding logic (actions) to the templates.
Yet another CMS? Why? Because the current crop of open source content management systems don't do what we want them to do. What do we want to do? We have a simple list of requirements:
Zotonic would not be possible without the contributions of many open source projects:
Zotonic is written in the programming language Erlang. In the next section I explain why we use this programming language.
Once upon a time a website was a collection of manually edited HTML pages and some images. All was fine.
Then search was added to the site, necessitating extra programs running on the server on behalf of a web site visitor.
Then those pages started to be filled with content from a database, nothing changed much except that editing and searching got a lot easier.
And then we started to let the visitor react, interact and upload information to the server. Not only the visitor was allowed to interact, also machines got their own interface to talk to the server. Web 2.0 with APIs and user generated content took off.
Nowadays a web site is talking in many different ways to humans and machines. It is collecting information from multiple sources and actively submitting new articles to other sites or machines. The web server gives the visitor more and more ways to access the stored information. People are using RSS readers, widgets, desktop applications, other web sites and so on.
The web server evolved from a simple machine serving some pages to a complex switchboard connected to many different servers and services. The web server constantly sifts through information and storing it on behalf of its users. It constantly generates different representations of its own stored information, let is be as HTML pages, RDF documents, Atom entries, XMPP messages or API result sets.
This new web server is an always on machine. Not a simple server reacting to requests from a web browser.
This new web server is way more dynamic than yesterdays database based HTML generator. It needs to stay connected with other servers, it needs to have all those connections open and active at all times. It even needs to push information to the web browsers of visitors.
This new web server needs a different approach and programming model which becomes more and more like a information switchboard with thousands of open connections to other machines and web browsers.
Erlang was created to program telephone switches. Its programming model and features are a natural match to this new kind of web/information server. Besides its natural match to the problem domain it was also created to make very robust systems that are always on. No need to reboot an Erlang system for a program update.
That is why we choose Erlang. To create a robust web server that can be like a spider in the web of information. And one that is still easy to program and extend.
The current generation of computers are so fast that we can play multi user video games on a single machine, whilst surfing the web, checking our mail and what not. Why should we be happy with generating a couple of web pages per second from a machine that is so capable? How green is that?
We have quite a lot of experience building content management systems and web sites in general. What did we learn? For one thing: when you want high performance and flexibility and less machines, then you are in a lot of trouble.
Our usage scenario is:
So we are not targeting the new Flickr or similar sites. They need custom built software, not an off the shelf CMS. We are targeting the other 99 or so percent of the web sites.
PHP is the most popular language for building web sites.
For every request PHP wakes up and starts finding out where he is, where the code is, what the code is, where the database is, etc. etc. It does this again, and again, and again. Even when nothing is changed. This share nothing approach is nice when handling a single request, or when scaling out (adding more machines). It is not nice when you want to handle lots of requests from a single machine.
Other scaling problems arise with Python (Django) and Ruby (on Rails). Typically they are single threaded, that is they use only one thread of those 32 or more hardware threads that are available on modern hardware. That is solved by having more independent processes, each handling requests one by one. Those processes need to communicate with each other, adding more processes and complexity for caching systems etc.
Thanks to Erlang, Zotonic is completely multi threaded and scales linearly when adding more cores. All information about the web site is kept in memory and easily shared between requests. When two requests need the same html fragment then that fragment can be only rendered once. Note that this is not caching, which Zotonic handles as well, but a simple prevention for doing the same thing twice at the same time.
Zotonic typically serves 100 to 500 dynamic pages per second on normal of the shelf hardware. Which is enough for the fast majority of web sites.
Zotonic can serve a complete page before a Joomla/Drupal/WordPress site has even loaded its PHP code. With a Zotonic server you can serve either many more sites, or many more visitors per site.
Later more about the architecture of Zotonic and how to use Zotonic.