I found a new Linux distro last night, and I am really impressed so far - it’s Linux Mint, and I must say, it’s the most polished, pretty distro that I have ever seen. It’s based on Ubuntu, so it’s pretty much rock solid under the hood as well.
I’d say, if you get a chance - check it out. It’s well worth it. It installed with out a problem on VMWare Fusion as well, and it’s quite snappy.
The install was the super simple ubuntu experience that I have come to expect. Not much different there. It comes with an impressive applications stack set up - so more stuff should “just work” then most linux geeks are used to. Then even include the Ndiswrapper application by default - so if there is not a native linux driver for your wireless card, you should be able to just install the windows version and press on.
All of that can be done on Ubuntu however, so it’s nothing special. It’s nice to have it all set up already. Where Linux Mint shines however is in the time they have taken to polish the look and feel. It’s simply gorgeous out of the box.
They have done a great job of integrating things like slab to provide a excellent experience - you can tell that they have some UI folks working with them.
They have a nice icon set
and their finder is laid out well.
All in all, they simply took away the first task that I had when building a new linux workstation - futzing with the themes until I get one that I like. Well done, Mint!
One of the questions that I am asked often is “how do I scale my web application?” or “we’re expecting 200 times the traffic tomorrow, what do I do?”. The second question is usually too late.
The first however, when asked at an appropriate time, can be what saves you from asking the second. A good knowledge of best practices to start with, and then a rigorous testing and introspection process will do more to inform you “when” then any thing else.
Cache, cache, cache…
You need to cache everything you can. Memcache, disk cache, what ever you can cache should be.
Find your Hotspots!
The first step in any scaling action is finding out what part of your application needs to scale. If you are having issues with your database, and you build a spiffy disk sharding scheme, you just fixed a problem that does not exist. Doing the proper discovery will allow you to allocate your efforts for best effect.
Basically, I subscribe to the “least effort” camp. Do the least that you can, as long as you are 1) fixing the problem and 2) giving yourself breathing room to start on the next problem before it becomes a serious issue. This will allow you to still work on the other parts of the app, rolling out new features and creating new hotspots to find.
Finding your hotspots is very important in this process. Usually, hotspots are where the flames start. Knowing your points of pain allow you to triage correctly, and to know how to best spend your developers time.
One of the things to keep in mind is that this process is ongoing. You might be optimizing disk reads and writes one week, and be neck deep in SQL the next. When you clear out one hotspot generally another one will crop up to take it’s place.
If you have a proper staging setup, you can build estimates against generated traffic. This can give you a (blurry) view into what the next hotspot might be. The best process that I have seen is to capture a hour or so worth of traffic on the live site, and replay it 2, three or more times faster against the staging environment. You want the traffic to be as real as possible.
Using introspection tools on your production environment is invaluable. You want these tools to be as real time as possible.
When to go deep, and when to go long..
After you have killed all of the hotspots you can, and added all the resources you can afford, it’s time to look at the next level. Usually this is when when start to see people thinking about sharding of some manner. There are three major types of sharding at the moment - Filesystem, Database and Application. I will touch on each of these topics, starting with the highest level, and hardest.
Application sharding is the most extreme, provides the most benefit and is the hardest to accomplish. If you can split your users amongst several vertical groups, you can basically install copies of the application for each segment.
You can also look at abstracting any shared logic into a back end API driven application. The rule of thumb there is to have each back end application do one thing, and do one thing very very quickly.
For example, if you can segment your user base into three groups who do not really interact, you can simply provision 3 environments and install 3 separate copies of the application. An example of this might be a site hosting application. Each site hosted will not need much (if any) interaction with the other sites hosted. This is by far the easiest method of sharding your application.
You can also look at this from a business logic view. If you can cut your application into portions (say, photos, chat and games for a social site) you can create smaller applications to handle authentication, user information storage, photos, chat and games. You would have the photos, chat and games applications leverage the back end authentication and user information applications to read and write shared information.
This gives us several advantages. For the back end application remove all unneeded code (i.e., if you are not going to need to use views, then remove ActionView), plugins and gems. Keep the app as light as possible. Optimize the DB access and make use of memcached as much as possible. Host each of the application shards on dedicated resources (i.e., their own Database’s and Compute).
One of the large advantages of this approach is that you can start to optimize your hardware spend. If your chat application is 1/2 as intensive as your photo and games applications, it’s far easier to assign resources in a targeted fashion and maximize returns. In a monolithic application, if the photo application breaks, or needs more resources the entire stack is effected. With sharding, you get some buffering from some site wide issues, and the ability to assign hardware exactly where it’s needed. The big drawback is that it’s not easy.
Another step that can be looked at in certain circumstances. If the amount of data you need to process is so huge, or the number of transactions is really large, you can look into database sharding. Basically, you take your database and break the schema up amongst several DB’s. There are tools in most major RDBMS’s which will allow you to take care of this. Informing the application where the data is might be complex depending on which RDBMS you use.
Look for a in-depth treatment of this subject in a later article.
If your application is filesystem IOPS heavy, filesystem sharding might be the route that you want to look at. Basically you add more hardware disk arrays, and split the reads and writes between them. You need to inject some logic into the save and open functions in your application so that it knows which filesystem each file is to be saved to and opened from. Usually you create a hash of the filename, and key off the first couple of characters in the hash. You can read more on this technique here.
It’s been a while since I had to get my fingers dirty in code, and as such I am a little rusty. There are also some areas of Rails that I never took the time to learn. Luckly I found that the community has grown, and there are some great resources out there for the aspiring coder.
One of the first places that I go when I need some help is GotAPI. It’s a one stop shop for all of your documentation needs. It even includes HTML and CSS docs from w3schools, probably once of the best HTML/CSS resources out there. RailsGuides is another great resource for helping understanding the nuances of Rails.
I have also stumbled across RailsBridge, a group of teachers and coders trying to help folks out. They have a few great tools and projects to help bootstrap people into rails development. For example, they offer online mentoring by a group of volunteers. The project is at an early stage, but they have some great ideas!
IRC is good resource. Who would have thought it, back in 1988 when it was first dropped into production, that 20 years later it would still be used? I personally hang out on freenode as it seems to be the most popular. There are a bunch of channels, and one of the issues that can easily happen is overload via signal to noise ratios. However if you pick your channels carefully you’ll be fine.
Another way of sharpening your chops is to contribute to a OSS or other project. RailsBridge has a few projects for needy organizations under their Builder program. Also projects like RailFrog or Mephisto are good choices if you want to lend a hand.
All of these just scratch the surface, but they are a great beginning. I think that the greatest assets a coder can have is a humble nature and a thirst for knowledge. Heck, if you don’t look at your old code and think “what the hell was I thinking??” then you are doing something wrong.
Um - ok. not better then ever. This blog has had quite the interesting (in the chinese meaning) life. It started out years and years ago as a blogspot site, and soon was moved to WordPress. It then almost migrated to several other platforms, but ended up on Mephisto. Well, we’re back on Wordpress for the time being. Will we stay? Probably not, even tho I like all of the cool toys baked in.