Faster Websites

Following on from last week I've been working on more ways to make websites faster and save precious bandwidth.

gzip

I've enabled gzip compression in apache. I've set it to compress text based files like Javascript, CSS and HTML. There is a trade-off between cpu and bandwidth, but so far it seems worth it. Most files compress to around 30% of their original size. It is not worth the server load to compress images as jpegs/ gifs and pngs are already a compressed format.

I've also enabled gzip compression in PHP. Because of shiftlib this was easy to add to multiple sites at the same time.

To enable gzip in apache you need to do the following:

# nano /etc/httpd/conf.d/mod_deflate.conf

<IfModule mod_deflate.c>
Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress images
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary

# or pdfs
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary

# or binary archives
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar|iso|dia)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

<IfModule mod_headers.c>
#properly handle requests coming from behind proxies
Header append Vary User-Agent env=!dont-vary
</IfModule>

</IfModule>

Then restart apache:

#service apached restart

Expiration headers

I've set apache to send 30 day cache headers with images/ css/ and javascript. This should mean that these files are cached on the users browser so that they don't have to download them on every single page load. This could result in a significant saving in bandwidth as images especially account for a big proportion for a web page.

To enable expiration headers in apache - you need to do the following:

#nano /etc/httpd/conf.d/mod_expires.conf

<IfModule mod_expires.c>
ExpiresActive on
ExpiresByType image/jpg "access 1 month"
ExpiresByType image/gif "access 1 month"
ExpiresByType image/png "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType text/html "access 1 day"
ExpiresDefault "access 2 days"
</IfModule>

Then restart apache:

#service apached restart

Caching thumbnails

Up until now my main PHP thumbnail script has generated images on the fly. PHP is so fast that it doesn't take very long even for a page full of images. However when you have lots of visitors all those clock cycles add up and there are significant savings to be made. My thumbnail script works by pointing the image source to a php script - and passing in the filename as a variable, e.g

There is a big problem with this. Because it's pointing to a php script the cache headers won't work. You could use the php header function to send the correct cache headers but I still suspect that it won't cache as well as a regular image. There is another problem - search engines won't know that it's an image - especially if it's in a link. They would just think it's a regular php page. So they will go ahead and pull down a bunch of images without needing to.

So my new approach is a little different. I have a function that checks if there is a cached thumbnail before creating one if necessary. It then returns the image tag of the resulting thumnail. e.g.

outputs:

Search engines

I checked the web-stats of some of the busier websites that I host. There was a significant chunk of bandwidth being used by search engines - around 10%. Search engines will index an entire site several times over a month - but they shouldn't have to pull down every single image. This may be related to the thumbnail issue outlined above but I also have taken another precaution. I've created a robots.txt file and am now preventing access to uploads folders and thumbnail scripts and anything else that a search engine doesn't need. I may need to add an exception for google image search, but for the time being I will monitor it and see how it goes.



18/07/2010 permalink | Posted in web development | 0 Comments »

Faster AJAX Libraries

Server bandwidth has been a tad high recently - so I decided to profile some of the busiest sites using the excellent resource measuring tools in Google Chrome. The most obvious culprit on the sites I looked at was AJAX libraries like Prototype, jQuery and extJS. I needed to speed this up, now I already know about Google AJAX libraries API - so that seemed like a good place to start.

Google AJAX libraries API provides an interface to load remote AJAX libraries using Google bandwidth. Google automatically handles caching and minifying to make the files load as quickly as possible. This is great but there are a few limitations.

You have to declare which version of the library to load. I manage a lot of sites and generally I just want the most efficient, up-to-date version without having to update many different websites.

So I've added a PHP function to ShiftLib called load_js(). You can send this function an array of libraries and it will load the most up-to-date versions using Google AJAX Libraries.

I've also added support for lightbox and extJS - which are not supported by Google AJAX Libraries. My function will also detect if the page is running under SSL and subsequently load all the scripts by HTTPS if it is. It loads all the scripts in the correct order to avoid conflicts and works out dependencies - e.g. lightbox requires prototype to work.

So instead of having a block of code like this:


I now have:

Much nicer.



12/07/2010 permalink | Posted in web development | 8 Comments »

Tips for optimising MySQL queries

Database queries can very often be the bottle-neck that slows down a webpage. If you have a busy server with lots of sites or lots of webpages it's not always apparent which database queries are causing the problems.

Logging slow queries

A good place to start is by logging slow queries. You can do this by adding the following lines to your MySQL config file (usually /etc/my.cnf)

 set-variable=long_query_time=1
log_slow_queries = /var/log/mysql/mysql-slow.log

 Create the log file and give it write access:

touch /var/log/mysql/mysql-slow.log
chmod 777 /var/log/mysql/mysql-slow.log

You will then need to restart mysql:

service mysqld restart

After a while the log file should start to fill up with a list of queries that have taken longer than 1 second.
Not all of these queries will be poorly optimised - it could be that some queries can't be further optimised or that they took a long time because the server was very busy. Even so it is a good indicator and you can look through the list of queries and start optimising them.

Indexes

Adding indexes can have a dramatic affect on speed. You should look at the table joins and WHERE conditions of the query to see where indexes could be used.

Take these examples.

SELECT * FROM users WHERE name ='Joe'

Adding an index to the "name" column will significantly improve this query.

SELECT * FROM users WHERE name ='Joe' AND surname='Schmoe'

Adding one combined index for "name" and "surname" will help here.

SELECT * FROM users WHERE email='joe.schmoe@gmail.com'

In this case I would use a unique index for the email column. Unique means that the email address can only appear once in the table. This is much faster than a regular index and also enforces the database integrity.

Note that indexes take up space and slow down insert/ updates - but this is generally a small price to pay for much faster select statements.

Correct field types

It's worth looking over the database structure and seeing if a TINYINT could be used instead of an INT. Or an ENUM instead of a VARCHAR. Also check the size of the VARCHAR columns and adjust accordingly. Check if INT fields should be UNSIGNED. An UNSIGNED INT can only be positive (>=0). Unsigned INTs should always be used for ID columns - and again this helps with your database integrity. You may need to run "OPTIMIZE table" before you see any benefit. You probably won't see a drastic improvement - but these little changes all add up.

Other tips

Joins are an expensive operation and should only be used when absolutely necessary. Check the fields you are selecting and your WHERE conditions to see if you really need each join.

 Only select what you need to, so if you only need to fetch an ID instead of doing:

SELECT * FROM table WHERE name='Joe'

do:

SELECT id FROM table WHERE name='Joe'

 And you can use limit to prevent searching of the entire table.

SELECT id FROM table WHERE name='Joe' LIMIT 1

Try not to include MySQL queries in PHP loops - you might be able to get the same outcome using a join or a sub-select at a fraction of the time.

This should be enough to get you started. For even better performance you could try tweaking MySQL itself or upgrading your hardware.



07/07/2010 permalink | Posted in web development | 13 Comments »


Bookmark and Share

About me

Adam Jimenez is a freelance web developer who has been professionally developing websites since 2000.

Find me


Projects


Archive