Faster Websites

Following on from last week I've been working on more ways to make websites faster and save precious bandwidth.

gzip

I've enabled gzip compression in apache. I've set it to compress text based files like Javascript, CSS and HTML. There is a trade-off between cpu and bandwidth, but so far it seems worth it. Most files compress to around 30% of their original size. It is not worth the server load to compress images as jpegs/ gifs and pngs are already a compressed format.

I've also enabled gzip compression in PHP. Because of shiftlib this was easy to add to multiple sites at the same time.

<? ob_start("ob_gzhandler");  ?>

To enable gzip in apache you need to do the following:

# nano /etc/httpd/conf.d/mod_deflate.conf

<IfModule mod_deflate.c>
Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress images
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary

# or pdfs
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary

# or binary archives
SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar|iso|dia)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary

<IfModule mod_headers.c>
#properly handle requests coming from behind proxies
Header append Vary User-Agent env=!dont-vary
</IfModule>

</IfModule>

Then restart apache:

#service apached restart

Expiration headers

I've set apache to send 30 day cache headers with images/ css/ and javascript. This should mean that these files are cached on the users browser so that they don't have to download them on every single page load. This could result in a significant saving in bandwidth as images especially account for a big proportion for a web page.

To enable expiration headers in apache - you need to do the following:

#nano /etc/httpd/conf.d/mod_expires.conf

<IfModule mod_expires.c>
ExpiresActive on
ExpiresByType image/jpg "access 1 month"
ExpiresByType image/gif "access 1 month"
ExpiresByType image/png "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType text/html "access 1 day"
ExpiresDefault "access 2 days"
</IfModule>

Then restart apache:

#service apached restart

Caching thumbnails

Up until now my main PHP thumbnail script has generated images on the fly. PHP is so fast that it doesn't take very long even for a page full of images. However when you have lots of visitors all those clock cycles add up and there are significant savings to be made. My thumbnail script works by pointing the image source to a php script - and passing in the filename as a variable, e.g

<img src="thumb.php?f=example.jpg">

There is a big problem with this. Because it's pointing to a php script the cache headers won't work. You could use the php header function to send the correct cache headers but I still suspect that it won't cache as well as a regular image. There is another problem - search engines won't know that it's an image - especially if it's in a link. They would just think it's a regular php page. So they will go ahead and pull down a bunch of images without needing to.

So my new approach is a little different. I have a function that checks if there is a cached thumbnail before creating one if necessary. It then returns the image tag of the resulting thumnail. e.g.

<? image('example.jpg'); ?>

outputs:

<img src="uploads/cache/example.jpg">

Search engines

I checked the web-stats of some of the busier websites that I host. There was a significant chunk of bandwidth being used by search engines - around 10%. Search engines will index an entire site several times over a month - but they shouldn't have to pull down every single image. This may be related to the thumbnail issue outlined above but I also have taken another precaution. I've created a robots.txt file and am now preventing access to uploads folders and thumbnail scripts and anything else that a search engine doesn't need. I may need to add an exception for google image search, but for the time being I will monitor it and see how it goes.



18/07/2010 permalink | Posted in web development | 0 Comments »

Leave a reply

Name
Email (not published)
Website


Bookmark and Share

About me

Adam Jimenez is a freelance web developer who has been professionally developing websites since 2000.

Find me


Projects


Archive


Email updates

Email
Email Marketing by ShiftMail