Using .htaccess

The .htaccess file can be used on Apache servers running Linux or Unix to increase your web site security, and customize the way your web site behaves. The main uses of the .htaccess files are to redirect visitors to custom error pages, stop directory listings, ban robots gathering email addresses for spam, ban visitors from certain countries and IP addresses, block visitors from bad referring (warez) sites, protect your web site from hot linking images and bandwidth theft, redirect visitors from a requested page to a new web page, and to password protect directories. Use the information in this article as a starting point to optimize and protect your web site.

Every file request on an Apache hosted web site using Unix or Linux requires all .htaccess files in the file path to be read. The limited server resources must be shared by all hosted web sites. Depending on what some .htaccess files do, they can slow down or compromise a server. To ensure all web sites have security and fast response times to process, display, and download files, some web hosts may limit the use of .htaccess. Some web hosts will offer .htaccess password protection but not let web sites use other .htaccess commands. Some web hosts, like www.dewahost.com and www.pair.com, offer quick setup of .htaccess files through their control panels. Some web hosts may have a script you can use to set up user passwords. Other web hosts leave you on your own. You should make sure that you are allowed to use .htaccess, and any limitations, before you actually start using it.

Most hosting companies host multiple domains on a server. Your web host uses global configuration files that serve as default settings for all sites hosted on the server. Each web site is usually allowed to use.htaccess files to override these global settings. When you place a .htaccess file in your domains root directory (same place as your home page), all commands (directives) in your .htaccess file will affect your entire web site, and override any pre-configured server global settings. If you do not have a .htaccess file in your root directory, your web site is using the preconfigured global settings set up by your web host. Other .htaccess files in subdirectories may change or nullify the effects of those in parent directories.

A typical web site using .htaccess usually has a couple of .htaccess files. The first .htaccess file in the root directory affects all files in the root directory and all directories below the root directory (subdirectories). If you have a directory of images that should only be displayed on your site, you may want to protect those images from "hot linking" by placing a .htaccess file in that directory. If you have a web site subdirectory that you want to password protect, (e.g., a members only directory), you would place another .htaccess file in the directory you are protecting. That .htaccess file would protect that directory and all subdirectories of that directory.

Creating the .htaccess File

The .htaccess file is a simple ASCII text file that you can create and modify using Notepad or any text editor. The .htaccess filename is always entered in lowercase letters. Your .htaccess file will contain a list of .htaccess commands (directives), with one command per line. There must not be any spaces or special characters after any of the lines..htaccess commands are also case-sensitive. Always make sure your .htaccess file has a line return after the last line or it won't work. If your text editor uses word-wrap, make sure word-wrap is disabled.

The filename for the .htaccess file begins with a . (dot). The file does not have anything before the dot. The file actually has no name, only an 8 letter file extension. Because the filename begins with a .(dot), it remains hidden on many Operating Systems. Because you may have trouble finding or working with hidden files, create your .htaccess file but name it something else (e.g. htaccess.txt), and then upload that file to the server in ASCII mode (not binary). Once you have uploaded the file, you can rename it to .htaccess using an FTP program. Make sure that the file permissions for .htaccess files are set to 644 (rw-r--r--). This setting allows the server to access the .htaccess files, but prevents visitors from accessing the files through their web browser, which is a security risk. The permission can be set with an administrative control panel or the CHMOD command.

Three .htaccess Warnings

If you are new to using an .htaccess file, I highly recommend experimenting with a .htaccess file in a subdirectory to avoid causing site wide problems. Always make a backup of your working .htaccess file before you change anything. It actually makes sense to backup your entire web site prior to editing your .htaccess files. If you do happen to load a bad .htaccess file, you can load your backup file(s) to keep your web site available while you look for the source of the problem(s).

If you store backups of your CGI scripts on your own computer or a different server, the permissions may be reset as a security measure by the server when you reload them. For this reason, you should also keep a record of the permissions required by each script so you can quickly reset permissions if you are forced to reload your entire web site.

Web sites using the Microsoft FrontPage Extensions use a custom .htaccess file in the root folder for its own purposes. Microsoft warns users that any changes to their file may corrupt the extensions and render your web site unreachable. If you decide to edit the Microsoft custom .htaccess file, make sure you have a backup, proceed with caution, and add your commands to the beginning of the file. The original commands Microsoft placed in the .htaccess file may conflict with the commands you add to the file.

In addition, do not get carried away and end up creating excessively big .htaccess files. Subdirectories inherit the .htaccess commands from parent directories, so there is no need to have redundant commands. The .htaccess file is processed by the web server for each file request at your web site. Excessively large .htaccess files can slow your server's performance. Keep your .htaccess file organized and use comments (# lines). Snoops, thieves, spammers, and web robots will also continue to search for new ways to get past whatever privacy and blocking methods a web site may use.

Custom Error Pages

Custom error pages allow you to have personal error pages instead of displaying your web host's generic error pages. Personal error pages make your site look more professional and user-friendly, by letting error pages have the same layout as the rest of your site, and allow you to create PHP scripts to send email notification of the errors to you. Custom error pages also help direct visitors to any page you wish if the visitor mistypes a web address on your web site, your web site has broken links, or you moved a page to a new location.

The five most common error codes returned by a web server are:

At a minimum, you should have a custom error page for 404 errors. Obviously, if you do not have a password protected area of your site, you do not need a 401 error page.

First decide where you will place your error pages. The directory containing the error pages must be accessible to everyone. Your error pages can be placed any where on your web site, as long as they have a valid URL address. I prefer to keep my custom error pages in a separate subdirectory called errors. This eliminates clutter in the root directory, and requires only one line in the robots.txt file informing spiders not to index these error pages.

Then you need to create your error pages. You can name your error pages anything you want. Links to images, style sheets, and other web pages contained in your custom error pages must be specified using URLs that are either absolute (e.g., starting with "http://") or relative to the document root (starting with "/"). This ensures visitors don't get a broken link when the error page is invoked.

If your error pages are less than 512 bites in size, Internet Explorer displays its own error page suggesting the visitor use MSN search to "look for information on the Internet." Your goal should be to keep your visitors on your web site. I would suggest including a link to your home page, and a link to your web site map. You should also include the META tag <meta name="robots" content="noindex,nofollow"> to your HTML error pages. You could also include your navigation menu. Once you are though designing your custom error pages, upload the finished HTML files to your web server.

Each customized error page requires a separate command on each line of your .htaccess file. To create an error handler for each of the five error codes listed above, you would add the following command lines to your .htaccess file. The initial slash indicates the root directory of your site (where your home page is located). Command lines are also case sensitive.

ErrorDocument 400 /errors/badrequest.htm
ErrorDocument 401 /errors/authreqd.htm
ErrorDocument 403 /errors/forbidden.htm
ErrorDocument 404 /errors/notfound.htm
ErrorDocument 500 /errors/intserver.htm

This .htaccess file should be located in your top-most directory. Then all errors that occur in that directory and all its subdirectories will be redirected to your custom pages. Test it out to make sure it works.

Prevent Directory Listings

Most web servers are configured to automatically find an index file in every web site directory. Your web host may use a global configuration setting that allows the listing of all files in all directories. If you type the URL http://www.wiscocomputing.com, the page that is actually displayed is http://www.wiscocomputing.com/index.htm. If your web site has a cgi-bin directory, and your web host has directory browsing enabled, you have a potential security nightmare. If a hacker enters http://www.yourdomain.com/cgi-bin/ , and the web server can not find an index page in the cgi-bin directory, all of your executable cgi scripts will be listed. If you have a folder of images in a directory that doesn't have a default file (index.htm), it could also be browsed. Placing an index.htm file in all subdirectories that you want to protect from these hackers and snoops is one way to stop directory listings. Adding the following command to your .htaccess file located in your root directory is another way to prevent directory browsing:

IndexIgnore *

Blocking Visitors by IP Addresses and Domain Names

You can decide to ban certain IP addresses and domains from accessing your entire web site(s) for many different reasons. By using a .htaccess file in a subdirectory, you can ban certain IP addresses and domains from the subdirectory (e.g., keeping disruptive members out of your message boards).

The robots.txt file located in your root directory can be used to indicate certain folders you do not want web robots to visit. If a web robot is ignoring your robots.txt file, or doing something on your web site that you do not like (e.g., gathering email addresses), or is from a country that you prefer not to search your site, you can ban those robots. When you review your log files, you will learn not all robots respect your rules in your robots.txt file. You can use the .htaccess file to block these bad web robots. Beware that spammers and bad robots are always coming up with new ways to get around blocking approaches. For example, spammers and bad robots can use zombie machines with new IP addresses, misidentify themselves, and spoof user agents.

You may want to deny access to your web site to bad visitors (fraudsters, spammers). If you know the offender's IP address, you are able to block him from visiting your site. This might require them to start using a different computer or a proxy address to continue to access your web site. Some Internet users have dynamic (changing) IP addresses. In that case, you can block an entire domain or range of IP addresses or domain names (and subdomain names). When you block an entire domain, you may also be blocking legitimate visitors. You can decide to block a domain because they are copying your content, and placing it on their web sites.

If you are responsible for different web sites which are directed towards different audiences, you should use a combination of techniques. No one approach will be totally effective for all sites.

Banning certain IP addresses from accessing your web site can be achieved by adding command lines to your .htaccess file. You can add as many lines as you want, replacing the incoming address with the IP you want to ban. The following syntax is the most common order for blocking visitors.

<Limit GET POST>
order deny,allow
deny from 100.101.102.103
deny from 123.45.6
deny from spammers.com
allow from all
</Limit>

You can deny access based upon incoming IP address, an incoming IP block, or incoming domain name. In the example above, a user from the exact IP number 100.101.102.103 would be blocked. If you only specify 2 or 3 groups of numbers, you can block a whole range. For example, deny from 123.45.6 will deny all users with an IP addresses with the first 3 matching numbers. All users connecting from spammers.com would be blocked.

Blocking Visitors by Referrer

Blocking visitors that originate from particular domains require that you have mod_rewrite installed, and that you have privileges to create your own rewrite rules in your own .htaccess file. Mod rewrite allows your server to determine the referrer before your site is visited. If you do not know if you have those privileges, check with your web host. Your web logs include a referrer entry every time every file on your web site. If you see referrals from other web sites, possibly linking to your images and CSS files, the referrals are not beneficial for your web site (e.g., warez), you should ban them.

Block traffic from a single referrer:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} spammers\.com [NC]
RewriteRule .* - [F]

Block traffic from multiple referrers
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} spammers\.com [NC,OR]
RewriteCond %{HTTP_REFERER} leechers\.com
RewriteRule .* - [F]

The flag "[NC]" is added to the end of the domain to make it case insensitive, so no matter how the name is sent, it gets blocked. The last line in the .htaccess file indicates the action to take when a match is found. The last line indicates a 403 Forbidden error should be sent to the browser. The only difference between blocking a single referrer and multiple referrers is the modified [NC, OR] flag in all cases to every domain except the last.

The line "Options +FollowSymlinks" above is commented. This line should be uncommented if your server isn't configured with FollowSymLinks in its section in httpd.conf, and you get a 500 internal server error when using the code above as is. Check with your web host.

Blocking Bad Robots and Site Rippers

You want search engine robots to visit your web pages. Unfortunately, this leaves the door open to bad robots, also called spam bots, that gather email addresses for spam lists. A site ripper is designed to crawl and download your entire web site for off-line browsing. Some robots ignore your robots.txt file, and deliberately crawl the parts of your web site you requested they stay away from. Some of these bad robots do not identify themselves, or lie about their identity. You are paying for the bandwidth these bad robots are using.

You can identify bad robots by analyzing your web logs. Many of the webmaster forums have good discussions about the bad web robots, and provide listings that you can add to your .htaccess file. Three excellent threads called 'A close to perfect .htaccess ban list' can be found at http://www.webmasterworld.com/forum13/687.htm, http://www.webmasterworld.com/forum92/205.htm, and http://www.webmasterworld.com/forum92/413.htm

You will never be able to ban all bad robots from searching your site. If you can ban the most common ones, your bandwidth usage will decrease. Bad web robots that you add to your .htaccess file will all receive a 403 Forbidden error when visiting your site.

Prevent Hot Linking - Bandwidth Leaching

When an external web site links to non-html objects, (e.g. images, CSS files, and Javascript files) not stored on their server, they are using your bandwidth. Common examples are linking your images on forums, message boards, EBay, and misrepresenting your proprietary images and content as their property. If the extra bandwidth has no benefit to you, and these people are using your property without permission, you can use .htaccess to stop the "hot linking" and bandwidth theft. You may be paying for this extra bandwidth.

Your web site usually contains images (e.g., product tutorial, company icon) that should only be displayed on your web site. Your web site can also contain images that you want to be displayed on other web sites. (e.g., download sites using your software screen shot listed in your pad file, or an icon image for your newsfeeds). The easiest way to accomplish this is to have two separate directories for your images. These directories should not contain any html files from your web sites. Both directories should have a blank index.htm file to prevent visitors from seeing your directory listing. The first directory would be used for 'allowed' images you share, since this is beneficial to you.

The second image directory would include a separate .htaccess file allowing these 'protected' image files to only be displayed on your web site. This .htaccess file includes commands that require the referring (Previous) URL was a URL from your web site. If someone attempts to link to an image in this 'protected' directory, you can either have their request fail, or display a substitute image that indicates their theft. The substitute image should be placed in your 'allowed' folder.

These .htaccess commands block the image request:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://yourdomain.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com/.*$ [NC]
RewriteRule .*\.(gif|GIF|jpg|JPG|bmp|BMP)$ - [F]

These .htaccess commands substitute a different image for the requested image:

RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://yourdomain.com/ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.yourdomain.com/ [NC]
RewriteCond %{REQUEST_URI} !^/stolen.gif [NC]
RewriteRule \.(gif|GIF|jpg|JPG)$ http://yourdomain.com/allowed/stolen.gif [R]

Change "yourdomain.com" to your own URL. The command RewriteCond %{REQUEST_URI} !^/stolen.gif [NC] may not be needed. You can remove the line, and the other commands will still work. To display an image different from the requested image, mod rewrite needs to be enabled by your web host. Checking the HTTP_REFERER does require the server to process an extra command, so you should only use this protection if people hotlinking your images is a problem.

You can test the effectiveness of your web site’s hotlink protection with a URL HotLink Checker at http://altlab.com/htaccess_tutorial.html#hotlinkcheck You enter your complete web site URL to see if your image can be loaded and hotlinked by a remote server.

If someone links to display your proprietary images on their web sites, or forums, or EBay, they are violating your copyright. You may want to send the offending webmaster an email and/or a letter explaining that they are violating your copyright and asking them to stop the copyright infringement. If that does not work, you can contact their web host informing them your copyright has been violated. You may need to send the webmaster and web host cease and desist letters. You can also contact a lawyer to protect your property.

Redirection

Redirection is used to have visitors who request a page that no longer exists redirected to another web page. Your .htaccess file can be used to redirect visitors to HTML pages, files, and directories that have been moved. People and web robots can be redirected to a different file or directory on the same or different web site. Redirection allows the visitor to still have a functioning link, even if you change the filename or location. Another use would be to use a short URL in a newsletter or email that would redirect to a longer URL. The short URL would have less of a change of breaking with a word wrap. This .htaccess feature can be used for any file type, including .jpg, .pdf, and .exe. Search engines will update their links in their index if you include permanent in the command line for both files and directories.

Note that this is search engine-friendly, too. Search engines will change the links in their index to the new link on the basis of the redirect permanent directive. If the original file location was not in the root directory, you must include the directory path in your command line. The following two commands redirect visitors and search engines to the new page and a directory.

Redirect permanent /original.html http://new_domain.html/new.html
Redirect /olddirectory/original.html http://new_domain.com/newdirectory/newfile.html

Prevent Viewing of the .htaccess File

If you use .htaccess for password protection, then the location of your password information is included in the .htaccess file. If you have set incorrect permissions or if your server is not as secure as it could be, a browser can be used to view the .htaccess file. This could compromise your web site and server. If the hacker can find the location of your password file, they can reverse engineer the password list to gain full access to any part of your web site you had protected. It is possible to prevent a.htaccess file from being viewed in this manner by adding these four lines to your .htaccess file.

<Files .htaccess>
order allow,deny
deny from all
</Files>

Any visitor trying to read the file will receive a 403 Access Forbidden error. For added security, you should also set the permission of the .htaccess file to 644 (rw-r--r--).

Password Protection

One of the most popular and useful uses of the .htaccess files is the ability to reliably password protect directories on web sites. Password protected directories can be used to sell downloadable software, provide premium content (e.g. newsletter subscribers) to web site members, and hide information from competitors. When a visitor tries to access a protected area of your web site, they will be prompted for a username and password, and will not be allowed access until they can provide the correct username/password combination. Password protection is usually use to protect entire directories, but individual files can also be protected.

Password protecting a directory takes a little more work than any of the other .htaccess functions because you must also create a file to contain the usernames and passwords which are allowed to access the site. Usernames and encrypted passwords are kept in a webmaster-maintained file. Webmasters are responsible for adding and deleting users, and resending lost passwords. Many web hosts offer password protection directly in their control panels, so you can log into your control panel, set up the password protected directory, and define the usernames and passwords.

To password protect a directory, you need four things; the location of the password file, the protected directory, a .htpasswd file, and a separate .htaccess file. The password file can be placed anywhere on your web site (as the passwords are encrypted), but most of the time it is placed outside the web root so that it is impossible to access it from the web.

The Password .htaccess File

To protect a directory, the directory needs to exist. So if it doesn't, first create the directory. This .htaccess file is placed in the directory that you want password protected. This directory and all its subdirectories will be protected using the .htaccess and .htpasswd files. This .htaccess file should not be placed in your root directory, it will password protect your entire site, which probably isn't your exact goal.

This is an example of a very simple .htaccess file to password protect a directory.

AuthUserFile /path/to/your/password/file/.htpasswd
AuthGroupFile /dev/null
AuthName "Members Area"
AuthType "Basic"

<Limit GET POST>
require valid-user
</Limit>

Following is an explanation of each line of this file:

AuthUserFile

The path to the password file (AuthUserFile) must be the full server pathname. This is not a URL (e.g. http://www.mydomain.com/.htpasswd). The path must also include a '/' at the front. Contact your web host if you do not know what the full path to your web space is.

AuthGroupFile

This is the full server path to the group password file. If there is no group file, use /dev/null

AuthName

This is the message that is displayed in the password prompt box. The AuthName is the name of the area you are protecting. You could label this anything you want. Common examples are "Restricted Area", "Members Area", "Private Folder". If you want the AuthName to have spaces in it, surround it within quote marks.

AuthType

Only Basic Authentication is possible at present. Basic Authentication allows restricted access by looking up users in plain text password files. This is not a really secure password system, because the username and password can be sent as plain text (e.g., http://username:password@www.yourdomain.com/privatedirectory/) to a protected URL. Another mode (Digest) is planned, which will be more secure.

Require valid-user

Use this command to allow the entire list of users in your .htaccess file to have access to the protected directory. Creative use of <Limit GET POST> can grant access to specific individuals, specific domains, restrict access to time of day, etc.

The .htpasswd File

The .htpasswd file must be located in the location entered in the .htaccess file after AuthUserFile. This is also a plain text file, that must be uploaded in ASCII (not binary) to work. You can give password file name a different name. .htpasswd is the suggested default filename. Each line of this file contains a username and encrypted password separated by a colon.

The username can be any text or name but should not contain any spaces. The password is always 13 characters long and the encrypted password can be different each time it is processed. But they will all work. There must be no spaces on the line before username, either side of the colon or after the password text. You can add as many username/password lines as you wish making sure each one is on a separate line. Many webmasters use on-line tools to encrypt passwords. One link is to http://www.tools.dynamicdrive.com/password/. You can also purchase your own script to encrypt passwords, and place it in your cgi-bin folder.

Your web host may provide another way to create the .htpasswd file in the password protection section of your web management console. Create the directory on your web site you want to protect, then tell the console to password protect it. The console should create both the .htaccess file in the protected directory and the .htpasswd files. These files can then be downloaded in binary mode, modified, and then uploaded in ASCII mode. Removing users from your .htpasswd file consists of deleting the line that contains the user you want to delete.

Other Sources of Information

A most complete list of all possible uses of the .htaccess file can be found at http://www.apache.org/docs/mod/directives.html. Several forums and web sites provide a wealth of information, and sample .htaccess files. The best are the Apache forum at www.webmasterworld.com, the Apache forum at www.sitepoint.com, www.mod-rewrite.com, http://www.webmaster-toolkit.com, http://www.htaccesstools.com, and http://devshed.com.

ASP member Paul Roberts of TLHouse software offers free software HTAccessible at http://www.tlhouse.co.uk/pc_software.shtml. The software is described as providing "a simple interface for putting together some of the most commonly used Apache directives."

Terry Jepson
www.wiscocomputing.com