edit · print · PDF

Please note that all the SIEpedia's articles address specific issues or questions raised by IAC users, so they do not attempt to be rigorous or exhaustive, and may or may not be useful or applicable in different or more general contexts.

How to check links in a Web Site

Regularly checking links on a Web site is or should be a staple activity of any Web developer or maintainer. Internal links may break because Web pages are moved or renamed; external links may get broken because Web sites are moved to a new server, or the URL address has changed, or they have shut-down altogether.

A useful and relatively easy tool to check both internal and external links is linklint, a perl program that runs both in Linux and Solaris.

linklint has many options, which may appear complicated and confusing. In this short article we describe first some basic usage, by looking at the shell script we run to check links in our [[SINFIN -> http://www.iac.es/sieinvens/SINFIN/] site; then we explain what output files to inspect for broken links.

Running linklint

The script is the following


#!/bin/csh -f
cd /net/marta/si/www/SINFIN
linklint -doc ~/public_html/linkdoc-SINFIN -output-index linklint \
-root . /@ -net

The meaning of the flags is:

  • -doc ~/public_html/linkdoc-SINFIN: directory where output files are stored
  • -output-index linklint: Index file will be called "linklint.html" (and not "index.html"). This because I prefer to see the file list in the browser when I type http://marta/~invweb/linkdoc-SINFIN/
  • -root .: the root dir of the site is the current directory
  • /@: examine the entire site
  • -net: check all remote links

For advanced usage, please see the documentation in the linklint Web site.

Inspecting results

linklint generated a lot of files in the doc directory. You may wish to have a look at each of those files to see what kind of information they give; however, the file you must absolutely check are:

  • errorF.html: internal broken links
  • urfailF.html: failed urls, that is broken or not working external links
  • urlmoved.html: urls that actually point to a different URL (such as redirections, etc.)
  • urlwarnF.html: possibly harmless warnings

Note that for each .html file there is a plain .txt file (useful if you wish to look at the linklint output from a terminal).

Shortcomings:

Apparently linklint is unable to check ftp:// links, and also fails with anchors inside a tag different from <a>

Other link check packages

There are many linkchecker, some free, some not, some only for Windows, etc. Just look for them in Google. We have tried checkbot, however running it requires some additional perl modules not currently installed here at the IAC. We are currently testing linkchecker, if we like it we will update this page.

edit · print · PDF
Page last modified on March 28, 2011, at 04:27 PM