Good Site Management
Back to: VRC Approach | VRC Tool Box
Indicators of good site management for preservation—lower risk for loss
- 1) Use of current server software
This practice suggests that site managers are tracking security threats and upgrading software regularly. We found a number of sites using outdated versions, and noted a slow response to Web server upgrade notifications.
- 2) Consistent implementation of HTTP headers
A server responds to requests from a Web browser with HTTP headers. There are many possible headers, but we have found that very few headers are used on the majority of pages. For risk management purposes, the content-length, last-modified, and content-md5 (checksum) headers will support effective monitoring. Here’s one tool for checking headers: http://www.delorie.com/web/headers.html
- 3) Low incidence (less than 1%) of missing pages (HTTP status
code 404)
We found an average of 4% of 404s across all test sites, one-third of the sites have 1% or less. Regularly checking links and comprehensive site maintenance will avoid missing pages.
- 4 ) Clear labels on redirected pages
Redirection is one way to avoid lost pages. We have found a substantial numbers of redirects are labeled ambiguously; i.e., using the 302 “Found” code rather than the 302 “Moved Permanently” code, letting users know that their links must be updated. On-screen notification for redirects is even better.
- 5) Use of current version of page encoding
This suggests that Web managers are following w3 organization recommendations. Our data suggest an overall transition from HTML 4.01 (final version) to XHTML. Tools like HTidy tool can assist in converting older HTML pages (http://www.w3.org/MarkUp/#tidy) and in maintaining well-formed pages.
- 6) Use of comprehensive, well-documented HTML tags on all pages
Accurate, effective, and consistent use of Meta tags on HTML pages support the reliable retrieval and identification of Web pages, which may help track content as page URLs change over time. For an overview of meta tags, see: http://www.w3.org/TR/html4/struct/global.html#edef-META
- 7) Use of common formats
MIME types identify the type of Web content on a page. In our test sets, we have found that 4 MIME types (text/html, image/jpeg, image/gif, and application/pdf) accounted for an average of 96% of pages. Common formats are more likely to have recommended preservation approaches and support cost-effective preservation through economies of scale. We have also found small numbers of pages with dozens of MIME types, many of which are now obsolete or not well supported, and so at risk.
