Site Mapping
Back to: VRC Categories | VRC Tool Box
View tools in the Site Mapping category
Site Mapper Category Description
Summary
Site Mapping tools examine the structure of a Web site and create a map
depicting each page in the site and the links that connect the pages. For
the purpose of VRC, site mapping utilities can potentially be useful in
each of the risk management modules, although they primarily suitable to
generate site characterizations (Analysis module) and to detect structural-level
changes to a site (Detection module).
Introduction
As its name implies, the Site Mapper category is concerned with utilities
that create a “map” of a Web site. Much like a typical map from
which the metaphor is taken, a Web site map provides a representation (graphical
and/or text) of each page that resides in the site and depicts the links
that a user may take to navigate from page to page, and out to the Web.
The origin of site mapping tools can be found in information visualization research. With the advent of the World Wide Web, site maps became a popular tool for navigation. Shareware programs for creating site maps were developed at least as far back as 1998. Several instances of site maps were uncovered using the Internet Archive Wayback Machine at least as far back as 1996. A few mentions of using site maps for Web site navigation can be found in Usenet archives in 1995. As sites grew in size and complexity, site mapping software evolved to streamline the process. Early site maps were intended to index a web site for website visitors, but the popularity of site maps decreased as usability studies criticized them, and the cost of maintaining them became too high. Instead of only creating a simple user-oriented site map, software began to offer additional features, such as the ability to check links, and check spelling.
Most site mapping tools are intended for use by Web site developers. Indeed the ability to create site maps is a common component of modern HTML editors, such as Dreamweaver, GoLive and FrontPage. Mapping tools and services are useful in the planning phase of site design and redesign in order to visualize and optimize the flow of information (e.g. information architecture), especially for very large sites. A site map is also a common entity (a.k.a. index, guide, or table of contents) within an existing site to provide users a navigational overview of the content of the site. Some mapping utilities can also produce navigational aids for use in off-line distribution, such as CD-ROM.
Other specialized mapping tools create site maps in the context of knowledge mapping for visual learning (e.g. The Brain-www.thebrain.com, Inspiration-www.inspiration.com, Visio-www.microsoft.com/office/visio). Web visualization tools—complex kin to site mappers—use many graphic styles to capture hierarchical sets of link relationships, often packing in more data by using color (e.g. to distinguish link directionality) or size (e.g. to represent the amount of traffic along a link). These tools may be of more experimental utility, as many are not particularly polished or dependable for everyday use. Many visualization tools are designed to map an entire region of the Web, and show connections between websites. This is an ambitious goal, and is difficult to put into practical application.
Site mappers are closely related to other VRC tools. Generally speaking, a site mapper is a specialized web crawler. A typical site mapper shows links between pages, acting indirectly as a link checker. If a site is mapped periodically it may function as a crude Web site monitor and capable of detecting change at the page structure level. Site map capabilities are sometimes included as a component of Web management applications.
Site Mapping for VRC
For the purposes of remote control, site mappers and site visualization
tools may be used to view and understand the full content and structure
of a website. Some mappers can diagram any Web site with a valid URL without
violation of the end user license agreement. This capability is essential
for remote control.
Site mapping and visualization utilities that do allow crawling of a remote site can potentially be useful in each of the risk management modules, although these tools are primarily suitable for generating site characterizations (Analysis module) and to detecting structural-level changes to a site (Detection module).
Identification
Examining the links into and out of a site could be a beneficial when determining
which Web sites need to be monitored.
Analysis
It can be difficult and time intensive to explore even a moderate sized
site by hand-linking page to page. A site map may prove most valuable by
providing a less-cluttered and more succinct image of the extent and layout
of the site.
Appraisal
As a Web site developer might employ site mapping to design effective flow
of information, a preservationist can similarly use a map to gauge the extent
to which the site was established and maintained with sound architectural
principles. Also, the depth and breath of site may in some instances be
a factor in assessing the value of the site from a preservation standpoint.
Strategy
Data about the size and complexity of a site may be necessary to develop
effective management strategies.
Detection
Site mapping utilities can be used periodically to detect changes at the
site structure level. Excessive and frequent changes in structure might
flag a potential risk to information loss. Infrequent changes to structure
may be an indication that a site is at risk of obsolescence. While most
site mapping applications cannot detect a change to the content of a page,
the page level of resolution may be sufficient and even more instructive
in some contexts (for example, detecting a change in number pages of a site
that maintains a large number of pdf files).
Response
The visual display of a site map can be a useful way to document and present
a Web site.
Site Mapper Features
Since site mappers are a specific type of Web crawler, they share many of
the potential uses for VRC—at least to the extent that the mapping
tool generates valuable data about the remote files as a Web crawler. However,
many common core features of site mapping software are less useful for VRC
(e.g., drawing tools, ability to upload maps to the Web site). Currently,
there are two main types of site mapping tools, those that primarily create
maps to be used on websites, and those that create maps to be used by developers.
In the context of remote control, we will concentrate on mapping tools aimed
at developers, since these tools are more fully functional.
Certain features seem to be common among all site mappers and site visualization tools, including the ability to crawl a website or a local directory, the ability to display all HTML files linked within a site, and the ability to display the links between HTML files. The following additional features were determined to be most important for VRC:
- · Scheduled Mapping
- · Change Detection
- · Record Last modified date
- · Color code or other visualization based on length of time since element has changed
- · Display ALL files (HTML AND images, etc.)
- · Ability to use on any URL – not restricted to one’s own domain or local host
- · Ability to restrict depth of map (3 levels, etc.)
- · Unlimited mapping capability – able to map large sites
- · Works on a variety of platforms, modern operating systems
- · Usable interface, clear information architecture
- · Report title, author, description, keywords, images, link in/out
- · Review file totals, mime types, sizes
- · Various map “styles”—visualization capability
- · Ability to save maps in a variety of formats (HTML, RTF, Excel, etc.)
- · Ability to map non-English sites, set of Unicode fonts
- · Ability to print maps
- · Ability to extract mailto links
- · Ability to specify username/password for reading password protected pages
- · Ability to map hyperlinks from Javascript, Flash, Coldfusion, PHP, ASP, etc.
- · Enable or disable cookies
- · Highly configurable without sacrificing usability or requiring a programmer
- · Select TCP/IP port number
- · Select response to robots.txt
- · Configurable to ignore specific links
Site mapping and visualization tools are most commonly stand-alone applications, or distributed within other software packages. These range from freeware to high-end (> $1000) commercial products. Most, though not all, appear to be well-maintained and supported. Most fall into the $30–150 price range.
Other modes include Perl scripts and Java applets primarily used to generate a server-side (or even client side) map or index of the Web site as the browser downloads it. Perl-based offerings are generally covered by the GNU General Public License. Java applets are typically commercial products.
Site mapping may also be a component of larger site management services (site maintenance) or information architecture (site development) services. These service providers generally market to large e-commerce sites; features are mostly oriented to local sites. Other subscription—and even free—services are available to create site maps and search capability, but these are also restricted to local sites.
Overall there are a fair number of types of site mapping tools available although not all are suited for remote monitoring. Nearly all offerings are Windows compatible or OS independent; few products are available for the Mac platform with the exception of more graphics oriented software (e.g. ConceptDraw, Inspiration).
Tool Evaluation Selection Process
A list of site mapping and visualization tools and services was generated
through:
- · Use of search engines using terms such as “site mapping,” and “site mapper”
- · Browsing through C|Net shareware offerings
- · Lists of compiled links (e.g. http://www.softwareqatest.com/qatweb1.html)
- · Directory searches (e.g. Google directory)
- · Perusing Usenet and other discussion groups
The list was first narrowed to tools that were currently available. Tools
that did not enable remote mapping were then eliminated. Of the remaining
possibilities, tools were selected based on features that would best suit
them for VRC (see ideal features list above). One aspect that presented
difficulty in tool selection was that Site Map functionality for the purpose
of remote control may overlap or be combined with tools in other categories.
