Introduction
GlimpseHTTP is a collection of tools that allows you to use
Glimpse
to search your files using HTTP interface. In other words,
it is "Glimpse search engine - HTTP" gateway. Glimpse indices
are much smaller than, for example, WAIS indices
(they are 2-7% vs. more than 100% of
the size of the text), glimpse gives you the line containing the match
(like grep), and glimpse allows you to search even with misspellings.
Glimpse, however, can be slower than WAIS, and it does not rank the matches
(although there is a custom modification that does that).
Furthermore, GlimpseHTTP allows you to integrate search with
browsing. If you have several nested directories which the user may
browse, you can include the glimpse interface in each document such that
only the relevant directories will be included in the search. More
details are given below.
The current version of GlimpseHTTP was
tested under httpd 1.2 HTML server from NCSA and
Glimpse currently works on many Unix platforms.
To search and browse the information any HTML browser can be used
(this includes NCSA Mosaic for X-Windows, MS-Windows and
Macintosh, Lynx and other browsers. For maximum convenience
your browser should support forms, although minimal
functionality can be achieved with any browser).
Since GlimpseHTTP uses Glimpse, this provides some unique features
- A very small index (3-5% of the total text).
- Reasonably fast search.
- Search for approximate match allowing errors.
In addition, GlimpseHTTP provides you with the following
capabilities:
- You can use a combination of browsing and searching:
first, you locate the directory where the relevant
information can be stored, then you can use search
to locate specific files.
- The result of the search is a nicely formatted hypertext with
hyperlinks to matching documents.
- Following the hyperlink leads you not only to a particular
file, but also to the exact place where the match occured.
- Hyperlinks in the documents are converted on the fly to
actual hyperlinks, which you can follow immediately. This
makes the GlimpseHTTP particularily suitable for searching
meta-information (Internet directories etc.).
- Similar tools are provided for archiving and searching
USENET newsgroups. You can maintain the archive of news articles
and allow people to search your archive using the
same interface. Features supported include kill-file for articles
and fast search for particular posters. Since news archiver uses
NNTP interface, you can archive news articles from remote
news servers. (Browse and search for news is yet to be
implemented: browsing in this case means selection of pertinent
newsgroup(s), currently supported is only the search within
one newsgroup a time)
Among the possible applications of GlimpseHTTP we envision:
- FTP sites with search possibilities;
- news archiving sites;
- any search application which should be accessed over local
or global network where searching for approximate match and/or
saving of disk space for indices is an issue.
GlimpseHTTP components
- aglimpse - "Archive Glimpse" - a tool for searching file
hierarchies indexed for Glimpse. aglimpse is a CGI-compliant
program which performs the search and formats the output as
HTML document with hyperlinks to the matches.
- Archive Manager
facilitates maintaining and
indexing of Glimpse archives. One of its options is
HTML indexing
which prepares hypertext indices for
each searchable directory - this supports the concept
of combined browsing and searching.
- GlimpseNews - a collection of tools for archiving and
searching newsgroups archives.
Documentation
Software
See also
Authors
Paul Klark
(GlimpseHTTP)
Udi Manber,
Sun Wu, and Burra Gopal (Glimpse)
University of Arizona,
Department of Computer Science
To be put on glimpse mailing list, send mail to
glimpse-request@cs.arizona.edu
Paul Klark
paul@cs.arizona.edu