Subject: Glimpse- Searching entire file system   

X-within-URL: ftp://ftp.cs.arizona.edu/glimpse/README


GLIMPSE 3.5: searching entire file systems
(http://glimpse.cs.arizona.edu/)

Glimpse is a very powerful indexing and query system that allows you to
search through all your files very quickly.  It can be used by
individuals for their personal file systems as well as by organizations
for large data collections.  Glimpse is also the basis of GlimpseHTTP,
which provides search for web sites, and it is the default search engine
in Harvest (see below).

Glimpseindex, which you run by saying "glimpseindex DIR" builds an
index of all text files in the tree rooted at DIR.
(e.g., glimpseindex ~ indexes all your files.) With it, glimpse can
search through all files much the same way as agrep (or any other
grep), except that you don't have to specify file names and the search
is fast.  For example,

        glimpse -1 unbelievable

will find all occurrences (in all your files!) of "unbelievable"
allowing one spelling error;

        glimpse -F mail arizona

will find all occurrences of "arizona" in all files with "mail" somewhere
in their name;

        glimpse  'Arizona desert;windsurfing'

will find all lines that contain both "Arizona desert" and "windsurfing".

Glimpse supports three types of indexes: a tiny one (2-3% of the
size of all files), a small one (7-9%), and a medium one (20-30%).
The larger the index the faster the search.
Glimpse supports most of agrep's options (agrep is our powerful version
of grep, and it is part of glimpse) including approximate matching
(e.g., finding misspelled words), Boolean queries, and even some
limited forms of regular expressions.

The WWW home page for glimpse is in
    http://glimpse.cs.arizona.edu/
It includes links to the source, binaries for most UNIX systems,
documentations, articles, and more.

GlimpseHTTP home page is in
    http://glimpse.cs.arizona.edu/ghttp/

Harvest's WWW home page is
        http://harvest.cs.colorado.edu/
(Harvest is an integrated set of tools to gather, extract,
organize, search, cache, and replicate relevant information
across the Internet.)

Mail glimpse-request@cs.arizona.edu to be added to the glimpse mailing list.
Mail glimpse@cs.arizona.edu to report bugs, ask questions, discuss
tricks for using glimpse, etc.  (This is a moderated mailing list.)

Udi Manber, Burra Gopal, and Sun Wu.




                                [IMAGE] GLIMPSE
                                       
                     A TOOL TO SEARCH ENTIRE FILE SYSTEMS
                                       
   
   
News

     * We are hiring! We have an opening for a full-time position of
       staff scientist as well as part-time research assistant positions.
       Mail me for more information.
       
     * Glimpse version 4.0B1 (beta) has been released on Nov. 22, 1996.
       See the announcement (and you can download it below). Glimpse now
       supports NOT. (As opposed to NOT supporting it before....) If you
       use 4.0, you will have to reindex.
       
     * WebGlimpse version 1.1b1 is now available.
       
   
   
Introduction

   Glimpse is a very powerful indexing and query system that allows you
   to search through all your files very quickly. It can be used by
   individuals for their personal file systems as well as by
   organizations for large data collections. Glimpse is the default
   search engine in Harvest. Glimpse is now at version 4.0B1.
   
   The Glimpse package contains several programs, the most important of
   which are glimpse, glimpseindex, agrep, and glimpseserver. To index
   all files in the a directory tree rooted at DIR, you simply say
        glimpseindex DIR

   (E.g., glimpseindex ~ indexes all your files.) Afterwards, glimpse can
   search through all these files much the same way as agrep (or any
   other grep), except that you don't have to specify file names and the
   search is fast. For example,
         glimpse -1 unbelievable

   will find all occurrences (in all your files!) of "unbelievable"
   allowing one spelling error;
         glimpse -F mail arizona

   will find all occurrences of "arizona" in all files with "mail"
   somewhere in their name;
         glimpse  'Arizona desert;windsurfing'

   will find all lines that contain both "Arizona desert" and
   "windsurfing".
         glimpse  -W 'Arizona;~football'

   will find all lines containing "Arizona" in files that do not contain
   the word "football".
   
   Glimpse supports three types of indexes: a tiny one (2-3% of the size
   of all files), a small one (7-9%), and a medium one (20-30%). The
   larger the index the faster the search. For most applications, the
   small index (glimpseindex -o) is the best choice. Glimpse supports
   most of agrep's options (agrep is our powerful version of grep, and it
   is part of glimpse) including approximate matching (e.g., finding
   misspelled words), Boolean queries, and even some limited forms of
   regular expressions.
   
Demos

     * Computer Science Bibliographies Database
     * A list of over 800 sites using GlimpseHTTP
     * A list of over 100 sites using WebGlimpse
       
Documentation

     * Glimpse man pages
     * Glimpseindex man pages
     * Glimpseserver man pages
     * README file
     * Copyright notice
     * An article describing the ideas behind the design of glimpse.
       
Software (version 4.0B1)

     * IMPORTANT: read the Copyright Notice.
     * Glimpse FTP area
     * The complete source code for glimpse and glimpseindex, as well as
       manual pages and other stuff.
     * OSF/1 DEC Alpha executables
     * Sparc Solaris 5.5 executables
     * Sparc Sun OS 4.1.3 executables
     * Linux 2.1.10 executables (courtesy of Nelson Beebe)
     * HP-UX executables (courtesy of Nelson Beebe)
     * SGI IRIX 5.3 executables (courtesy of Nelson Beebe)
     * UnixWare executables (courtesy of Henri Irla)
       
   
   
Software (version 3.6)

     * IMPORTANT: read the Copyright Notice.
     * The complete source code for glimpse and glimpseindex, as well as
       manual pages and other stuff.
     * OSF/1 DEC Alpha executables
     * Sparc Solaris 5.3 executables
     * Sparc Solaris 5.5 executables
     * Sparc Sun OS 4.1.1 executables
     *