Archie (Archive Server Listing Service)


  
  1. Introduction
  2. Archie access modes
  3. Accessing Archie using a Local Archie Client
  4. Accessing Archie by Telnet
  5. Frequently Used Telnet Archie Commands
  6. Accessing Archie by e-mail
  7. Frequently Used e-mail Archie Commands
  8. Maximizing Archie search using 'Patterns'


  
Available CEE UCL softeware(s) related to this topic:

  • archie (command-line mode)
  • xarchie (X11)





Introduction

"If the hill will not come to Mahomet,
Mahomet will come to the hill."
-- Francis Bacon, Of Boldness

     
Wouldn't it be great if there was some sort of "search" program that would look through hundreds of different anonymous ftp sites and tell us where all of the files that we want are located? Well, such a search program exists. It is called Archie (Archive Server Listing Service).

Archie is actually a collection of servers. Each of these servers is responsible for keeping track of file locations in several different anonymous ftp sites. All of the Archie servers talk to each other, and they pool their information into a huge, global database.

Administrators all over the world register anonymous FTP servers with the archie service; once a month the archie service runs a program which scans the directories and filenames contained in each of the registered FTP servers, and generates a grand merged list of all the files and directories contained in all the registered servers. More than 2300 anonymous FTP sites are now represented in this list, which is referred to as the archie database. The archie database currently contains more than approximately 20,000,000 unique filenames themselves representing 4 terabytes (that is, 4,000,000,000,000 bytes) of information.

Files made available at anonymous FTP sites contain software packages for various systems (Windows, DOS, Macintosh, Unix, etc.), utilities, information or documentation, mailing lists or Usenet group discussion archives. At most FTP sites, the resources are organized hierarchically in directories and subdirectories. The archie database contains both the directory path and the file names.

You can search this database for file locations simply by giving an Archie client or server a keyword to search for. (The archie database is available to all users of the Internet, and can also be accessed via electronic mail)

A few minutes ago I did an Archie search using the keyword "linear". Archie sent me back a whole bunch of information in the following format:


Host triples.math.mcgill.ca     (132.206.150.30)
Last updated 22:41 23 Dec 1998

  Location: /pub/rags
  DIRECTORY    drwxr-xr-x           512  01:55 17 Jun 1997  linear

What does all of this tell me? Well, this tells me the address of the anonymous ftp site is
triples.math.mcgill.ca (132.206.150.30)

the directory that the file is located in is
/pub/rags/linear

Archie doesn't retrieve the file for me, but it does tell me exactly where the file that I am looking for is located. Once I know the file's location (and its filename), retrieving the file using ftp is easy.
Archie access modes
     
There are three ways that you can access Archie:

  1. through an Archie client running on your local system,
  2. through a telnet connection directly to an Archie server, or
  3. by sending an e-mail letter directly to an Archie server.

The load on all of the Archie servers is incredible. If your site has its own Archie client, you should use that client instead of telnetting or e-mailing to a distant Archie server.

Accessing Archie using a Local Archie Client
     
To find out if your site is running its own Archie client, type the word at your system prompt;

$ archie

and see what happens. If you don't get an error message, you can safely assume that your site has its own Archie client :)

To actually conduct an Archie search using your site's Archie client, type

$ archie <search term>

replacing <search term> with what you want to search for. For example:

What you want Archie to search for You type
Files and directories that have the word "post" in their titles $ archie post
Files that have the extension .dll $ archie .dll

For example, to retrieve a list of ftp servers with file(s) or directories containing "archie" string:
$ archie -s archie

then archie will send you the following results (partial listing):

Host ftp.csua.berkeley.edu      (128.32.43.51)
Last updated 02:14  1 Aug 1998
	
Location: /pub
DIRECTORY    drwxr-xr-x           512  20:00 26 Oct 1996  archie
	
Host ftp.nau.edu        (134.114.96.15)
Last updated 02:06 13 Jun 1998
	
Location: /gopher/general/departs/cts/support/netwrkng/tcp/netsoft
FILE    -rwxrwxr-x          7532  05:00 19 Aug 1994  archie
	
Host rcs1.urz.tu-dresden.de     (141.30.61.11)
Last updated 02:36 31 Mar 1998
	
Location: /pub/soft/unix/bsd/FreeBSD/FreeBSD-CVS/ports/net
DIRECTORY    drwxr-xr-x          8192  08:36 22 May 1997  archie
	
Location: /pub/soft/unix/comp.sources.misc/volume22
DIRECTORY    drwxr-xr-x          8192  06:00 24 Oct 1996  archie
	
Location: /pub/soft/unix/comp.sources.misc/volume33
DIRECTORY    drwxr-xr-x          8192  06:00 24 Oct 1996  archie
	
Host ftp.iij.ad.jp      (202.232.2.51)
Last updated 20:25 12 Jun 1998
	
Location: /network
DIRECTORY    drwxr-xr-x          1024  05:00 25 Apr 1996  archie
	
Location: /NetNews/comp.sources.misc/volume22
DIRECTORY    drwxr-xr-x           512  05:00 23 Apr 1996  archie
	
Location: /FreeBSD/ports-2.1.6/news
DIRECTORY    drwxr-xr-x           512  00:25 12 Jun 1997  archie
	
............

There are lots of options available, read the manual with the 'help' command (no quotes). For additional archie options, RTFM or use "archie -h."

FYI, various public domain clients for Windows, MS-DOS, OS/2, VMS, Unix (including Linux), Macintosh and X-Windows are available from most of anonymous ftp sites, and are in the directories /pub/archie/clients or /archie/clients.

Some of you may be wondering, why does the Anonymous FTP Sitelist exist if archie can find files? The answer is this: archie does not work (yet) with non-Unix sites (the number of which will increase substantially year after year) and another problem with archie is that different servers can provide you with different answers depending on the ftp sites they currently have in their memory. Using a European server you might not be able to find a file in the US, but if you try a US server it's possible that it does find the file(s) you need and vice versa.
Accessing Archie by Telnet
     
The following are a few of the Archie servers that you can access using telnet. At the login: prompt enter 'archie' (no quotes). The login procedure leaves the user at the prompt archie> indicating that the server is ready for user requests.

$ telnet <archie host address>
login: archie
archie>

There are several archie servers you can telnet (see below for the list). I normally use the archie server at Rutgers University ( archie.rutgers.edu) which seems always faster than the others. Anyhow, if possible, use the server that is closest to you.

Once connected, to find a file or directory called 'filename' you would type: 'prog filename' (no quotes) or 'find filename' (again, no quotes) - depending on which archie server you're accessing - at the archie> at the prompt.

archie> prog linear
# Search type: exact.
working...
.
.
.

It's great to see the list of all the ftp sites that contain the file(s) you're looking for. However, it scrolls over too quickly and there seems no way to redirect the search result. Don't worry. After Archie has finished its search and printed its results on your screen, you can have archie e-mail the results to you by typing

archie> mail <your e-mail address>

Finally, to quit your telnet archie session, type
archie> quit
# Bye.
Connection closed by foreign host.
$

Some suggestions on using Telnet Archie servers;

  • Avoid connecting during working hours; most of the archie servers are not dedicated machines - they have local functions as well.

  • Make your queries as specific as possible; the response will be quicker and shorter.

  • Archie client installed on your computer help to reduce the load on the server sites, so please use the client instead of telnet..

  • Use the archie server closest to you and, in particular, don't overload the TransAtlantic lines.

To get an updated archie server list, type: telnet archie.ans.net (or any other archie server) and login as 'archie' (no quotes) and type 'servers' (again, no quotes). Of course you can also try a server somewhat closer but this list is from archie.ans.net.

archie address Site in
archie.au Australia
archie.aco.net Austria
archie.cs.mcgill.ca Canada
archie.uqam.ca Canada
archie.funet.fi Finland
archie.univ-rennes1.fr France
archie.th-darmstadt.de Germany
archie.ac.il Israel
archie.unipi.it Italy
archie.kyoto-u.ac.jp Japan
archie.wide.ad.jp Japan
archie.hana.nm.kr Korea
archie.sogang.ac.kr Korea
archie.nz New Zealand
archie.uninett.no Norway
archie.rediris.es Spain
archie.luth.se Sweden
archie.switch.ch Switzerland
archie.ncu.edu.tw Taiwan
archie.twnic.net Taiwan
archie.doc.ic.ac.uk United Kingdom
archie.hensa.ac.uk United Kingdom
archie.ans.net USA
archie.internic.net USA
archie.rutgers.edu USA
archie.sura.net USA
archie.unl.edu USA
Frequently Used Telnet Archie Commands
     
The following archie commands are available for telnet archie search:

Telnet Archie command It does
exit, quit, bye exits archie.
help <command-name> invokes the on-line help. If a command-name is given, the help request is restricted to that command. Pressing the RETURN key exits from the on-line help.
list <pattern> provides a list of the FTP servers in the database and the time at which they were last updated. The result is a list of site names, with the site IP address and date of the last update in the database. The optional parameter limits the list to sites matching pattern: the command list with no pattern will list all sites in the database (more than 1000 sites!). E.g. list \.kr$ will list all Korean anonymous ftp sites


archie> list \.kr$
# Your queue position: 1
# Estimated time for completion: 13 seconds.
working... =

cbubbs.chungbuk.ac.kr  203.255.72.254	14:47 23 Feb 2001
ftp.kigam.re.kr	  134.75.144.10	15:11 16 Jul 2001
ftp.kornet.nm.kr       168.126.63.7	02:27  4 Mar 2001
uniboy.dwt.co.kr       165.133.1.2	15:11 16 Jul 2001
  ...............
archie>
site(*) site-name lists the directories and subdirectories held in the database from a particular site-name. The result may be very long.
whatis string searches the database of software package descriptions for string. The search is case-insensitive.

If you send the command whatis sparse (=sparse matrix solver) in a Telnet session, then you will get the following results:


archie> whatis sparse
harwell  MA28 sparse linear system  (argonne)
laso     Scott's Lanczos program for eigenvalues of sparse
         matrices  (argonne)
sparse   Kundert + Sangiovanni-Vincentelli, C sparse linear
         algebra  (argonne)
sparspak George + Liu, sparse linear algebra core  (argonne)
y12m     Sparse linear system  (Aarhus)  (argonne)
archie> 
prog string | pattern

find(+) string | pattern
searches the database for string or pattern. Searches may be performed in a number of different ways specified in the variable search, which also determines whether the parameter is treated as a string or as a pattern.

The search produces a list of FTP site addresses which contain filenames matching the pattern or containing the string, the size of the file, its last modification date and its directory path. The number of matches is limited by the maxhits variable.

The list can be sorted in different ways, depending on the value of the sortby variable. By default, the variables search, maxhits and sortby are set to, respectively, exact match search on string, 1000 hits and unsorted resulting list.

A search can be aborted by typing the keyboard interrupt character (Control-C); the list produced at that point will be displayed.

mail <email> <,email2...> places the result of the last command in a mail message and dispatches specified e-mail address(es). If no mail address is specified as a parameter, the result is sent to the address specified in the variable mailto.
show <variable> displays the value of the given variable. If issued with no argument, it displays all variables. The archie variables are shown below with the details of the set command.
set variable value changes the value of the specified archie variable. The variables specify how other archie commands should operate.

Variables and values of telnet archie command are:

Archie command Variable/Value It means
compress(+) compress-method specifies the compression method (none or compress) to be used before mailing a result with the mail command. The default is none.
encode(+) encode-method specifies the encoding method (none or uuencode) to be used before mailing a result with the mail command. This variable is ignored if compress is not set. The default is none.
mailto email <,email2 ...> specifies the e-mail address(es) to be used when the mail command is issued with no arguments.
maxhits number specifies the maximum number of matches prog will generate (within the range 0 to 1000). The default value is 1000.
search search-value determines the kind of search performed on the database by the command: prog string | pattern. search-values are:

sub
a partial and case insensitive search is performed with string on the database, e.g.:
"is" will match "islington" and "this" and "poison"
subcase
as above but the search is case sensitive, e.g.
"TeX" will match "LaTeX" but not "Latex"
exact
the parameter of prog (string) must EXACTLY match the string in the database (including case). The fastest search method of all, and the default.

regex
pattern is used as a Unix regular expression to match filenames during the database search.

sortby sort-value
describes how to sort the result of prog. sort-values are:

hostname
on the FTP site address in lexical order.
time
by the modification date, most recent first.
size
by the size of the files or directories in the list, largest first.
filename
on file or directory name in lexical order.
none
unsorted (default) -- Reverse sorts can be carried out by prepending r to the sortby value given (e.g. rhostname instead of hostname).

set term terminal-type <number-of-rows <number-of-columns>
tells the archie server what type of terminal you are using, and optionally its size in rows and columns, e.g.
set term xterm 24 100
Accessing Archie by e-mail
     
User's internet access capability limited to only e-mail can still access the archie servers via archie e-mail. To conduct an Archie search via e-mail, send an e-mail letter to the Archie server closest to you. (The domain addresses of the servers are listed below.)

Typical e-mail archie search looks similar to shown below.

find ****
set mailto your_e-mail_address
quit

replacing "****" with what you want the server to search for. Search results will be automatically sent back to you via e-mail.

The e-mail interface to an archie server recognizes a subset of the commands described described below. An empty message, or a message containing no valid requests, is treated as a help request.

Archie commands are sent in the body part of the mail message, but the Subject: line is also processed as if it were part of the main body, so be careful! Command lines begin in the first column; all lines that do not match a valid command are ignored.

archie address Located in
archie.au Australia
archie.aco.net Austria
archie.cs.mcgill.ca Canada
archie.uqam.ca Canada
archie.funet.fi Finland
archie.univ-rennes1.fr France
archie.th-darmstadt.de Germany
archie.ac.il Israel
archie.unipi.it Italy
archie.kyoto-u.ac.jp Japan
archie.wide.ad.jp Japan
archie.hana.nm.kr Korea
archie.sogang.ac.kr Korea
archie.nz New Zealand
archie.uninett.no Norway
archie.rediris.es Spain
archie.luth.se Sweden
archie.switch.ch Switzerland
archie.ncu.edu.tw Taiwan
archie.twnic.net Taiwan
archie.doc.ic.ac.uk United Kingdom
archie.hensa.ac.uk United Kingdom
archie.ans.net USA
archie.internic.net USA
archie.rutgers.edu USA
archie.sura.net USA
archie.unl.edu USA
Frequently Used e-mail Archie Commands
     
The following archie commands are available for e-mail archie search:

E-mail Archie command It does
help sends you the help file. The help command is exclusive, so other commands in the same message are ignored.
path return-address

set mailto(+) return-address
specifies a return e-mail address different from that which is extracted from the message header. If you do not receive a reply from the archie server within several hours, you might need to add a path command to your message request.
list pattern <pattern2 ...> requests a list of the sites in the database that match pattern, with the time at which they were last updated. The result is a list with site names, site IP addresses and date of each site's last update in the database.
site(*) site-name lists the directories and subdirectories of site-name in the database.
whatis string <string2 ...> searches the descriptions of software packages for each string. The search is case insensitive.
prog pattern <pattern2 ...>

find(+) pattern <pattern2>
uses pattern as a Unix regular expression to be matched when searching the database. If multiple pattern are placed on one line, the results will be mailed back in one message. If several lines are sent, each containing a prog command, then multiple messages will be returned, one for each prog line. Results are sorted by FTP site address in lexical order. If pattern contains spaces, it must be quoted with single (') or double (") quotes. The search is case insensitive.
compress(*) causes the result of the current request to be compressed and uuencoded. When you receive the reply, you should run it through uudecode, to produce a .Z file. You can then run uncompress on the .Z file and get the result of your request.
set compress(+) compress-method specifies the compression method (none or compress) to be used before mailing the result of the current request. The default is none.
set encode(+) encode-method specifies the encoding method (none or uuencode) to be used before mailing the result of the current request. This variable is ignored if compress is not set. The default is none.

Note: set compress compress and set encode uuencode would produce the same result as the former compress command.
quit nothing past this point is interpreted. Useful if a signature is automatically appended to the end of your mail messages.
Maximizing Archie search using 'Patterns'
     
A pattern is a specification of a character string, and may include characters which take a special meaning. The special meaning will be lost if "\" is put before the character, i.e., just use the character as is. The special characters are:


Pattern Character It means
. (Period) this is the wildcard character that replaces any single character, e.g. "...." will match any 4-character string.
^ (caret) if "^" appears at the beginning of the pattern, then only strings which start with the substring following the "^" will match the pattern. If the substring occurs anywhere else in the string it does not match the pattern, e.g.

"^efghi" will match "efghi" or "efghijlk" but not "abcefghi"
$ (dollar) if "$" appears at the end of the pattern, then the searched string must end with the substring preceding the "$". If the substring occurs anywhere else in the searched string, it is not considered to match, e.g.

"efghi$" will match "efghi" or "abcdefghi" but not "efghijkl"