The Internet as the fulfillment of the dream of the ultimate, universal library “at your fingertips” has been predicted for a long time now, but so far it has not become reality. Practically all the information one could wish for is on-line, and if it is not there yet, it is coming on-line ”real soon now”, it is ”Under construction”. So, assuming that all kinds of information will be available through the World Wide Web within not too many years, why is it that we still can not always quickly find the information we need, when we need it?
The answer is, as it has been for some time now, that we have not been able to find a practical, fast and efficient way to keep track of what information is where on the Net. A number of approaches to solving this problem have been tried, with some of the major, general search engines like Infoseek, Alta Vista and Excite and the Yahoo! directory seemingly the most successful ones. Still, even with the best search engines, claiming to have millions of Web pages indexed, most people have problems with finding exactly what they want, unless they spend a lot of time browsing through the numerous URLs suggested by the search engines.
There are several reasons for the occurrence of this situation. Some search engines operate by performing a more or less raw text search through the millions and millions of Web pages that are stored in some enormous storage structure. This search most often results in a lot of ”hits”, meaning links to pages that the search engine believes, sadly not always correctly, are more likely than not to be of interest to the information-seeking person. Other search tools provide an index neatly structured into an intuitive hierarchy of topics, so that people can find the pages they need by locating the correct area of interest in the hierarchy. The drawback of this approach is that someone must file every entry in the correct place in the hierarchy. Because of the efforts this presently requires, it is practically impossible for a finite number of people to file entries from the Web into the hierarchy as fast as new Web pages appear on the Net. Hence, only a fraction of all available documents on the Web are represented in these search engine indexes.
With this in mind, it is obvious what the core problem is; we need a new and elegant way to put information into a context, so that we can do our searching on the Net within specific areas of interest. Information without a context is often of no value. In the book by Douglas [Adams, 1979]: ”Hitchhikers Guide to the Galaxy”, human kind is told by a computer (of course), after millions of years of calculations, that the answer to the Question of Life, the Universe and Everything is… *drumroll* ”42”. Although the philosophers’ guild is very pleased by this obtuse piece of information, the answer ”42”, without its context, namely what “the Question of Life, the Universe and Everything” is, human kind is not brought any closer to solving the mysteries of life.
A main goal of this report is to solve the two-fold core problem. First we must find a way to efficiently gather and sort information about Web pages into suitable contexts, and second we have to come up with a new and better user interface for searching, which also allows contextual searching. As we shall see, achieving our goal will require a smoothening of many facets.
In Chapter 3 we identify what prevents the Web from becoming the ultimate library, namely librarians. Requirements for the Web librarians’ tasks and desired capabilities are listed, and we present three scenarios that illustrate how we want our context-based, librarian-assisted search tool to perform.
Chapter 4 is an introduction to agent technology. This is something we regard as an important enabling technology, for our purpose practically unused until now, which can help us in creating librarians for the Internet “library”.
An overview of the first major part of EDDIC, the search tool system we suggest, is an agent-supported indexing and classification system which is presented in Chapter 5. Text analysis theory, keyword identification techniques and reasons for choosing the Dewey Decimal Classification as basis for a metadata code format are also found here.
Chapter 6 discusses and presents the metadata code format for our solution, with descriptions of code fields we consider to be of importance to be able to offer flexible, contextual searching.
The second major part of the EDDIC search tool system, namely a user interface that takes advantage of agent technology and the metadata codes we have described earlier in the report, is suggested in Chapter 7. We present a typology of four different modes of searching, and outline a different user interface for each mode. The possibility of personalizing user interfaces and assistance in the search process is also discussed.
Chapter 8 contains comments about a number of aspects that have to be considered carefully before the system is implemented; the necessity of having agents “speak” a universal agent language, what partners we need to cooperate with to be successful, what personnel is needed to build and maintain the service and some ideas on how to finance the search service. Finally, we identify a number of research disciplines that must be involved in an implementation of the system.
In the last chapter, Chapter
9, we look at how our suggested system adresses the problems that were
identified in the beginning of the report. We show the new possibilities
our system adds, compared to traditional search engines and directories,
and briefly discuss the next steps that must be taken to implement the
system. We conclude the report with a summary of what we have achieved.