Chapter 1.  Introduction

Aether is an application for indexing and searching unstructured collections. Unstructured, in a way that the collections may be overlapping, may be randomly put together, or may be too large to go through manually. The collections themselves is meant to be mainly books and documents, but also music may work. For items like music we are solely dependent on indexing metadata. Since the collections may be overlapping, the main id is an (md5) checksum, and it won't be indexed again if already found. The application consists of a web client, that may connect to a server. Both searching and indexing is initiated through the web client. The server may be a single server, or a set of clustered/cloudy servers. It is currently set for a standard analyzer (presumably for English), and is automatically able to recognize language. Given a bit of manual training with classifier software, it is possible to use the training data to automatically classify the documents during indexing.