Specifications for Repository Federations - Part 1
Our company has been providing search/discovery services for repositories for over 10 years. The first project that we were involved with was EDNA (EDucation Network Australia) - Australia's gateway to education resources. Edna, as it is now known, is still providing search services for content that it catalogs, and also other useful collections of content on the Web via its distributed search. Distributed, or federated search, as it is often called has a number of challenges. Searching multiple collections in real time can have performance/scalability problems but also many repositories and collections use different methods for accessing them. Specifications (and adoption of them) helps federated search implementers. However, just as there are many different search solutions with different interfaces, there are a number of relevant standards and specifications to select from. The edna distributed search mentioned earlier in this article uses some open source distributed search software (openDSM) that we have developed and which utilises a number of these specifications to access different collections. Real time searching of multiple collections provides one way of searching multiple repositories in a single query. Another approach is to 'harvest' information about resources from many repositories to a central repository and to provide a search across that central collection. There are specifications available for this approach but one in particular is very widely adopted. When we have a number of repositories to search across (loosely speaking, a federation) it is useful to be able to describe those repositories (what they contain, what protocols/specifications they use, intended audience, metadata profiles etc) and store that information in some sort of a registry. This gives us at least three types of specifications to look at:
- federated search
- harvesting
- registries
technorati tags: Federated Search, repository specifications,harvesting
