Engineering Te Ara
Software
Gliding near Ōmarama
The sheer volume of material to be housed by Te Ara, combined with a publishing process that involves input from a variety of internal and external contributors, researchers, authors and editors, meant that a robust, scalable and flexible content and workflow management platform was required. In addition, the technology base had to be 'future-proof' and standards-based, given the long-term nature of the project.
The Ministry for Culture and Heritage (MCH) opted for a Microsoft-based solution, with the foundations being Microsoft Content Management Server (CMS) and ASP.NET. A Sharepoint Portal Server, Microsoft Content Management System, and Microsoft SQL Server are also part of the content management solution. Standards such as XML and the Dublin Core Metadata Initiative were used to ensure interoperability and longevity.
Software company Optimation implemented, configured and integrated these software systems and custom-developed a number of .NET web services to handle specialised tasks such as automated content upload. Optimation also used .NET technology to build the 'engine' that underpins the public website.
One of Optimation's big challenges was designing a framework capable of accommodating and handling the massive volume of data Te Ara will eventually contain. The system also had to be flexible enough to deal with complex hierarchies of information and offer different ways to navigate and drill down through the content. The site's search engine had to cover many media types held not only in the main encyclopedia but also by the additional resources available through the site, such material from the 1966 Encyclopaedia of New Zealand and the Dictionary of New Zealand Biography. It also has to be effective in either Māori or English, and return results across a number of different categories – 'Te Ara ', 'Images and Media', 'Biographies' and '1966 Encyclopedia'.
The content crawling and indexing features of SharePoint Portal Server were used to provide the search facility, but some specialised fine-tuning was needed to provide the kind of search functionality required by Te Ara. Optimation used advanced search techniques within the Sharepoint programming library and customised the logic and functionality to enable search features such as: Boolean search using 'and', 'or', 'not', 'near' and exact phrases, and the use of keywords to increase the relevance of search results. Optimation used MS Indexing and SharePoint components to exclude the contents of footers, citations and picture captions from search results. Entry synopsis which included text before and after search words, so results are presented as short text abstracts.
