It’s a jarring juxtaposition. The yellowing, aged pages of 19th century literature are literally trapped within a casement of black angle iron, surrounded by clear tubes and naked white light while robotic arms perform a mechanical dance across the pages. This is the future of reading and information access, and its name is digitisation.
Deep in the bowels of the British Library in London, shelf after shelf of historical literature is subject to what looks almost like a rape of literature by the march of the machines as books are turned into internet artefacts.
But the truth is that no pages are torn and these machines will promote
information usage at the same time as they preserve these historic books for the
ages to come.
Two pilot runs have been completed and the digitisation process is now in full
swing, with 50,000 pages a day being captured for digital access that’s about
a million pages a month swallowing 30 terabytes of storage capacity.
For the British Library (BL), this link-up with Microsoft, announced in 2005, is the largest and most comprehensive digitisation programme the national library has ever been involved in. Kristian Jensen, head of British and early printed collections, likens it to the first microfilm archiving projects.
“Previously, we chose item by item, which perpetuates those texts that are already well known,” Jensen said.
This project differs in that 19th century British literature is being digitised, literally shelf by shelf.
It’s a process that excites Jensen. “We are able to open up our collections. Teachers want to be able to open up their curriculums through different texts, but many texts of this age are not known. There is a rigid canon of Dickens and Austen.”
By digitising the collections shelf by shelf, the project is reverting to how the holding was put together in the 19th century, which means there are categories for female poetry of the time and many others. “Dickens is now in association with authors that may have been as famous and popular as he was at the time, but are not today,” Jensen said.
There are clear benefits for the BL’s future, not only in being able to provide electronic access to its collections, but the shelf-by-shelf method provides a more comprehensive and non-selected digitised collection that will be easier to add to in future digitisation runs.
Technical demands
Digital files created by the process are backed up at the BL’s Boston Spa
premises, and the National Library of
Wales has also joined the project as a backup site. It is going to
take two years to digitise the first 25 million pages within the 19th century
literature collection. Digitisation is being carried out by an outsourced
supplier CCS which has to operate to the METS and ALTO digital information
standards.
Microsoft and the BL hope this will be an ongoing project. Content from the digitisation will be available on the Books.live.com online service and in the BL from its catalogue system. Microsoft is covering the cost of the digitisation.
Neil Fitzgerald, digitisation project manager, said that BL was matching the input with services and that the software giant was learning a great deal from the project.
Google’s attempts to digitise libraries have been met with a barrage of derision, which the BL as a legal deposit library could not afford to be embroiled in. Ben White, copyright compliance and licensing manager, told IWR that they used the much loved Oxford Dictionary of National Biography, and the Author’s Licensing Collecting Society, to compile a database and a methodology to weed out the books that were still protected by copyright and found only 1% of those it wished to digitise were affected. Advertisements were also placed in the Times Literary Supplement. A further database of those that were not digitised was also created to help with future digitisation when titles are no longer protected. The final digital files will be copyrighted to the BL.
Digitisation will not only increase access to BL holdings, it will also ensure that many of these works can be accessed by future generations. “For most material, wear and tear is becoming an issue,” said Jensen. “Ninenteenth century literature that was popular was printed on poor-quality paper, and these texts have been used heavily. Some material will benefit from not being handled.”
Jensen was quick to point out that access would still be given and that a digitised text had advantages. “The feeling is important and this system allows users to know if a text really is the one they want to handle.”
Geography is likely to be the second subject matter to be fed to the machines in the BL bowels. The organisation is bullish about how much it can digitise and the strength of its relationship with Microsoft.
The age of the machine may already be upon us, but it’s nothing but beneficial. A robotic submarine is bringing the treasures of library collections to the surface that all information users swim on today: the internet.