News centre
ITHOUND
ADVERTISEMENT

Scan and Deliver

The biggest challenge of all in the British Library’s vast digitisation programme has been copyright clearance, as Tracey Caldwell explains

By Tracey Caldwell, Information World Review 11 Jul 2008

The airy WiFi-enabled atrium of the British Library befits a modern national library at the beginning of a millennium that is already being called the Information Age. But it is in the smaller, anonymous back-offices that history is being made. Almost 600 years after the advent of the printing press, work is under way on digitising important books, newspapers and sound recordings as a first step to offering unprecedented access to hard-to-access materials.

The British Library has digitisation projects going on all fronts: 19th century newspapers, archive sound recordings, manuscripts from Central Asia (as part of the International Dunhuang Project) and UK theses for the Ethos e-thesis service. With its mass digitisation of 19th century English literature nearing completion, the British Library faces some tough decisions about what to digitise next. Three of its projects are funded by JISC, which is supporting 16 digitisation schemes in the UK to the tune of £10m. Sound, moving pictures, newspapers, census data, journals and parliamentary papers are all in the process of digitisation.

Digitisation projects have to take many factors into account. For book digitisation, for example, the British Library trialled a range of hardware, software and processes for speed, quality and accuracy. But the biggest project challenge of all proved to be copyright clearance.

The intellectual property rights (IPR) of every potential rights holder has to be considered. In many cases, the British Library has to contact them individually for permission to digitise their work. Ben White, copyright compliance and publisher licensing manager at the British Library, says: “If you are going to go up to 1900, as we are, you have to acknowledge that some of that would be in copyright. In the EU the law is that for books, maps and pictures copyright is life plus 70 years, so if you do the maths you know that some of it is going to be in copyright.” For example, if an author of a book published in 1900 lived until 1938, the work would still be in copyright.

White adds: “To digitise historical out-of-print books, we have to go back to the 1860s. Google is blocking access from the EU to anything post-1865 although you can see it in the US and it may well have been written by a European.”

In the US, copyright law is clearer: copyright has expired on all works published before 1923, paving the way for mass digitisation. It has taken a considerable amount of time to identify copyright holders and seek permissions. The American Association of Law Libraries believes the average permission takes 12 hours to research and chase. When authors can’t be identified or found, the materials are known as orphan works, but there is no international agreement or precedent on how to handle works that are often of little commercial value but potentially of great academic value.

“There is little commercial value to much of this,” says White. “We have advertised our intentions in The TLS and The Bookseller to reach publishers and authors, and we have a notice and takedown policy. In the US this has some validity but in Europe this is not the case.”

If it is informed of a rights holder, the British Library removes digitised materials pending permissions being sought and granted. It is lobbying for legal protection for its stance for all public bodies.

The British Library estimates that a minimum of 40% of all copyright materials ever produced are orphan works and with the explosion in self-publishing on the internet, it is not just a problem with historical items. White believes the time is ripe for fresh thinking on the differences between types of value and types of usage, and how this should be reflected in the copyright regime.

He explains: “Many online projects we have launched have no economic value, but high research value. As an organisation dedicated to balance, we will continue to make the argument that a successful copyright regime must deal better with the different values that copyright law nurtures.”

The British Library has worked to influence the debate and is using its real-life experiences with clearing copyright in its digitisation projects to lend weight to its arguments. Dame Lynne Brindley, chief executive of the Library, was appointed to the Strategic Advisory Board for Intellectual Property (SABIP), which was set up on the recommendation of the Gowers Review of intellectual property to advise government ministers. The British Library is also working with the European i2010
digital libraries initiative and the Strategic Content Alliance.

The British Library makes no bones about the fact that it has a publisher’s take on copyright issues as well as a public interest view. One of the aims of its digitisation projects is to generate income from products with market appeal that can be exploited commercially.

The deal with Microsoft for the digitisation of Victorian literature ticked just about all these boxes and the British Library is now having to come to terms with Microsoft’s recent decision to pull the plug on this funding. Microsoft and the British Library partnered in 2005 to digitise 25 million pages contained in 100,000 of the latter’s out-of copyright printed book collections, starting with Victorian literature.

Delivery of the search results is through Live Search and the British Library’s integrated library system. Microsoft ended the Live Search Books and Live Search Academic projects and took down both sites. Books and scholarly publications
continue to be integrated into Live Search results, but not through separate indexes.
Satya Nadella, senior vice president for search, portal and advertising at Microsoft, says: “We are winding down our digitisation initiatives, including our library scanning and in-copyright book programmes. We recognise that this decision comes as disappointing news to our partners, the publishing and academic communities, and Live Search users.

“The best way for a search engine to make book content available will be by crawling content repositories created by book publishers and libraries. With our investments, the technology to create these repositories is now available at lower costs for those with the commercial interest or public mandate to digitise book content.”

White reckons the costs of the book digitisation project have been shared 50:50 with Microsoft: “Our cost is in selection and appraisal of whether books can be filmed. We have learned a lot in five years through trial and error. We piloted two companies with different workflow and different kit and that process took almost 12 months. We are not using the opportunity to free up physical space but as an opportunity to update catalogue records that have not changed since the 19th century.”

Kirtas Technologies supplies the hardware and German firm CCS handles the software, the people operating the scanners and image processing and workflow. It takes two to three months training for a scanning operator to get up to speed and operators work shifts.

“The scanner will work entirely automatically but we get a better result if it is semi-automatic,” says White. Operators keep a close eye on things as a mechanised arm uses suction to turn the page. The British Library had to exclude very small and very large books at first, but the system’s development has allowed it to go back to do those.

It can now also digitise foldouts such as maps to A2 size. Scan resolution is to a “good enough” standard. White describes the digitisation work as “a learning curve” and one of the aims is to create and share best practice for dealing with copyright
clearance as part of digitisation. JISC has set up a template that people can use to clear copyright, and projects are feeding back on that.

According to Alistair Dunning, head of JISC’s 16 digitisation projects, JISC will publish what has happened in terms of digitisation and intellectual property (IP) at the end of the projects, and that will accessible by all.

“People assumed anything created in the 19th century didn’t need copyright clearance, that there was a gentleman’s agreement,” he says. “The difference is the growing professionalism in terms of clearing IPR.”

An evaluation project has already started to report back, Dunning says. “One of things we have found is it is easy to get people’s agreement regarding their rights over the phone, but less easy to get them to sign a formal document. You need to make sure you have a written agreement and that can take a long time. A lot of our work is providing reassurance; people think they are missing out on the chance of making money. Also, we have to reassure people that when something is commercially valuable they will be protected.”

Naomi Korn is leading efforts to create an IP policy for JISC. She says: “Orphan works will be covered in the new policy but it has not yet been signed off. People need guidance as to due diligence. “JISC is not unusual in being risk-averse. It doesn’t want a risk associated with projects. Yet it is funding digitisation projects where the rights holder can’t always be contacted.”

Korn believes it is important to take a broader view of IP: “We are taking a holistic approach at a much bigger level such as information management and licences to use and store information, capturing and communicating information too.

“Ideally, the policy will be as relevant to new works as it is to the past. If we only talk about discrete historical projects we may create a black hole in terms of the content generated by everyday internet users. We need to think ahead as we don't want another black hole in 20 years' time."

Korn is pessimistic about the pace of progress: "I fear that in 10 years time we will not be much further on and rights holders will still be fighting to protect their business models. We had to fight really hard to get tiny exceptions in the Gowers Review."

The British Library is reporting increased receptiveness to its lobbying on IPR issues such as orphan works on the part of the government, particularly the UK Intellectual Property Office. "Lynne and I have been involved in the i2010 programme, looking at digitisation en masse of European culture; some of the issues naturally enough relate to copyright," says White.

It is also contributing to the development of legislation covering the long-term preservation of digitised material. "Electronic legal deposit was introduced as part of the Legal Deposit Libraries ACT 2003 and we are working with other legal deposit libraries and rights holders to define the parameters of e-legal deposit with the intention of putting in place secondary legislation around it," says White. "Internally we are building the infrastructure for this. "

The process of copyright clearance during the British Library's digitisation of soft targets - works that are mostly out of copyright - is certainly serving to underline the need for secure frameworks within which public and academic institutions can go forward with digitisation initiatives that might include more recent works. Guidance about the exceptions and grey areas is being put together but copyright clearance looks set to be a long hard road for some time to come.


Other websites