When Oracle proclaims that its Secure Enterprise Search 10g is “one of our
biggest announcements for many, many years”, then you can be sure that it is
either trying to queer someone else’s pitch or it is onto something good. Or
both.
Secure is the keyword in Oracle’s offering. By embedding the word in the product
name it subliminally hints that its competitors’ products are not secure. Smart
move by Oracle.
A few months later, in July this year, IBM announced IBM WebSphere Content Discovery for Business Intelligence. In the battle of buzzwords IBM chose to use the ‘discovery’ word rather than ‘search’. This implies so much more than boring old search. It suggests ‘text mining’ and the unearthing of previously hidden information. Smart move by IBM.
Autonomy, for its part, has not been backward in coming forward with its own term: ‘meaning-based computing’. After all, what’s the point of any search or discovery unless it leads to meaning? And meaning, suggests that the found information is interpreted and used to give the user answers rather than just lists of documents.
Your place in the new search world
Things are happening to shake up the search/discovery/meaning world. Not least the idea that anyone in the organisation who uses a computer screen to do their job should be able to access the multiple sources of information they need in order to work effectively. This includes both unstructured text and databases, inside and outside the firewall and, increasingly, on the desktop. And users are not interested in searching. They just want to ask a question and get an answer. They want the most relevant information gathered together for them from these multiple sources in response to a single query.
Nick Haddock of Atomic Intelligence and one-time HP researcher told IWR, “information is inherently disorganised and not uniformly stored, and it’s better to provide good search than to provide good organisation of the information.” Users will expect their search queries to be more or less effortless and the results to be relevant, comprehensible and navigable. Their expectations have been set by the simplicity of web-based search engines such as Google or Vivisimo’s Clusty.
Where does all this leave you, the information professional? Well, it means that if your department is not already integrated with the organisation, it soon will be. Whether the initiative comes from you, from IT or from somewhere else, the fact is that your world, the business world and the IT worlds will merge.
Users who are completely ignorant of search strategies will find themselves wrestling with the best ways to connect themselves to the information they need. This is a huge opportunity for information professionals to increase their value and relevance to the organisation. As IDC’s Susan Feldman, research vice-president, content technologies says, “The weak point in any online search is asking the question.” With your knowledge of psychology, subjects and sources, you can provide these users with real leverage.
You may need to adjust yourself away from the search precision you’ve become used to, to the altogether more fuzzy world of discovery, but it’s a chance to help, to train, to create stored searches and to help with identifying useful categories or entities.
The shape of the search infrastructure
The market for access technologies is fragmented at the moment. Online
searches and directories, both free and paid for, abound. Actually they’re all
paid for one way or another, otherwise they’d go out of business. A dozen or so
tools from companies like Copernic, Google, Yahoo!, Microsoft and Apple are able
to search the desktop.
Some enterprise search tools stand alone, including a growing number of
appliances such as Google’s and, slightly to one side, MarkLogic’s XML content
server which turns all manner of documents into XML and then delivers relevant
pieces rather than whole documents. This is heading into master data management
territory where a single consistent view of related information is built from an
organisation’s disparate data sources.
Other search functions are embedded inside applications. Some are part of the enterprise computing platform. Some are able to ‘federate’ their searches, calling on existing search and related services. Search is permeating almost all of Microsoft’s developments and SAP already has powerful search capabilities.
Apart from the fact the industry is going to face more consolidation, it seems inevitable that before long, search is going to become an expected part of the workplace infrastructure. About as remarkable and commonplace as a browser.
Some examples of integration
When Google announced its OneBox for Enterprise, a number of enterprise software companies, including Oracle, SAS and Cognos were quick to support it. It gives users the familiarity of the Google interface with the power to look inside enterprise application data stores as well as web-based information inside and beyond the firewall.
Carnegie-Mellon spin-out, Vivisimo, is currently regarded as quite a hot company. Apart from its web-based Clusty service, it also sells Velocity 5 Enterprise Search. This works across multiple information sources and arranges its results into easy-to-navigate themed folders. The 4.2 version was chosen as InfoWorld’s Best Enterprise Search Solution this year.
IBM has made its UIMA (Unstructured Information Management Architecture) open source to enable developers to collaborate on the creation, development and deployment of technologies for discovering information which is hidden inside unstructured data. This includes documents, images, comment and note fields, e-mail and rich media such as video and audio. It’s a way of combining the underpinning data sources, with the components of the UIMA layer and making the results easily accessible to user-level applications.
Guy Creese, a Burton Group IT analyst wryly observes, “so IBM is now supplying the enabling technology; it will be interesting to see if the search companies can supply the enabling humility to make it work.”
Other vendors, such as Endeca and FAST are already surfacing both kinds of data as a matter of course, presenting a single access point for a host of data sources. And, here’s a wildcard, what about the privately-held Swiss/French (and, more recently, Canadian and British) AMI Software? It was founded in 1999 to search by meaning rather than the more traditional Boolean methods. The MI in its name stands for ‘Meaning Interpreter’ and its first mainstream product was called ‘Enterprise Discovery’. It was somewhat ahead of its time. The system has a web browser front-end and has been working across structured and unstructured data sources for donkey’s years.
Getting results
Apart from being able to mingle different sources of information, modern and future search systems will be able to make a reasonable fist of understanding natural language and tolerate errors and fuzziness in both the query and the information sources. This is to compensate for human error and to ensure that the fullest set of results get delivered.
Some will also be able to work across multiple languages. Many organisations need to be sure that they have unearthed all relevant material. A court case or a drug trial, for example, could be compromised if anything gets overlooked. At a more mundane level, a support desk would need to tap into the most up-to-date and appropriate material, regardless of the actual search terms used. In fact, taking that a step further, natural language recognition could lead to semi- or fully-automated support call management, with the findings being delivered through text-to speech.
However, for the everyday user, results need to be navigated easily, which brings into play the idea of facets or named entities. Inxight (a Xerox PARC spin-off), for example, binds structured and unstructured data together through entities such as name, company, city, weapon, vehicle etc. When the results are displayed, these faceted links can be displayed to the side of the main search results, giving rapid access to related key information without the need to key in further search arguments. Inxight, by the way, provides the LinguistX natural language analyser found inside a number of competing and complementary products.
Information security is an issue, which is why Oracle put the word in its product name. Indexing engines need global authority in order to do their work. But they have to preserve the access rights associated with individual documents, databases and columns. This has always been the case but, with the widening of access to enterprise information, it is more important than ever that sensitive data is held securely and only released according to up-to-date authority settings.
Repetitive searches can be embedded into workflow. The searches can be stored and results gathered in a just-in-time fashion, in case the user needs them. Some products, even desktop programs like Intellext’s Watson, sit there watching what the user’s doing and quietly garnering relevant material for ready-reference. In Watson’s case, it divides the material according to the source. On my machine, it’s news, web, blogs and premium. This is just the sort of help, perhaps geared more to the enterprise, that makes users’ lives easier.
In a more sophisticated approach, SAP is working on knowing the context of the user’s work, so that it can be intelligent about what information it delivers. Vishal Sikka, SAP’s Chief Software Architect, talks of intersections such as those between work life and personal life, between unstructured processes like phone calls and structured activities which don’t break the work flow. By properly understanding the user’s situation at any moment, he sees search evolving, “beyond what is available towards what is needed/relevant to what the user is doing.”
Index all?
Arguments rage between the traditionalists, who suggest that indexing should be restricted to information which is deemed important, and the more modern view that everything should be indexed on the grounds that no-one knows what’s going to be important in the future.
A couple of years ago, in DMReview, Guy Creese noted, “all of a corporation's transactions and musings can be accessed at the touch of a button - it is now cheaper to store everything than spend time deciding what to keep and what to archive.”
So storage costs are no longer an issue but, the naysayers argue that in order to keep indexes up to date, data sources would have to be crawled frequently. This impacts computer and network capacity. But there are technological workarounds now. Index Engines, for example, indexes and classifies everything within the organisation as it is backed up, replicated, snapshotted, archived or vaulted, with no crawling and the resulting indexes are less than eight percent of the original data size.
So the decision, if there’s still one to make, will come down to the nature of the organisation and the perceived risks of not being able to garner every scrap of relevant information in support of its present and future business activities.
Your opportunity
Left to a conventional IT department, the implementation of the above systems
is likely to be sub-optimal. And that’s being polite. The one thing that’s
needed is an understanding of information, of the uses to which it is put and
the psychology of the users. This is where the information professional comes
in. IDC’s Susan Feldman suggests that, “a good information professional should
be able to crawl into the skin of the user and understand their thought
processes.” Sometimes it will be to help the users optimise their searching
activities, sometimes to prepare stored searches for them and sometimes to write
the connectors that reach into data stores.
And one has to suspect that, despite the oodles of computer power and program
intelligence thrown at automatic indexing, it will still fall short and lead to
complaints. Someone needs to be tuning the systems and improving their quality.
Someone needs to be looking for popular searches and creating hotlinks to
make the users’ life easier. In short, someone has to keep a professional eye
out for how the organisation is retrieving and deploying its invaluable
information assets.
And that someone could be you.
All