We’re always hearing about the year of this and that. This year was going to be the year of search, according to the buzz at last year’s Online Information conference . Maybe it was/is but I guess, for many of you, search has been somewhat central to your lives for many years now.
But what about the filtering and classification side? We all need something to do the heavy lifting for us. Something that pre-sorts the wheat from the chaff without straining our own neurons too much.
The good news is that next year is going to be the year of ‘sentiment analysis’, a bolt-on for search that determines the prevailing mood of document authors. Saul Haydon Rowe of British company Corpora believes that sentiment analysis software is “finally commercially mature” and that someone, not necessarily his own company, will announce a major win later this year.
Once sentiment analysis goes mainstream, we’ll wonder how we managed without it. Machine searches which can grade material according to the author’s overriding sentiment will greatly help us place an appropriate value on the story and track mood trends in general.
The applications are endless, but media analysis in order to track corporate reputation and market moods has real commercial value. It could also be used inside companies to analyse all manner of traditional digital content, as well as forums, blogs and emails. How about a pharmaceutical company that needs to find negative research feedback from a drug trial? Thousands of reports could be sifted in minutes.
I’ve been experimenting, in a small way, with Corpora’s News! to follow the network neutrality debate in America. Most of the discussion has been positive, even though the individual articles take different sides in the debate. Corpora’s software provides graphs of sentiment trends and bar charts from which you can drill down to the articles themselves.
In order to arrive at sensible conclusions, sentiment analysis software has
to understand context. For example, “fighting” and “disease” is negative in a
war context but positive in a medical one. This discrimination is something that
humans are very good at compared with computer software. Now the gap
has closed. In Corpora’s case, it reckons that its software hits 75% accuracy
compared with the 82% chance of two or more human analysts agreeing with each
other.
On average, humans process six articles per hour against the machine’s
throughput of 10 per second. I make that 6,000 times faster with a relative
accuracy of 91.4%. The software gives a confidence rating for its predictions
that can either be used as a filter or as a warning to involve humans in
the loop.
The way Corpora works is to throw out anything that doesn’t relate to sentiment before going into the assessment and machine-learning stage. Apart from obvious positive and negative words, it looks out for clues in sentence structure and the creative use of quotation and exclamation marks.
There are also some players on open source platforms like www.jane16.com for example, building text analysis tool.
In this world of total information overload, we need all the help we can get. Sentiment analysis promises to help us tackle the problem of filtering information by the quality of its source.