NEWS2 June 2010
All MRS websites use cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here.
All MRS websites use cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here.
UK— Automated sentiment analysis is less accurate then flipping a coin when it comes to determining whether brand mentions in social media are positive or negative, according to a white paper from FreshMinds.
Tests of a range of different social media monitoring tools conducted by the research consultancy found that comments were, on average, correctly categorised only 30% of the time.
FreshMinds’ experiment involved tools from Alterian, Biz360, Brandwatch, Nielsen, Radian6, Scoutlabs and Sysomos. The products were tested on how well they assessed comments made about the coffee chain Starbucks, with the comments also having been manually coded.
On aggregate the results look good, said FreshMinds. Accuracy levels were between 60% and 80% when the automated tools were reporting whether a brand mention was either positive, negative or neutral.
“However, this masks what is really going on here,” writes Matt Rhodes, a director of sister company FreshNetworks, in a blog post. “In our test case on the Starbucks brand, approximately 80% of all comments we found were neutral in nature.
“For brands, the positive and negative conversations are of most importance and it is here that automated sentiment analysis really fails,” Rhodes said.
Excluding the neutral comments, FreshMinds manually coded conversations that the tools judged to be either positive or negative in tone. “We were shocked that, without ‘training the tools’, they could be so wrong,” said the firm. “While positive sentiment was more consistently categorised than negative, not one tool achieved the 60-80% accuracy we saw at the aggregate level.
“To get real value from any social media monitoring tool, ongoing human refinement and interpretation is essential,” said the company.
The full whitepaper can be download online here. Get the lowdown on social media monitoring here.
Newsletter
Sign up for the latest news and opinion.
You will be asked to create an account which also gives you free access to premium Impact content.
Media evaluation firm Comscore has increased its revenue in the second quarter but has made a net loss of $44.9m, a… https://t.co/rAHZYxiapz
RT @ImpactMRS: Marginalised groups are asserting themselves in Latin America, with diverse creative energy and an embrace of indigenous cul…
There is no evidence that Facebook’s worldwide popularity is linked to widespread psychological harm, according to… https://t.co/wS1Um3JRS5
The world's leading job site for research and insight
Resources Group
Qualitative Senior Research Exec – London / Hybrid working
Up to circa £35,000 + Benefits
Resources Group
Project Manager – Quantitative – Dynamic Boutique Agency
£30–40,000 + good benefits
Spalding Goobey Associates
Senior Research Executive, Mixed Methods – Technology and IT
£Excellent Package
Annie Pettit
15 years ago
It all comes down to doing the work to get the automated systems working as best as possible. The amount of validation work that must going into creating an automated sentiment analysis system that is accurate is simply enormous and continually ongoing. Systems that do not incorporate ongoing validity mechanisms cannot improve and will only worsen over time as speech and language changes with the times. What this says to me is buyer beware and buyer do your homework. Ask your vendor if they validate their engines, how they do it, and how often they do it. Annie Pettit, Chief Research Officer www.conversition.com
Mark Westaby
15 years ago
These findings will come as no surprise to companies that use automated analysis properly. Using automated analysis for individual pieces of coverage and without 'training' the software is never going to produce good results; and, in this respect, the Freshminds study is itself flawed because, frankly, they should understand that. Equally, the companies studied should not be offering generic automated analysis services for exactly this reason, so in that respect the study is valid. In fact, automated analysis used properly can achieve remarkably accurate results. Something the study does not do, of course, is compare the use of properly trained and correctly used automated analysis against humans to analyse, say, 1000 pieces of online coverage in real-time, which is increasingly required in today's highly connected world. Had they done so the automated analysis would win hands-down. In other words it's 'horses for courses' and this study really should have pointed that out.
Nikki Wright, FreshMinds
15 years ago
Thanks for your comments Matt. Many of our clients come to us having attempted to try Social Media Monitoring for themselves and after discovering the issues we highlighted in our report. Through the research it was our intention to see how the tools varied without such 'training' as this is not always consistent and is certainly not always used by clients. We've had some great feedback, particularly from the tool providers and we plan to update this research shortly.
Jo Shaw
15 years ago
Couldn't agree more. Social media measurement has a LONG way to go, and no number of funky dashboards and black box algorhythms is going to make a difference until some pretty fundamental weaknesses have been addressed. http://tiny.cc/d9ld5
Mike Daniels
15 years ago
As regular visitors to this site may recall, Mark Westaby and I debated this very question - whether automated tools could provide sufficiently accurate sentiment analysis to support critical business decisions - earlier this year. This study supports what is now generally considered a settled view – that automated analysis tools cannot, and generally do not pretend to deliver the same levels of sentiment accuracy as well trained, fully briefed human analysts. However, as others have pointed out, there are inadequacies in this study. But I would contend that these research issues do not detract from the central finding that when it comes to sentiment, automated tools are simply not as accurate or consistent as humans. Faced with this finding, proponents of automated tools, as Mark is, often retreat into justifying their use by virtue of their benefits in "real time" analysis. However, in practice, real time analysis is really only necessary in crisis or rapid response situations. And the paradox is that in these situations, whilst actionable results can be achieved by automated tools, there is actually no need for sentiment analysis. Crises are marked by specific topics under discussion – it is these that need to be tracked. Their very presence will indicate where remedial or defensive action may be required – no sentiment required... In more strategic contexts, where business insight and support for business outcomes are critical, delivering accurate, reliable, consistent and robust sentiment analysis from trained human analysts massively outweighs the constant nagging doubt about the consistency and accuracy of data from an automated platform. In our experience, owners of valuable brands simply cannot, and indeed do not take the risk of using such potentially inaccurate data in determining the performance of their assets. The noticeable swing back to human-derived analytics from companies previously using automated only tools is tangible proof of that particular pudding. As a sidebar, I would strongly dispute the study's view that neutral coverage is somehow less important than positive/negative sentiment , especially in relation to building and sustaining brands and reputation – and even more so in a competitive context. There are plenty of research studies showing that neutral brand visibility helps build awareness, and, more importantly also serves to build up reputational "trust bank" reserves...
Brian Tarran
15 years ago
Thanks for all the comments. Just to pick up on Mike Daniels' reference to his head-to-head debate with Mark Westaby on people vs. machine analysis – that piece can be found here: http://bit.ly/gLHJ8
David Geddes
15 years ago
First, we see automated sentiment scoring as part of our business process to assist analysts rather than as a stand alone tool. Second, the white paper is not especially transparent about the statistical tools and methods used to arrive at their conclusion. It is meaningless to lump all the systems together and lament that only 30% of the posts were scored accurately. Likewise, it is not meaningful to say that the best system achieved around 50% accuracy. How was this calculated? Third, I continue to be amazed by the success achieved by computer scientists in their models using automated sentiment scoring. Fourth, I am surprised by the claim that Twitter is easier to rate due to short text length. All academic and research papers I have read state the opposite. Finally, are we falling into a trap of feeling that we have to provide a sentiment score for everything to achieve the results we need? I am regularly impressed with results reported by academics where they use manual scoring on a small sample of stories (say 1,000). Why do business clients always want scoring of everything? Is this overkill? Should we instead revert to an appropriate sample-based research design to address specific client questions.
Mark Evans
15 years ago
Brian, While automated sentiment technology isn't perfect, it is improving on a steady basis as the technology evolves. At the same time, it is important to recognize that technology does a lot of grunt work in processing millions of conversations - something that couldn't be done manually. As well, there is a role for people to play alongside automated sentiment technology to make sure that the results can be edited or tweaked to reflect context, sarcasm, etc. In many respects, social media sentiment works effectively if there is a solid marriage between technology and people. cheers, Mark Mark Evans Director of Communications Sysomos Inc.
Katie Paine
15 years ago
Unfortunately the study tested the most popular, but least reliable of the systems available. I'm convinced that PR people would rather measure what is easy to measure than to measure accurately. Just as an FYI, we routinely test humans against humans to ensure a minimum 90% intercoder reliability score and THEN test automated sentiment analysis against that. The only system that comes close is SAS's Social Media Analysis, but that's in part because they are a client of ours and used our coding instructions to design their system.
Aditi Muralidharan
15 years ago
Trying them out "without training" makes no sense, and if I were a company using this sort of software to analyze my brand I'd make sure to train it first. To anyone who's familiar with the literature on this topic, it's not at all surprising that untrained accuracies would be abysmal. I agree with Mark Westaby, it's been demonstrated over and over again that an automatic sentiment analyzer needs to be trained to avoid being hopelessly bad, so this study is flawed.
Brought to you by:
©2025 The Market Research Society,
15 Northburgh Street, London EC1V 0JR
Tel: +44 (0)20 7490 4911
info@mrs.org.uk
The post-demographic consumerism trend means segments such age are often outdated, from @trendwatching #TrendSemLON
22 Comments
Annie Pettit
15 years ago
It all comes down to doing the work to get the automated systems working as best as possible. The amount of validation work that must going into creating an automated sentiment analysis system that is accurate is simply enormous and continually ongoing. Systems that do not incorporate ongoing validity mechanisms cannot improve and will only worsen over time as speech and language changes with the times. What this says to me is buyer beware and buyer do your homework. Ask your vendor if they validate their engines, how they do it, and how often they do it. Annie Pettit, Chief Research Officer www.conversition.com
Like Reply Report
Mark Westaby
15 years ago
These findings will come as no surprise to companies that use automated analysis properly. Using automated analysis for individual pieces of coverage and without 'training' the software is never going to produce good results; and, in this respect, the Freshminds study is itself flawed because, frankly, they should understand that. Equally, the companies studied should not be offering generic automated analysis services for exactly this reason, so in that respect the study is valid. In fact, automated analysis used properly can achieve remarkably accurate results. Something the study does not do, of course, is compare the use of properly trained and correctly used automated analysis against humans to analyse, say, 1000 pieces of online coverage in real-time, which is increasingly required in today's highly connected world. Had they done so the automated analysis would win hands-down. In other words it's 'horses for courses' and this study really should have pointed that out.
Like Reply Report
Nikki Wright, FreshMinds
15 years ago
Thanks for your comments Matt. Many of our clients come to us having attempted to try Social Media Monitoring for themselves and after discovering the issues we highlighted in our report. Through the research it was our intention to see how the tools varied without such 'training' as this is not always consistent and is certainly not always used by clients. We've had some great feedback, particularly from the tool providers and we plan to update this research shortly.
Like Reply Report
Jo Shaw
15 years ago
Couldn't agree more. Social media measurement has a LONG way to go, and no number of funky dashboards and black box algorhythms is going to make a difference until some pretty fundamental weaknesses have been addressed. http://tiny.cc/d9ld5
Like Reply Report
Mike Daniels
15 years ago
As regular visitors to this site may recall, Mark Westaby and I debated this very question - whether automated tools could provide sufficiently accurate sentiment analysis to support critical business decisions - earlier this year. This study supports what is now generally considered a settled view – that automated analysis tools cannot, and generally do not pretend to deliver the same levels of sentiment accuracy as well trained, fully briefed human analysts. However, as others have pointed out, there are inadequacies in this study. But I would contend that these research issues do not detract from the central finding that when it comes to sentiment, automated tools are simply not as accurate or consistent as humans. Faced with this finding, proponents of automated tools, as Mark is, often retreat into justifying their use by virtue of their benefits in "real time" analysis. However, in practice, real time analysis is really only necessary in crisis or rapid response situations. And the paradox is that in these situations, whilst actionable results can be achieved by automated tools, there is actually no need for sentiment analysis. Crises are marked by specific topics under discussion – it is these that need to be tracked. Their very presence will indicate where remedial or defensive action may be required – no sentiment required... In more strategic contexts, where business insight and support for business outcomes are critical, delivering accurate, reliable, consistent and robust sentiment analysis from trained human analysts massively outweighs the constant nagging doubt about the consistency and accuracy of data from an automated platform. In our experience, owners of valuable brands simply cannot, and indeed do not take the risk of using such potentially inaccurate data in determining the performance of their assets. The noticeable swing back to human-derived analytics from companies previously using automated only tools is tangible proof of that particular pudding. As a sidebar, I would strongly dispute the study's view that neutral coverage is somehow less important than positive/negative sentiment , especially in relation to building and sustaining brands and reputation – and even more so in a competitive context. There are plenty of research studies showing that neutral brand visibility helps build awareness, and, more importantly also serves to build up reputational "trust bank" reserves...
Like Reply Report
Brian Tarran
15 years ago
Thanks for all the comments. Just to pick up on Mike Daniels' reference to his head-to-head debate with Mark Westaby on people vs. machine analysis – that piece can be found here: http://bit.ly/gLHJ8
Like Reply Report
David Geddes
15 years ago
First, we see automated sentiment scoring as part of our business process to assist analysts rather than as a stand alone tool. Second, the white paper is not especially transparent about the statistical tools and methods used to arrive at their conclusion. It is meaningless to lump all the systems together and lament that only 30% of the posts were scored accurately. Likewise, it is not meaningful to say that the best system achieved around 50% accuracy. How was this calculated? Third, I continue to be amazed by the success achieved by computer scientists in their models using automated sentiment scoring. Fourth, I am surprised by the claim that Twitter is easier to rate due to short text length. All academic and research papers I have read state the opposite. Finally, are we falling into a trap of feeling that we have to provide a sentiment score for everything to achieve the results we need? I am regularly impressed with results reported by academics where they use manual scoring on a small sample of stories (say 1,000). Why do business clients always want scoring of everything? Is this overkill? Should we instead revert to an appropriate sample-based research design to address specific client questions.
Like Reply Report
Mark Evans
15 years ago
Brian, While automated sentiment technology isn't perfect, it is improving on a steady basis as the technology evolves. At the same time, it is important to recognize that technology does a lot of grunt work in processing millions of conversations - something that couldn't be done manually. As well, there is a role for people to play alongside automated sentiment technology to make sure that the results can be edited or tweaked to reflect context, sarcasm, etc. In many respects, social media sentiment works effectively if there is a solid marriage between technology and people. cheers, Mark Mark Evans Director of Communications Sysomos Inc.
Like Reply Report
Katie Paine
15 years ago
Unfortunately the study tested the most popular, but least reliable of the systems available. I'm convinced that PR people would rather measure what is easy to measure than to measure accurately. Just as an FYI, we routinely test humans against humans to ensure a minimum 90% intercoder reliability score and THEN test automated sentiment analysis against that. The only system that comes close is SAS's Social Media Analysis, but that's in part because they are a client of ours and used our coding instructions to design their system.
Like Reply Report
Aditi Muralidharan
15 years ago
Trying them out "without training" makes no sense, and if I were a company using this sort of software to analyze my brand I'd make sure to train it first. To anyone who's familiar with the literature on this topic, it's not at all surprising that untrained accuracies would be abysmal. I agree with Mark Westaby, it's been demonstrated over and over again that an automatic sentiment analyzer needs to be trained to avoid being hopelessly bad, so this study is flawed.
Like Reply Report