2 Text Questions * 5 Analytics Experts = 10 Text Analytics Gems
Ezra Steinberg who interviewed me last month about Text Analytics and OdinText just published the Q&A below. I think the questions are very relevant for many of us trying to understand social media. I received permission from Text Analytics News (and The 7th Annual Text Analytics Summit) to republish the piece here on NGMR. If you have any comments or questions about text analytics in social media feel free to leave them here.
In preparation for the Text Analytics Summit I had the pleasure of asking five of the foremost experts in text analytics a couple of questions on one of the sexiest areas in analytics, the intersection of text and social media.
I spoke to:
- Tom H. C. Anderson CEO and Founder Anderson Analytics (Developers of OdinText)
- David Howlett Senior Director, Consumer Insights and Strategy J.D. Power and Associates (Co-founder and VP Insights & Strategy at Umbria, acquired by J.D. Power in 2008)
- Don Springer CEO Collective Intellect
- Lisa Joy Rosner CMO, NetBase
- Themos Kalafatis MSc, respected Predictive Analytics Consultant, LifeAnalytics
I’m sure you’ll find the responses below as interesting as I did.
Social media is very broad. It depends on what part of social media we’re talking about. Generally we think about social media as the most ‘unstructured’ text data projects (pun intended). However, in many ways it is also ideal for leveraging the benefits of text analytics software as it’s theoretically limitless, continuous and free. These same opportunities also pose the biggest challenges.
I’ve found the work we, and the subsequent analytics best practices we developed over the course of six years in working with various software and client projects has proven invaluable in helping us to build software that actually provides the insights clients expect. I think because social media is new, some have approached the text analytics software without thinking about proper methodological considerations. This has been one of the biggest shortcomings of the industry I believe.
Anyway, part of our personal challenge has been in helping the client identify what data is most relevant for their purposes. We find that more well defined parameters allow us to customize the analysis or software to guarantee key objectives are met. This can be challenging as many are currently overwhelmed by the sheer availability of data and often think they need to monitor everything.
In using text analytics to analyze social media, I find that there are multiple challenges. One of the biggest is that social media has turned language upside-down. People no longer use proper grammar of real words for that matter. Social media is an informal, anonymous forum and as a result, often filled with the use of what I call “slanguage”; and what others call , urban speech, emoticons, misspellings and jargon. In order to understand and classify these posts, the text analytics system has to be “taught” this new language and also understand the context in which these expressions are being used.
Another challenge lies in understanding at a deeper level whether there are false positives or false negatives or a negative word used in a positive light. For example, take the sentence “I know Listerine kills germs because it hurts”. Most semantic technologies might interpret the sentence as representing negative sentiment because of the words “kills and hurts”. Only through language understanding and diagramming out the sentence to extract meaning can it be determined that “killing germs” is a good thing and “hurts” is positive in this context of the sentence. The key here is to understand the word “because”.
While there are multiple other challenges still to confront in this ever-growing area, another challenge is filtering posts and disambiguating brand names - such as, Cheer, Target, Tide, etc. In order to assure that analysis is done on relevant messages, Spam, “for sale” sites, and other unusable social media needs to be cleaned up in order to differentiate the signal from the noise. A component of filtering is to ascertain if the mentions are rich in insights or just white noise about your brand (example: “Kicking back tonight with a Coke and a movie rental” vs. a really impactful sound bite such as “I really want to buy Coke because I need my caffeine fix right now!”).
Lastly, there is much sarcasm expressed in social media and machines are hard pressed to learn and understand that nuance. In fact, humans cannot always agree upon or “get” the intent of these types of comments. Sarcasm is a topic yet to be tackled.
The noise in social media data is a particularly big challenge relative to more topic-focused data sources like enterprise data. Call center comments, inbound emails, even mainstream media are typically more centered around one thought or one topic within each document. Social media is particularly difficult in that social media, like blogs, can span many topics within one document, which means that you need to have the ability to parse the document and identify each of the distinct topics within the document. Then, anchor the sentiment around the right topics, which is another particularly tricky problem to solve.
Also, social media short content, like microblogs, can present its own challenges, particularly when so many tweets are re-tweets and/or links to other pieces of social media, which can provide greater insight into the topic and context of the tweet. Understanding how to parse the short content and track back to original sources like re-tweets and links, enables the user to gain a deeper understanding into the author and the context of their communication. This can then be used in aggregate to better illuminate insights and understand potential trends uncovered using social media text analysis.
Comments processing also presents a difficult challenge, parsing and associating the appropriate topic and sentiment with the appropriate author is essential to understand the relative viewpoints, both pros and cons, associated with a specific social media post or topic. Comments are helpful in understanding the importance of a social media post or topic (e.g., like looking at the number of comments) as well as the polarity (i.e., by analyzing the polarity of sentiment in the comments).
Fortunately, companies like NetBase and J.D. Power have specialists employed to help address these issues and are using the most advanced text mining algorithms and approaches to solve for these difficult to solve challenges and insure that the hidden insights and “aha’s” residing deep within social media sources are uncovered and illuminated for marketers.
One of the primary challenges is educating management that social analytics should not be treated as a siloed effort, the sole purview of a specific business unit. This shift in perception from siloed social efforts to an enterprise social business represents a new form of data management that blends social knowledge with behavioral, transactional and referral details at the customer level.
Integrating social analytics - real time analysis from a variety of social media platforms and topics - into new or existing data management tools is the foundation for social business and presents its own challenges. The process starts from an outside-in perspective of mapping the social customer to existing consumer details that reside in an enterprise CRM or data management system. Once an enterprise has successfully integrated social analytics, the more advanced clients have systems optimized for social business - Social CRM, Social Business Intelligence or Social Business Management - that benefit the entire organization.
In many ways, social analytics represents another channel of data similar to more traditional or digital information. But the difference is that social analytics manifests true voice of customer, their intentions and preferences, unstructured and in real-time. Understanding the social customer necessitates an enterprise class social listening tool that uses social intelligence as a signal for when a specific social customer should receive branded engagements from an enterprise.
I would say that there are challenges on many levels. The first major challenge is the nature of Social Media Data itself. We have to be aware of the issues associated with Social Media data (such as Sampling Bias). Data can be collected but its not generally available for those who wish to analyze this information on a mass scale. Usually we do not have information about the sex or the age of someone expressing his/her opinion. Identifying automatically spam and uninteresting information within Social Data is also tough work.
The second major challenge is the techniques used for analyzing Text. The fact is that no software is able to fully “understand” the subtleties of human language and Knowledge consumers (Say Marketers) need to be aware of this. I have no doubt that Text Analytics can provide with ideas to work and insights in Consumer Behavior. But all professionals wishing to use Text Analytics first need to understand -to some extent- how this technology works and its limitations.
The key difference that makes Text Analytics in Social Media important is its ability to extract knowledge. Decision makers need insights and quite often they want to have them as soon as possible because they need to act fast. So the third challenge is the ability to set the goals for the analysis, collect / analyze the available information and have results, all in a time-efficient manner. Knowing what the technology can do and how it works can also be of great help in getting results fast.
Knowledge consumers need also to be aware that there will be actionable and non-actionable insights : For example recently i have analyzed Tweets of people visiting or having visited a shopping Mall. A segment of customers expressed negative sentiment in their Tweets because they couldn’t walk with the pace they wanted since the mall was crowded. There is not much you can do about that. However, it is good to know that this thought exists in the minds of some of your customers and also whether this customer segment is growing or decreasing as time passes by.