Tom H. C. Anderson - Next Gen Market Research™

More Than Market Research - Gain The Information Advantage

Tom H. C. Anderson - Next Gen Market Research™ header image 6

By PVG viagra

application/rss+xml

The Experts of Text!

March 10th, 2011 · 7 Comments

2 Text Questions * 5 Analytics Experts = 10 Text Analytics Gems

Ezra Steinberg who interviewed me last month about Text Analytics and OdinText just published the Q&A below. I think the questions are very relevant for many of us trying to understand social media. I received permission from Text Analytics News (and The 7th Annual Text Analytics Summit) to republish the piece here on NGMR. If you have any comments or questions about text analytics in social media feel free to leave them here.

@TomHCAnderson

—————

In preparation for the Text Analytics Summit I had the pleasure of asking five of the foremost experts in text analytics a couple of questions on one of the sexiest areas in analytics, the intersection of text and social media.

I spoke to:

I’m sure you’ll find the responses below as interesting as I did.

Text Analytics is being used more and more to understand and analyze social media. What are the biggest challenges in this area?

Tom (Anderson Analytics)

Social media is very broad. It depends on what part of social media we’re talking about. Generally we think about social media as the most ‘unstructured’ text data projects (pun intended). However, in many ways it is also ideal for leveraging the benefits of text analytics software as it’s theoretically limitless, continuous and free. These same opportunities also pose the biggest challenges.

I’ve found the work we, and the subsequent analytics best practices we developed over the course of six years in working with various software and client projects has proven invaluable in helping us to build software that actually provides the insights clients expect. I think because social media is new, some have approached the text analytics software without thinking about proper methodological considerations. This has been one of the biggest shortcomings of the industry I believe.

Anyway, part of our personal challenge has been in helping the client identify what data is most relevant for their purposes. We find that more well defined parameters allow us to customize the analysis or software to guarantee key objectives are met. This can be challenging as many are currently overwhelmed by the sheer availability of data and often think they need to monitor everything.

Lisa (NetBase)

In using text analytics to analyze social media, I find that there are multiple challenges. One of the biggest is that social media has turned language upside-down. People no longer use proper grammar of real words for that matter. Social media is an informal, anonymous forum and as a result, often filled with the use of what I call “slanguage”; and what others call , urban speech, emoticons, misspellings and jargon. In order to understand and classify these posts, the text analytics system has to be “taught” this new language and also understand the context in which these expressions are being used.

Another challenge lies in understanding at a deeper level whether there are false positives or false negatives or a negative word used in a positive light. For example, take the sentence “I know Listerine kills germs because it hurts”. Most semantic technologies might interpret the sentence as representing negative sentiment because of the words “kills and hurts”. Only through language understanding and diagramming out the sentence to extract meaning can it be determined that “killing germs” is a good thing and “hurts” is positive in this context of the sentence. The key here is to understand the word “because”.

While there are multiple other challenges still to confront in this ever-growing area, another challenge is filtering posts and disambiguating brand names - such as, Cheer, Target, Tide, etc. In order to assure that analysis is done on relevant messages, Spam, “for sale” sites, and other unusable social media needs to be cleaned up in order to differentiate the signal from the noise. A component of filtering is to ascertain if the mentions are rich in insights or just white noise about your brand (example: “Kicking back tonight with a Coke and a movie rental” vs. a really impactful sound bite such as “I really want to buy Coke because I need my caffeine fix right now!”).

Lastly, there is much sarcasm expressed in social media and machines are hard pressed to learn and understand that nuance. In fact, humans cannot always agree upon or “get” the intent of these types of comments. Sarcasm is a topic yet to be tackled.

David (J.D. Power and Associates)

The noise in social media data is a particularly big challenge relative to more topic-focused data sources like enterprise data. Call center comments, inbound emails, even mainstream media are typically more centered around one thought or one topic within each document. Social media is particularly difficult in that social media, like blogs, can span many topics within one document, which means that you need to have the ability to parse the document and identify each of the distinct topics within the document. Then, anchor the sentiment around the right topics, which is another particularly tricky problem to solve.

Also, social media short content, like microblogs, can present its own challenges, particularly when so many tweets are re-tweets and/or links to other pieces of social media, which can provide greater insight into the topic and context of the tweet. Understanding how to parse the short content and track back to original sources like re-tweets and links, enables the user to gain a deeper understanding into the author and the context of their communication. This can then be used in aggregate to better illuminate insights and understand potential trends uncovered using social media text analysis.

Comments processing also presents a difficult challenge, parsing and associating the appropriate topic and sentiment with the appropriate author is essential to understand the relative viewpoints, both pros and cons, associated with a specific social media post or topic. Comments are helpful in understanding the importance of a social media post or topic (e.g., like looking at the number of comments) as well as the polarity (i.e., by analyzing the polarity of sentiment in the comments).

Fortunately, companies like NetBase and J.D. Power have specialists employed to help address these issues and are using the most advanced text mining algorithms and approaches to solve for these difficult to solve challenges and insure that the hidden insights and “aha’s” residing deep within social media sources are uncovered and illuminated for marketers.

Don (Collective Intellect)

One of the primary challenges is educating management that social analytics should not be treated as a siloed effort, the sole purview of a specific business unit. This shift in perception from siloed social efforts to an enterprise social business represents a new form of data management that blends social knowledge with behavioral, transactional and referral details at the customer level.

Integrating social analytics - real time analysis from a variety of social media platforms and topics - into new or existing data management tools is the foundation for social business and presents its own challenges. The process starts from an outside-in perspective of mapping the social customer to existing consumer details that reside in an enterprise CRM or data management system. Once an enterprise has successfully integrated social analytics, the more advanced clients have systems optimized for social business - Social CRM, Social Business Intelligence or Social Business Management - that benefit the entire organization.

In many ways, social analytics represents another channel of data similar to more traditional or digital information. But the difference is that social analytics manifests true voice of customer, their intentions and preferences, unstructured and in real-time. Understanding the social customer necessitates an enterprise class social listening tool that uses social intelligence as a signal for when a specific social customer should receive branded engagements from an enterprise.

Themos (Analytics Consultant)

I would say that there are challenges on many levels. The first major challenge is the nature of Social Media Data itself. We have to be aware of the issues associated with Social Media data (such as Sampling Bias). Data can be collected but its not generally available for those who wish to analyze this information on a mass scale. Usually we do not have information about the sex or the age of someone expressing his/her opinion. Identifying automatically spam and uninteresting information within Social Data is also tough work.

The second major challenge is the techniques used for analyzing Text. The fact is that no software is able to fully “understand” the subtleties of human language and Knowledge consumers (Say Marketers) need to be aware of this. I have no doubt that Text Analytics can provide with ideas to work and insights in Consumer Behavior. But all professionals wishing to use Text Analytics first need to understand -to some extent- how this technology works and its limitations.

The key difference that makes Text Analytics in Social Media important is its ability to extract knowledge. Decision makers need insights and quite often they want to have them as soon as possible because they need to act fast. So the third challenge is the ability to set the goals for the analysis, collect / analyze the available information and have results, all in a time-efficient manner. Knowing what the technology can do and how it works can also be of great help in getting results fast.

Knowledge consumers need also to be aware that there will be actionable and non-actionable insights : For example recently i have analyzed Tweets of people visiting or having visited a shopping Mall. A segment of customers expressed negative sentiment in their Tweets because they couldn’t walk with the pace they wanted since the mall was crowded. There is not much you can do about that. However, it is good to know that this thought exists in the minds of some of your customers and also whether this customer segment is growing or decreasing as time passes by.

Who are the clients, and what kinds of things are they looking for help with? …

[Feel free to continue reading the interview on the Text Analytics Summit website here. If you have a any questions or comments on the interview feel free to post them here.]

[Post to Twitter] 

Tags: Analytics · Anderson Analytics · Business Guru · Conferences · Datamining · Interview · Linkedin · Market Research · Market Research Guru · Marketing · Marketing Guru · Marketing Research Guru · Marketing research · NGMR · OdinText · Predictive Analytics · Sentiment · Social Media · Social Media Marketing · Social networks · Text Analytics · Tom H. C. Anderson · Twitter · blog mining · blogging · next gen market research · text mining · tomhcanderson

7 responses so far ↓

  • 1 Shlomo Argamon // Mar 10, 2011 at 6:36 pm

    It seems that one of the things that you all agree on is the importance of properly dealing with context in text analytics, and I wholeheartedly agree. Such context arises in many ways. Sometimes it amounts to understanding the flow of a particular document. Sometimes it involves the context of background knowledge (as in the “kills germs” example). Another form of context that has not been deeply explored yet, but will be very important in social media analytics is understanding each message as a turn in an ongoing conversation. Sarcasm and figurative language also, I think, can be thought of, in part, as an issue of contextualizing language use, but requires much background knowledge, not just about the world, but about human behavior and interaction.

  • 2 Tom H C Anderson // Mar 11, 2011 at 9:28 am

    Yes agree Shlomo. I like what you guys have been doing by the way, with interpreting demographics from text.

    Sounds slightly similar to what we have done some work with in looking for emotion.

  • 3 Jake // Mar 11, 2011 at 9:41 am

    seems as if text analytics will really push social media to the next level…

  • 4 Peter Mancini // Mar 13, 2011 at 5:32 pm

    Sentiment Analysis is the snake oil of current text analytical efforts. There is no scientific basis for the scoring systems - they all use their own, proprietary methods. There is not even a decent test to compare “success” - but the simple concept of what sentiment analysis drives a desire to have such a capability. I doubt current technological constructs will ever work. If it does work it will require a different approach.

    The example of sarcasm being hard to detect (let’s face it, no system detects it an no one has a plan to mitigate for it) is the perfect phase line to cross to prove that sentiment has been both understood and dealt with. How do people distinguish sarcasm? We certainly don’t look at the meaning of individual words. We compare what is said against what we know. The sarcastic statement often requires back tracking in the dialog to find the keys necessary for understanding it was sarcasm.

    The biggest issue with text analytics today goes beyond measuring sarcasm but measuring systems against each other. If you are the technologist for a firm and you want to adopt a text analytics platform you need to compare what is out there. Sure there are many differentiation points such as price, features, support, etc. But in the end how do you compare two systems’ performance against each other. F-measure is one of the only measures people use and it’s broken (it fails to account for the inputs which is why any given system will have a wide variety of f-measure results for different corpora.)

    Text Analytics needs to not only produce verifiable results but it also needs measures that allow the comparison of systems. Current measures are weak, prone to being gamed and lack scientific controls.

  • 5 Tom H. C. Anderson // Mar 16, 2011 at 4:39 pm

    There is most definitely room for improvement. I agree with some of what you say, and that yes sentiment and even beyond that to concept identification, there’s certainly a lot of snake oil out there. However that said, it’s more about how it’s being positioned that I have a problem with. If you know what you are doing, there are a lot of useful tools out there already.

  • 6 My Take on Rethink11 // Mar 24, 2011 at 4:05 pm

    [...] as well as panelists from suppliers uSamp, J.D. Power, and NetBase (The latter two of which Dave Howlett and Lisa Rosner, will also both be presenting at the Text Analytics Summit May [...]

  • 7 The Top Blog in Market Research! // Mar 29, 2011 at 10:03 am

    [...] other researchers to think about less traditional market research techniques such as data mining or text analytics, and to take social media more [...]

Leave a Comment