There is Room for More in the Text Analytics Tool Case
I thought I’d give a brief summary of the two day Sentiment Analysis Symposium which ended last night. The event is just three years old now, and continues to grow fantastically. As the earliest proponent for text analytics within marketing research I was very pleased to see other marketing research firms represented at the Symposium this year (Vovici and Maritz). Text Analytics (as well as data mining) is the primary reason I founded Anderson Analytics and started the Next Generation Marketing Research (NGMR) networking group.
The Sentiment Symposium, as well as the Text Analytics Summit which I’m helping to sponsor again this year is unique in that industry interest in the event spans government, intelligence, finance, health, IT and even more recently fields like PR (encouraged by the recent success of companies like Radian6). It is difficult to separate sentiment from text analytics overall, so while the focus of the Symposium is definitely on sentiment, both the Symposium and the Summit are definitely text analytics conferences. That said, because of the tremendous growth and opportunity, this is the one field where I would say we could actually use more conferences and events rather than fewer.
Seth Grimes especially, continues to do an excellent job in bringing these various experts together. Ezra Steinberg was also present to answer questions about the upcoming Text Analytics Summit.
Day 1 (eBay)
The Practical Sentiment Analysis workshop taught by Yongzheng “Tiger” Zhang, Catherine Baudin, and Nitin Indurkhya of eBay Research Labs was especially well attended. I think a few of us wondered why eBay would be teaching a practical workshop on text analytics, and yet were also probably hoping for a bit more than what was reasonable IP-wise.
Several developers in the audience, including myself, where probably hoping for rules or code we could use directly for various parts (sentiment specifically) of our own text analytics tools. However, while Catherine conducted a hands on exercise at the end, the workshop was primarily dedicated to summarizing various academic research in the field to date, and some of eBay’s opinions on what seems to work better.
The workshop was a pretty good general overview and set us up well for day two. A few of my personal takeaways, some of which I was well aware beforehand, others that definitely warrant further exploration:
- Sentiment and Text Analytics will be playing a greater role in Online advertising soon.
- Simple Polarity Lexicons can be very effective for Sentiment
- Handling negation/reversals via rule based algorithms seems to be a popular and fairly simple approach
- Terminology such as tokens etc. are becoming more standard
- 10-20 rules typically suffice per domain
- Rules are very much domain specific!
- No technique is perfect
- Support Vector machines seem to work better than Naive Bayes, Maximum Entropy and Logistic Regression Models
- PMI & TDFI techniques seem to be promising approaches
- Depending on project, DP while boring can add value (Unix Spell Checker)
- Semantic approaches are limited
- Don’t let pretty visualizations fool you into trusting algorithms behind them
Day 2 (Main Event)
While there were several interesting speakers on Day 2, Katie Paine, CEO of PR firm KDPaine & Partners resonated most. Like me, because she has worked with various vendors to deliver actionable results to end clients for several years she asked “So What!”. Speaking as the skeptic, she had come to several of the same conclusions I had in terms of what text analytics is actually best suited for, and understanding to what degree human analysts need to be involved in the process.
Additionally, with PR being such a human analyst intensive field (lots and lots of qualitative analysis), she had explored compelling cost and quality benefit analysis based on volume of text.
I would say my overall takeaway is that while a lot of players are being added to this field all the time, and while they tend to come from various disciplines with various levels of experience, I’m somewhat surprised to see a lot of players approaching the problem the same way.
I’m not at all convinced there is one best way to approach text analytics and strongly believe it’s dependent not just on the domain, but also on the final objectives and needs of the end user. At the risk of using an overly simple example, a carpenter needs different tools depending on the job. A saw, no matter how sharp, will not do the job of a hammer.
I continue to be very excited about the opportunities for text analytics across various industries including social media analytics as well as marketing research.
If you also attended the event I’d love to hear your takeaways. I look forward to seeing some of you at the Text Analytics Summit in May!