Tom H. C. Anderson - Next Gen Market Research™corel videostudio pro x3 trial version
autodesk inventor 2015 full download
eset smart security 6 manual
vmware fusion 5 mac os x mountain lion
adobe dreamweaver cs4 download portable
keygen for microsoft word 2013
autodesk inventor professional 2014 sp1 crack
adobe illustrator cs4 free download trial version
download eset smart security 6 full crack gratis
download eset smart security 6 nod32 antivirus 6 with crack
microsoft office mac home and student 2011 pacchetto italiano 3 installazioni
microsoft gittf for visual studio team foundation server 2010
adobe dreamweaver cs5.5 for mac
adobe audition 3.0 plugins free download
adobe after effects cs6 mavericks
pinnacle studio 16 ultimate esd
adobe photoshop elements 12 download with serial number
pinnacle studio 14 hd ultimate collection download gratis
corel draw 11 pour mac gratuit
microsoft office word 2007 free download no trial
serial number adobe audition v 3.0
adobe creative suite 5.5 master collection hosts
how much space does ableton suite 8 take up
fotoslate 4 photo print studio crack
microsoft office frontpage 2003 step by step download
eset smart security 6 keys startimes
microsoft office home and student 2007 pl box 3264bit
adobe creative suite 5 master collection mac serial
microsoft office excel 2007 video tutorial free download
avid media composer 5.5 aja io express
autodesk 3ds max design 2014 updates
adobe indesign cs5 nested styles
numero de serie para nero multimedia suite 10 platinum hd
tuneup utilities 2008 es compatible con windows 7
adobe indesign cs4 windows 7
upgrade adobe captivate 4 to 6
how to upgrade microsoft office 2008 for mac
how hard is it to learn german using rosetta stone
adobe flash professional cs6 keygen
adobe indesign cs5 5 keygen mac
adobe illustrator cs6 classroom in a book pdf free download
serial number for adobe creative suite 3 master collection
windows 8 the missing manual review
apple mac os x 10.6 snow leopard free download
vmware fusion 4 mac rar
tutorials on adobe after effects cs5
adobe indesign cs5 download mac
adobe photoshop elements 10 clone stamp tool
parallels desktop 4 for mac serial
microsoft office 2007 professional confirmation code
acdsee pro 2.5 watermark
microsoft powerpoint 2013 essai
mise a jour microsoft word 2013
huong dan su dung adobe after effects cs3 professional
autodesk 3ds max 2012 free download full version with crack
descargar crack de adobe flash professional cc
adobe photoshop cs5 extended french keygen .rar
adobe authorware windows 7
keygen for sony acid pro 6
steinberg nuendo 4.3 crack
descargar adobe illustrator cs6 gratis con keygen
adobe acrobat x pro get serial number and crack amtlib dll
adobe illustrator cs5 download gratis portugues
autodesk autocad 2014 object enabler
acronis disk director suite 10 for windows 7
autodesk maya 2011 license error
autodesk autocad 2010 trial free download
lynda com html5 graphics and animation with canvas quasar
adobe acrobat x pro for mac download
download adobe indesign cs5 for free
red giant trapcode suite 12.1.1 serial
adobe photoshop cs3 extended online
code de validation quarkxpress 8 mac
microsoft windows 8.1 rtm x64 english dvd iso
eset smart security 5 username and password 20 november 2012
chief architect premier x5 core
microsoft windows 8 professional 64bit
microsoft word 2013 page x of y
microsoft office powerpoint 2013 free download for windows 7
ia writer for mac free download
coreldraw graphics suite x6 os x
microsoft office word 2007 quick steps
autodesk inventor suite 2011 product key
aimersoft video converter ultimate 4.2.4.0 crack
autodesk building design suite ultimate 2014 xforce
microsoft windows 8 90 day trial
red giant keying suite 11 mac download
jak pracovat s sony vegas pro 9
quarkxpress 8 297
descargar sony sound forge 10 gratis full
hollywood fx pro for adobe premiere 7 1 5 2.0 cs3
autodesk mudbox 2014 sp
eset smart security 5 jak wyczy
microsoft.windows.8.1.enterprise.rtm.x64.volume.en glish.dvd.iso
microsoft works 9 2007
free product key fr microsoft office 2007 home and student
adobe pagemaker 7.0 crack
adobe illustrator cs6 keygen crack
adobe photoshop lightroom 5.2 crack only
adobe photoshop elements 8.0 raw
microsoft streets and trips 2013 for ipad
adobe illustrator cs4 1 link full
crack adobe illustrator cc per mac
adobe photoshop cs3 extended 10.0 dmg
tutorial sony vegas pro 12 indonesia
adobe creative suite 5.5 master collection free keygen
microsoft office 2013 professional plus keygen 64 bit
microsoft windows 7 ultimate rtm with sp1 x64 retail english
sony vegas pro 11 32 bit crack
avid media composer 6 wiki
autodesk revit structure 2012 training .rar
camtasia studio 8 free download for windows 7 32bit
adobe illustrator cs5 zip
nikon capture nx2 windows 7 64
microsoft windows 8.1 training
eset smart security 6 username and password 90 days
how to crack adobe flash professional cs5.5
microsoft money 2007 deluxe rar
adobe photoshop cs6 extended help pdf
coreldraw graphics suite x6 16.4
adobe photoshop cs4 extended tutorial
xin cd key adobe presenter 7
sony vegas pro 11 free download with serial number
nikon capture nx2 infrared
vmware workstation 8 nat not working windows 7
microsoft publisher 2013 demo
cd key for chief architect x2 free download
sony vegas pro 11 tutorial crack
steinberg cubase 6 and halion 4 bundle
microsoft powerpoint 2013 kostenlos
adobe indesign cs6 ebook
microsoft office professional plus 2013 backup
sony vegas pro 11 keygen virus
microsoft office word 2007 free download
tuneup utilities 2008 myegy
sony vegas movie studio hd platinum 11 production suite keygen
pixologic zbrush 4r6 download
microsoft offers 40 windows 8 pro upgrade
microsoft office home business 2013 mit datentrger
microsoft word 2007 calendar template 2013
adobe dreamweaver cs4 tutorial pdf download
adobe fireworks cs4 classroom in a book pdf
adobe photoshop elements 8 gratis download
aimersoft video converter ultimate 5.5.1 keygen
abbyy finereader 9.0 professional edition keygen crack
adobe audition 2.0 serial number crack
sony vegas pro 11 trial
how to get fl studio 9 xxl producer edition for free
microsoft visual studio 2010 professional key generator
adobe indesign cs3 english portable
adobe dreamweaver cs6 website templates
acronis disk director suite 10 windows 7 64 bit
adobe captivate 5 user guide
adobe captivate 5 sample projects
pinnacle studio 17 ultimate green screen
microsoft word 2013 free download for windows 8 64 bit
pinnacle studio 16 ultimate 16.0.0.75 free
adobe soundbooth cs4 portable free download
microsoft sql server 2008 enterprise edition sp1
nik software color efex pro 3.0 download
adobe captivate 5 essentials
vmware fusion 4 vs workstation 8
download program adobe captivate 5
adobe acrobat xi pro education edition
microsoft office visio professional 2007 free trial
download adobe photoshop cs6 extended mac version crack
tutorial de como utilizar adobe audition 3.0
get serial number for adobe audition cs5.5
adobe photoshop cs6 extended new features
microsoft office 2013 professional plus free download with crack
avid media composer 5 download windows
autodesk inventor 2014 64 bit download
pinnacle studio 16 ultimate no sound
microsoft office enterprise 2007 volume license key
microsoft office word 2007 download chip
portable solidworks premium 2012 sp4 x86 engrus
microsoft windows 8 upgrade free
microsoft office 2007 home and student registration key
coreldraw graphics suite x5 home student
download microsoft office 2010 for free full version in windows 8
serial para nero 11 platinum gratis
lynda.com audio mixing bootcamp download
eset smart security 6 keys trial
acdsee pro 2 vs lightroom 4
microsoft visual studio 2010 professional enu
como usar corel motionstudio 3d
nero multimedia suite 10 gratis para windows 7
how to update nikon capture nx2
microsoft word 2013 32 bit
eset smart security 6 indir
corel painter 12 mac os x download
snagit 11 official website
eset smart security 5 username and password nod32 keys updates daily facebook
adobe after effects cs5 tutorials in urdu
sony vegas pro 9 windows 7
alien skin eye candy 7 full free download
how to do everything with microsoft office access 2007 free download
sony vegas pro 12 keyframe tutorial
tai microsoft expression web 2007 full
roxio toast titanium 11 mac 1 link

More Than Market Research - Gain The Information Advantage

Tom H. C. Anderson - Next Gen Market Research™ header image 6

Practical Sentiment Analysis and Lies

April 9th, 2012 · 4 Comments

Q&A with Prof. Bing Liu ahead of the Sentiment Analysis Symposium and Pre Symposium Tutorial

The Sentiment Analysis Symposium in NYC is just a month away (May 8th), so I thought I’d check out who was teaching the pre conference sentiment analysis tutorial this year. For those of us working with text analytics and in the New York area, Seth Grimes Sentiment Symposium has definitely made our annual must attend list. However, what most seem to miss is the half day workshop the day before the event each year. I started attending this component last year when researchers from Amazon.com were teaching it and decided it was definitely well worth half a day in the city to get a more tactical POV on Sentiment from someone who might have a slightly different use case or experience.

This year, data mining expert Bing Liu, a Professor at University of Illinois at Chicago’s Computer Science Department, will be teaching the workshop. Some of his work on text analytics and detecting fraud in online ratings was recently published in the NY Times and as I noticed we were connected on LinkedIn from a previous text analytics event, I called him up for a quick chat to learn a bit more about his work and what I might expect to learn at his pre Symposium workshop. We had an interesting talk and subsequently I sent him a few questions as I thought others would be interested as well.

I plan on being at both the Symposium and Pre Workshop again this year. Anyone else who is interested in attending feel free to use my discount code (OdinText). Do let me know if you’ll be attending so we can meet up, it’s a relatively small and informal group.

Now on to the Q&A…

Tom: Bing, how did you get into text analytics, and sentiment analysis?

Bing: My earlier research interests were in the areas of data mining and machine learning. In about year 2000, I started to get interested in Web mining and machine learning using text data. These two topics led me to the text on the Web. Reviews naturally come to mind because they are focused and well organized, which is great for data mining. I also quickly realized that sentiment analysis was a perfect research problem on its own (I called it opinion mining then due to my data mining background). It had so many applications as every individual and organization needs opinions for decision making. There was also a whole range of challenging research problems that had not been addressed by the natural language processing or the linguistics communities. We started to work on it in 2003 and published our first paper in KDD-2004 (ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). The paper basically defined the framework of feature or aspect-based sentiment analysis and opinion summarization, which is now widely used in the industry and in research.

Tom: False website reviews are an interesting application, and one that I’ve been keeping my eye on. I noticed the New York Times recently covered some of your work in this area. This type of text analytics research seems to be much more difficult than most people think. Can you tell us a bit about this problem from the text analytics perspective, and how it is different from simpler use cases like identifying spam email for instance?

Bing: Indeed, this is a very difficult problem. My group began to work on it in around 2006 or 2007 as we realized this was an important problem and would become more and more important. When we started to do it, we realized it was really hard. The main difficulty lies in the fact that it is very hard, if not impossible, to recognize fake reviews manually as it is fairly easy to craft a fake review and pose it as a genuine one. Email spam detection is a much easier problem because you will immediately recognize a spam mail when you see one. This means that spam and non-spam emails have clear differences, and that it is easy to produce training data for machine learning algorithms in order to produce predictive models and to evaluate the models.

However, for fake reviews, if one writes them very carefully, it is hard to recognize them just by reading the review text. In the extreme case, this is an impossible task logically. For example, one can write a genuine review for a good restaurant and post it as a fake review for a bad restaurant in order to promote the bad restaurant. There is no way to detect this fake review without considering information beyond the review text itself simply because one review cannot be both truthful and fake at the same time.

Tom: What do you see as some of the applications of this type of research?

Bing: Review hosting sites or any general social media sites all want their reviews and user comments to be trustworthy. They are thus interested in fake review detection algorithms. All text analytics systems that use reviews or any opinion data need to worry about this problem too. Social media is here to stay. Its content is also being used more and more in applications.

Something has to be done to ensure the integrity of this valuable source of information before it becomes full of fake opinions, lies and deceptive information. After all, there are strong motivations for businesses and individuals to post fake reviews for profit and fame. It is also easy and cheap to do so. Writing fake reviews has already become a very cheap way of marketing and product promotion.

Tom: Have you found there are certain approaches that work better than others?

Bing: It is still too early to tell. Researchers currently use both linguistic features and atypical behaviors of reviewers to detect fakes. I feel that algorithms that mine atypical behaviors of reviewers and reviews tend to produce more interpretable and trustworthy results. For example, if all 5-star reviews for a hotel were posted only by people from the surrounding area of the hotel, these reviews are clearly suspicious. This is a simple example. More sophisticated fake reviews need more involved modeling and algorithms to detect them.

Tom: It’s been my observation and experience that we as an industry are moving away from linguistic approach to text (sure, some of the basics are useful), but machine learning and statistical approaches seem more powerful. What are your thoughts on this?

Bing: For most tasks, machine learning and statistical approaches are indeed more effective than pure linguistic based approaches. Linguistic approaches are mostly based on heuristic rules and patterns (including grammar information). For those tasks that can be performed based on words, it is very hard for a linguistics based approach to beat a statistical machine learning algorithm simply because the signals used by a machine learning algorithm are far more numerous than the rules or patterns that a human person can design. Plus, machine learning algorithms optimize the performances. However, that being said, in many tasks, linguistics based signals and clues are used as features by machine learning algorithms.

Statistical approaches are not without their limits. Going forward, I believe that both linguistic knowledge and statistical modeling are important. We are working on integrating more linguistic knowledge into statistical modeling.

Tom: It seems to me a lot of folks get a little too caught up in differences between languages. My firm for instance has found it rather easy to add other European languages to our approach, and of course machine translation is always a possibility. What are your thoughts on this?

Yes, I agree. Although every language is different, different languages are still similar as they all consist of words and grammar. European languages have even more similarities due to their common roots. A learning algorithm can capture many types of grammar regularities from any language if there is a sufficient amount of training data. For those tasks that need only word or lexical information, the same algorithm can be used for any language with almost no modification because an algorithm treats words are symbols. In that sense, it does not matter what language it is.

Tom: What will you be covering during the tutorial at the sentiment symposium?

Bing: Sentiment analysis has been studied extensively for the past decade. A huge number of research papers have been published on it (probably more than 1000). It is impossible to cover them all. Therefore, I will try to cover the main threads of research that also contain aspects which can be of immediate use in practice.

In the tutorial, I will start with a short motivation and then go on to define the problem. This will provide an abstraction or statement of the problem, which will naturally introduce the key sub-problems. I will then discuss the current state-of-the-art approaches to solving these problems. Since this is a practical sentiment analysis tutorial, I will also describe how to build a practical sentiment analysis system based on my previous experience in building one. In the final part of the tutorial, I will introduce the problem of fake review detection.

A big thanks to Bing for our talk and the subsequent Q&A. Looking forward to meeting up at the Symposium.

@TomHCAnderson
@OdinText

[For those interested in more info about the sentiment tutorial a syllabus and outline is available here]

[Post to Twitter] 

Tags: Academia · Conferences · Datamining · Odin Text · OdinText · Sentiment Analysis · Sentiment Analysis Symposium · Text Analytics · Text Analytics Summit · Text Mining Guru · seth grimes · text mining · tomhcanderson

4 responses so far ↓

  • 1 Peter Szekeres // Apr 11, 2012 at 7:58 am

    Great interview. I really agree with Prof. Bing Liu. I think that the most effective sentiment analysis methods those which are using knowledge lexicons, grammar rules and statictical methods and assesments both. By combining these I built a quite good sentiment analysis system on Hungarian webpages (accuracy: 80%).
    I think one important thing wasn’t mentioned above: handling irony. Maybe it can be similar to recognizing fake reviews…

  • 2 chris west // Apr 19, 2012 at 6:47 am

    i’m someone that’s coming into Text Analytics from the Marketing World. Can anyone explain (simply) how ’statistical machine learning algorithm’ works: what do you give it as inputs? Does it look for wide variances from ‘typical’ or ‘mean’ to spot possible fakes?
    Any help appreciated
    Chris

  • 3 Tom H C Anderson // Apr 19, 2012 at 8:30 am

    Chris, good question. Yes most typically that approach has to do with the computer ‘learning’ how humans do it. So target variable would be how humans have coded it, be it sentiment (Pos, Neg etc.) or in this case I guess dishonest or honest.

  • 4 Fake Reviews a Growing and Tenacious Problem in Social Media : Beyond Search // Apr 20, 2012 at 12:07 am

    [...] Analysis Symposium in New York City early next month. He has titled his interview, “Practical Sentiment Analysis and Lies.” [...]

Leave a Comment