Tom H. C. Anderson - Next Gen Market Research™american jewelry and loan trutv
family loan agreement template
texas 514 loans
lone star loans dallas
quicken loans j d power
ontario student loans
small personal loans for bad credit
payday loans wired to debit card
boat loans phoenix az
bank loan for tata ace
holiday loan low interest
the main source of short term operating capital is bank loans
boat loans for old boats
personal loan eligibility for 10000 salary
bad credit personal loans alberta canada
loan finder south africa
1 room kitchen in mumbai with loan facility
loan back method money laundering
sector concentration risk loan portfolios
what do i need to apply for a home loan at nedbank
how to pay maybank loan online
loan bank perumahan terbaik
counter offer car loan
loans that dont require you to have a job
get a loan with bad credit in south africa
money loan same day
payday loan data vendors
glasgow city council crisis loans
kiwibank business loans nz
japanese ninja loans
personal loans birmingham al
gateway 1 auto loans
nic bank kenya personal loans
payday loan hell help
connected party loans
central bank of india education loan interest rate
spot loans in hyderabad
loan a tool transmission jack
apple loans edinburgh
what is the difference between a usda direct and guaranteed loan
indian overseas bank education loan application form
car title loans taylor tx
self employed mortgage loans 2014
how can i get a 3000 dollar loan with no credit
sbi home loans for govt employees
credit loan in tata docomo
procedure for applying educational loan in sbi
hdb financial services personal loan status
quicken loans arena seating chart with rows and seat numbers
loans for overseas filipino workers
same day loans for bad credit no upfront fees
cash loans yuma az
dial a loan uk
how to get 5 rs loan in airtel
cash loans knoxville tn
merton council crisis loans
loan no requirements philippines
foreign currency denominated term loans
loan shops in wood green
are there legitimate debt consolidation loans
can payday loans mess up your credit
cheap overseas loans
how to pay loan amount online hdfc
home loans for part time workers
indian bank education loan interest waiver
eligibility for home loan from icici bank
horizon cash loans
payday loan advances no matter your credit
loans and advances journal entries
nak apply personal loan
payday loan online instant
ally financial loan payoff
loan against cheque in chennai
qatar national bank car loan
bank of america loan assumption
title max loan payments
bad credit car loans in new jersey
one trust home loans review
state bank of india personal loan eligibility
6000 personal loan bad credit australia
average ppi payout on 5000 loan
documents required for applying personal loan in sbi
quicken loans stock symbol
title loans in portland
maximum title loans az
100 thousand dollar loan
metro loan in buffalo ny
pay chase auto loan online
title loans in lawrence kansas
rv loans in iowa
logbook loan instant decision
can i get arrested for not paying a payday loan in texas
washington state usda loan processing time
sba loans for inventory
loan facility on lic jeevan saral
i need a loan urgently gumtree
loan shark finder
loan bank rakyat terkini 2014
tsp loan to pay off credit card debt
unsecured personal loans for bad credit toronto

More Than Market Research - Gain The Information Advantage

Tom H. C. Anderson - Next Gen Market Research™ header image 6

Practical Sentiment Analysis and Lies

April 9th, 2012 · 4 Comments

Q&A with Prof. Bing Liu ahead of the Sentiment Analysis Symposium and Pre Symposium Tutorial

The Sentiment Analysis Symposium in NYC is just a month away (May 8th), so I thought I’d check out who was teaching the pre conference sentiment analysis tutorial this year. For those of us working with text analytics and in the New York area, Seth Grimes Sentiment Symposium has definitely made our annual must attend list. However, what most seem to miss is the half day workshop the day before the event each year. I started attending this component last year when researchers from Amazon.com were teaching it and decided it was definitely well worth half a day in the city to get a more tactical POV on Sentiment from someone who might have a slightly different use case or experience.

This year, data mining expert Bing Liu, a Professor at University of Illinois at Chicago’s Computer Science Department, will be teaching the workshop. Some of his work on text analytics and detecting fraud in online ratings was recently published in the NY Times and as I noticed we were connected on LinkedIn from a previous text analytics event, I called him up for a quick chat to learn a bit more about his work and what I might expect to learn at his pre Symposium workshop. We had an interesting talk and subsequently I sent him a few questions as I thought others would be interested as well.

I plan on being at both the Symposium and Pre Workshop again this year. Anyone else who is interested in attending feel free to use my discount code (OdinText). Do let me know if you’ll be attending so we can meet up, it’s a relatively small and informal group.

Now on to the Q&A…

Tom: Bing, how did you get into text analytics, and sentiment analysis?

Bing: My earlier research interests were in the areas of data mining and machine learning. In about year 2000, I started to get interested in Web mining and machine learning using text data. These two topics led me to the text on the Web. Reviews naturally come to mind because they are focused and well organized, which is great for data mining. I also quickly realized that sentiment analysis was a perfect research problem on its own (I called it opinion mining then due to my data mining background). It had so many applications as every individual and organization needs opinions for decision making. There was also a whole range of challenging research problems that had not been addressed by the natural language processing or the linguistics communities. We started to work on it in 2003 and published our first paper in KDD-2004 (ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). The paper basically defined the framework of feature or aspect-based sentiment analysis and opinion summarization, which is now widely used in the industry and in research.

Tom: False website reviews are an interesting application, and one that I’ve been keeping my eye on. I noticed the New York Times recently covered some of your work in this area. This type of text analytics research seems to be much more difficult than most people think. Can you tell us a bit about this problem from the text analytics perspective, and how it is different from simpler use cases like identifying spam email for instance?

Bing: Indeed, this is a very difficult problem. My group began to work on it in around 2006 or 2007 as we realized this was an important problem and would become more and more important. When we started to do it, we realized it was really hard. The main difficulty lies in the fact that it is very hard, if not impossible, to recognize fake reviews manually as it is fairly easy to craft a fake review and pose it as a genuine one. Email spam detection is a much easier problem because you will immediately recognize a spam mail when you see one. This means that spam and non-spam emails have clear differences, and that it is easy to produce training data for machine learning algorithms in order to produce predictive models and to evaluate the models.

However, for fake reviews, if one writes them very carefully, it is hard to recognize them just by reading the review text. In the extreme case, this is an impossible task logically. For example, one can write a genuine review for a good restaurant and post it as a fake review for a bad restaurant in order to promote the bad restaurant. There is no way to detect this fake review without considering information beyond the review text itself simply because one review cannot be both truthful and fake at the same time.

Tom: What do you see as some of the applications of this type of research?

Bing: Review hosting sites or any general social media sites all want their reviews and user comments to be trustworthy. They are thus interested in fake review detection algorithms. All text analytics systems that use reviews or any opinion data need to worry about this problem too. Social media is here to stay. Its content is also being used more and more in applications.

Something has to be done to ensure the integrity of this valuable source of information before it becomes full of fake opinions, lies and deceptive information. After all, there are strong motivations for businesses and individuals to post fake reviews for profit and fame. It is also easy and cheap to do so. Writing fake reviews has already become a very cheap way of marketing and product promotion.

Tom: Have you found there are certain approaches that work better than others?

Bing: It is still too early to tell. Researchers currently use both linguistic features and atypical behaviors of reviewers to detect fakes. I feel that algorithms that mine atypical behaviors of reviewers and reviews tend to produce more interpretable and trustworthy results. For example, if all 5-star reviews for a hotel were posted only by people from the surrounding area of the hotel, these reviews are clearly suspicious. This is a simple example. More sophisticated fake reviews need more involved modeling and algorithms to detect them.

Tom: It’s been my observation and experience that we as an industry are moving away from linguistic approach to text (sure, some of the basics are useful), but machine learning and statistical approaches seem more powerful. What are your thoughts on this?

Bing: For most tasks, machine learning and statistical approaches are indeed more effective than pure linguistic based approaches. Linguistic approaches are mostly based on heuristic rules and patterns (including grammar information). For those tasks that can be performed based on words, it is very hard for a linguistics based approach to beat a statistical machine learning algorithm simply because the signals used by a machine learning algorithm are far more numerous than the rules or patterns that a human person can design. Plus, machine learning algorithms optimize the performances. However, that being said, in many tasks, linguistics based signals and clues are used as features by machine learning algorithms.

Statistical approaches are not without their limits. Going forward, I believe that both linguistic knowledge and statistical modeling are important. We are working on integrating more linguistic knowledge into statistical modeling.

Tom: It seems to me a lot of folks get a little too caught up in differences between languages. My firm for instance has found it rather easy to add other European languages to our approach, and of course machine translation is always a possibility. What are your thoughts on this?

Yes, I agree. Although every language is different, different languages are still similar as they all consist of words and grammar. European languages have even more similarities due to their common roots. A learning algorithm can capture many types of grammar regularities from any language if there is a sufficient amount of training data. For those tasks that need only word or lexical information, the same algorithm can be used for any language with almost no modification because an algorithm treats words are symbols. In that sense, it does not matter what language it is.

Tom: What will you be covering during the tutorial at the sentiment symposium?

Bing: Sentiment analysis has been studied extensively for the past decade. A huge number of research papers have been published on it (probably more than 1000). It is impossible to cover them all. Therefore, I will try to cover the main threads of research that also contain aspects which can be of immediate use in practice.

In the tutorial, I will start with a short motivation and then go on to define the problem. This will provide an abstraction or statement of the problem, which will naturally introduce the key sub-problems. I will then discuss the current state-of-the-art approaches to solving these problems. Since this is a practical sentiment analysis tutorial, I will also describe how to build a practical sentiment analysis system based on my previous experience in building one. In the final part of the tutorial, I will introduce the problem of fake review detection.

A big thanks to Bing for our talk and the subsequent Q&A. Looking forward to meeting up at the Symposium.

@TomHCAnderson
@OdinText

[For those interested in more info about the sentiment tutorial a syllabus and outline is available here]

[Post to Twitter] 

Tags: Academia · Conferences · Datamining · Odin Text · OdinText · Sentiment Analysis · Sentiment Analysis Symposium · Text Analytics · Text Analytics Summit · Text Mining Guru · seth grimes · text mining · tomhcanderson

4 responses so far ↓

  • 1 Peter Szekeres // Apr 11, 2012 at 7:58 am

    Great interview. I really agree with Prof. Bing Liu. I think that the most effective sentiment analysis methods those which are using knowledge lexicons, grammar rules and statictical methods and assesments both. By combining these I built a quite good sentiment analysis system on Hungarian webpages (accuracy: 80%).
    I think one important thing wasn’t mentioned above: handling irony. Maybe it can be similar to recognizing fake reviews…

  • 2 chris west // Apr 19, 2012 at 6:47 am

    i’m someone that’s coming into Text Analytics from the Marketing World. Can anyone explain (simply) how ’statistical machine learning algorithm’ works: what do you give it as inputs? Does it look for wide variances from ‘typical’ or ‘mean’ to spot possible fakes?
    Any help appreciated
    Chris

  • 3 Tom H C Anderson // Apr 19, 2012 at 8:30 am

    Chris, good question. Yes most typically that approach has to do with the computer ‘learning’ how humans do it. So target variable would be how humans have coded it, be it sentiment (Pos, Neg etc.) or in this case I guess dishonest or honest.

  • 4 Fake Reviews a Growing and Tenacious Problem in Social Media : Beyond Search // Apr 20, 2012 at 12:07 am

    [...] Analysis Symposium in New York City early next month. He has titled his interview, “Practical Sentiment Analysis and Lies.” [...]

Leave a Comment