Dr. Gregory Piatetsky-Shapiro
KDNuggets - Data Mining Guru
If you’ve been working with data mining, knowledge discovery, bioinformatics, or business analytics, for the past few years then you are also likely to be familiar with Gregory. The word Data Mining and the name Gregory Piatetsky-Shapiro are inextricably linked. Before staring KDnuggets he led data mining and consulting groups at GTE Laboratories, Knowledge Stream Partners, and Xchange. He has extensive experience developing CRM, customer attrition, cross-sell, segmentation and other models for some of the leading banks, insurance companies, and telcos. He has also worked on clinical trial, microarray, and proteomic data analysis for several leading biotech and pharmaceutical companies. He has served as an expert witness and provided expert opinions in several cases. He serves as the current Chair of ACM SIGKDD, the leading professional organization for Knowledge Discovery and Data Mining. He is also the founder of Knowledge Discovery in Database (KDD) conferences, having organized and chaired the first three Knowledge Discovery in Databases workshops in 1989, 1991, and 1993.
Gregory has over 60 publications, including 2 best-selling books and several edited collections on topics related to data mining and knowledge discovery, including SIGKDD Explorations Special Issue on Microarray Data Mining (Vol 5, Issue 2, Dec 2003). Gregory received ACM SIGKDD Service Award (2000) and IEEE ICDM Outstanding Service Award for contributions to data mining field and community. He is a true Data Mining Guru.
Gregory’s work has had an indelible influence on the scientific approach of Anderson Analytics. His noted influence came during founder Tom H. C. Anderson’s graduate school years, and later, during the development of Anderson Analytics, was a source of advice and scientific guidance. After meeting for the first time in person last year at the text analytics summit in Boston, Tom and Gregory have kept in touch. Tom’s most recent conversation with Gregory is presented as the third installment of the Anderson Analytics Guru Round-Table discussion. In their conversation, Tom and Gregory discuss the importance of data mining, including how it is unfortunately foreign to most market researchers.
On Data Mining
Tom H. C. Anderson: Gregory, how did you get into data mining?
Gregory Piatetsky-Shapiro: Since I was a child I’ve always been interested in AI [artificial intelligence]. There weren’t really any PhD programs specifically in data mining back in the early 80’s, so I got my M.S. and Ph.D. from NYU in Computer Science and my dissertation was on Self-Organizing Database Systems.
When I worked at GTE Labs in 1980-s, I was very interested in applying AI to databases, and data mining was a natural application. There were no meetings on data mining, so I decided to organize a workshop myself in 1989. I started Knowledge Discovery Nuggets newsletter in 1993 as a way for researchers in the area to stay connected and share ideas.When the first web browser Mosaic came along it seemed only natural to start a website (1994).
When I left GTE in 1997, I moved the Knowledge Discovery Mine website to www.kdnuggets.com .
In retrospect I should have chosen a name easier to spell than KDnuggets.
Tom H. C. Anderson: I think there’s nothing wrong with the branding there, it’s quite a memorable name I would think. How, about advice for those just starting to learn about data mining? Are there any tips you can give? Perhaps books you could recommend that you like?
Gregory Piatetsky-Shapiro: There are so many, it depends on whether you are interested in Data Mining /Knowledge Discovery technical aspects or in business aspects. One good less technical one is Michael Berry’s and Gordon Linoff’s “Mastering Data Mining”. I believe I have this one on my site. It’s a very good non technical introduction.
Tom H. C. Anderson: How about websites or blogs?
Gregory Piatetsky-Shapiro: I like Avinash Kaushilk’s blog ( http://www.kaushik.net/avinash/) and also “Juice analytics“. I enjoyed the Freakanomics book and read their blog as well. A good list of data mining related blogs is on http://www.kdnuggets.com/websites/blogs.html
Tom H. C. Anderson: Gregory, is Data Mining an art, a science, or both?
Gregory Piatetsky-Shapiro: It has to be both. Otherwise someone will find a correlation between the S&P 500 index and the price of butter in Bangladesh. I’m not kidding; I’ve seen some crazy things. There has to be an art applied.
Tom H. C. Anderson: What do you feel is the current state of data mining, knowledge discovery, bioinformatics, and business analytics in companies today?
Gregory Piatetsky-Shapiro: Companies are certainly aware of data mining/KDD, but most companies are not making effective use of the data collected. They are good at collecting it. And also good at correcting the collected data. But they are not so good at analyzing it or applying these insights to the business.
Tom H. C. Anderson: Yes, that is what we have found as well. There is certainly a lot of data collection going on. It seems that’s where the focus is. There are exceptions of course, like Amazon for instance?
Gregory Piatetsky-Shapiro: Amazon definitely makes a good use of their customer data - their “customer who have seen X also bought Y” recommendations are very effective and from what I hear contribute a lot to their bottom line.
Speaking of the broader data mining and analytics industry, I have a unique indicator which is the number of job postings on KDnuggets . In 1999-2000 it was growing strongly, then dropped sharply after the dom com bubble burst in 2001, and then started growing again in 2004. 2007 was the best year so far,
I’m seeing a slowdown this quarter, but part of that is the business cycle. I’m very optimistic about the long term outlook for analytics.
Trends, CGM & SNS ~ Link & Text Analysis
Tom H. C. Anderson: What important trends, if any, have you seen emerging during the last year which will be important in 2008 and in the near future?
Gregory Piatetsky-Shapiro: Myspace, Facebook, Linkedin and other SNs are causing this growth. There is a lot technology that can be used to analyze links, but we must be careful with privacy and allow users to opt-in explicitly to the potential analytics. Facebook recent Beacon advertising program was a notable example of violating privacy and caused a big backlash.
Then of course there are video/movie and image databases, though these will take some time, but a lot of work is being done now.
Tom H. C. Anderson: Other than Link Analysis, it seems Text Analytics/Text mining seems to be one of our best hopes for understanding SNS, would you agree?
Gregory Piatetsky-Shapiro: Link Analysis can be valuable even without the text. For instance analyzing phone records, you can learn a lot by seeing who calls who. But yes, text analytics is also a very important trend and complimentary to link analysis.
Tom H. C. Anderson: I ask this of myself sometimes. How will SNS such as Linkedin and Facebook change the world?
Gregory Piatetsky-Shapiro: I think the great part is that these companies make a structure that already exists explicit and amenable to analysis. Some things like twitter perhaps seem a bit strange to our generation, but to younger generations it’s no stranger than driving and drinking coffee. So this next generation may totally accept a computer mediated network.
Tom H. C. Anderson: I like that, taking of something that tacit and making it explicit. Yes, I think we are already seeing a change in some of the work we do with GenX2Z, especially among younger girls; it may well change the future of how we network.
Outsourcing/Off-Shoring & KPO
Tom H. C. Anderson: I’ve been hearing more and more about KPO or Knowledge Process Outsourcing/Off-shoring from others in the field of market research as well as data mining, it seems to be a trend which has been growing as more and more companies seem to be trying to leverage skilled cheap labor, what are your thoughts on this if any in terms of data mining? Do you feel it will continue to grow?
Gregory Piatetsky-Shapiro: Clearly it will continue to grow; it’s like an economic force of gravity which drives outsourcing to Mumbai and similar areas. The US should develop skills that are harder to duplicate. Business skills are harder to offshore. Good Analytics are a combination of these skills.
Tom H. C. Anderson: In terms of Customer Segmentation 1-to1 Marketing has long been a buzz word in our industry. Do you feel this is still the direction we’re heading in (1-to-1 segmentation), or is this something that has been abandoned for more actionable, broader segmentation strategies?
Gregory Piatetsky-Shapiro: Segmentation sometimes can become even more specific than 1-to-1 because there may be multiple people on an account and each person may have different personas.
For example, there may be one Netflix account for a family, but can we figure out if the movie rating reflects the opinion of the husband, the wife, the children, or the dog?
Segmentation must be actionable; one must find the best way to segment.
Who is Leveraging Analytics for an Information Advantage?
Tom H. C. Anderson: In your experience what companies are leveraging analytics better/more?
Gregory Piatetsky-Shapiro:…To see which companies are leveraging analytics more as I mentioned before, you just have to look at who is recruiting online for what. Insurance companies, banks, Pharma (Microsoft, Travelers, etc.).
Tom H. C. Anderson: And how about by department (Market Research, Business Intelligence, Competitive Intelligence, Strategy etc.) who is making more/better use of KD/DM techniques currently? For Marketing, do you think CMO’s have a good enough understanding of these analytic techniques?
Gregory Piatetsky-Shapiro: In regard to CMO’s, I suspect they are aware of the buzz words. But probably not aware of potential. While analytics isn’t the ‘Golden Bullet’, it is very important. I see relatively few analytics/data mining job postings within the marketing department.