Topic Modeling Amazon Reviews

by Kevin Davenport · March 7, 2017

This article is originally published at http://kldavenport.com

Adapted from Biel 2011

I found Professor Julian McAuley’s work at UCSD when I was searching for academic work identifying the ontology and utility of products on Amazon. Professor McAuley and his students have accomplished impressive work inferring networks of substitutable and complementary items. They constructed a browseable product graph of related products and discovered topics or ‘microcategories’ that are associated with product relationships to infer networks of substitutable and complementary products. Much of this work utilizes topic modeling, and as I’ve never applied it in academia or work, this blog will be a practical intro to Latent Dirichlet Allocation (LDA) through code.

More broadly what can we do with and what do we need to know about LDA?

It is an Unsupervised Learning Technique that assumes documents are produced from a mixture of topics
LDA extracts key topics and themes from a large corpus of text
Each topic is a ordered list of representative words (Order is based on importance of word to a Topic)
LDA describes each document in the corpus based on allocation to the extracted topics
Many domain specific methods to create training datasets
It is easy to use for exploratory analysis

We’ll be using a subset (reviews_Automotive_5.json.gz) of the 142.8 million reviews spanning May 1996 – July 2014 that Julian and his team have compiled and provided in a very convenient manner on their site.

The post Topic Modeling Amazon Reviews appeared first on Kevin Davenport.

Thanks for visiting r-craft.org
This article is originally published at http://kldavenport.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Topic Modeling Amazon Reviews

You may also like...

Categories

Topic Modeling Amazon Reviews

You may also like...

Variational Mode Decomposition (VMD) using R

styler 1.1.0

Numpy Argmax, Explained

Categories