Michelle Luo
  • Home
  • NEWS
  • Project
  • LeetCode
  • Travel
  • Entertainment

Yelp Dataset Challenge

10/1/2018

 
Picture
Data Loading, Transformation and Feature Extraction:
  1. Load and transform raw Yelp data challenge datasets into Pandas dataframe. Clean data and join different data sets.
  2. Convert user review data to vector space for Natural Language Processing study by using tokenization with stemming and lemmatization.
Modeling and Data Product:
  1. Define the successfulness of a business entity by their rating and build Naive-Bayes Model, Logistic Regression Classifier and  Random Forest Classifier to make predictions based on user tips and reviews.
  2. Use unsupervised learning to cluster users into groups. Identify and understand the common user preference within each of the group by inspecting the cluster centroid.
  3. Build a restaurant recommender based on user's past visits and ratings by using collaborative filtering.
Presentation Slides of this project.   - pdf

Comments are closed.
    Picture

    Design Projects

    Always learn from each other and get great ideas!

 © Michelle Luo
  • Home
  • NEWS
  • Project
  • LeetCode
  • Travel
  • Entertainment