Data Loading, Transformation and Feature Extraction:
Load and transform raw Yelp data challenge datasets into Pandas dataframe. Clean data and join different data sets.
Convert user review data to vector space for Natural Language Processing study by using tokenization with stemming and lemmatization.
Modeling and Data Product:
Define the successfulness of a business entity by their rating and build Naive-Bayes Model, Logistic Regression Classifier and Random Forest Classifier to make predictions based on user tips and reviews.
Use unsupervised learning to cluster users into groups. Identify and understand the common user preference within each of the group by inspecting the cluster centroid.
Build a restaurant recommender based on user's past visits and ratings by using collaborative filtering.