Michelle Luo
  • Home
  • NEWS
  • Project
  • LeetCode
  • Travel
  • Entertainment

Yelp Dataset Challenge

10/1/2018

 
Picture
Data Loading, Transformation and Feature Extraction:
  1. Load and transform raw Yelp data challenge datasets into Pandas dataframe. Clean data and join different data sets.
  2. Convert user review data to vector space for Natural Language Processing study by using tokenization with stemming and lemmatization.
Modeling and Data Product:
  1. Define the successfulness of a business entity by their rating and build Naive-Bayes Model, Logistic Regression Classifier and  Random Forest Classifier to make predictions based on user tips and reviews.
  2. Use unsupervised learning to cluster users into groups. Identify and understand the common user preference within each of the group by inspecting the cluster centroid.
  3. Build a restaurant recommender based on user's past visits and ratings by using collaborative filtering.
Presentation Slides of this project.   - pdf

Online store monetization experiment design

4/11/2018

 
Picture
  1. Identified potential monetization opportunity to improve buyer conversion. Discovered strong correlation between user conversion and form of payment, through exploratory analysis on longitudinal user data in Python.
  2. Built interactive and scalable Python dashboard to measure impact of A/B test in store purchase flow by drawing Jackknife confidence interval and calculating statistical significance.
  3. Made recommendation of running experiment to incentivize user to add credit cards as payment methods or purchase gift card, presented this recommendation and demoed Python dashboard to audience of 30 people including 6 capstone committee members.
​More Details with Slides

Credit Card Fraud Detection

2/9/2018

1 Comment

 
Picture
  1. Identified fraud transactions with over 98% accuracy on hugely imbalanced (0.17%) transaction level data, using Synthetic Minority Over-sampling Technique (SMOTE)
  2. Implemented machine learning approach including Logistic Regression and Random Forest
  3. Made recommendations on designing an automatic fraud detection system with the built models
1 Comment

Zillow Prize: Zillow’s Home Value Prediction

1/11/2018

 
Picture
  1. Extracted 32 features from raw housing data containing different types, such as categorial, numerical and time series data, imputed missing data using multivariate imputation by chained equation (MICE) algorithm.
  2. Performed feature selection through exploratory analysis.
  3. Fitted linear regression model with regularization to control for multicollinearity and also built decision tree, random forest, boosting decision tree to predict housing price.
  4. Achieved 0.008416363 RMSE by boosting decision tree model on test data set.

Information Security of Digital Image

8/26/2013

 
Picture
Shanghai Innovative Project
--Grant from The National Natural Science Funds Fund: Streaming Media
1. Implemented image pre-processing such as denoising and rotation
2. Implemented text conversion, data embedding into LSB plane of cover image
3. Developed user ID authentication interface to meet the requirement of embedded infor-mation security and images authentication protection
4. Successfully completed Excellent Undergraduate Training Program
    Picture

    Design Projects

    Always learn from each other and get great ideas!

 © Michelle Luo
  • Home
  • NEWS
  • Project
  • LeetCode
  • Travel
  • Entertainment