Yelp Me Out

Finding Similar Restaurant Reviewers

All Reviews Are Not Created Equal

A few months ago, my mother (the source of all my best ideas) was complaining about her experience with online review sites such as Yelp and TripAdvisor. Whenever she signed on, she found herself reading a lot of complaints that she found rather trivial. Such as this Amazon review for the film The Wolf of Wall Street:


OK, so that's an extreme example for comedic effect, but it highlights an important point: many popular online reviews sites have tens to hundreds of millions of reviewers; as such, they are bound to reflect a diverse array of interests and values. Which means that users often finds themselves reading rants and raves about aspects of local businesses that they find unimportant. My mom once encountered a one star TripAdvisor review where the user complained about the china pattern. Sheesh!

It would be nice if we could filter reviews based on the author's similarity to ourselves in terms of tastes and values. But how can online review platforms identify users with similar interests and preferences? And how can reviews sites such as Yelp or TripAdvisor effectively leverage this information to enhance their product?


Finding Similar Restaurant Reviewers

My approach to finding similar users was straightforwards: identify the topics (e.g. food, service, etc.) that users talked about the most in their reviews, and compute user similarity as a function of the topics most discussed. The devil of course lies in the details: how to tag topics in review text without having to annotated thousands of reviews by hand, and how to compute user similarity based on those tagged topics once you do have them?

Enter machine learning. I used a semi-supervised topic modeling approach to identify and segment users based on the topics of their reviews. This process had three main steps:

  1. Train text classifiers on human-labeled review data.
  2. Use trained models to predict topics in unlabelled review data and aggregate to the user level.
  3. Cluster users based on topics most often discussed.

Step (1): Train Text Classifier on Human-Labeled Review Data

First, I obtained human-labeled restaurant review data from the International Workshop on Semantic Evaluation (SemEval). These data consisted of restaurant reviews broken out by sentence, with each sentence hand-labeled as zero or more of four possible topics: Food, Service, Ambience, and Value.

Using the SemEval data, I trained and tested Naive Bayes, random forest, and linear SVM text classifiers to identify tagged topics in the corpus of SemEval review sentences. Since each sentence could be tagged as more than one topic, I trained one model per topic using binary labels (=1 if sentence belongs to a given topic, =0 otherwise).

Across the four topics, the linear SVM models attained the highest accuracies. After performing a cross-validated grid search to find the optimal SVM parameters, I got the following performance statistics:

Linear SVM Performance Metrics

Topic Accuracy Precision Recall
Food & Drink 87% 89% 79%
Service 92% 86% 76%
Ambience 90% 57% 71%
Value 96% 73% 81%


Step (2): Use Trained Models to Predict Topics in Unlabeled Yelp Data and Aggregate to the User Level

Next, I obtained a sample of over 200,000 restaurant reviews from over 60,000 users from the Yelp academic dataset. After splitting each review out into its component sentences, I used the models trained in the previous step to predict the topic(s) of each sentence. I then calculated the percentage of sentences on each topic per user, yielding a user-level topic distributions that looks like the data table below:

Proportion of Sentences by User Discussing….

User Food & Drink Service Ambience Value Total
Sandra S. 75% 15% 8% 2% 100%
Bob B. 72% 10% 10% 12% 104%
Tamara T. 50% 30% 5% 16% 101%
etc, etc... ... ... ... ... ...

Mock data for illustrative purposes only. Note that percentages can sum to over 100%, as a single sentence may talk about multiple topics.


Step (3): Cluster Users Based on Topics Most Discussed

Finally, I used k-means clustering to segment the most talkative users (e.g. users with at least 3-paragraphs of review text to analyze) into groups based on the topics most often discussed. Cluster analysis yielded the following four user segments:

  • The Fanatical Foodies, who talk about food and drink in nearly 80% of their review sentences on average, at the expense of all other topics.
  • The Suckers for Service, who value good service over food to some extent, and over ambience and value to a much greater extent.
  • The Feed Me Fasters, who appreciate good food and good service over ambience and value.
  • The Date Nighters, who show a much more even distribution over the topics discussed.

The graphs below show the average proportion of review sentences spent Yelping about each topic by user segment:

Use Cases

So how could Yelp use these segments to enhance their product? I thought of three potential applications:

  1. Review Ranking: Show reviews from people within a given users cluster first to ensure people get more relevant reviews more quickly.

  2. Customize Average Star Ratings: Tailor a business's average star rating to the user by upweighting reviews from people within a user's cluster.

  3. Provide Market Analytics Services to Yelp Businesses: Provide reports to Yelp businesses to help them better understand the tastes and preferences of their customer base.

Of course, these ideas are just the beginning - there are many more potential applications!

Thanks for reading! If your are interested in learning more about this project, you can check out my GitHub repo here.