0% found this document useful (0 votes)

203 views

Data Mining of Restaurant Review Using W PDF

This document summarizes a research paper that aims to create a prediction model for restaurant reviews using machine learning techniques. The paper extracts review data from the Zomato website and analyzes factors that could improve restaurant ratings, such as cuisine options and home delivery. It performs data preprocessing on the dataset before applying various machine learning algorithms like Naive Bayes, Decision Trees, and Multilayer Perceptron to classify reviews and predict triggers that would enhance ratings. The Multilayer Perceptron model achieved the best results at predicting restaurant rating based on the Zomato review data.

Uploaded by

Priyankka PR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views

Data Mining of Restaurant Review Using W PDF

Uploaded by

Priyankka PR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

International Journal of Engineering and Techniques - Volume 4, Issue 3, May - June 2018

RESEARCH ARTICLE OPEN ACCESS

Data mining of restaurant review using WEKA

Gayathri.T1
Assistant Professor, Department of Computer science, New Horizon college of Engineering, Bangalore

Abstract:
Many customers visit a restaurant based on food critics and reviews on websites such as Zomato.com. Restaurant strive at the
initial stages of opening but their demand deteriorates after the initial hype. Further business for these restaurants are largely
based on their reviews. What can the restaurant do to make their ratings better? Food taste is an obvious trigger to improve the
ratings of a restaurant, but there are other factors that improve the ratings of a restaurant. Such as inclusion of cuisine, option
for home delivery, presence of table etc. This paper aims a creating a prediction model for the reviews and analyze the trigger
event that would improve the ratings.
Keywords- Restaurant review, Zomato, Multilayer Perceptron, Naïve Bayes, J48, Data mining

I. INTRODUCTION
Machine learning sheds light on various domains data. From the learned knowledge it predicts,
unexplored by human analyses. It provides a view associates, classifies and clusters the data. For this
point that are not visible in general. The prediction purpose, various algorithms are used.The
and classification models which took scientist information gained from machine learning is better
decades to created are achieved in days. Data when understood through visualization.
available in huge amount can be studied through Restaurant is a domain which are traversed by
machine learning algorithms to arrive at meaningful small as well as big players. Data mining provides a
information. way for both to improve their business with
Data miningused in predicting disease diagnosis, minimum effort. Restaurant business rely on the
weather, customer expectations; learning the data to taste of food, the variety of cuisine that is provided
create automation, purchase pattern etc.There are in the restaurant, ambience, availability of home
four steps in the process of Data mining.Data delivery, online booking, price etc. When any of the
collection, Data pre-processing, machine learning factor is improved or included it is possible to
and Data visualisation. increase customer attention and thus increase
Data collection is a predominant and difficult step productivity in business.
in data mining. The data that is collected should be Zomato is a webpage and a mobile application
relevant and should cover all the spaces of the which provides information about restaurants,
domain. The concentration on one sample space reviews of restaurants and allows online ordering
would lead to bias in the prediction or classification from the restaurants. The data is extracted using
of result. After data collection comes data pre- Zomato API [1] by Shruthi Metha. This dataset was
processing. When data is collected not all downloaded from Kaggle an online repository for
information are relevant in the machine learning. dataset [2].
For example, the age and date of birth can be two
attributes in an employee data, this information is II. LITERATURE SURVEY
dependent on one another. The presence of such There are a couple of research papers published
redundant information would lead to decreased based on restaurant reviews and hotel reviews.
accuracy. Next step is machine learning, Machine Following is a survey of such papers, [3] is a paper
learning is the process where the system learns the which reviews the Thai restaurants around the
world. It attempts to find classify the restaurant

ISSN: 2395-1303 http://www.ijetjournal.org Page 642

International Journal of Engineering and Techniques - Volume 4, Issue 3, May - June 2018

based on the reviews. The model proposed in this classification or prediction it is necessary to identify
paper is, extraction of review from social the features that would enable higher accuracy in
networking site using text processing, artificial classification. [7] suggests the use of data pre-
neural network is used to classify the dataset as processing to improve machine learning.
positive and negative. mRMR feature selection Classification and clustering accuracy is
technique is used for selecting the features of data predominantly dependent on the proper
set. representation of data. Correlation based feature
[4] paper analyses the fast food franchise data to selection is used to reduce the number of features.
help franchise reap benefit. Time series data from
store as well as corporate is used with ARIMA V. MACHINE LEARNING
model understand data. Outlier detection is used to Machine learning literally means, make the
identify sales opportunities and risk. machine learn, machine learns by processing the
In [5] Yelp restaurant review dataset is used to data with various machine learning algorithm[7].
model a system to improve restaurants. Here There is no fixed algorithm to provide high
Latent Dirichlet Allocation (LDA) algorithm is accuracy this is called No Free lunch theorem [8],
used to find subtopics from the review. The ratings however deep learning provides a better accuracy in
for the hidden topic allowed to understand the most cases.
reason for rating. For any application it is important to apply few
In paper [6] the reviews are scraped from machine learning algorithms to find out the best
ww.tripadvisor.com using web crawler. The suited model. Machine learning algorithms can be
reviews are distinguished into positive and grouped under Bayes, Rule Based, Neural network
negative polarity using sentiwordnet and various and Decision tree.
machine learning algorithm are used to check their
accuracy. A. Naïve Bayes
Naïve Bayes theorem is the best machine learning
In most of these research papers reviews are
algorithm to use when the features are independent
extracted from one website and classification
of one another[10]. Each instance is considered as a
model is created. This paper is an attempt to create
vector. The posterior probability of a class given a
a trigger model to improve restaurants based on
predictor is found with
Zomato dataset.
P(h|d) = (P(d|h) * P(h)) / P(d)
III. DATA COLLECTION P(d|h) - the posterior probability of class given a
Zomato data set is downloaded from Kaggle data predictor
repository. The dataset contains 22 attributes and P(h) - Prior probability of a class
9552 instances. The attributes present in the dataset P(d) - Prior probability of a predictor
are: Restaurant Id, Restaurant Name, Country Code, B. Decision Tree
City, Address, Locality, Locality Verbose, Decision tree is arrived at by finding the optimum
Longitude, Latitude, Cuisines, Average Cost for way to arrange the various nodes. There are two
two, Currency, Has Table booking, Has Online ways to identify the best partition of dataset at node,
delivery, Is delivering, Switch to order menu, Price information gain or gain ratio. The decision tree
range, Aggregate Rating, Rating color, Rating text, model which uses information gain is ID3 and gain
Votes. ratio is J48 [11]
IV. DATA PRE-PROCESSING C. Multilayer Perceptron
Data pre-processing can be data cleaning or data Multilayer perceptron contains large number of
transformation. Dataset in Kaggle can be used for nodes called as neurons, joined together so that they
classification or association mining. When used in for input layer hidden layer and output layer. The

ISSN: 2395-1303 http://www.ijetjournal.org Page 643

International Journal of Engineering and Techniques - Volume 4, Issue 3, May - June 2018

instances are supplied though the input layer, bias VII. CONCLUSION AND FUTURE WORK
and weight are added at the hidden layer and Zomato dataset is used to a create
supplies the class in output layer [12]. classification model for restaurant rating. It was
found that Multilayer perceptron work well with
VI. EXPERIMENTATION
this dataset. In this paper an attempt is made to
The dataset acquired from Kaggle, first undergoes predict the trigger which would further enhance the
data preprocessing. From the information about rating of the review. This project can be further
dataset it was found that some attributes were extended to create a tool to evaluate the trigger to
redundant, restaurant id and restaurant represented improve the ratings.
the same information; Locality, locality verbose and
latitude longitude represented the same information; ACKNOWLEDGMENT
rating color and rating text represented the same I thank my college New Horizon college of
information. To avoid redundancy of attributes only engineering for providing support and tools for this
one the representation was kept. Average cost for research. I thank Head of Department,
two is an attribute whose value is not standard. It Dr.B.Rajalakshmi for her support and guidance.
depends on the currency attribute. Using the
currency information, the average price is converted REFERENCES
into standard US dollar format. Correlation based 1. https://developers.zomato.com/api#headline1
feature selection with ranker algorithm is done to 2. https://www.kaggle.com/shrutimehta/zomato-restaurants-
data
reduce the number of dataset.
3. Claypo, Niphat, and SaichonJaiyen. "Opinion mining for
Thai restaurant reviews using neural networks and
Machine learning algorithm such as J48, Naïve mRMR feature selection." Computer Science and
Bayes and Multilayer perceptron are prone to reap Engineering Conference (ICSEC), 2014 International.
better results in most dataset. So Mutilayer IEEE, 2014.
4. Liu, Lon-Mu, et al. "Data mining on time series: an
perceptron, J48, naïve bayes classification is used
illustration using fast-food restaurant franchise data."
learn the algorithm in WEKA is free online data Computational Statistics & Data Analysis 37.4 (2001):
mining tool published by Waikato University. The 455-476.
dataset is preprocessed, Feature selected, trained 5. Huang, James, Stephanie Rogers, and EunkwangJoo.
and tested using WEKA.The algorithm found to "Improving restaurants by extracting subtopics from yelp
reviews." iConference 2014 (Social Media Expo) (2014).
reap better result is J48.
6. V. B. Raut and D. D. Londhe, "Opinion Mining and
Summarization of Hotel Reviews," 2014 International
TABLE II Conference on Computational Intelligence and
ACCURACY RESULTS FOR CLASSIFICATION MODEL FOR ZOMATODATSET Communication Networks, Bhopal, 2014, pp. 556-559.
doi: 10.1109/CICN.2014.126
Algorithm Accuracy
7. D. H. Deshmukh, T. Ghorpade, and P. Padiya,
J48 97.2% “Improving classification using preprocessing and
Multilayer 78.16% machine learning algorithms on nslkdddataset,” in
Perceptron Communication, Information & Computing
Naïve Bayes 82.2%
Technology(ICCICT), 2015 International Conference on.
IEEE, 2015, pp. 1–6
To find the trigger to improve ratings, a sample 8. Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas.
record of poor rating is taken and modified to "Supervised machine learning: A review of classification
reduce the price range to one. This sample record is techniques." Emerging artificial intelligence applications
in computer engineering 160 (2007): 3-24.
tested on J48 Zomato model. It was found that there 9. Wolpert, David H., and William G. Macready. "No free
was no change in rating. Whereas when the country lunch theorems for optimization." IEEE transactions on
code was changed there was change in rating evolutionary computation 1.1 (1997): 67-82.
10. Lewis, David D. "Naive (Bayes) at forty: The
independence assumption in information retrieval."

ISSN: 2395-1303 http://www.ijetjournal.org Page 644

International Journal of Engineering and Techniques - Volume 4, Issue 3, May - June 2018

European conference on machine learning. Springer, [1994] IV. Proceedings of the 1994 IEEE Workshop.
Berlin, Heidelberg, 1998. IEEE, 1994.
11. Quinlan, J. R. C4.5: Programs for Machine Learning.
Morgan Kaufmann Publishers, 1993
12. Goodman, Rodney M., and Zheng Zeng. "A learning .
algorithm for multi-layer perceptrons with hard-limiting
threshold units." Neural Networks for Signal Processing

ISSN: 2395-1303 http://www.ijetjournal.org Page 645

Ubisoft Sample Aptitude Placement Paper
75% (8)
Ubisoft Sample Aptitude Placement Paper
5 pages
5S Implementation in Hospitals
No ratings yet
5S Implementation in Hospitals
15 pages
Alumni Portal
No ratings yet
Alumni Portal
46 pages
RTIT-Notes
0% (1)
RTIT-Notes
31 pages
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
No ratings yet
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
4 pages
Database Management Systems: ©silberschatz, Korth and Sudarshan 1.1 Database System Concepts
No ratings yet
Database Management Systems: ©silberschatz, Korth and Sudarshan 1.1 Database System Concepts
33 pages
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
No ratings yet
Unit-Ii Chapter-3 Beyond Binary Classification Handling More Than Two Classes
16 pages
Develop Static Pages (Using Only HTML) of An Online Book Store. Should Consist The Following Pages
No ratings yet
Develop Static Pages (Using Only HTML) of An Online Book Store. Should Consist The Following Pages
96 pages
Naved PHP
No ratings yet
Naved PHP
31 pages
Web Development
No ratings yet
Web Development
23 pages
Solved Osee Sppu Q - Paper
No ratings yet
Solved Osee Sppu Q - Paper
18 pages
REAL ESTATE MANAGEMENT SYSTEM Report
No ratings yet
REAL ESTATE MANAGEMENT SYSTEM Report
58 pages
Shivaji University, Kolhapur
No ratings yet
Shivaji University, Kolhapur
12 pages
SPM
No ratings yet
SPM
83 pages
Simple Billing System - (PROJECT SYNOPSIS)
No ratings yet
Simple Billing System - (PROJECT SYNOPSIS)
20 pages
Store Management System Project 29092013023847 Store Management System Project
100% (1)
Store Management System Project 29092013023847 Store Management System Project
50 pages
Structure of Mobile Computing Application
No ratings yet
Structure of Mobile Computing Application
2 pages
Major Synopsis IPU PDF
No ratings yet
Major Synopsis IPU PDF
17 pages
Input and Output Text and Binary I/O: Introduction To Java Y.Daniel Liang 1
No ratings yet
Input and Output Text and Binary I/O: Introduction To Java Y.Daniel Liang 1
64 pages
Exam Registration System
No ratings yet
Exam Registration System
13 pages
Assignment 1 WT Dinesh Dodeja
No ratings yet
Assignment 1 WT Dinesh Dodeja
12 pages
2.1.1. Functional Requirement: Reservation
No ratings yet
2.1.1. Functional Requirement: Reservation
3 pages
Project Report Text Editor in Java
100% (1)
Project Report Text Editor in Java
10 pages
Freelancing Platform Synopsis
No ratings yet
Freelancing Platform Synopsis
9 pages
Synopsis Main
No ratings yet
Synopsis Main
9 pages
Unit 5 - ODMG Object Model
No ratings yet
Unit 5 - ODMG Object Model
28 pages
Build and Fix Model (Also Referred To As An Ad Hoc Model), The Software Is Developed Without Any
No ratings yet
Build and Fix Model (Also Referred To As An Ad Hoc Model), The Software Is Developed Without Any
5 pages
Programming Using Asp Dot Net Notes
No ratings yet
Programming Using Asp Dot Net Notes
138 pages
SPM 2 Marks Refer
No ratings yet
SPM 2 Marks Refer
13 pages
B.Tech - CS - Design 3rd Year Year 2023-24
No ratings yet
B.Tech - CS - Design 3rd Year Year 2023-24
33 pages
19 - Crop Recommender System Using Machine Learning Approach
No ratings yet
19 - Crop Recommender System Using Machine Learning Approach
64 pages
Practical 7
No ratings yet
Practical 7
4 pages
Bus Ticket
No ratings yet
Bus Ticket
62 pages
Srs Template
No ratings yet
Srs Template
9 pages
Savitribai Phule Pune University: A Report On Mini Project
No ratings yet
Savitribai Phule Pune University: A Report On Mini Project
10 pages
Se Unit2
No ratings yet
Se Unit2
115 pages
STQA SEM III SPPU MAR APR 2023 FINAL PDF-1_removed
No ratings yet
STQA SEM III SPPU MAR APR 2023 FINAL PDF-1_removed
35 pages
Question Paper: Bms College of Engineering
No ratings yet
Question Paper: Bms College of Engineering
3 pages
Mean Stack Technologies Lab Record
No ratings yet
Mean Stack Technologies Lab Record
49 pages
JSP and Servlets in Java: Abstract
No ratings yet
JSP and Servlets in Java: Abstract
5 pages
Information Retrieval
100% (1)
Information Retrieval
11 pages
Concurrency and Transaction Management in An Object Oriented Database
No ratings yet
Concurrency and Transaction Management in An Object Oriented Database
23 pages
of Restaurant
100% (1)
of Restaurant
14 pages
Unit III - Knowledge Representation
No ratings yet
Unit III - Knowledge Representation
21 pages
Q.1 What Are Various Parameters of An Applet Tag. Answer:: Java and Web Design
No ratings yet
Q.1 What Are Various Parameters of An Applet Tag. Answer:: Java and Web Design
22 pages
Forward and Reverse Engg
No ratings yet
Forward and Reverse Engg
21 pages
Railway Reservation System
0% (1)
Railway Reservation System
15 pages
Industrial Training Report On (App in Flutter) : Shopping E-Commerce
No ratings yet
Industrial Training Report On (App in Flutter) : Shopping E-Commerce
40 pages
Web Technology II
No ratings yet
Web Technology II
31 pages
RWPD Theory Practical Seprate
100% (1)
RWPD Theory Practical Seprate
134 pages
Unit - 5 DBMS Kca 204
No ratings yet
Unit - 5 DBMS Kca 204
19 pages
Nested Classes
No ratings yet
Nested Classes
23 pages
Chapter 3 SRS
No ratings yet
Chapter 3 SRS
8 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
Database Management System
No ratings yet
Database Management System
32 pages
Candidate Generation and Pruning
No ratings yet
Candidate Generation and Pruning
9 pages
Dbms Mini Project: Bhoomika Manjunath Vaidya 5 Sem Cse-A 1BG19CS022
No ratings yet
Dbms Mini Project: Bhoomika Manjunath Vaidya 5 Sem Cse-A 1BG19CS022
9 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
E-Bazzar: Abstract - Project E-Bazaar Is A Workplace Strategy That
No ratings yet
E-Bazzar: Abstract - Project E-Bazaar Is A Workplace Strategy That
5 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Embedding Sustainability in Procurement
No ratings yet
Embedding Sustainability in Procurement
10 pages
ABC Case Annexure - To - Students
No ratings yet
ABC Case Annexure - To - Students
7 pages
HR Finance Supply Chain Sales Warehouse CRM Quality Manufact Uring
No ratings yet
HR Finance Supply Chain Sales Warehouse CRM Quality Manufact Uring
1 page
Various Levels of Implementation: 1. District Level
No ratings yet
Various Levels of Implementation: 1. District Level
5 pages
Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
No ratings yet
Neural Networks: Sree Rama Vamsidhar S., Arun Kumar Sivapuram, Vaishnavi Ravi, Gowtham Senthil, Rama Krishna Gorthi
7 pages
A Deep Increasing-Decreasing-Linear Neural Network For Financial Time Series Prediction
No ratings yet
A Deep Increasing-Decreasing-Linear Neural Network For Financial Time Series Prediction
23 pages
Machine Learning Mining Companies
No ratings yet
Machine Learning Mining Companies
5 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
14 pages
An Intelligent Flow Measurement Technique Using USM With Optimized NN
No ratings yet
An Intelligent Flow Measurement Technique Using USM With Optimized NN
13 pages
ANN Deep Learning Course Structure AUG-DEC2023
No ratings yet
ANN Deep Learning Course Structure AUG-DEC2023
1 page
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
Modeling Hourly Energy Use in Commercial Buildings With Fourier Series Functional Forms
No ratings yet
Modeling Hourly Energy Use in Commercial Buildings With Fourier Series Functional Forms
7 pages
Neuro-Fuzzy Approach For Fault Location and Diagnosis Using Online Learning System
No ratings yet
Neuro-Fuzzy Approach For Fault Location and Diagnosis Using Online Learning System
8 pages
M E EmbeddedSystemTechnologies
No ratings yet
M E EmbeddedSystemTechnologies
31 pages
Self Organizing Maps - Applications
No ratings yet
Self Organizing Maps - Applications
714 pages
YOLOv8n-FAWL Object Detection for Autonomous Driving Using YOLOv8 Network on Edge Devices
No ratings yet
YOLOv8n-FAWL Object Detection for Autonomous Driving Using YOLOv8 Network on Edge Devices
12 pages
Neural Cryptography
No ratings yet
Neural Cryptography
13 pages
Fuzzy Min-Max Classification With Neural Networks: Patrick K
No ratings yet
Fuzzy Min-Max Classification With Neural Networks: Patrick K
10 pages
BSC Thesis Geophy
No ratings yet
BSC Thesis Geophy
57 pages
Outage Probability Analysis For Relay-Aided Self-Energy Recycling Wireless Sensor Networks Over INID Rayleigh Fading Channels
No ratings yet
Outage Probability Analysis For Relay-Aided Self-Energy Recycling Wireless Sensor Networks Over INID Rayleigh Fading Channels
12 pages
How AI Is Impacting Physical Access Control
No ratings yet
How AI Is Impacting Physical Access Control
11 pages
Document 1
No ratings yet
Document 1
16 pages
Week 1 Sol Merged
No ratings yet
Week 1 Sol Merged
39 pages
Economic Dispatch
100% (1)
Economic Dispatch
9 pages
Applying Data Mining To Customer Churn Prediction in An Internet Service Provider
No ratings yet
Applying Data Mining To Customer Churn Prediction in An Internet Service Provider
7 pages
Student Performance Literature Review
100% (3)
Student Performance Literature Review
8 pages
Review of Python Applications in Solving Oil and Gas Problems
No ratings yet
Review of Python Applications in Solving Oil and Gas Problems
11 pages
Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network
No ratings yet
Detecting Fraudulent Financial Statement Under Imbalanced Data Using Neural Network
7 pages
Terminologies of ANN
No ratings yet
Terminologies of ANN
3 pages
Fourier Feature Approximations For Periodic Kernels
No ratings yet
Fourier Feature Approximations For Periodic Kernels
8 pages
Feng Liang Et Al - 2021 - Efficient Neural Network Using Pointwise Convolution Kernels With Linear Phase
No ratings yet
Feng Liang Et Al - 2021 - Efficient Neural Network Using Pointwise Convolution Kernels With Linear Phase
8 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
ML_notion_1
No ratings yet
ML_notion_1
18 pages
High-Performance Extreme Learning Machines - A Complete Toolbox For Big Data Applications PDF
No ratings yet
High-Performance Extreme Learning Machines - A Complete Toolbox For Big Data Applications PDF
15 pages

Uploaded by

Uploaded by

International Journal of Engineering and Techniques - Volume 4, Issue 3, May - June 2018

RESEARCH ARTICLE OPEN ACCESS

Data mining of restaurant review using WEKA

ISSN: 2395-1303 http://www.ijetjournal.org Page 642

ISSN: 2395-1303 http://www.ijetjournal.org Page 643

ISSN: 2395-1303 http://www.ijetjournal.org Page 644

ISSN: 2395-1303 http://www.ijetjournal.org Page 645

You might also like