Topic: Restaurants
Team: Clarence Cheung, Jin Ruan
Restaurant data of ten most populated US cities is extracted from
Yelp
and
Yellow Pages.
Two csv files (tables) are generated for further processing in the next stage.
Using the csv files, a candidate set is obtained by blocking.
Yelp: | |||
Yellow
|
Candidate set: | ||
ipython code: | ||
Misc: |
Golden Set G: | ||
ipython code: | ||
Misc: |
blocking: | |
matching: | |
User Report: | |
User Survey: |
Special thanks to Prof. AnHai Doan and Pradap Konda