Project: Data Integration

Topic: Restaurants

Team: Clarence Cheung, Jin Ruan


Description:

Restaurant data of ten most populated US cities is extracted from Yelp and Yellow Pages.
Two csv files (tables) are generated for further processing in the next stage.
Using the csv files, a candidate set is obtained by blocking.


part 2: sources

Yelp:

Yellow
Pages:


part 3: Blocking

Blocking explanation can be found here.

Candidate set:

ipython code:

Misc:


part 4: Matching

The Matching explanation is available here.

Golden Set G:

ipython code:

Misc:


part 4.5: Final File Submission

blocking:

matching:


part 5: Bonus

Magellan User Report/Survey

User Report:

User Survey:


Acknowledgement

Special thanks to Prof. AnHai Doan and Pradap Konda