Master Thesis Project

Git for Master Thesis Code

Data Integration of event data from the DBpedia and YAGO Knowledge Graphs using different blocking methods and matching rules. Based on the WInte.r - Web Data Integration Framework (older version) and the Blocking Framework by Papadakis et al. In addition, the SILK Linked Data Integration Framework was used to learn different matching rules.

All blocking methods were tested on five subsets (having different or additional attributes) to analyze the importance of the attributes.

Different consecutive blocking sub-tasks were tested:

Standard (token) blocking and Attribute Clustering for building the blocks
Block Filtering for cleaning the blocks
Meta-Blocking for cleaning comparisons

Tested parameters:

Standard (token) blocking: parameter-free
Attribute Clustering: five different representation models)
Block Filtering: 20 different ratios [0.05, 1.0] (steps of size 0.05) for the best of both block building methods
Meta-Blocking: five weighting schemes and four pruning algorithms for the best Block Filtering methods (for both block building methods)

Main result: Blocking only becomes efficient when applying block- or comparison-refinement methods. For the analyzed data, taking all attributes for block building and removing entities from 50% of the largest blocks works best when regarding Pairs Completeness and Reduction Ratio. A simple matching rule that compares the stripped URI of the entities using a Levenshtein similarity measure with a high threshold of 0.96 outperforms all learned matching rules concerning the F-measure.

The full thesis is uploaded as thesis.pdf if you are interested in further details.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
DBpedia		DBpedia
YAGO		YAGO
createDatasets		createDatasets
webApp		webApp
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
convertDBpedia_Event.py		convertDBpedia_Event.py
convertDBpedia_Event_with_labels.py		convertDBpedia_Event_with_labels.py
convertGeoNames.py		convertGeoNames.py
thesis.pdf		thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master Thesis Project

About

Releases

Packages

Languages

dringler/WebDataIntegrationOfEventsInSemanticKGs

Folders and files

Latest commit

History

Repository files navigation

Master Thesis Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages