Type: Interesting and well-defined problem with broad applicability in data science, knowledge representation and knowledge base exploration
Background info: Knowledge Bases such as Freebase, WikiData and DBpedia contain giant amounts of knowledge data encompassing vast fields from book characters to geographic and political entities. Especially for data for which ontological knowledge such as Germany <is-a> Country is available it can often be useful to represent (parts) of the data in a tabular format. One particularly user relevant application for this is in automatically generated and updated tables on wikis as well as custom reduced data sets for data science.
Assume we wish to examine the distribution of cities on the globe. If we had a table of the form [City] [Country] [Population Count] [Latitude] [Longitude] this would be an easy task and as we just showed this table is trivially specified in a concise, structured format. However, trying to get this data out of a SPARQL based Freebase frontend can be quite challenging even if we are sure that as in this case the data is in there. Try writing a SPARQL query for our Qlever instance and you will see, if you don't believe us it's possible send me a mail.
This gap between the availability of a simple concise description, queryable datasets and the effort necessary to extract the result is what we aim to close.
Goal: design, implement, and evaluate a system that given a concise, structured table definition generates the resulting table using a knowledge base as its data source. While the definition is concise and structured note, that there is some fuzzyness to the category names which should not need to match the exact name of the associated knowledge base entity since these have often non-human-readable format, unexpected names and/or require a detour via mediator entities.
Step 0: Search the literature for solutions to this problem, familiarize yourself with the available knowledge base systems, SPARQL and try your hand at a medium number of manually designed queries and example tables. Start designing a simple definition for a table description format, this is not set in stone however. DOCUMENT YOUR QUERIES AND DISCOVERED PROBLEMS
Step 1: Design and implement a baseline version using for example exact entity names with a rule based approach. Design a simple but useful benchmark against which your system can be evaluated. This will give you an idea of where you stand, what kind of errors are still present. This also gives you the opportunity to evaluate if and where your approach may have gone in the wrong direction.
Step 2: Using more advanced techniques such as simple machine learning algorithms tackle the problems discovered in the previous step. Handle synonyms for the categories and possibly allow for additional data filtering. If necessary improve the performance of query generation. Design and implement a web frontend allowing easy interaction with your system both for human and machine users.