Accurate Word Extraction from Documents with Complex Layouts
Type: An interesting and practical bachelor thesis. A basic understanding of Machine Learning is required; knowledge of Deep Learning is desirable. The preferred programming language is Python.
Background info: TODO
Goal: Merging hyphenated words by using machine learning techniques; taking into account that a word can be a compound word, in which case the hyphen between the two parts of the word needs to be retained on merging the parts.
Challenge 1: TODO
Challenge 2: TODO
Subgoal 1: TODO
Subgoal 2: TODO
Supervision by Claudius Korzen.