= Accurate Word Extraction from Documents with Complex Layouts = '''Type''': An interesting and practical bachelor thesis. A basic understanding of Machine Learning''' is required'''; knowledge of Deep Learning is desirable. The preferred programming language is Python. '''Background info''': TODO '''Goal''': Merging hyphenated words by using machine learning techniques; taking into account that a word can be a compound word, in which case the hyphen between the two parts of the word needs to be retained on merging the parts. ''Challenge 1'': TODO ''Challenge 2'': TODO '''Subgoal 1''': TODO '''Subgoal 2''': TODO <
><
> Supervision by [[http://ad.informatik.uni-freiburg.de/staff/korzen|Claudius Korzen]].