Teaching

Courses

Natural Language Processing

Lecture ยท B.Sc. Digital Humanities, B.Sc. Computer Science

This course introduces students to the field of Natural Language Processing. We start with classic NLP tasks, then cover prerequisites to language models such as preprocessing and tokenisation. We move on to transformers and large language models, and finally cover topics from computational linguistics and their application to LLMs.

Foundations of Machine Learning

Lecture ยท B.Sc. Digital Humanities, B.Sc. Computer Science

For a given task and measure of success, a computer program learns when its performance improves with experience. This course introduces machine learning as a guided search through a space of potential hypotheses. Students gain a broad overview of learning paradigms โ€” including linear regression, decision trees, support vector machines, Bayesian learning, and neural networks โ€” and understand the mathematical foundations that determine discrimination power and learning complexity.

Current Topics in Natural Language Processing

Seminar ยท M.Sc. Digital Humanities, M.Sc. Computer Science, M.Sc. Data Science ยท Irregular

This seminar covers a different topic from current NLP research each time it is offered. Students each present a paper, and at the end of the semester write up a project proposal for a new research project building on the current state of the topic. The most recent edition focused on Massively Multilingual Language Models.


Open Thesis Positions

We offer B.Sc. and M.Sc. thesis topics to students at Leipzig University in Computer Science, Digital Humanities, or Data Science. If you have your own topic that fits our research interests, we welcome that too.

To apply, email us with [THESIS] in the subject line, including: the topic you'd like to work on (with a short explanation if it's not from the list below), why you find it interesting, prior NLP coursework and practical experience, full transcripts, and your CV.

B.Sc. / M.Sc.

Expanding and Improving Universal Dependencies for German

Universal Dependencies (UD) is a framework for consistent annotation of grammar across human languages. The largest UD treebank of any language is UD-HDT, created via automatic conversion from the Hamburg Dependency Treebank and not actively maintained since 2017. A thesis could integrate additional pre-conversion data and fix known errors and inconsistencies. A M.Sc. thesis would additionally train UD parsers before and after the fixes and evaluate them on German treebanks to demonstrate the impact of the changes.

B.Sc. / M.Sc.

Adopt a UD Treebank

Most UD treebanks have not been actively maintained since creation, accumulating validation errors that limit their usefulness. If you read (natively or with reasonable comprehension) any of the languages in the UD validation report, a thesis could adopt that treebank, fix errors and warnings, and expand feature coverage or cross-treebank consistency. A M.Sc. thesis would validate improvements by training and comparing parsers before and after.

B.Sc. / M.Sc.

Unsupervised Discovery of Unaccusative and Unergative Verbs

Computational approaches to argument structure and verb classes. See arxiv.org/pdf/2111.00808 for background.

B.Sc. / M.Sc.

Noun Compound Benchmark

Testing models' understanding of noun compounds (e.g. "child camel jockey slave") by probing whether they can answer relational questions such as "is this a type of X?". Potential basis in the CoBra dataset; German data could be gathered from corpus searches and hand-annotated.