Seminar: Large Language Models

Why should I be here?

Understand how LLMs work and where their limitations are!
Learn how to automate tasks with a language model
Learn how to process large data amounts with a language model
Learn how to use a large language model in a product

About the seminar

Roughly divided into 3 parts: theory, training, and application
Theory:
- Learn about important topics in natural language processing
- Topics include tokenization, matching, statistical text analysis, language models, and embeddings
- Coding examples in Python alongside theoretical concepts

Training:
- Hands-on exercises after each topic
- Solve coding exercises to consolidate knowledge
- Utilize Jupyterlab environment for exercises

Application:
- Apply knowledge in own projects
- Teams of 2-3 develop and implement prototypes
- Small application involving a language model

Intended learning outcomes (Part 1)

Understand the basics of natural language processing, including its tasks and challenges
Understand the general concept of LLMs and why it makes the above so much easier
Write simple Python programs / scripts and use basic data and control structures

Intended learning outcomes (Part 2)

Access an LLM via the OpenAI API and how to work with the result
Understand the concept of text embeddings and use them via the OpenAI API
Understand why and how LLMs can be used for process automation
Use an LLM in a small application
Have fun!

Content 0 (optional)

Short introduction to Programming in Python in the context of Natural Language Processing and Large Language Models
- Basics (Syntax, Variables, Data Types, Conditional Statements etc.)
- Lists & Loops
- Dictionaries & Classes

Content 1

Quick overview of classic NLP
- Text processing (Tokenization, Lemmatization, etc.)
- Applications (Classification, Sentiment Analysis, Matching, etc.)
- Challenges
Introduction to LLM
- Text processing with neural networks
- Sequence generation & language modeling

Content 2

Introduction to the OpenAI API
- Prompting
- ~~Parameterization~~
- Tool calling
~~Introduction to embeddings~~
- ~~Similarity~~
- ~~Visualization & Clustering~~
~~(Ethics & Privacy)~~

The schedule

Day 1 (10.04.2025):

data:unplugged

Day 2 (24.04.2024):

Introduction & Getting to know each other
Survey (experiences & expectations) & Learning goals & Evaluation criteria
Review data:unplugged
Introduction to the general topic & Python & Jupyter
Introduction NLP (tokenization, matching, statistical text analysis)

Day 3 (30.04.2024):

Introduction to LLM & OpenAI API
Prompting
Application concept & Group brainstorming
Start: Project work on prototype & mentoring

Day 4 (08.05.2024):

Ctd: Project work on prototype & mentoring
Project presentations & reflections on the seminar
Backup: Left-over topics

Not included this semester or only limited:

~~Embeddings~~
~~Advanced GPT topics (image data, parameterization, tool calling)~~
~~Real-world examples of applications (& implementation) & limitations~~

After the seminar (~1d):

Refine data:unplugged summary
Refine prototype business case & potential applications of prototype
Reflections & lessons learned → Hand in data:unplugged summary and 2-page project summary

Evaluation

Your data:unplugged summary: 35%
Your prototype & presentation on the last day of the seminar: 50%
Your activity during the seminar: 15%

What is the data:unplugged summary?

A short reflection on a session (approx. 300–500 words):

Summary: What key points were made during the session?
Practical Relevance: What specific use cases or problems were addressed?
Personal Assessment: Which aspects did you find particularly interesting or critical? Do you see potential for your own ideas or projects?

What is the summary?

2-3 pages only!
What is your prototype? What can I do?
What could be a business case for your prototype, or where can it be applied?
What are current limitations of your prototype and how could you overcome them?
What have been your main learnings during the creation of your prototype (and/or) the seminar itself?

Code, Exercises & Prototypes

Jupyterlab & Exercises

Jupyterlab

To get started right away, we have prepared a Jupyterlab!

Exercises

All exercises can be solved in the Jupyterlab, all packages and datasets are pre-installed!