Seminar: Large Language Models

Why should I be here?

  • Understand how LLMs work and where their limitations are!
  • Learn how to automate tasks with a language model
  • Learn how to process large data amounts with a language model
  • Learn how to use a large language model in a product

About the seminar

  • Roughly divided into 3 parts: theory, training, and application
  • Theory:
    • Learn about important topics in natural language processing
    • Topics include tokenization, matching, statistical text analysis, language models, and embeddings
    • Coding examples in Python alongside theoretical concepts
  • Training:
    • Hands-on exercises after each topic
    • Solve coding exercises to consolidate knowledge
    • Utilize Jupyterlab environment for exercises

 

  • Application:
    • Apply knowledge in own projects
    • Teams of 2-3 develop and implement prototypes
    • Small application involving a language model

Intended learning outcomes (Part 1)

  • Understand the basics of natural language processing, including its tasks and challenges
  • Understand the general concept of LLMs and why it makes the above so much easier
  • Write simple Python programs / scripts and use basic data and control structures

Intended learning outcomes (Part 2)

  • Access an LLM via the OpenAI API and how to work with the result

  • Understand the concept of text embeddings and use them via the OpenAI API

  • Understand why and how LLMs can be used for process automation

  • Use an LLM in a small application

  • Have fun!

Content 0 (optional)

  • Short introduction to Programming in Python in the context of Natural Language Processing and Large Language Models
    • Basics (Syntax, Variables, Data Types, Conditional Statements etc.)
    • Lists & Loops
    • Dictionaries & Classes

Content 1

  • Quick overview of classic NLP
    • Text processing (Tokenization, Lemmatization, etc.)
    • Applications (Classification, Sentiment Analysis, Matching, etc.)
    • Challenges
  • Introduction to LLM
    • Text processing with neural networks
    • Sequence generation & language modeling

Content 2

  • Introduction to the OpenAI API
    • Prompting
    • Parameterization
    • Tool calling
  • Introduction to embeddings
    • Similarity
    • Visualization & Clustering
  • (Ethics & Privacy)

The schedule

Day 1 (10.04.2025):

  • data:unplugged

Day 2 (24.04.2024):

  • Introduction & Getting to know each other
  • Survey (experiences & expectations) & Learning goals & Evaluation criteria
  • Review data:unplugged
  • Introduction to the general topic & Python & Jupyter
  • Introduction NLP (tokenization, matching, statistical text analysis)

Day 3 (30.04.2024):

  • Introduction to LLM & OpenAI API
  • Prompting
  • Application concept & Group brainstorming
  • Start: Project work on prototype & mentoring

Day 4 (08.05.2024):

  • Ctd: Project work on prototype & mentoring
  • Project presentations & reflections on the seminar
  • Backup: Left-over topics

Not included this semester or only limited:

  • Embeddings
  • Advanced GPT topics (image data, parameterization, tool calling)
  • Real-world examples of applications (& implementation) & limitations

After the seminar (~1d):

  • Refine data:unplugged summary
  • Refine prototype business case & potential applications of prototype
  • Reflections & lessons learned → Hand in data:unplugged summary and 2-page project summary

Evaluation

  • Your data:unplugged summary: 35%
  • Your prototype & presentation on the last day of the seminar: 50%
  • Your activity during the seminar: 15%

What is the data:unplugged summary?

A short reflection on a session (approx. 300–500 words):

  • Summary: What key points were made during the session?
  • Practical Relevance: What specific use cases or problems were addressed?
  • Personal Assessment: Which aspects did you find particularly interesting or critical? Do you see potential for your own ideas or projects?

What is the summary?

  • 2-3 pages only!
  • What is your prototype? What can I do?
  • What could be a business case for your prototype, or where can it be applied?
  • What are current limitations of your prototype and how could you overcome them?
  • What have been your main learnings during the creation of your prototype (and/or) the seminar itself?

Code, Exercises & Prototypes

Jupyterlab & Exercises

 

Jupyterlab

To get started right away, we have prepared a Jupyterlab!

 

Exercises

All exercises can be solved in the Jupyterlab, all packages and datasets are pre-installed!