Seminar: Large Language Models

Why should I be here?

  • Understand how LLMs work and where their limitations are!
  • Learn how to automate tasks with a language model
  • Learn how to process large data amounts with a language model
  • Learn how to use a large language model in a product

About the seminar

  • Roughly divided into 3 parts: theory, training, and application
  • Theory:
    • Learn about important topics in natural language processing
    • Topics include tokenization, matching, statistical text analysis, language models, and embeddings
    • Coding examples in Python alongside theoretical concepts
  • Training:
    • Hands-on exercises after each topic
    • Solve coding exercises to consolidate knowledge
    • Utilize Jupyterlab environment for exercises

 

  • Application:
    • Apply knowledge in own projects
    • Teams of 2-3 develop and implement prototypes
    • Small application involving a language model

Intended learning outcomes (Part 1)

  • Understand the basics of natural language processing, including its tasks and challenges
  • Understand the general concept of LLMs and why it makes the above so much easier
  • Write simple Python programs / scripts and use basic data and control structures

Intended learning outcomes (Part 2)

  • Access an LLM via the OpenAI API and how to work with the result

  • Understand the concept of text embeddings and use them via the OpenAI API

  • Understand why and how LLMs can be used for process automation

  • Use an LLM in a small application

  • Have fun!

Content 1

  • Short introduction to Programming in Python in the context of Natural Language Processing and Large Language Models
    • Basics (Syntax, Variables, Data Types, Conditional Statements etc.)
    • Lists & Loops
    • Dictionaries & Classes

Content 2

  • Quick overview of classic NLP
    • Text processing (Tokenization, Lemmatization, etc.)
    • Applications (Classification, Sentiment Analysis, Matching, etc.)
    • Challenges
  • Introduction to LLM
    • Text processing with neural networks
    • Sequence generation & language modeling

Content 3

  • Introduction to the OpenAI API
    • Prompting
    • Parameterization
    • Function calling
  • Introduction to embeddings
    • Similarity
    • Visualization & Clustering
  • (Ethics & Privacy)

The schedule

  • Today: Intro & Getting to know each other & Survey (experiences & expectations) & Learning goals & Evaluation criteria
  • Introduction to the general topic & Python & Jupyter
  • Introduction NLP (tokenization, matching, statistical analysis)
  • Introduction to LLM & OpenAI API
  • Prompting
  • Embeddings
  • Advanced GPT topics (image data, parameterization, tool calling)
  • Real-world examples of applications (& implementation) & limitations
  • App concept & Group brainstorming
  • Project work on prototype & mentoring
  • Project presentations & reflections on the seminar
  • Backup: Ethics and data privacy

After the seminar (~1d):

  • Prototype refinement
  • Code review & documentation
  • Refine business case & potential applications of prototype
  • Reflections & lessons learned → Hand in 2-page summary

Evaluation

  • Your presentation on the last day of the seminar: 25%
  • Your prototype: 35%
  • Your summary: 25%
  • Your activity during the seminar: 15%

What is the summary?

  • 2-3 pages only!
  • What is your prototype? What can I do?
  • What could be a business case for your prototype, or where can it be applied?
  • What are current limitations of your prototype and how could you overcome them?
  • What have been your main learnings during the creation of your prototype (and/or) the seminar itself?

Code, Exercises & Prototypes

Jupyterlab & Exercises

 

Jupyterlab

To get started right away, we have prepared a Jupyterlab!

 

Exercises

All exercises can be solved in the Jupyterlab, all packages and datasets are pre-installed!