Day 0: Trouble shooting and FAQs

  • Kaggle notebook
  • API key pay in AI Studio
  • Sign up for a Discord account and join us on the Kaggle Discord server.
    We have the following channels dedicated to this event:
    • #5dgai-announcements: find official course announcements and livestream recordings.
    • #5dgai-introductions: introduce yourself and meet other participants from - around the world.
    • #5dgai-question-forum: Discord forum-style channel for asking questions and discussions about the assignments.
    • #5dgai-general-chat: a general channel to discuss course materials and network with other participants.

Day 1: Foundational Large Language Models & Text Generation and Prompt Engineering

  • 📝 My Kaggle Notebooks
  • 🎒Assignment
    1. Complete the Intro Unit – “Foundational Large Language Models & Text Generation”:
      • Listen to the summary podcast episode for this unit.
      • To complement the podcast, read the “Foundational Large Language Models & Text Generation” whitepaper.
    2. Complete Unit 1 – “Prompt Engineering”:
      • Listen to the summary podcast episode for this unit.
      • To complement the podcast, read the “Prompt Engineering” whitepaper.
      • Complete these codelabs on Kaggle:
      • Make sure you phone verify your Kaggle account before starting, it’s necessary for the codelabs.
      • Want to have an interactive conversation? Try adding the whitepapers to NotebookLM.
  • 💡What You’ll Learn
    • Today you’ll explore the evolution of LLMs, from transformers to techniques like fine-tuning and inference acceleration. You’ll also get trained in the art of prompt engineering for optimal LLM interaction.
    • The code lab will walk you through getting started with the Gemini API and cover several prompt techniques and how different parameters impact the prompts.

Summary of the key points & callouts

Day 1 - Prompting [TP]

  • The examples were built leveraging Gemini 2.0 and the latest google-genai SDK
  • Gemini Models covered are
    • gemini-2.0-flash
    • gemini-2.0-flash-thinking-exp

Thinking model: Trained to generate the "thinking process" the model goes through as part of it's response. This provides us with high quality responses without needing specialised prompting like CoT or ReAct.

  • Two types of content generation covered are
    • client.models.generate_content
      • Single-turn text-in/text out structure
    • client.models.generate_content_stream
      • Instead of waiting for the entire response, the model sends back chunks or parts of the generated content as they become available as an iterable.
    • `client.chats.create
      • Multi-turn chat structure
  • Configs covered are
    • temperature
    • top_p
    • max_output_tokens

Specifying this parameter does not influence the generation of the output tokens, so the output will not become more stylistically or textually succinct, but it will stop generating tokens once the specified length is reached.

  • Prompting techniques covered are
  • Techniques to enforce the LLM output to follow the supplied schema
    • enum mode
    • json mode
      • Can be achieved using typingTypedDict or dataclass
  • Code prompting scenarios covered are
    • Generating code
    • Code execution
      • Can automatically run generated code using ToolCodeExecution
    • Explaining code

Day 1 - Evaluation and structured output [TP]

  • Gemini Models covered are
    • gemini-2.0-flash
  • Concepts covered include
    • Summarising a (pdf) document
    • Evaluating question answering quality
      • Gauge the quality of the LLM generated summary for the user question on a document.
      • Criteria used are
        • Instruction following
        • Groundedness
        • Completeness
        • Conciseness
        • Fluency
      • Two popular model-based metrics covered are:
          1. Pointwise evaluation: Evaluate a single I/O pair against some criteria (e.g., 5 levels of grading)
          1. Pairwise evaluation: Compare two outputs against each other and pick the better one (e.g., Compare against a baseline model response).
    • Caching & Memoization

Day 2: Embeddings & Vector

Summary of the key points & callouts

Day 2 - Document Q&A with RAG [TP]

  • Gemini Models covered include
    • text-embedding-004
  • Two embedding task types (official documentation) covered are
    • retrieval_document
    • retrieval_query
  • ChromaDB methods (official documentation) covered are
    • db.add(...)
    • db.count()
    • db.peek(...)
    • db.query(...)
  • RAG Concepts covered
    • 0. Creating embeddings
      1. Retrieval: Finding relevant documents
      1. & 3. Augmented Generation: Answer the questions

Day 2 - Embeddings and similarity scores [TP]

  • Gemini Models covered include
    • text-embedding-004
  • Embedding task types (official documentation) covered are
    • semantic_similarity
  • Concepts covered include

Day 2 - Classifying embeddings with Keras [TP]

  • Gemini Models covered include
    • text-embedding-004
  • Embedding task types (official documentation) covered are
    • classification
  • Concepts covered include
    • Building a simple classification model with 1 hidden layer (and 593,668 trainable params) using keras.Sequential

Day 3: AI Agents

Summary of the key points & callouts

Day 3 - Function calling with the Gemini AP [TP]

Day 3 - Building an agent with LangGraph [TP]


Day 4: Domain-Specific Models

  • 📝 My Kaggle Notebooks
  • 🎒Assignment
  • 💡What You’ll Learn
    • In today’s reading, you’ll delve into the creation and application of specialized LLMs like SecLM and MedLM/Med-PaLM, with insights from the researchers who built them.
    • In the code labs you will learn how to add real world data to a model beyond its knowledge cut-off by grounding with Google Search.  You will also learn how to fine-tune a custom Gemini model using your own labeled data to solve custom tasks.

Day 5: MLOps for Generative AI


Miscellaneous


To do or clarify