Applied RAG Internship - Genvarsity & COTPOT
in collaboration with
Presented by Genvarsity in collaboration with COTPOT

Applied RAG Internship

An 8-Week Technical Curriculum. Transition from consuming AI tools to engineering them.

1. What is the Tech?

Retrieval-Augmented Generation (RAG) is an AI framework that improves the quality of Large Language Model (LLM) generated responses by grounding the model on external sources of knowledge. Instead of relying solely on the static data the LLM was originally trained on, RAG first retrieves relevant facts from a custom knowledge base and then augments the user's prompt with this retrieved data. Finally, the LLM generates an informed, accurate response based on that specific context.

2. Why is it Important?

  • Eliminates Hallucinations: By forcing the LLM to cite specific, retrieved documents, RAG drastically reduces the chances of the AI making up false information.
  • Cost-Effective & Dynamic: Fine-tuning an LLM on new data is expensive and time-consuming. With RAG, you simply update the database, and the AI instantly has access to the new information without retraining.
  • Data Privacy: RAG allows organizations to keep their proprietary data secure in their own vector databases while still leveraging the reasoning power of advanced LLMs.

3. What are its Uses?

RAG is transforming how businesses handle unstructured data. Common use cases include:

  • Intelligent Assistants: Instantly querying thousands of unstructured documents to find specific criteria or answer user questions.
  • Enterprise Search & Knowledge Bases: Allowing employees to "chat" with their company's internal wikis, HR policies, and technical documentation.
  • Customer Support Bots: Providing highly accurate, context-aware answers to user queries by pulling from product manuals and past support tickets.

4. Learning Outcomes

Students will move from foundational AI concepts to deploying a full-stack AI application. They will learn:

  • How to preprocess, clean, and chunk unstructured text data.
  • The mathematics and application of text embeddings.
  • How to set up, populate, and query Vector Databases.
  • Prompt engineering techniques for strict context adherence.
  • Orchestration frameworks like LangChain or LlamaIndex.

5. Proof of Build (Capstone Project)

Project: The AI Knowledge Assistant

To prove their technical capability, each student will build and deploy a working RAG application.

  • The Challenge: Ingest a dataset of at least 50 complex PDF documents.
  • The Deliverable: A web-based chat interface (built with Streamlit or Gradio) where a user can ask complex questions. The system must return an accurate answer alongside the specific citations/source documents used.
  • Evaluation Criteria: Retrieval accuracy, latency, UI usability, and the model's ability to say "I don't know" when the information is missing.

6. Weekly Curriculum Breakdown

Program Structure: 12 Hours total Live Instruction (1.5 hours/week) + Weekly hands-on assignments.

Week 1 The AI Landscape & Intro to RAG

Live Class: The limitations of standard LLMs, what RAG solves, and a high-level overview of the ingestion and retrieval pipelines.

Week 2 Data Ingestion & Preprocessing

Live Class: Handling unstructured data. Reading PDFs, scraping text, and the critical science of "Chunking".

Week 3 Embeddings & Vector Databases

Live Class: What are vector embeddings? Understanding high-dimensional space and semantic similarity. Intro to Vector Databases.

Week 4 The Art of Retrieval

Live Class: Querying the database. Similarity search vs. Maximal Marginal Relevance (MMR). Hybrid search concepts.

Week 5 LLM Augmentation & Prompt Engineering

Live Class: Feeding retrieved context to the LLM. Designing strict prompts to prevent hallucinations and enforce citation.

Week 6 Frameworks (LangChain / LlamaIndex)

Live Class: Moving from custom scripts to production frameworks. Using LangChain/LlamaIndex to simplify the RAG pipeline.

Week 7 Evaluation & User Interfaces

Live Class: Evaluating accuracy (RAGAS framework). Building a quick frontend using Streamlit.

Week 8 Project Finalization & Showcase

Live Class: Capstone presentations. Students demo their AI Knowledge Assistants.

Ready to start building?

Join the 8-week intensive program presented by Genvarsity & COTPOT.

1650
Apply Now