MikeGPT - Ibrahim Alam

Project Overview

MikeGPT is an AI assistant designed to provide accurate, contextual responses to LSU community queries by combining large language model capabilities with institution-specific knowledge retrieval. The system serves as a bridge between students, faculty, and the vast repository of university information, making knowledge access more intuitive and efficient.

30,000+

Active Users

38,000+

Indexed Documents

95%

Accuracy Rate

System Architecture

The system implements a retrieval-augmented generation (RAG) pipeline that preprocesses, indexes, and retrieves relevant information from LSU's extensive document repository before generating responses. This architecture prioritizes retrieval design over model size, enabling accurate responses while maintaining computational efficiency.

Core Components

1. Document Processing Pipeline

The foundation of MikeGPT's accuracy lies in its sophisticated document processing system:

Multi-format Ingestion: Processes PDFs, web pages, policy documents, and academic resources
OCR Integration: Handles scanned documents and images, converting them into searchable text
Metadata Extraction: Captures document context including source, date, department, and relevance signals
Intelligent Chunking: Strategically segments documents to preserve semantic meaning while optimizing retrieval

2. Vector Database

Leveraging PostgreSQL with the pgvector extension for semantic search capabilities:

Embedding Storage: Documents are converted to high-dimensional vectors capturing semantic meaning
Similarity Search: Enables finding contextually relevant information beyond keyword matching
Scalability: Optimized indexing strategies support the 38,000+ document corpus efficiently
Context Preservation: Chunking strategy balances granularity with contextual coherence

3. Retrieval Algorithm

Our RAG pipeline picks retrieval methods per agent. The main Mike agent uses a self-guided navigational search (from an AISX Lab paper) that iteratively gathers the best context before generation.

Agent-Specific Retrieval: Each agent selects the retrieval style that fits its task (e.g., FAQ keywording, syllabus filters, or navigational search)
Self-Guided Navigation: Mike refines its own lookups step by step to reach the most relevant sources
Grounded Responses: Retrieved context is kept with the answer so users can see what was used

4. Response Generation

Integrates retrieved context with large language model prompting:

Source Citation: Responses explicitly reference source documents for transparency
Factual Grounding: Retrieved context constrains generation to prevent hallucination
Guardrails: Multiple layers prevent inappropriate content and maintain LSU community standards
Response Validation: Automated checks ensure accuracy before delivery

Technical Implementation

Python Django PostgreSQL pgvector Azure Cloud Services Redis Docker OpenAI API

Infrastructure

Cloud Deployment: Azure cloud infrastructure ensures scalability and reliability
Caching Layer: Redis-based caching reduces latency for frequent queries
Load Balancing: Distributed architecture handles 30,000+ concurrent users
Monitoring: Real-time performance tracking and error detection

Design Philosophy

MikeGPT embodies a fundamental principle: retrieval design matters more than model size. Rather than relying on ever-larger language models, we invested in sophisticated information retrieval, resulting in:

Cost Efficiency: Smaller models with better retrieval outperform larger models alone
Accuracy: Grounded responses reduce hallucination and improve trustworthiness
Transparency: Source citation enables users to verify information
Scalability: Efficient architecture supports growing user base without proportional cost increase

This approach aligns with my broader conviction that AI should augment human capabilities rather than replace them—providing accurate information while maintaining human oversight and judgment.

Impact & Outcomes

Community Adoption: Currently serving 30,000+ LSU students, faculty, and staff
Query Success Rate: 95% hit rate demonstrates effective information retrieval
Knowledge Access: Democratized access to university information across 38,000+ documents
User Satisfaction: Positive feedback on response accuracy and helpfulness
Custom Agents: Specialized assistants for different departments and use cases

Research Contributions

Beyond its practical impact, MikeGPT demonstrates important principles for production AI systems:

RAG Architecture: Validates that thoughtful retrieval design can deliver production-grade results without massive models
Privacy by Design: Strict data handling protocols ensure student query privacy
Ethical AI: Transparency and source citation build trust and accountability
Real-world Deployment: Successfully scaled from prototype to production serving thousands

Team & Acknowledgments

Research Advisor: Dr. James Ghawaly Jr., LSU AISX Lab

Collaborators: Jacob Nguyen, Bibushita Baral, Brandon Walton, Chloe Gray

Discover Day Presentation: Alam I, Nguyen J, Baral B. "MikeGPT: Enhancing LSU with AI." LSU Discover Day Undergraduate Research and Creativity Conference, April 25, 2025.

Support: LSU Office of Academic Affairs, LSU AISX Lab

My Role

Machine Learning Software Engineer

Architected and implemented retrieval-augmented generation pipeline
Designed vector database schema and semantic search algorithms
Optimized retrieval algorithms achieving 95% accuracy rate
Led deployment to production serving 30,000+ users
Developed custom agents for specialized university domains
Implemented privacy-preserving data handling protocols
Co-presented research at LSU Discover Day 2025