Project Overview
MikeGPT is an AI assistant designed to provide accurate, contextual responses to LSU community queries by combining large language model capabilities with institution-specific knowledge retrieval. The system serves as a bridge between students, faculty, and the vast repository of university information, making knowledge access more intuitive and efficient.
38,000+
Indexed Documents
System Architecture
The system implements a retrieval-augmented generation (RAG) pipeline that preprocesses, indexes, and retrieves relevant information from LSU's extensive document repository before generating responses. This architecture prioritizes retrieval design over model size, enabling accurate responses while maintaining computational efficiency.
Core Components
1. Document Processing Pipeline
The foundation of MikeGPT's accuracy lies in its sophisticated document processing system:
- Multi-format Ingestion: Processes PDFs, web pages, policy documents, and academic resources
- OCR Integration: Handles scanned documents and images, converting them into searchable text
- Metadata Extraction: Captures document context including source, date, department, and relevance signals
- Intelligent Chunking: Strategically segments documents to preserve semantic meaning while optimizing retrieval
2. Vector Database
Leveraging PostgreSQL with the pgvector extension for semantic search capabilities:
- Embedding Storage: Documents are converted to high-dimensional vectors capturing semantic meaning
- Similarity Search: Enables finding contextually relevant information beyond keyword matching
- Scalability: Optimized indexing strategies support the 38,000+ document corpus efficiently
- Context Preservation: Chunking strategy balances granularity with contextual coherence
3. Retrieval Algorithm
Our RAG pipeline picks retrieval methods per agent. The main Mike agent uses a self-guided navigational search (from an AISX Lab paper) that iteratively gathers the best context before generation.
- Agent-Specific Retrieval: Each agent selects the retrieval style that fits its task (e.g., FAQ keywording, syllabus filters, or navigational search)
- Self-Guided Navigation: Mike refines its own lookups step by step to reach the most relevant sources
- Grounded Responses: Retrieved context is kept with the answer so users can see what was used
4. Response Generation
Integrates retrieved context with large language model prompting:
- Source Citation: Responses explicitly reference source documents for transparency
- Factual Grounding: Retrieved context constrains generation to prevent hallucination
- Guardrails: Multiple layers prevent inappropriate content and maintain LSU community standards
- Response Validation: Automated checks ensure accuracy before delivery
Technical Implementation
Python
Django
PostgreSQL
pgvector
Azure Cloud Services
Redis
Docker
OpenAI API
Infrastructure
- Cloud Deployment: Azure cloud infrastructure ensures scalability and reliability
- Caching Layer: Redis-based caching reduces latency for frequent queries
- Load Balancing: Distributed architecture handles 30,000+ concurrent users
- Monitoring: Real-time performance tracking and error detection
Design Philosophy
MikeGPT embodies a fundamental principle: retrieval design matters more than model size. Rather than relying on ever-larger language models, we invested in sophisticated information retrieval, resulting in:
- Cost Efficiency: Smaller models with better retrieval outperform larger models alone
- Accuracy: Grounded responses reduce hallucination and improve trustworthiness
- Transparency: Source citation enables users to verify information
- Scalability: Efficient architecture supports growing user base without proportional cost increase
This approach aligns with my broader conviction that AI should augment human capabilities rather than replace them—providing accurate information while maintaining human oversight and judgment.
Impact & Outcomes
- Community Adoption: Currently serving 30,000+ LSU students, faculty, and staff
- Query Success Rate: 95% hit rate demonstrates effective information retrieval
- Knowledge Access: Democratized access to university information across 38,000+ documents
- User Satisfaction: Positive feedback on response accuracy and helpfulness
- Custom Agents: Specialized assistants for different departments and use cases
Research Contributions
Beyond its practical impact, MikeGPT demonstrates important principles for production AI systems:
- RAG Architecture: Validates that thoughtful retrieval design can deliver production-grade results without massive models
- Privacy by Design: Strict data handling protocols ensure student query privacy
- Ethical AI: Transparency and source citation build trust and accountability
- Real-world Deployment: Successfully scaled from prototype to production serving thousands
Team & Acknowledgments
Research Advisor: Dr. James Ghawaly Jr., LSU AISX Lab
Collaborators: Jacob Nguyen, Bibushita Baral, Brandon Walton, Chloe Gray
Discover Day Presentation: Alam I, Nguyen J, Baral B. "MikeGPT: Enhancing LSU with AI." LSU Discover Day Undergraduate Research and Creativity Conference, April 25, 2025.
Support: LSU Office of Academic Affairs, LSU AISX Lab
My Role
Machine Learning Software Engineer
- Architected and implemented retrieval-augmented generation pipeline
- Designed vector database schema and semantic search algorithms
- Optimized retrieval algorithms achieving 95% accuracy rate
- Led deployment to production serving 30,000+ users
- Developed custom agents for specialized university domains
- Implemented privacy-preserving data handling protocols
- Co-presented research at LSU Discover Day 2025