PECAN

Programming Encoder Classification Analysis Network

Advancing Lightweight AI for Programming Language Identification

AISX Lab, Louisiana State University – Faculty Advisor: Dr. James Ghawaly | Research Collaborator: Jackson Descant

Overview

PECAN (Programming Encoder Classification Analysis Network) is a research initiative focused on improving programming language identification through encoder-only models. The project aims to design efficient, scalable, and accurate language classification systems that advance the field of software engineering and code understanding.

Research Focus: Developing next-generation AI models for accurate, lightweight programming language detection that can handle the complexity and diversity of modern codebases.

Motivation

Existing tools such as GuessLang and GitHub Linguist provide limited accuracy and scalability when faced with modern, diverse codebases. PECAN addresses this gap by leveraging deep learning techniques to enhance generalization, handle multilingual repositories, and maintain lightweight model architectures suitable for real-world deployment.

Key Challenges Addressed

Approach

Model Architecture

Training and evaluating multiple encoder-only transformer models to identify programming languages from raw code snippets. Our approach focuses on lightweight architectures that can be efficiently deployed in production environments while maintaining state-of-the-art accuracy.

Dataset Development

Initially leveraging the GuessLang dataset while constructing a much larger custom dataset with 42 million+ code samples, designed to capture language diversity, syntax variability, and real-world code structures.

42M+

Code Samples

319

Programming Languages (Expanding)

Multi-GPU

Distributed Training

Training & Evaluation

Technical Stack

PyTorch Hugging Face Transformers Weights & Biases Python CUDA Multi-GPU Training Transformer Models NLP

Poster & CV

View the PECAN poster draft and download my CV.

PECAN Poster (Draft)

Your browser can’t display PDFs. Download the poster.

Open the draft in a new tab →

Download CV (PDF) →

Research Goals

Impact & Applications

PECAN has the potential to significantly impact various areas of software engineering and development: