SJTU & Stanford Propose "Long Code Compression Artifact": 5.6x Extreme Slimming Without Performance Drop

When LLMs face tens of thousands of lines of legacy code, larger context windows often lead to “nearsightedness”—resulting in surging API costs, spiking inference latency, and critical dependencies often getting drowned out.

To tackle these challenges, a research team from Shanghai Jiao Tong University, Stanford University, and Chongqing University introduced an innovative code compression framework called LongCodeZip. It acts like a skilled surgeon, precisely “excising” redundant code while retaining only the most essential contextual information. Crucially, this framework is training-free, model-agnostic, and plug-and-play.

The paper has been accepted by the CCF-A top conference ASE 2025 and topped the Hugging Face Daily Papers ranking on the day of its release.

Core Idea: Retain Code That Reduces "Uncertainty"

Comparison of AMI versus traditional similarity matching methods

LongCodeZip's philosophy is both simple and profound: A code snippet's utility is determined by its ability to help the model better understand the user's instruction.

The researchers use an metric called Approximated Mutual Information (AMI) to measure this. Specifically, they calculate how much the model's perplexity decreases when predicting the user prompt after viewing a certain piece of code. Lower perplexity means less "uncertainty" about the instruction, indicating a stronger relevance between that code snippet and the instruction. This approach captures deep logical dependencies far better than mere text similarity.

Based on this, LongCodeZip designed a two-stage compression process:

LongCodeZip Two-Stage Code Compression Process

Phase I: Coarse-grained Compression

First, the framework segments the entire codebase along function or class boundaries to ensure that each code block is semantically complete and syntactically valid. Next, it uses the aforementioned AMI metric to score and rank the relevance of each code block relative to the user instruction.

Finally, under a preset “coarse budget,” it greedily selects the highest-ranked functions and classes. Unselected code is replaced with brief placeholders (such as comments or ellipses). This drastically reduces the context length while preserving the overall structure of the code.

Phase II: Fine-grained Compression

Example of the Fine-grained Compression process

Inside the filtered key functions, LongCodeZip performs more refined “pruning.”

1. Perplexity-Based Block Segmentation: Lines of code within a function are treated as atomic units. When the perplexity of a specific line shows a sharp peak compared to its neighboring lines, this usually marks the beginning of a new semantic block (e.g., a new logical segment, an important conditional statement, or an external API call).

2. Adaptive Budget Allocation: Not all selected functions are equally important. The framework assigns an adaptive token budget to each function based on its Phase I AMI score—more important functions retain more detail.

3. Knapsack Algorithm for Optimal Selection: Within the budget limits of each function, selecting which semantic blocks maximize information retention becomes a classic 0/1 Knapsack Problem. By solving this using dynamic programming, LongCodeZip ensures the combination of blocks achieves the highest information density possible within the limited budget.

Experimental Results: 5.6x Compression, Efficiency Meets Performance

The research team comprehensively evaluated LongCodeZip across three typical tasks: code completion, module summarization, and repository question answering (RepoQA). Tested models included mainstream open-source models (e.g., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, Seed-Coder-8B) and closed-source models (GPT-4o, Claude-3.7-Sonnet).

LongCodeZip Compression Effectiveness

The results were striking:

Up to 5.6x compression, with performance improving, not dropping: By filtering out significant noise, models performed even better on the compressed context than they did on the full context.
Significant cost and latency reduction: For Qwen2.5-Coder-7B in the code completion task, generation time was reduced from 15.7 seconds to 6.6 seconds, and input token costs were reduced by approximately 77%. The compression process itself only takes 2.6 seconds.
Astonishing cross-model generalization: Experiments showed that even using a small model with only 0.5B parameters to perform the compression task, and then feeding the compressed context to a powerful main model, resulted in virtually no performance loss. This means users can drastically save computation resources during the compression process without sacrificing quality.

Application Scenarios

LongCodeZip's plug-and-play nature gives it wide application prospects:

Cross-file Code Completion: When a developer’s intent spans multiple files and the model's context window cannot accommodate the entire project, LongCodeZip efficiently constructs a “golden context.”
Repository-Level Code QA: Quickly locating and understanding a specific implementation within a massive codebase without manually loading and reading thousands of lines of code.
Large Module Summarization and Code Review: Preparing a highly condensed context for code reviewers (human or AI) to help them quickly grasp the core module logic.

In summary, LongCodeZip ingeniously solves the core pain point of large language models handling long code contexts through a novel, information-theory-based approach, removing a major hurdle for its practical deployment in large-scale software engineering projects.

Paper Link: https://arxiv.org/abs/2510.00446

Project Address: https://github.com/YerbaPage/LongCodeZip