Back to List
Managing AI Coding with Agent Evaluation: Meituan's 310,000-Line Code Refactoring Practice
Industry NewsAI CodingRefactoringSoftware Engineering

Managing AI Coding with Agent Evaluation: Meituan's 310,000-Line Code Refactoring Practice

Meituan's technical team has detailed a transformative approach to software maintenance by refactoring 310,000 lines of code using AI. As AI now generates over 90% of code in certain environments, the focus has shifted from coding speed to the implementation of strict constraints. The team introduced an 'Agent evaluation' mindset to manage AI-driven development, utilizing technical debt analysis, rule construction, Standard Operating Procedures (SOPs), and a Pre-PR mechanism. This framework successfully transitioned large-scale refactoring from a high-cost, specialized project into a continuous, daily iterative process. By establishing these systematic boundaries, the team ensures that AI enhances system quality rather than amplifying chaos, providing a scalable model for long-term AI-native code management.

美团技术团队

Key Takeaways

  • Scale of Success: Successfully managed the refactoring of 310,000 lines of code using AI-driven methodologies.
  • Governance Over Speed: When AI generates more than 90% of code, the primary challenge shifts from output velocity to the enforcement of architectural constraints.
  • Agent Evaluation Mindset: Applied evaluation logic typically used for AI Agents to manage and audit the quality of AI-generated code.
  • Systematic Framework: Utilized a combination of technical debt assessment, rule-based governance, and Standard Operating Procedures (SOPs).
  • Continuous Refactoring: Implemented a Pre-PR mechanism that integrates code refactoring into daily development cycles, reducing the need for high-cost specialized projects.

In-Depth Analysis

The Shift from Generation to Governance

In the current landscape of software engineering, the bottleneck is no longer how fast code can be written, but how effectively it can be governed. Meituan’s technical team points out that when AI is responsible for over 90% of code generation, the absence of uniform standards can lead to an exponential increase in system chaos. AI, while highly productive, does not inherently understand the long-term architectural goals of a complex system. Therefore, the role of the technical team has evolved from manual coding to defining the constraints and rules that guide AI behavior. The goal is to ensure that AI-generated code adheres to the same quality and consistency standards as human-written code, preventing the accumulation of unmanageable technical debt.

The Agent Evaluation Framework for AI Coding

To manage a massive 310,000-line refactoring effort, Meituan adopted an "Agent evaluation" approach. This methodology treats the AI as an autonomous agent whose outputs must be continuously validated against a set of predefined criteria. The process is structured around several key technical pillars:

  1. Technical Debt Assessment: Before refactoring begins, the system identifies existing debt to prioritize areas for AI intervention.
  2. Rule Construction: Establishing a robust set of coding rules that the AI must follow to maintain system integrity.
  3. Refactoring SOPs: Standard Operating Procedures provide a repeatable, reliable workflow for AI-driven changes, ensuring that the refactoring process is consistent across different modules.
  4. Pre-PR Mechanism: By introducing a verification stage before the Pull Request (PR), the team can catch and correct AI errors early, ensuring that only high-quality, compliant code enters the main branch.

Integrating Refactoring into the Daily Workflow

One of the most significant breakthroughs of this practice is the transformation of refactoring from a "special project" into a "daily habit." Historically, refactoring hundreds of thousands of lines of code would require a dedicated task force and significant downtime. By leveraging AI and the Pre-PR mechanism, Meituan has made it possible to perform continuous refactoring during regular feature iterations. This approach ensures that the codebase remains healthy and modern without the need for periodic, high-risk overhauls. It effectively democratizes code quality, making it a byproduct of the standard development lifecycle rather than an afterthought.

Industry Impact

Meituan's practice sets a significant precedent for the AI-native era of software engineering. It demonstrates that the key to scaling AI in development is not more powerful models, but better management frameworks. By sharing their success in refactoring 310,000 lines of code, they provide a blueprint for other large-scale tech organizations to handle the transition to AI-heavy codebases. This shift toward "AI-managed AI"—where automated systems and evaluation logic oversee the generation of code—marks a critical evolution in how software is maintained and scaled in the age of Large Language Models (LLMs).

Frequently Asked Questions

Question: Why is the 90% AI-generated code threshold significant?

At this level, the volume of code produced by AI exceeds the capacity for manual human review in traditional ways. Without strict constraints and automated governance like the Agent evaluation mindset, the AI can amplify existing system inconsistencies and create massive technical debt very quickly.

Question: What role does the Pre-PR mechanism play in AI coding?

The Pre-PR mechanism acts as a critical quality gate. It allows the system to evaluate AI-generated refactoring against established rules and SOPs before the code is even submitted for human review, ensuring that refactoring becomes a seamless part of the daily development iteration.

Question: How does Meituan's approach reduce the cost of refactoring?

By using AI to handle the bulk of the work and using SOPs to standardize the process, the team moves away from high-cost, manual refactoring projects. This allows for continuous improvement of the codebase, which is much more cost-effective than performing large-scale, disruptive refactoring every few years.

Related News

Meituan LongCat Team Releases General 365 Benchmark Revealing Reasoning Gaps in Leading AI Models
Industry News

Meituan LongCat Team Releases General 365 Benchmark Revealing Reasoning Gaps in Leading AI Models

The Meituan LongCat team has officially introduced General 365, a new evaluation benchmark designed to test the reasoning capabilities of large language models. In a recent assessment of 26 mainstream models, the benchmark revealed a significant performance gap across the industry. Gemini 3 Pro, currently identified as the strongest model in the test, achieved an accuracy rate of 62.8%. However, the results indicate a broader struggle within the field, as the vast majority of the 26 models tested failed to reach the 60% accuracy threshold, which is considered the passing mark. This release by Meituan's technical team establishes a new standard for measuring AI reasoning, highlighting that even top-tier models have substantial room for improvement in complex cognitive tasks.

Managing AI Coding Through Agent Evaluation: A 310,000-Line Code Refactoring Case Study
Industry News

Managing AI Coding Through Agent Evaluation: A 310,000-Line Code Refactoring Case Study

As AI-generated code begins to account for over 90% of system development, the primary challenge shifts from increasing coding speed to managing and constraining AI output. Meituan's technical team has shared a comprehensive practice involving the refactoring of 310,000 lines of code using an 'Agent evaluation' mindset. By implementing a structured framework—including technical debt sorting, rule construction, standardized operating procedures (SOP), and a Pre-PR (Pull Request) mechanism—the team successfully transitioned code refactoring from a high-cost, specialized project into a sustainable, daily iterative process. This approach addresses the risk of AI-driven development amplifying system chaos and emphasizes the necessity of unified standards in the era of AI-native programming.

Meituan BI Evolution: Building a Next-Generation Architecture with Metrics Platforms and Enhanced Calculation Engines
Industry News

Meituan BI Evolution: Building a Next-Generation Architecture with Metrics Platforms and Enhanced Calculation Engines

Meituan's data platform team has pioneered a new generation of Business Intelligence (BI) architecture, placing a centralized metrics platform at its core. This strategic shift addresses critical limitations found in traditional BI systems, which often suffer from inconsistent data definitions—commonly known as "data caliber confusion"—and sluggish query performance when handling personalized datasets. By developing and implementing two primary technical capabilities, automatic semantics and enhanced calculation, Meituan has successfully streamlined its data processing workflows. This evolution marks a significant transition from dataset-driven analytics to a more robust, metrics-centric model, ensuring higher data reliability and faster insights for the organization's diverse business operations. The practice underscores Meituan's commitment to solving complex data engineering challenges through architectural innovation.