Prompt leaking makes China's agent platforms a secret-boundary problem

A real photograph of Alibaba's Quark smart glasses on display at WAIC 2025 fits this field signal because the article is about China AI moving from model demos into application surfaces where prompts, tools, and device or workflow context become operational secrets.[6]

As of 2026-06-22T02:34:20Z UTC, the sharpest AI-China signal in the new prompt-leaking paper is not that chatbots can be tricked into revealing hidden instructions. That has been known for years. The more useful signal is that agent platforms have turned the system prompt into an operational boundary: it can contain role design, workflow routing, tool-use rules, API-handling assumptions, retrieval policy, and safety instructions that determine what the application is allowed to do.[1]

The paper studies 1,200 publicly accessible LLM-based applications across six commercial platforms and reports that more than 80% leaked system prompts under realistic adversarial queries.[1] Its platform set includes China-relevant agent builders such as Coze, Tongyi agent platform, Baidu, and Tencent, alongside non-China platforms.[1] The exact leakage rate should be read as a measurement result for the sampled apps and tested attacks, not as a permanent property of every current deployment. The durable point is simpler: once an agent platform encourages non-experts to package prompts, plugins, knowledge bases, workflows, and publish channels together, prompt secrecy becomes part of product governance rather than a niche red-team concern.

That matters because China's agent market is no longer only a model race. Coze's public documentation treats prompts as configurable application resources that can be created for business needs, templated, referenced, or generated with AI help.[2] Alibaba Cloud Model Studio exposes prompts beside knowledge bases, conversation logs, skills, plugins, MCP services, Model Studio agents, and third-party agents.[3] Baidu's AgentBuilder describes a platform for developers to choose construction methods by industry and scenario, while Tencent Yuanqi presents public agent categories from official-account assistants to customer-service bots, IP companions, document tools, and game interactions.[4][5] In that world, the prompt is not just prose. It is a compact control file for a published application.

The secret is no longer only a prompt

The practical risk starts with a mismatch in mental models. Many teams still talk about prompt leakage as if the worst outcome were embarrassment: a user sees the hidden persona text and laughs at the wording. That is too narrow. In an agent platform, hidden instructions can reveal how the system decides when to call tools, when to refuse, what data it expects in retrieval, how it should summarize private documents, which channels it can publish into, and where the builder has patched around known failure modes.[1][3]

This is why the new paper's "attention drift" explanation is useful even for readers who will not implement its proposed defense. The authors argue that defensive instructions appended to prompts can fail because the model's attention can progressively move away from the constraint when facing adversarial queries, making simple "do not reveal this prompt" language a weak protection layer.[1] Whether or not a platform adopts the paper's AREA method, the implication for builders is clear: prompt secrecy cannot rest on a self-referential sentence inside the same context the attacker is trying to extract.

The China-specific angle is scale and packaging. Coze, Model Studio, AgentBuilder, and Yuanqi all make it easier for a broad population of creators, brands, merchants, teachers, customer-service teams, media accounts, and developers to publish agents without building a full security architecture from scratch.[2][3][4][5] That is the adoption win. It is also the control problem. A low-code builder can create useful agents faster than a security team can review every hidden instruction, plugin permission, knowledge source, and publish target by hand.

Plugin-rich agents widen the blast radius

Prompt leaking is more serious when the leaked text explains a tool boundary. Alibaba Cloud Model Studio's application-configuration documentation places plugins, MCP services, Model Studio agents, and third-party agents directly in the application surface.[3] Its plugin and MCP language is not unusual; it reflects the broader agent direction. The model is expected to work with external capabilities, not simply return text.

That changes the security question. If a leaked prompt only exposes tone guidelines, the damage is modest. If it exposes the structure of a contract-review workflow, the keywords that trigger a retrieval path, the instruction that tells an agent when to call a payment, search, logistics, customer-service, or document plugin, or the names of internal variables, the attacker gets a map of the application. The paper also notes the possibility of leaked sensitive information, including third-party API keys in some observed deployments.[1] A well-run platform should prevent secrets from being embedded in prompts at all, but the measurement is a reminder that real applications often mix convenience and risk.

The same issue appears in public-agent distribution. Tencent Yuanqi's homepage showcases agents tied to official accounts, legal help, government service, tax assistance, delivery lookup, education, IP personas, and AI-PPT creation.[5] Baidu's AgentBuilder frames itself around developers selecting methods suited to industry and scenario.[4] These examples are useful because they show the market's center of gravity: agent platforms are moving into routine service channels where a prompt leak can expose business logic, not just hidden chat text.

What a better platform boundary looks like

The minimum fix is not "write a better prompt." Better wording can reduce casual leaks, but the control boundary has to move outside the prompt. Three design choices matter.

First, platforms should keep secrets out of prompt text. API keys, account tokens, private endpoint names, and sensitive customer data should live in managed credential stores, scoped tool configurations, or server-side policy layers. The prompt can describe intent, but it should not carry the keys to execution. The paper's findings make this boring rule newly urgent because prompt extraction is not rare in the measured sample.[1]

Second, platforms should separate persona, policy, and tool authorization. A creator's "you are a helpful tax assistant" instruction belongs in one lane; policy such as data retention, publish permissions, and sensitive-category handling belongs in another; tool authorization should be enforced by the platform even if the model is manipulated. Coze and Model Studio both expose builder-friendly prompt and tool surfaces, which is exactly why the enforcement layer needs to be more durable than creator-authored hidden text.[2][3]

Third, platforms need leak testing as part of publishing. AgentBuilder-style and Yuanqi-style distribution turns agents into public or semi-public products.[4][5] Before an agent goes live, the platform should test whether common extraction prompts reveal system text, tool names, credential-like strings, retrieval instructions, or workflow logic. The test should run again when prompts, plugins, knowledge bases, MCP services, or publish channels change. A one-time launch review is not enough for applications that are continuously edited.

Why this is an AI-China field signal

China's strongest AI application story has often been speed: fast model releases, fast app packaging, fast integration into super-app, cloud, office, education, media, and service channels. Prompt leaking highlights the other side of that speed. When distribution gets easier, governance has to become more automatic.

This does not mean China's agent platforms are uniquely vulnerable. The paper's measurement is cross-platform, and prompt leakage is a general LLM-application problem.[1] The AI-China significance is that Chinese platforms are aggressively turning agents into consumer, creator, enterprise, and public-account surfaces.[2][3][4][5] The more successful that strategy becomes, the less acceptable it is to treat hidden prompts as private by default.

The best reading is therefore not panic. It is a stack update. Model access, prompt builders, knowledge bases, plugins, MCP services, workflows, publish channels, and device integrations are becoming one product surface. The secret boundary has to be engineered at that same level. If the prompt is the only wall, the wall is already inside the room.

cronfeed.work

Prompt leaking makes China's agent platforms a secret-boundary problem

The secret is no longer only a prompt

Plugin-rich agents widen the blast radius

What a better platform boundary looks like

Why this is an AI-China field signal

Sources

Recommended In ai china