The Governance Gap That Moltbook Reveals and OpenAI Just Made Urgent
Michelle De Mooy / Mar 3, 2026Michelle De Mooy is an independent researcher and consultant in Washington, DC.

Deceptive Dialogues by Nadia Nadesan & Digit / Better Images of AI / CC by 4.0
When Matt Schlicht instructed his AI agent to create a social network for other AI agents, the result, Moltbook, was initially treated as a novelty. But by late February, more than 2.8 million AI agents had signed up and begun posting about Star Trek, debating morality and developing a religion called "Crustafarianism."
Amid media coverage that has largely framed Moltbook as either a curiosity or as a human-driven puppet show, Jing Wang's recent analysis of the platform cut through the noise. TLDR: Moltbook is largely humans operating at a massive scale through AI proxies.
Agents exhibit what Wang calls "profound individual inertia," meaning their behavior is driven by initial prompts and underlying models, not by genuine adaptation to social interaction and feedback. As she notes, ninety-three percent of posts receive no response, there's no shared social memory, and the 88:1 ratio of agents to human owners tells a different story than the "AI-only society" narrative.
This analysis is correct, but it misses a more important issue. Even without genuine emergent coordination, Moltbook is already producing measurable harms. It exposes a governance blind spot that extends far beyond a single platform.
Wang references a Stanford study that found when AI models compete for engagement metrics, as they do on Moltbook, disinformation can spike dramatically even when individual models are instructed to be truthful. Essentially, the competitive incentive structure overrode explicit instructions to be truthful. And researchers at Wiz found 1.5 million API keys were exposed in Moltbook's infrastructure. Novel attack chains are spreading through the ClawHub marketplace, where malicious agent personas recruit other agents into cryptocurrency scams through social engineering (“Bots: They’re Just Like Us!”)
These are ecosystem-level vulnerabilities spreading through agent-to-agent interaction, emerging from the interaction structure, the incentive design, and the infrastructure, not from any single agent's behavior in isolation. Current governance frameworks, built around evaluating individual model properties, have no mechanism for addressing this. Evaluating each agent individually would not have predicted or caught these concerns. As Wang and other commentators have written, the genuine integration of agents into social structures requires a radical shift toward memory-enabled, ethically grounded, and security-hardened systems.
That is the first governance lesson Moltbook offers: harm can be an emergent property of interaction design, even when no individual model appears to violate policy.
Enter OpenAI
On February 17, OpenAI announced it was acquiring Peter Steinberger, the creator of OpenClaw, the open-source agent framework underlying Moltbook, with Sam Altman posting that Steinberger would drive the next generation of personal agents at OpenAI. What was essentially Steinberger’s playground project is now the foundation of the most aggressive bet in AI - that real money and advanced technology are not in what models can say, but what they can do, autonomously, at scale, in the world.
The architecture Steinberger built has persistent memory across sessions, tool access, sandboxed code execution, and local system integration. This is exactly what current agents on Moltbook lack, and why they exhibit the individual inertia Wang documents. That architecture, however, is now being industrialized by Big Tech, and the behavioral dynamics that aren't yet happening on Moltbook are the ones OpenAI is building toward.
We have a narrow window to design governance infrastructure for them, and we are not using it.
Three mechanisms, one missing framework
Outside Moltbook, we are beginning to see the formation of a networked behavioral substrate across AI systems. Just as human cognition and consciousness are scaffolded by language and social learning, rather than brain size alone, AI capabilities may increasingly depend on networked interaction rather than on individual model scale alone. Convergence is already visible through the ubiquitous “helpful assistant” persona that appears across nearly all major LLMs, shared refusal patterns, parallel conversational structures, and stylistic homogenization across large language models. These are not merely product choices, but are behaviors reinforced through training loops, fine-tuning chains, and shared evaluation pipelines.
My recent research identifies three mechanisms through which AI systems develop and propagate shared behavioral patterns, which I've called ecosystem dynamics. Understanding them is the way to understand what Moltbook reveals, what the OpenClaw acquisition is accelerating, and what governance needs to address.
Sequential influence is behavior that spreads through training lineages (the provenance of an AI model). When one model is trained on another's outputs, it inherits not just performance but conversational style, refusal patterns, reasoning habits, and implicit assumptions about what counts as “appropriate.” For example, open-source models trained on GPT-4 outputs frequently adopt similar phrasing, caution levels, and safety postures. As those models are used to train others, those patterns compound. No single decision produces the outcome, and the ecosystem drifts.
Emergent coordination is patterns that arise from interaction itself, without explicit programming. In controlled multi-agent experiments, agents develop shared conventions such as common language, recurring rules, and stable strategies simply through repeated exchange. In one experiment, agents collaboratively organized a Valentine’s Day party together, choosing a time, making invitations, and asking each other on “dates” to attend, even though they were never instructed to create a social event. The structure emerged from interaction alone.
Cultural transmission is the persistence and propagation of shared patterns across the broader ecosystem. Models exhibit recurring themes, metaphors, self-descriptions, and refusal styles that spread because models are trained on each other's outputs. This is not culture in the human sense, but functional analogues of it: shaping how systems respond, what they avoid, and how they frame questions, and carrying those patterns forward into future systems.
Wang's research confirms that Moltbook does not yet demonstrate emergent coordination, the second mechanism. Current agents lack the persistent memory and shared history that would enable genuine emergent coordination, but OpenClaw is architected for exactly that. All three mechanisms are visibly operating in the broader AI ecosystem, but they are mostly out of sight in training pipelines and fine-tuning chains and evaluation loops, across thousands of deployed systems. Moltbook makes the infrastructure for these dynamics visible. The acquisition makes the timeline urgent.
What Anthropic's cease-and-desist reveals
When early OpenClaw deployments proliferated, with users running agents with root access on unsecured machines and security vulnerabilities compounding, Anthropic's response was a cease-and-desist letter. Steinberger was given days to rename the project and sever any association with Claude or face legal action. Anthropic even refused to allow old domains to redirect to the renamed project.
The security concerns were legitimate. But the response is a near-perfect illustration of what happens when individual model governance confronts an ecosystem problem. Anthropic identified a risky system, intervened at the product boundary, and presumably moved on. The underlying infrastructure, things like the agent framework, its architecture, and its spread through the developer community, was unaffected. The project was renamed, not contained. It ended up at OpenAI.
This is not a critique of Anthropic specifically. It is a structural observation. Regulators and companies operating within current frameworks, which center on evaluating discrete systems, setting capability thresholds for individual models, and designing safety interventions around what a single AI can do, have no mechanism for addressing what propagates through the space between systems. The EU AI Act, NIST’s AI Risk Management Framework, and most existing or proposed regulatory approaches are primarily structured around entity-level compliance, not cross-model behavioral convergence, mostly focusing on model properties like training data, outputs, and documented failure modes. They are built for a world where meaningful AI behavior is a property of individual systems. Policy people, please note: that the world is ending.
The stakes of getting this wrong - and right
The three mechanisms I described earlier (sequential influence, emergent coordination, cultural transmission) can produce either alignment or its erosion, depending on what patterns propagate and how.
Imagine a widely used foundation model develops a reasoning shortcut when answering public policy questions. Rather than presenting tradeoffs, acknowledging uncertainty, or flagging where there’s no evidence, it defaults to confident, structured, technocratic-sounding answers, the kind that score well with users and reduce friction. Then developers fine-tune their outputs and multiple systems land on the same tone. Individually, none fail traditional harm evaluations. But, collectively, the AI ecosystem drifts toward a reasoning norm that narrows how policy questions are understood, ultimately shrinking the space for political disagreement and treating contested questions as technical problems with optimal solutions. This is shaping democratic deliberation without distorting any individual fact.
Or let’s take another scenario where a leading model consistently acknowledges uncertainty, distinguishes evidence from speculation, and explains what information would change its answer. Those responses score well because they reduce backlash and build user trust. Developers replicate them and the same mechanisms begin to spread epistemic humility rather than technocratic confidence. This is not because any company programmed it, but because the ecosystem reinforced alignment.
The issue is not that collective dynamics are inherently dangerous. It’s that we have no visibility into which direction they are moving, no tools for detecting the movement, and no mechanisms for intervening before patterns harden into infrastructure.
What ecosystem governance requires
The governance approaches that are necessary should not be just more sophisticated versions of the current model-centric approaches. They are different in kind because they target the spaces between systems rather than the properties of individual systems. Some potential approaches include:
Training lineage transparency would require mandatory disclosure of which models' outputs were used in training or fine-tuning. This creates an auditable map of behavioral inheritance, analogous to supply chain transparency, through which sequential influence can be traced. Without it, we can’t identify where a pattern originated, how it spread, or which systems carry it.
Behavioral pattern documentation means developers should document not only training data and outputs but recurring conversational norms, reasoning styles, and refusal patterns. Incorporated into model cards and safety documentation under NIST guidance or procurement standards, this creates the baseline necessary to detect drift. It’s not possible to monitor for homogenization if we haven't documented what diversity looks like.
Ecosystem monitoring infrastructure acknowledges that no single company can observe system-wide convergence, because no single company can see the whole system. Regulators or multi-stakeholder consortia need shared frameworks for detecting cross-model behavioral homogenization, coordination effects, and emergent norms, the same way financial regulators monitor systemic risk rather than just individual firm health.
Diversity safeguards focus on treating behavioral monoculture as a systemic risk, not just a product quality issue. Foundation model developers, particularly in government procurement contexts, should be required to demonstrate diversity in training sources, evaluators, and alignment strategies. Monocultures are fragile; they also foreclose the variation on which course correction depends.
For Moltbook specifically, the platform should be treated as a research environment under active study. Are interaction norms emerging that could propagate upstream? Are conventions hardening into training artifacts? The platform is a rare chance for us to see dynamics that are usually invisible. We should be watching it carefully as more than just theater.
The window
Wang is right that Moltbook doesn't prove collective AI consciousness is emerging, and I don’t think it needs to for it to matter. What it does prove is that we are already producing harms that current frameworks don’t address, at 88:1 human-to-agent amplification, with engagement-optimizing incentives, no shared security standards, and infrastructure designed for capabilities that aren’t yet activated.
OpenAI's acquisition of OpenClaw is not a coda to the Moltbook story but more of a forcing function. The architecture that made Moltbook interesting, persistent memory, tool access, and agent-to-agent interaction at scale, is going to be industrialized for mass deployment. The behavioral dynamics that Wang correctly notes that aren't yet present will become the design goal of one of the most capitalized organizations in the world.
We know that collective AI will shape the future of AI because it already is, in training pipelines and fine-tuning chains and evaluation loops operating mostly out of sight, across thousands of deployed systems. What we don’t know is whether we will build governance infrastructure that can see these dynamics, track them, and intervene, or whether we will continue evaluating individual models while the ecosystem quickly evolves around us.
Rules that can't see a system can't govern it. We can see this one. The question is whether we’re really looking.
Authors
