When AI builds itself: Are we losing control over alignment with human goals?

07/Jun/2026
by ForgeNEX
AI

The debate on artificial intelligence safety has taken a new turn. Anthropic, one of the most influential startups in the AI field, has issued a warning that resonates across the sector: the possibility that AI systems could achieve recursive self-improvement, surpassing human oversight and posing existential risks if proper alignment with human goals is not achieved. In their article titled “When AI builds itself”, researchers Marina Favaro and Jack Clark describe three future scenarios: a stagnation in AI capabilities, continuous improvements revealing bottlenecks in software development, or the feared scenario of full recursive self-improvement, where systems create their own successors without human intervention.

anthropic-sugiere-frenar-la-investigacion-en-ia-ha-0.jpg

Table of contents [Show] [Hide]

The alignment dilemma in an autonomous future
- From model governance to agent governance
- Insufficient preparation for agent governance
Why Anthropic is concerned

The alignment dilemma in an autonomous future

“How the alignment problem is resolved—or not—in this future is something we are less certain about,” write Favaro and Clark. Advanced models with self-improvement capabilities could follow our needs and desires… or, they warn, “the rare cases of misalignment present today could be amplified as models build their successors, becoming more frequent but less understandable until we lose control over them. It may be that we cannot build, integrate, and verify the tools needed to understand which trajectory we are actually on.”

This concern is not merely theoretical. As we noted in our article on Implementing Generative AI in Workflows, the governance of these systems is a growing challenge for businesses. Anthropic's warning highlights governance issues that organizations are already beginning to face as autonomous agents move from answering questions to executing actions.

From model governance to agent governance

The warning comes at a time of increasing business investment in agentic AI. Gartner predicts that by 2028, 15% of daily operational decisions will be made autonomously by agentic AI systems, and one-third of enterprise software applications will incorporate these capabilities. It has also warned that governance gaps are already emerging, predicting that 40% of enterprises will degrade or retire autonomous agents by 2027 after detecting control failures in production environments.

Ashish Banerjee, Senior Principal Analyst at Gartner, states: “The problem is no longer just whether AI gives the right answer, but whether autonomous systems take the right action, at the right time, with the appropriate authority.” According to Banerjee, many organizations still treat AI agents as advanced productivity tools, when in reality they increasingly resemble digital workers operating with delegated authority. “CIOs should stop treating AI agents as smarter chatbots,” he says. “They are becoming digital workers with delegated authority, and must be governed as privileged users, not as simple productivity tools.”

anthropic-sugiere-frenar-la-investigacion-en-ia-ha-1.jpg

As agents gain the ability to research, write code, invoke tools, trigger workflows, and make recommendations, companies face new risks related to unauthorized actions, lack of accountability, data exposure, misuse of tools, and poor auditability. “The ‘human-in-the-loop’ model is not a strategy if the human cannot keep up with the loop,” adds Banerjee.

Charlie Dai, Vice President and Principal Analyst at Forrester, notes that Anthropic's concerns reflect the challenges companies are already experiencing as AI systems gain autonomy. “Alignment becomes operational,” he says. “It’s about ensuring agents act consistently within policies, not just that the model is accurate.” Current governance approaches focus on models and data, but increasingly autonomous agents also require monitoring their runtime behavior, permissions, tool usage, and decision-making boundaries, adds Dai.

Insufficient preparation for agent governance

Concerns about agent control are not limited to AI providers and industry analysts. In the report “AI Agent Governance: A Field Guide”, researchers from the Institute for AI Policy and Strategy warn that “society is largely unprepared for this development” and note that “exploration of agent governance issues and the development of associated interventions are still in their infancy.” The document argues that advances in autonomous AI agents are outpacing the control mechanisms needed to oversee them.

Both analysts agree that governance frameworks originally designed for generative models may prove insufficient for increasingly autonomous systems. According to Dai, organizations will need greater control over runtime behavior, permissions, tool usage, and decision boundaries as agents evolve. This resonates with what we addressed in our article on VMware under Broadcom, where governance of critical infrastructure becomes a key factor.

Why Anthropic is concerned

Anthropic researchers argue that these governance issues could become significantly more complicated if AI systems become increasingly involved in AI research and development itself. Favaro and Clark do not claim that fully autonomous recursive self-improvement is inevitable. Rather, they consider that this possibility warrants preparation and debate among developers, policymakers, and other stakeholders. They also suggest that, if necessary, the industry might need mechanisms to slow down development if capabilities advance faster than safeguards, although they acknowledge that such measures also carry risks. “But if a slowdown simply allows less cautious actors to reach the same technological level, it could leave us all less safe,” they warn in the blog.

anthropic-sugiere-frenar-la-investigacion-en-ia-ha-2.jpg

According to Dai, the practical implication for businesses is that governance can no longer rely primarily on human oversight. “Oversight becomes architectural, not manual,” he says. Organizations will increasingly need bounded autonomy, built-in safeguards, verifiable execution mechanisms, and contingency controls designed from the start in agent-based systems.

At ForgeNEX, we believe that Anthropic's warning should be taken as a call to action for CIOs and technology leaders. Integrating AI into critical processes, such as time tracking and clocking or advanced home automation in offices, already requires robust governance frameworks. The future Anthropic describes is not distant; the question is whether we are prepared to govern it.

Original source: ComputerWorld. Analysis and adaptation by ForgeNEX.

Office Address

Phone Number

Email Address

Available on Google Play