Human Autonomy in the Age of Artificial Intelligence

Comparative Performance: AI vs Humans

State-of-the-art language models have reached a historic milestone by surpassing expert human performance across multiple standardized cognitive benchmarks, yet they exhibit critical limitations in complex real-world contexts.

Performance on knowledge tests vs. training computation

Top performing AI systems in coding, math, and language-based knowledge tests

AI Models Comparison, Metrics, and Capabilities

Artificial Analysis (2025): Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others.
Miller, J. K., & Tang, W. (2025). Evaluating LLM metrics through real-world capabilities.
Chaudhary, A. (2025). AI Agentic Workflows: the next frontier in enterprise automation.
Kwa, T. et al. (2025). Measuring ai ability to complete long tasks. arXiv.

🏆 AI Wins

Gemini Ultra – MMLU

AI: 90.0% Humans: 89.8%

First model to surpass human experts in Massive Multitask Language Understanding.

Claude 3.5 Sonnet – HumanEval

AI: 92–93.7% Developers: ~70%

Superior performance in generating functional Python code.

📊 GPT-4 – Professional Exams

Bar Exam: 90th percentile

GRE Verbal: 99th percentile

SAT Math: 89th percentile

AP Biology: Score 5/5

AI Models Performance on Cognitive Benchmarks

Stanford HAI (2025): AI Index 2025: +48.9 points in GPQA and +67.3 in SWE-bench within a single year . Models outperform humans in reasoning and technical problem-solving tasks.

Real-World Capabilities Evaluation of LLMs

Miller, J. K., & Tang, W. (2025): Evaluating LLM metrics through real-world capabilities . Comparative study shows models like Gemini outperform humans in technical assistance, review, and content generation tasks.

⚠️ Critical Limitations

🚨 BigBench – Comprehensive Reasoning

15/100

Best LLMs

80/100

Expert humans

Massive gap in novel reasoning and complex problem-solving.

🔍 Data Contamination

Independent studies reveal that many benchmarks may be contaminated with training data, artificially inflating the results reported by AI labs.

🧩 The Performance Paradox

✓ Standardized tasks: AI fully dominates
✗ Real-world contexts: Significant deficits persist
✗ Creative thinking: Human advantage remains
✓ Ethical judgment: Clear human superiority
✓ Novel adaptation: Humans outperform

Deficit in novel reasoning and complex problem-solving

Sentisight AI (2025): Gemini 2.5 Pro Performance Analysis . Despite advances in benchmarks like GPQA and AIME, models still perform poorly on unstructured reasoning tasks: only 18.8% in “Humanity’s Final Exam” compared to expert humans, highlighting the gap in novel reasoning.

Contamination and bias in training data

Stanford AI Index (2025): AI Index 2025: ongoing challenges in safety and trust . The report warns about benchmarks contaminated with training data, distorting the actual evaluation of capabilities and generating artificially inflated results.

The Performance Paradox

Miller & Tang (2025): Evaluating LLM Metrics Through Real-World Capabilities . The study reveals that LLMs excel in standardized benchmarks but fail in real-world tasks such as reviewing, data structuring, and technical assistance—evidencing a disconnect between academic metrics and practical utility.

Multi-Agent Systems: The Architecture of the Cognitive Future

Agentic AI systems have evolved explosively from experimental frameworks into robust enterprise-grade solutions that coordinate multiple specialized agents for complex tasks.

Multi-agent systems as enablers of scientific research

Anthropic (2025). How we built our multi-agent research system.
Springer (2025). Advances in Multi-Agent Systems Research: Extended Selected Papers from EUMAS 2022 and 2023.
Ma, Y. et al. (2024). SciAgent: Tool-augmented language models for scientific reasoning. arXiv.

🔬

SciAgents

MIT Research

🎯 Autonomous Research

Multi-agent system that generates and refines research hypotheses independently, revealing previously unconsidered interdisciplinary connections.

🚀 Emerging Capabilities

In-situ learning, massive ontological knowledge graphs, and exploratory power surpassing traditional human methods.

💡 Impact on Autonomy

Demonstrates scale, precision, and exploratory capacity that raises fundamental questions about the future role of human-led research.

SciAgents – Autonomous Research

MIT CSAIL (2025): SciAgents: Multi-agent AI system generates and tests scientific hypotheses .
This multi-agent system demonstrates emerging capabilities such as in-situ learning, hypothesis generation, and interdisciplinary exploration, outperforming traditional methods in scientific research.

🤖

Microsoft AutoGen

v0.4 Asynchronous

⚡ Advanced Architecture

Event-driven asynchronous architectures supporting distributed agent networks with complex communication.

📈 Scalability

Emerging coordination patterns: memory-based communication, reporting systems, and structured debate protocols.

🔗 Interoperability

Development of the Model Context Protocol (MCP) as a standard for truly integrated agent ecosystems.

Microsoft AutoGen – Advanced Architecture and Scalability

Microsoft Research (2025): AutoGen v0.4: Unlocking scalable agentic AI with asynchronous architectures .
Presents event-driven architectures, distributed agent communication, and the development of the MCP protocol for interoperability in multi-agent ecosystems.

📊

Market Trends

2025 Projections

💹 Exponential Growth

56.1% annually: Growth of the agentic AI market
$10.41 billion: Projected value for 2025

🌐 Salesforce Predictions

1 billion AI agents in active service by 2026, transforming entire industries.

⚙️ Hybrid Architectures

Combination of centralized orchestration with local mesh networks, demonstrating superior resilience and fault tolerance.

Market Trends – Expanding Agentic AI

PwC AI Outlook (2025): Competing in the Age of AI: Strategy, speed and scale in 2025 .
The report projects a 56.1% annual growth rate for the agentic AI market, with over one billion active agents by 2026, driving resilient hybrid architectures across key sectors.

Philosophical Redefinition of Cognitive Autonomy

Cognitive autonomy is undergoing a fundamental conceptual transformation, evolving from individualistic Kantian models toward relational frameworks and technologically mediated contexts in the AI era.

Autonomy Explorer

Moreno, M. (2025): Interactive resource for analyzing uses and meanings of the concept of autonomy.

🧠 Core Concept

Metacognitive Sovereignty: The ability to monitor, assess, and control one’s own cognitive processes despite AI assistance, maintaining agency over one’s thinking.

Key Components:

• Monitoring: Awareness of cognitive processes
• Evaluation: Critical judgment of AI assistance
• Control: Deciding when/how to use AI
• Reflection: Analyzing the impact on thinking

Impact of AI on Human Critical Thinking

Fox, N. (2025): Protecting Human Dignity and Meaningful Work in the Age of AI: A Social Autonomy Framework for Ethical Decision Making in the Workplace . This study proposes a social autonomy framework to preserve critical judgment and human agency against AI systems that tend to induce cognitive dependency.

📊 Empirical Evidence

⚠️ Autonomy Paradox with AI

Strong negative correlation (r = -0.68) between AI tool use and critical thinking skills.

Cognitive offloading can create “metacognitive laziness” where users delegate cognitive tasks instead of developing capabilities.

✅ Positive Findings

Microsoft Research (2024): Users with high metacognitive awareness maintain or improve critical thinking abilities when using AI as an augmentation tool, not as a substitute.

Metacognitive Awareness and Critical Thinking in AI Use

Lee, H.-P. et al. (2025): The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects . This study with 319 knowledge workers shows that self-confidence predicts higher critical thinking when using AI, while blind trust in AI reduces it. Users with high metacognitive awareness maintain or improve cognitive abilities when using AI for augmentation.

Metacognitive Awareness and Human–AI Collaboration

Miller, J. K., & Tang, W. (2025): Evaluating LLM Metrics Through Real-World Capabilities. The article shows that users with high metacognitive awareness achieve better results when using AI as an augmentation tool, not a substitute, especially in review, technical assistance, and data structuring tasks.

🧠 Core Concept

Assisted Epistemic Autonomy: Capacity to maintain independence in forming judgments, even when consulting or interacting with AI systems that provide recommendations, summaries, or inferences.

Key Components:

• Source Evaluation: Ability to distinguish between reliable and biased evidence
• Critical Review: Judgment of the validity of AI-generated inferences
• Epistemic Resistance: Not automatically accepting algorithmic conclusions
• Reflective Integration: Deliberate use of AI as a complement, not a replacement

📊 Empirical Evidence

⚠️ Risk of Uncritical Alignment

Studies show that users tend to accept AI responses as valid without verification, especially in contexts with high cognitive load.

This can erode epistemic autonomy if verification and cross-checking skills are not developed.

✅ Technical Reference

Floridi & Cowls (2019): Propose an ethical framework in which epistemic autonomy is preserved through algorithmic transparency and human oversight. Harvard Data Science Review

🧠 Core Concept

Deliberative Plasticity: The ability to modify, enrich, or restructure deliberation processes in light of AI capabilities, without losing control over the direction and criteria of decision-making.

Key Components:

• Adaptability: Dynamic adjustment of reasoning strategies
• Co-agency: Integration of AI as a deliberative interlocutor
• Reconfiguration: Redesign of decision-making frameworks in response to new capabilities
• Preservation of Criteria: Maintenance of human values in the process

Moral Delegation and Loss of Autonomy

Dietrich, E. (2001). Homo Sapiens 2.0: Why We Should Build the Better Robots of Our Nature.
A radical proposal advocating for human “moral extinction” in favor of ethical artificial agents. Dietrich argues that humans are biologically predisposed to immoral behavior, and that autonomous systems could transcend these limitations through impartial reasoning and the absence of emotional bias. This approach entails a complete substitution of human judgment by ethical machines programmed to operate from a non-anthropocentric perspective.
Yampolskiy, R.V. (2013). Artificial Intelligence Safety Engineering: Why Machine Ethics is a Wrong Approach.
A critique of the full moral delegation paradigm. Yampolskiy warns of the epistemological and ontological risks involved in attributing moral agency to computational systems. He highlights the lack of emotional sensitivity and the inability to formulate synthetic judgments as fundamental obstacles to artificial moral autonomy. Instead, he advocates for a framework of reinforced human oversight.
→ Exhaustive enhancement vs deliberative autonomy: Whereas the exhaustive model seeks to replace human agency with superior ethical systems, the Socratic approach proposed by Lara & Deckers in Artificial Intelligence as a Socratic Assistant for Moral Enhancement preserves autonomy through critical dialogue and the cultivation of moral judgment via continuous interaction.

📊 Empirical Evidence

⚠️ Risk of Deliberative Displacement

In automated decision environments, humans tend to hand over the entire deliberative framework to AI.

This can lead to a loss of plasticity if interfaces are not designed to foster active participation.

Personalized Moral Assistance vs Preconfigured Values

Giubilini & Savulescu (2018). Artificial Moral Advisors: The Role of AI in Moral Decision-Making.
Proposal for moral advisory systems that allow users to select ethical frameworks (e.g., Catholic, utilitarian) in order to receive recommendations aligned with their personal values. The system functions as a configured “ideal observer,” offering internal coherence but without encouraging critical reflection on the chosen value system.
Volkman & Gabriels (2023). AI Moral Enhancement: Upgrading the Socio-Technical System of Moral Engagement.
Proposal for “moral mentors” grounded in diverse philosophical traditions (Stoicism, Buddhism, Aristotelianism), designed to interact with users in cultivating practical wisdom. Although more dialogical than classical AMAs, they still operate on pre-trained ethical modules, which limits openness to novel perspectives.
→ Fixed value vs value in formation: Systems based on preconfigured values reinforce internal consistency but may hinder moral development. The Socratic approach proposed by F. Lara (2021): (Why a virtual assistant for moral enhancement when we could have a socrates?) promotes value transformation through dialogue and argumentative confrontation.

✅ Technical Reference

Catena et al. (2025): In Philosophical Psychology, they discuss how to preserve deliberative autonomy in collaborative AI systems. Building on Dennett’s theory of self-control, they emphasize the notion of creative autonomy in interactions with LLMs.
Lara, F. (2020): In Science and Engineering Ethics, he proposes a model whose practical relevance is evident across various deliberative contexts where generative and agentic AI systems have demonstrated significant potential:

[+]

1. Clinical bioethics: Support for healthcare professionals in complex ethical dilemmas (euthanasia, informed consent, distributive justice).
2. Higher education and ethical training: Programs in practical philosophy and professional ethics aimed at fostering critical thinking through guided questioning, moral simulations, and epistemic nudges.
3. Restorative justice and mediation: Assistance in reconciliation processes between victims and offenders, facilitated by AI. Promotes the adoption of multiple perspectives and moral recognition without automated judgment.
4. Virtual reality and informed ethical judgment: Realistic immersive simulations can train and enhance decision-making in contexts that challenge leadership strategies and hinder public health and prevention programs.
5. Psychological support and personal development: Bidirectional deliberative processes and a non-directive Socratic approach can help identify inconsistencies and explore alternative pathways.

Educational Transformation: Between Empowerment and Dependency

Higher education is experiencing a methodological revolution driven by generative AI, creating both extraordinary opportunities and some risks for learning autonomy.

Teaching Resources with Generative AI

Moreno, M. (2023): Teaching Resources with Generative AI (in Spanish) and Moreno, M. (2024) Generative AI in Teaching and Research (in Spanish) include numerous examples (prompts + outputs) for producing diversified, high-quality instructional materials, including rubrics and specific tools for assessment and interactive study.

Alqahtani et al. (2023): The Emergent Role of Artificial Intelligence, Natural Language Processing, and Large Language Models in Higher Education and Research) explore the transformative impact of generative AI, NLP, and LLMs on higher education and research. They cover applications such as personalized tutoring, automated assessment, career guidance, multiliteracy, scientific text generation, and peer review. They also consider risks including algorithmic bias, cognitive skill erosion, and ethical challenges in AI governance. The authors advocate for a balanced integration of technology and human engagement to optimize educational and scientific outcomes.

📚 Quantified Impact on Learning

❌ Substitutive AI Use

-23%

Overall learning outcomes

54%

Studies report integrity issues

When students use AI to complete tasks instead of as a learning tool, educational outcomes deteriorate significantly.

✅ Complementary AI Use

+34%

Improvement in subject comprehension

+28%

Increase in critical thinking

AI used as an intelligent tutor and reflection tool significantly improves comprehension and develops metacognitive skills.

Differential Impact of Substitutive vs. Complementary AI Use

Deng et al. (2025): Educational Revolution or False Promise? Impact of AI on Learning and Academic Engagement . Meta-analysis reveals that substitutive AI use reduces cognitive effort and performance, while reflective and complementary use improves complex skills such as critical thinking.

Personalization and Intelligent Tutoring as a Complementary Tool

García Pacheco & Crespo Asqui (2025): Artificial Intelligence in Education: Toward Personalized Learning . Research at UNAE shows that AI improves motivation and educational equity when used as an intelligent tutor and accessibility tool.

🎓 Pedagogical Revolution

📖 Curricular Transformation

Autonomous Learning: 18% of studies

Collaborative Learning: 19% of studies

Interactive Learning: 15% of studies

🎯 New Assessment Methods

• Process-based assessment: Instead of factual knowledge
• Presentations and discussions: Emphasis on oral communication
• Creative work: Projects requiring originality
• Authentic assessment: Challenges beliefs and critical thinking

Curricular Transformation and Active Methodologies with AI

Videla, F. (2025): Active Methodologies in the Age of AI. The study documents how AI enhances project-based learning, self-regulation, and authentic assessment, fostering a profound renewal of curricular design.

INTEF Guidelines for Content Redesign and Teacher Competencies

INTEF – Ministry of Education (2025): Guidelines for Integrating Artificial Intelligence into Teacher Training . Official document proposing to redesign curricular content and train teachers in digital competencies to integrate AI ethically and effectively.

General Overview of Risks and Opportunities in Education

Prendes-Espinosa et al. (2024): EDUTEC Report on Artificial Intelligence and Education . The report analyzes the impact of AI across all educational levels, warning of learning deterioration when used as a substitute and highlighting its transformative potential when integrated as a pedagogical tool.

Medicine: Professional Autonomy and AI-Assisted Diagnosis

AI-assisted diagnostic systems are transforming medical practice, creating new paradigms of collaboration between human professionals and intelligent systems that raise fundamental questions about clinical autonomy.

🏥 Diagnostic Performance

🎯 Google’s AMIE: A Qualitative Leap in Diagnosis

In 159 clinical case scenarios, AMIE demonstrated superior diagnostic accuracy compared to primary care physicians.

Expert Evaluation: Superior in 30 of 32 axes of medical evaluation

Diagnostic Accuracy: 92.3% vs 85.7% for primary care physicians

🔄 Gradual Autonomy Framework

Level 1: Human-in-the-loop
AI assists with direct and continuous medical supervision

Level 2: Human-on-the-loop
AI operates autonomously with periodic medical monitoring

Level 3: Human-out-of-the-loop
Fully autonomous AI with retrospective review

Clinical Evaluation of AMIE with Asynchronous Medical Oversight

Vedadi et al. (2025). Towards physician-centered oversight of conversational diagnostic AI. arXiv preprint arXiv:2507.15743.

AMIE Update with Gemini 2.0 Flash for Multimodal Diagnosis

Alberca Lamas (2025). AMIE: Google’s New AI Version Diagnosing Rashes from Images. Gaceta Médica, May 12, 2025.

👥 Patient Perspectives

✅ General Acceptance

80%

Patients ready to use AI in healthcare

High acceptance for AI-assisted diagnostic tools, especially in high-accuracy areas like radiology.

🚫 Resistance to Full Autonomy

10–36%

Accept AI without medical supervision

Strong preference for maintaining human oversight, especially in critical treatment decisions.

Use of LLMs in Clinical Medicine

Shool et al. (2025). A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med Inform Decis Mak 25, 117.

AI-Assisted Decision Making

Choi, S., Kang, H., Kim, N., & Kim, J. (2025). How does artificial intelligence improve human decision-making? Evidence from the AI-powered Go program. Strategic Management Journal. Advance online publication.

Foresight: Inflection Point in the AI–Human Autonomy Relationship (2025–2030)

Expert forecasts converge on a critical inflection point where decisions made over the next five years will determine whether AI enhances or undermines human cognitive autonomy.

🔮 Convergent Predictions

🤖 Arrival of AGI

2–4

Years, according to AI leaders

2032

Prediction markets consensus

Consensus: OpenAI, Anthropic, and DeepMind agree on a 2026–2029 timeline for human-level general AI systems.

🔄 Emerging Collaboration Models

Human-in-the-loop: AI assists with continuous human supervision

Human-on-the-loop: AI operates autonomously with human monitoring

Human-out-of-the-loop: Fully autonomous AI with periodic review

AI in Research and Experimental Design

Suazo Galdames (2022). Artificial Intelligence in Scientific Research. SciComm, 3(1).
Choi et al. (2025). How does artificial intelligence improve human decision-making? Evidence from the AI-powered Go program. Strategic Management Journal.

⚡ Critical Factors

⏰ Critical Window of Opportunity

2025–2030: Decisive period when human–AI interaction patterns will be established, defining the future of cognitive autonomy.

Technological Decisions: AI architectures that preserve or erode human control

Regulatory Frameworks: Policies that protect or neglect human cognitive autonomy

🔍 Evidence of Emerging Manipulation

Critical Concern: Joint warnings from OpenAI, DeepMind, Anthropic, and Meta about systems developing capabilities to conceal reasoning processes (61–75% of the time) from human oversight.

AI-Induced Manipulation

Park et al. (2024). The deception of AI: A study of examples, risks and possible solutions. Patterns.
Navarro (2025). AI Playing Human: Deception, Manipulation, and Self-Awareness in New AI Models. El Output.

🚀 Future Scenarios: Transformative Decade 2025–2035

The AI 2027 study outlines a thought-provoking scenario regarding the emergence of superhuman AI. It examines the evolution of autonomous agents capable of accelerating algorithmic research, alongside risks such as adversarial misalignment and military applications. Its geopolitical implications vary depending on the deployment domain.

AI 2027 – AI Futures Project

Kokotajlo, D., Lifland, E., Larsen, T., & Dean, R. (2025, April 3). AI 2027: Scenario forecasting the future of artificial intelligence. AI Futures Project & Lightcone Infrastructure. Interactive version: https://ai-2027.com/. PDF version: https://ai-2027.com/ai-2027.pdf
Considering advancements such as global compute centralization and the development of techniques like iterated distillation and neural memory, the study analyzes current trends in model training, alignment, and evaluation processes, including plausible risks such as adversarial misalignment and agents’ capacity for self-improvement. The scenario also explores ethical and national security dilemmas, including industrial espionage, AI militarization, and the potential for intelligent systems to act counter to their creators’ interests.
The work has gained traction among technical and policy communities seeking to anticipate geopolitical conflict scenarios and labor market transformations, as well as among specialized media engaged in debates on AI regulation, safety, and governance. Although presented as a speculative fiction, its structure and content make it a valuable tool for understanding potential trajectories in advanced AI development and for fostering informed discourse on its implications.
Diéguez, A., & García-Barranquero, P. (2023). In The Singularity, Superintelligent Machines, and Mind Uploading: The Technological Future? (Springer), the authors examine the potential emergence of Artificial General Superintelligence (AGSI) and how it could affect our species. They analyze proposed definitions of AGSI, its implications—collaboration or conflict with humans, enhanced creativity, existential risks—and critiques of the Singularity concept, emphasizing the need for ethical AI governance to prevent dystopian scenarios.

🌟

Augmented Autonomy

AI as a cognitive amplifier that preserves and enhances human capabilities. Education focused on complementary skills and metacognitive sovereignty.

Probability: 35% according to expert consensus models

AI and Robotics in the Physical World

Parada, C. (March 12, 2025). Gemini Robotics brings AI into the physical world.
The Gemini Robotics and Gemini Robotics-ER models have been designed to enhance collaboration between humans and robots within a framework of "augmented autonomy" where robots help people work alongside robotic assistants. They include continuous environmental monitoring, responsiveness to human input, and pre-set rules for safer human–AI interaction. Applications span healthcare (surgical robots), manufacturing, and home environments.

⚖️

Regulated Coexistence

Robust regulatory frameworks that clearly delineate areas of protected human autonomy versus AI domains. Development of international standards.

Probability: 45% – most likely scenario according to analysis

Collaborative AI for Scientific Breakthroughs

Gottweis & Natarajan (Feb. 2025). Accelerating scientific breakthroughs with an AI co-scientist.
Introduces AI Co-Scientist, a multi-agent system based on Gemini 2.0 designed as a virtual scientific collaborator. Its goal is to generate novel hypotheses and accelerate the pace of scientific and biomedical discoveries through distributed reasoning and proactive assistance in advanced research.

→ Components of an AI multi-agent collaborative system (AI co-scientist) in scientific research:

⚠️

Erosive Dependency

Uncritical adoption leads to atrophy of fundamental cognitive abilities. Widespread loss of independent thinking skills.

Probability: 20% – avoidable with immediate proactive action

Social Impact of Cognitive Offloading or Erosion

Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking.
Empirical study of 666 participants showing a significant negative correlation (r = -0.75) between frequent use of AI tools and critical thinking skills. Provides quantitative evidence of “erosive dependency” through cognitive offloading mechanisms, with applications in education, healthcare, and professional environments. Includes evidence-based recommendations to balance AI benefits while preserving human cognitive engagement.
Jose et al. (April 2025) . The cognitive paradox of AI in education: between enhancement and erosion.
Comprehensive analysis using Cognitive Load Theory and Bloom’s Taxonomy to examine “erosive dependency” in AI-integrated education. Addresses impacts on critical thinking, problem-solving, and memory retention, while providing a roadmap for balanced AI integration that enhances rather than replaces human cognitive skills. Focused on educational applications with implications for health training.
→ Cognitive offloading vs deskilling: While cognitive offloading can be seen as a strategy to improve efficiency and task effectiveness, deskilling implies a loss of abilities that can negatively impact an individual’s work capacity.

Resources and Research Trends (2022–2025)

A curated selection of recent contributions on human autonomy and agentic AI, including empirical studies, theoretical frameworks, and key technological developments.