Dawn Song, Meta Platforms’ vice-president of AI research, outlined a vision where AI agents take on complex economically valuable tasks across diverse real-world domains, enhancing human productivity without replacing workers.

  • ALE benchmark tests AI on industry-specific, economically meaningful tasks
  • Top AI models like OpenAI’s GPT-5.5 remain far from human-level accuracy
  • Meta focuses on AI safety and practical augmentation over replacement

What happened

Dawn Song, who recently joined Meta as vice-president of AI research, revealed that the next major frontier in AI involves building agents capable of completing economically valuable work across many industries. She shared these insights during the World Economic Forum in Dalian, China, highlighting efforts to move AI beyond experimental use towards practical applications that support human labor.

Song and her team at UC Berkeley’s Centre for Responsible, Decentralised Intelligence introduced the Agents’ Last Exam (ALE) benchmark, which evaluates AI agents’ performance on over 1,500 complex tasks that mimic real-world occupational responsibilities. Leading AI models such as OpenAI’s GPT-5.5 have demonstrated limited success on these tests, with pass rates below 25%, underscoring the challenges ahead.

Why it matters

The focus on AI agents performing economically valuable tasks represents a shift from general AI research towards practical tools that can assist humans in complex work settings across sectors like video editing, neuroimaging analysis, and cybersecurity. This approach emphasizes augmentation over replacement, aiming to boost productivity and economic impact rather than displace jobs.

With AI advancing rapidly, safety and security are critical priorities. Song’s leadership at Meta combines expertise in AI security and enterprise AI safety, reflecting the tech giant’s commitment to developing trustworthy AI systems. The ALE benchmark and related cybersecurity tests also provide a transparent measure for the industry to gauge progress and risks associated with frontier AI models.

What to watch next

Meta’s Superintelligence Labs, led by Song, will be instrumental in shaping AI safety frameworks and advancing agent capabilities to tackle even tougher tasks. The ALE benchmark is expected to evolve with increasing difficulty to challenge and guide the development of more sophisticated AI agents.

The broader AI ecosystem, particularly in China, is closely tracking progress as Chinese models like ByteDance’s Seed2.1 Pro begin to compete globally on these benchmarks. Additionally, the cybersecurity implications of AI remain a key area of concern, with open-source AI developments anticipated to impact defense and vulnerability management substantially in the near future.

Source assisted: This briefing began from a discovered source item from SCMP China Tech. Open the original source.
How SignalDesk reports: feeds and outside sources are used for discovery. Public briefings are edited to add context, buyer relevance and attribution before they are published. Read the standards

Related briefings