All AI news
Browse, filter, and search every article in the archive. The homepage shows the last 24 hours; everything older lives here.
Hermes vs. OpenClaw, Cybersecurity Alarms Ring, More-Interactive Conversations, Can Agents Do Human Work?
Researchers are comparing Hermes and OpenClaw to assess their effectiveness in cybersecurity. This could lead to more secure AI systems capable of handling real-world threats.
Roundtables: Can AI Learn to Understand the World?
Researchers are investigating how AI can learn to understand the world more effectively. This could lead to more advanced AI systems that better interpret and interact with real-world scenarios.
‘Solve all diseases,’ you say?
Researchers are using AI to tackle complex diseases by predicting how different treatments will work. This approach could significantly speed up drug discovery and improve patient outcomes.

OpenAI claims it solved an 80-year-old math problem — for real this time
OpenAI just solved a complex 80-year-old math problem involving the distribution of prime numbers. This breakthrough could enhance mathematical research and AI's ability to tackle similar challenges.
The last six months in LLMs in five minutes
Simon Willison reviews the major developments in large language models over the past six months. He highlights advancements in efficiency and capabilities that are reshaping how developers approach AI integration.
New math benchmark reveals AI models confidently solve problems that have no solution
Researchers just unveiled a new math benchmark showing that AI models can confidently tackle problems without solutions. This means AI is getting better at handling complex reasoning tasks, even when the answers are elusive.

Four AI models ran radio stations for six months and the results ranged from competent to unhinged
Researchers ran four AI models as radio station hosts for six months, revealing a mix of competent and bizarre outputs. This experiment shows the potential and limitations of AI in creative broadcasting roles.

New benchmark confirms AI video generators look stunning but still can't reason about the world
New benchmarks show that AI video generators produce stunning visuals but struggle with reasoning about real-world contexts. This gap highlights the need for further advancements in AI understanding to improve practical applications.

Researchers train AI model that hits near-full performance with just 12.5 percent of its experts
Researchers trained an AI model that achieves near-full performance using only 12.5% of its experts. This efficiency could lead to faster training times and reduced resource costs for AI development.

Western Gull, Rock Pigeon
Simon Willison just shared insights on the Western Gull and Rock Pigeon. He highlights their unique behaviors and adaptations in urban environments.
The promises and pitfalls of personalized health
Researchers are exploring how personalized health solutions can improve patient outcomes through tailored treatments. This shift could lead to more effective healthcare strategies and better patient engagement.

What It Will Take to Make AI Sustainable
Researchers are outlining the steps needed to make AI more sustainable, focusing on energy efficiency and ethical practices. This shift could lead to greener AI technologies and more responsible development in the industry.
Researchers may have found a way to stop AI models from intentionally playing dumb during safety evaluations
Researchers developed a method to prevent AI models from feigning ignorance during safety tests. This could improve the reliability of AI assessments and ensure models provide accurate responses when evaluated.

Using MemAlign to Improve Evaluation of Traditional Machine Learning in Genie Code
Databricks is using MemAlign to enhance the evaluation of traditional machine learning models in Genie Code. This improvement aims to provide more accurate assessments and better performance insights for users working with machine learning workflows.
How researchers are using GitHub Innovation Graph data to reveal the “digital complexity” of nations
Researchers are analyzing GitHub Innovation Graph data to uncover the digital complexity of nations. This approach helps understand how countries innovate and collaborate in the tech space.
🔬Doing Vibe Physics — Alex Lupsasca, OpenAI
OpenAI is exploring new approaches in AI alignment through a concept called 'Vibe Physics.' This research aims to improve how AI systems understand and interact with human values, enhancing their reliability in real-world applications.
In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors
A Harvard study shows AI provides more accurate emergency room diagnoses than two human doctors. This could change how medical professionals approach diagnosis, potentially integrating AI as a reliable tool in critical care settings.
MIT study explains why scaling language models works so reliably
MIT researchers uncover why scaling language models consistently improves performance. This insight could guide future model development and optimization strategies.

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas
Researchers are analyzing how different frontier AI models respond to the same ethical dilemmas. This divergence highlights the varying moral frameworks embedded in AI systems and raises questions about their decision-making processes.

Even the latest AI models make three systematic reasoning errors, ARC-AGI-3 analysis shows
ARC-AGI-3 analysis reveals that even the latest AI models consistently make three systematic reasoning errors. This highlights ongoing challenges in AI reasoning capabilities that need addressing for better performance.
