In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors
A Harvard study shows AI provides more accurate emergency room diagnoses than two human doctors. This could change how medical professionals approach diagnosis, potentially integrating AI as a reliable tool in critical care settings.
More in Research
MIT study explains why scaling language models works so reliably
MIT researchers uncover why scaling language models consistently improves performance. This insight could guide future model development and optimization strategies.

Same prompt, different morals: how frontier AI models diverge on ethical dilemmas
Researchers are analyzing how different frontier AI models respond to the same ethical dilemmas. This divergence highlights the varying moral frameworks embedded in AI systems and raises questions about their decision-making processes.

Even the latest AI models make three systematic reasoning errors, ARC-AGI-3 analysis shows
ARC-AGI-3 analysis reveals that even the latest AI models consistently make three systematic reasoning errors. This highlights ongoing challenges in AI reasoning capabilities that need addressing for better performance.

Reinforcement fine-tuning with LLM-as-a-judge
AWS just introduced reinforcement fine-tuning using LLMs as judges. This approach enhances model training by leveraging feedback from large language models, improving overall performance and adaptability in various tasks.