
Anthropic Unveils Enhanced Claude 3.5 Sonnet and Introduces Claude 3.5 Haiku
Anthropic Unveils Enhanced Claude 3.5 Sonnet and Introduces Claude 3.5 Haiku
Anthropic has announced several exciting updates regarding its AI models, including an enhanced version of Claude 3.5 Sonnet, the introduction of Claude 3.5 Haiku, and a public beta for a new capability allowing users to direct Claude to operate computers similarly to a human.
The revised Claude 3.5 Sonnet showcases significant improvements over its predecessor, demonstrating superior performance in various areas, including graduate-level reasoning, undergraduate knowledge, coding, solving mathematical problems, participating in high school math competitions, visual question answering, and both agentic coding and tool usage.
According to Anthropic, early feedback from users indicates that the updated Claude 3.5 Sonnet marks a considerable advancement in AI-driven coding capabilities. GitLab’s evaluation of the model for DevSecOps tasks revealed up to a 10% enhancement in reasoning across various applications.
On the other hand, Claude 3.5 Haiku is described as Anthropic’s fastest model, matching the cost and speed of Claude 3 Haiku but improving across all skill areas. It often surpasses the previous version’s largest model, Claude 3 Opus, in numerous benchmarks.
Anthropic reports that Claude 3.5 Haiku excels particularly in coding tasks, achieving a score of 40.6 on SWE-bench—a benchmark assessing a model’s ability to reason through GitHub issues—outperforming both the original Claude 3.5 Sonnet and GPT-4o.
“With reduced latency, better instruction adherence, and enhanced tool usage, Claude 3.5 Haiku is ideal for user-centric applications, specialized sub-agent tasks, and creating personalized experiences from large datasets such as purchase histories, pricing, or inventory information,” Anthropic noted.
Claude 3.5 Haiku is expected to be available in the coming weeks via Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI, initially as a text-only model, with image input functionalities to be integrated later.
In addition to the model enhancements, Anthropic has launched a public beta for a new feature that equips Claude with general computer skills. This API allows the model to perceive and interact with computer interfaces, enabling it to perform tasks such as moving the cursor to open applications, navigating to specific web pages, and filling out forms with data sourced from those pages.
In early assessments using the OSWorld benchmark—which measures an AI’s ability to operate computers like a human—Claude 3.5 Sonnet achieved a score of 14.9% in the screenshot-only category, the highest score recorded for any model (the next closest being 7.8%). Furthermore, when tasked with more complex activities, Claude scored 22%.
Anthropic acknowledged that Claude has some limitations, particularly with actions like scrolling, dragging, and zooming, advising users to experiment with it on low-risk tasks.
“By learning from the initial implementations of this still-nascent technology, we aim to gain a deeper understanding of both the opportunities and the challenges posed by increasingly capable AI systems,” the company stated.