Cosine Achieves Record 30% on SWE-Bench, Outperforming Industry Rivals in AI Software Development

by

Cosine, a human reasoning lab focused on building artificial general developers (AGD), announced a significant achievement today by scoring 30% on the SWE-Bench, the industry standard for evaluating AI software engineering skills. This represents a 56% improvement over the previous best score of 19% held by Factory, and an astonishing 2196% improvement over OpenAI’s GPT-4 score of 1.31%. The benchmark assesses AI models on real-world tasks like software architecture, debugging, and implementing new features in existing codebases.

Cosine’s AI software developer, Genie, based between San Francisco and London, operates like a highly skilled human developer, capable of autonomously solving bugs, building features, refactoring code, and more. By fine-tuning models to mimic human reasoning, Cosine’s approach has outperformed rivals like AWS’s Amazon Q Developer and Cognition’s Devin, both of which scored below 20% on the SWE-Bench. Cognition, despite its recent $2 billion valuation after raising from Peter Thiel’s Founders Fund, was still outperformed by Genie.

Cosine’s recent $2.5 million funding round, led by Uphonest and SOMA Capital, with participation from Lakestar, Focal, and others, underscores the confidence investors have in the company’s vision.

“Our breakthrough in codifying human reasoning allows us to train AI models that operate far beyond the narrow tasks and limited prompts available to current software development teams,” said Cosine CEO Alistair Pullen, who published and monetized his first software application at age nine.

COO Yang Li highlighted Cosine’s ability to outperform competitors in completing complex software tasks quickly and cost-effectively, marking a transformative shift in how software development is approached.

Founded in 2022, Cosine’s mission is to create AI capable of tackling open-ended problems across various domains, with Genie being the first product to demonstrate this capability. CIO Sam Stenner emphasized that Cosine’s focus is on creating a true AI colleague, not just a co-pilot, capable of human-like reasoning and resilience.

Investors and industry experts, such as Ellen Ma from Uphonest Capital and Ben Tossell from Ben’s Bites, praised Cosine’s achievements, noting that the company is not only advancing AI but fundamentally teaching AI to reason like a human, bringing us closer to artificial general intelligence (AGI).

Related News