There’s a clear thread running through a lot of this week’s devtools news: Testing, evaluation and reliability. Tools built to make sure what’s being shipped actually holds up. Each of these tools help software developers build systems that are less brittle, more predictable, and easier to trust… even as complexity grows. Here is our round up of news from developer tool startups from around the world:

🏆 Funding Wins

Cerebrium Scores $8.5M to Power Real-Time AI Infrastructure

Cerebrium has raised $8.5 million in seed funding to simplify the process of building and scaling multimodal AI apps. The round was led by Gradient, with Y Combinator, Authentic Ventures, and strategic angels joining in. Founded by Michael Louis and Jono Irwin, the company’s serverless infrastructure handles GPU scaling, cold starts under two seconds, and per-second billing. As Michael Louis puts it;

“We built Cerebrium so engineers can focus on building AI products users love that have real business impact instead of hiring an infrastructure team, racking up six-figure cloud bills or worrying about security and compliance.”

🧠It’s already used by companies like Tavus, Deepgram and Vap to run compute-heavy workloads without infrastructure headaches. Read more in the full announcement.

Vellum Raises $20M to Set the Standard for AI Development

Vellum has raised a $20M Series A to help teams take AI from demo to production with confidence. The round was led by Leaders Fund, with backing from Y Combinator, Socii Capital, Rebel Fund, Pioneer Fund, and Eastlink Capital. Founders Akash Sharma, Sidd Seethepalli, and Noa Flaherty built the platform to support test-driven development, enabling engineers and product teams to build, evaluate, deploy, and monitor mission-critical AI systems in one place.

🧱 Read the full announcement and see what’s next for Velllum.

🚀 3, 2, 1… Launches

Deepgram Unveils Saga, a Voice OS for Devs in Flow

Deepgram has introduced Saga, a voice-first operating system that embeds directly into developer workflows. Unlike voice assistants that interrupt flow, Saga runs on top of tools like Cursor, Jira, Figma, and Gmail, letting developers control their stack with natural speech. It turns vague ideas into clear prompts, executes multi-step tasks via MCP, and writes real-time documentation as you speak. CEO and co-founder, Scott Stephenson said:

“Developers spend too much mental energy switching between tools instead of building. Saga changes that by turning voice into a universal interface — you say what you want to do, and Saga makes it happen across your entire workflow”

🎙️Read the full announcement here and find out how to start building with Saga.

BootLoop Launches AI Agent for Embedded Firmware

BootLoop is tackling one of hardware’s biggest pain points: firmware. Their newly launched AI agent writes, tests, and debugs embedded software directly on real hardware, turning months of engineering work into minutes. It interacts with datasheets, schematics, and test equipment like oscilloscopes and debuggers to validate code automatically.

The team ( Noah Pacik-Nelson and Christopher Markus) previously built firmware for SpaceX’s Raptor engine and brain-computer interfaces at MIT. That experience now powers BootLoop’s ITAR-compliant coding agent, built for aerospace, robotics, and IoT.

🥾 Explore the tool and join the pilot program here.

Stably Ships AI Agent That Auto-Heals Playwright Tests

Stably AI has launched its autonomous QA agent for E2E testing, designed to write, run, and self-heal Playwright tests in minutes. The agent learns from your docs, codebase, and user recordings, spinning up new tests for every PR and fixing brittle ones as your UI evolves. Teams like OpenArt AI and Tofu AI are already seeing results: 99.7%+ test accuracy, reduced testing overhead, and hours reclaimed per week. Stably is led by Jinjing Liang (ex-Chrome infra) and Neil Parker (ex-Uber Safety tech lead), with a team of AI PhDs turning test automation into a product advantage.

⏺️Find out all you need to know and start with a 14 day free trial.

Autosana Debuts AI QA Agent for Mobile Teams

Autosana (YC S25) has launched an AI QA agent that tests iOS and Android apps like a real user. Just describe flows in natural language and Autosana simulates them with runtime context, removing the need for flaky test scripts or manual regression testing. It integrates directly into your CI/CD pipeline, saves 8+ engineering hours per deployment, and supports frameworks like React Native, Swift, and Kotlin. Founded by Yuvan Sundrani and Jason Steinberg, Autosana is already helping mobile teams ship faster with less fear of breakage.

📱Start your bug-free build process and book a demo here.

Plug.dev Opens the Gateway for Creator-Led DevTool Growth

Plug.Dev is live with a new platform designed for DevTool GTM teams to run creator-led growth. From YouTube partner programs and influencer campaigns to technical launch content, Plug.dev helps companies tap into the voices developers already trust. Founder Neill Gernon announced on Linkedin last week:

“We’ve been quietly building for a little over a year and a half. In that time Plug has become the default platform for the world’s leading DevTool GTM teams to run creator partnership programs and more.”

🎥 See what Plug.dev can do for your next GTM push here.

🔦 DevTool of the Week: Confident AI

Confident AI Banner (Source: GitHub) DevTools Reliability Testing Evaluation Developer Tools — Confident AI Banner (Source: GitHub)

Confident AI (YC W25) has had a very strong 2025 so far. After launching in February through Y Combinator’s Winter batch, the team quickly raised a $2.2M seed round and wrapped up their first Launch Week in April. Founded by Jeffrey Ip and Kritin Vongthongsri, Confident AI gives engineering teams the tools to test and iterate on LLM applications with more control and less overhead.

The platform is powered by DeepEval, their open-source framework for LLM evaluation. It now runs over 5 million evaluations and is used by teams at Microsoft, BCG, and AstraZeneca. Confident AI brings those evaluations to the cloud, helping teams benchmark models, track changes, and monitor performance with production-grade reliability.

Their newest release, DeepTeam, takes things further. It’s an open-source red teaming tool for testing the safety of agentic systems, built to uncover issues like memory leaks, prompt injection, and goal hijacking before they become costly.

“Agentic systems are the future of AI, but they come with new risks. DeepTeam helps teams test for those risks before they cost you a fortune” (Y-Combinator)

For teams building LLM apps that need to perform and hold up under pressure, Confident AI is becoming a go-to platform for both precision and peace of mind.

💪 Explore Confident AI ⭐ Check out DeepTeam on GitHub 🛡️ Try DeepTeam here

And that’s it for this week! As always, we continue to be utterly impressed by the developments in this space and are proud to be working with some of these incredible startups to find talented people to build modern developer tools.

Louise Ogilvy (Recruitment Director) – UK/USA

Becca Combe MIRP CertRP (Lead Recruiter) – USA

Ben Chipchase (Lead Recruiter) – UK

Natalie Harper (Administrator)

Looking for your next DevTools role?

Visit devtoolsjobs.com

Tags:

AI infrastructure developer tools QA automation reliability engineering software testing startup funding

DevTools Built for Testing, Evaluation and Reliability

🏆 Funding Wins

Cerebrium Scores $8.5M to Power Real-Time AI Infrastructure

Vellum Raises $20M to Set the Standard for AI Development

🚀 3, 2, 1… Launches

Deepgram Unveils Saga, a Voice OS for Devs in Flow

BootLoop Launches AI Agent for Embedded Firmware

Stably Ships AI Agent That Auto-Heals Playwright Tests

Autosana Debuts AI QA Agent for Mobile Teams

Plug.dev Opens the Gateway for Creator-Led DevTool Growth

🔦 DevTool of the Week: Confident AI

Looking for your next DevTools role?

Tags:

Quick links

Contact

Submit CV

DevTools Built for Testing, Evaluation and Reliability

🏆 Funding Wins

Cerebrium Scores $8.5M to Power Real-Time AI Infrastructure

Vellum Raises $20M to Set the Standard for AI Development

🚀 3, 2, 1… Launches

Deepgram Unveils Saga, a Voice OS for Devs in Flow

BootLoop Launches AI Agent for Embedded Firmware

Stably Ships AI Agent That Auto-Heals Playwright Tests

Autosana Debuts AI QA Agent for Mobile Teams

Plug.dev Opens the Gateway for Creator-Led DevTool Growth

🔦 DevTool of the Week: Confident AI

Looking for your next DevTools role?

Tags:

Related Posts

The DevTools Weekly Roundup: Edition 122

The DevTools Weekly Roundup: Edition 121

The DevTools Weekly Roundup: Edition 120