The rapid advancement of artificial intelligence (AI) technology has revolutionized the way programming tools are used in the software development landscape. While AI-driven code editors like Cursor, Windsurf, and GitHub's Copilot once dominated the scene, the rise of "autonomous agent AI" and "ambient programming" has quietly transformed the way AI systems interact with software.
Terminal interfaces, once popularized by 90s hacker movies with their black-and-white screens, may not look as sleek as modern code editors, but their powerful capabilities in program development are undeniable. AI not only writes and debugs code, but terminal tools are crucial in turning code into usable software.
This shift is most evident in the release of command-line coding tools from major labs. Since February, Anthropic, DeepMind, and OpenAI have launched terminal tools like Claude Code, Gemini CLI, and CLI Codex, which have quickly become some of the most popular products in companies.
While subtle, this change signifies a fundamental shift in the way AI interacts with computers. Many experts believe this trend is just beginning. Mike Merrill, co-creator of Terminal-Bench, said, "We firmly believe that 95% of interactions between large language models (LLMs) and computers will be through terminal-like interfaces in the future."
Traditional code editing tools also face significant challenges. AI code editor Windsurf has gone through a series of acquisitions, leaving the company's future uncertain. New research shows that programmers overestimate the productivity gains from traditional tools. For instance, a METR study found that although developers believed using Cursor Pro would increase their work efficiency by 20-30%, actual observations showed task completion speeds slowed by nearly 20%.
In this context, companies like Warp have quickly risen to prominence, becoming leaders in terminal tools due to their high scores in Terminal-Bench. Warp founder Zach Lloyd is confident in terminals, believing they are the ideal place to tackle problems that code editors struggle with.
The key to the new approach is defining performance benchmarks. Traditional tools often focus on solving code issues on GitHub, while terminal tools take a broader perspective, covering aspects like code writing and DevOps tasks. For example, Terminal-Bench challenges AI to reverse-engineer a compression algorithm and build the Linux kernel from source code, requiring the tenacious problem-solving skills needed by programmers.
Although today's terminal tools have not yet fully unlocked their potential, Lloyd believes they can already handle many developers' non-coding work, which is undoubtedly an exciting prospect.