Why I'm betting on voice agents

January 16, 2026

My Profile Picture

Five years ago, during my undergrad, I spent hours scouring webpages for articles to learn CS concepts and prepare for tests. Every blog I came across, every lecture recording and god knows what else. Now, the idea of doing that sounds insane - I get annoyed when a webpage takes more than 2-3 seconds to load or ChatGPT / Claude’s prompts fail.

Initially, I thought I was just becoming lazy. But no. It’s even the same people around me. I think we’re not becoming lazier; it's just technology that has made us “intolerant of friction.”

Let’s take a look at the history:

  • 1990s: Spend hours searching, and reading through books, research papers to find relevant information.
  • 2000s: Google makes a debut, takes away a lot of friction to find relevant information. But you still click through 5-10 sites.
  • 2022: ChatGPT launches, you get synthesized answer in seconds. Takes away friction of searching through sites to the point that it now feels likes ancient technology.

If you observe, each breakthrough in technology didn’t make things faster - it reset our baseline for acceptable effort. The more resistant we become to friction, the less patience we will have.

Every major tech shift removed a layer of translation. Search removed the need to know where information lived. AI removed the need to know how to ask. What remains is translating intent into clicks. This is where voice agents remove that last translation layer.

Imagine having a real life Jarvis on your phone / computer. It books your flight tickets, orders food from Zomato or Doordash, or replies to emails on your behalf with just your voice commands. That was not possible a few years back but now it is. Every action is performed through your voice and executed by agents.

We’re not fully there yet. Current voice assistants like Gemini, Siri can set timers and play music but can’t navigate complex workflows across multiple apps. But the pieces are coming together: better capabilities in speech recognition, standardized tool integrations, and AI that can understand multi-step intent. The voice agents I’m describing - ones that can actually execute across your entire workflow - are emerging now.

But Shlok, what if AI doing things on my behalf messes up?

Of course, there’s a trust problem. Would you really let an agent book a $500 / ₹50,000 flight or send an email / Slack message to your boss without confirmation? Probably not. At least not yet.

Early voice agents will need to show you what they are about to do before executing, i.e., keeping a human in loop. Think of it like a ‘draft’ mode where agent prepares everything, you approve with a click. But as the technology gets more reliable, and as we see it work hundreds of times without error? That confirmation step will feel like unnecessary friction.

Ten years from now, explaining to someone that you used to manually click through websites to book flights will sound as absurd as today researching without AI.

Voice agents will not make us lazy; it’s just we will be even more “intolerant of friction.”