Table of Contents
As artificial intelligence continues to reshape the landscape of modern technology, one frontier stands out for its potential to fundamentally change human-computer interaction: voice. Mati Staniszewski, co-founder of ElevenLabs, recently joined Nikhil Kamath on the WTF Online podcast to discuss why voice technology is the next major shift in how we navigate the digital world and why it might eventually replace traditional screen-based interfaces.
Key Takeaways
- Voice as the Next Interface: Voice is positioned to become the primary way humans interact with technology, moving computing into the background and allowing for more immersive, natural experiences.
- The Hardware Challenge: While foundational AI models are maturing rapidly, the "holy grail" remains finding the right form factor—be it smart headphones, pendants, or wearables—to make voice agents ubiquitously useful.
- Domain-Specific AI: Entrepreneurs are finding the most success by combining powerful audio models with deep domain expertise in industries like automotive, healthcare, and e-commerce.
- Authenticity and Trust: As AI-generated content grows, building platforms that prioritize human verification and authentic interaction will be critical to breaking through the digital noise.
The Shift Toward Voice-Native Experiences
For decades, human-computer interaction has been dominated by keyboards and touchscreens. Staniszewski argues that this is fundamentally unnatural. We are hard-wired to communicate through speech, yet our technology forces us to stop, look down at a screen, and manually input information.
Building the "Jarvis" of Tomorrow
The goal for many in the AI field is to move technology into the background. Imagine a device that understands tone, emotion, and context, providing real-time assistance without the friction of a display. Staniszewski notes that we are approaching "Iron Man" levels of AI, where a voice assistant could potentially manage our schedules, translate foreign languages in real-time, and act as a reliable repository of our personal and professional knowledge.
The most exciting part is: could you have the technology kind of fold into the background, the phone goes back into the pocket, and you kind of immerse yourself in the world around you?
Essential Components for Voice Dominance
To reach the level of mass adoption where voice replaces the smartphone, three distinct barriers must be cleared. First, the foundational technology must reach human-level quality; the AI must understand interruptions, display appropriate intonation, and possess high intelligence. Second, knowledge access is paramount—the assistant must have a memory of past conversations and deep integration with personal or corporate data.
Finally, there is the form factor challenge. While the phone is currently the default, Staniszewski believes we are moving toward a multi-device future. Smart headphones, discrete wearables like pendants, or even future neural interfaces will likely work in concert to provide a seamless, ever-present voice experience.
The Entrepreneurial Opportunity
For aspiring founders, Staniszewski cautions against trying to compete directly with massive foundational model companies. Instead, the greatest value lies in the "agentic layer." This involves taking existing models and applying them to specific, traditional industries that have historically lagged in innovation.
Vertical Integration
The automotive, healthcare, and e-commerce sectors offer fertile ground. By building a voice agent that specifically understands the technical jargon of a hospital or the inventory logistics of a car manufacturer, entrepreneurs can create significant defensive moats that general-purpose models cannot easily replicate. The value is not just in the voice generation itself, but in the integrations and trust built within those specific workflows.
Authenticity and the Future of Social Connection
The conversation also pivoted to the current state of social media, which many believe has become trapped in an algorithmic cycle of negativity. Staniszewski and Kamath discussed the possibility of building platforms that prioritize genuine connection and curiosity over knee-jerk emotional reactions.
I don't think there is a place or an ecosystem today where you can have that conversation, so we hope to kind of create it.
The future of social media may not look like the static, text-and-image timelines of today. Instead, it could become an interactive companion where voice plays a central role. By incorporating AI that summarizes interesting developments and facilitates nuanced, multilingual discourse, the next generation of social platforms could foster deeper human connections rather than driving polarization through engagement-baiting algorithms.
Conclusion
The promise of voice technology is not merely about convenience; it is about reclaiming the natural rhythm of human communication. Whether it involves re-imagining how we learn, how we conduct business, or how we relate to one another, the transition toward voice-first interfaces appears inevitable. As foundational models continue to improve, the entrepreneurs who succeed will be those who bridge the gap between abstract AI capabilities and the practical, everyday needs of users, ultimately humanizing the technology we use every day.