Listen now

Speech recognition and artificial intelligence are changing how we interact with everything. Voice as a user interface has been a topic for discussion for years, but we believe that the technology finally is accurate enough with sufficiently low latency to provide an experience similar to other forms of interaction. A leading indicator for this is how the adoption of smart assistants from Apple, Google, Amazon, and others has gone mainstream with smart speaker penetration at 34% in the US. Smartphones are already embedded with algorithms enabling voice-based interaction.

Around half of adult Internet users are now using voice technology in some way – whether it’s through voice assistants on their smartphones, in their vehicles, or in their homes.

The advantages of voice

We think the addition of a voice user interface draws many parallels with prior UI changes, for example, the ways in which our interactions with technology have changed with the addition of touch screens, graphical interfaces, and keyboards. Voice has a few key advantages over these technologies, and we expect voice to become a core user interface, coexisting with existing technologies.

Productivity: Americans type at an average rate of 40 words per minute, but speak at an average of 150 words per minute – more than three times faster. This means that voice-enabling tasks that would normally require a keyboard can result in huge productivity increases, not to mention the simplification and streamlining of numerous daily tasks.
Improved user experience: Enabling analytics, for example on customer and sales calls, can significantly improve processes as well as the overall user experience.
Safety: Enabling hands-free engagement is particularly important in environments like industrial work sites and vehicles, where screen engagement is limited or impossible.

Spoken versus conversational language

Key speech and natural language processing technologies include attention detection. This is a lightweight, locally processed command (e.g., "Alexa" or "Hey Siri") that 'wakes up' the application or technology. They also include speech recognition, which recognizes and translates spoken language; natural-language understanding (NLU), and/or interpretation (NLI), which deals with 'intent' processing or comprehension; and speech synthesis, which is the ability to produce human speech.

Currently, even the most sophisticated smart assistants only understand a small subset of user intent. For example, Amazon Alexa’s skills are, for the most part, manually programmed and solve one specific command or 'intent' at a time. Conversational language is much more complex, and natural language processing is considered a particularly 'AI-hard' problem to solve.

Nevertheless, deep learning, and increased computational power and connectivity have significantly improved voice recognition. In 2017, Google's machine learning word accuracy in US English reached 95%, the threshold for human accuracy. By late 2019, Amazon's Alexa smart assistant had racked up more than 100,000 'skills'. And the use of voice assistants is expected to triple over the next few years, to 8 billion by 2023.

Value-added use cases for voice-enabled technology

Despite the hype around smart speakers and smartphone-embedded voice assistants, we believe that a majority of value is in B2B use cases, where significant economic value can be unlocked through voice. The vast majority of consumer use cases remain ‘nice to have’ rather than ‘must have’ and do not necessarily improve our daily lives in a meaningful way. A few examples of what we believe are use cases with strong value propositions include:

Customer service. A voice interface can provide automatic, conversational responses to a growing subset of customer service calls, such as appointment bookings and common support/helpdesk queries, reducing the need for human interaction to handle repetitive tasks and calls. Conversational transparency can be a powerful tool for discovering new customer-driven product recommendations and improving processes, which is an opportunity that companies like Chorus, Speechmatics, and i2x are addressing.
Integration of voice with AR/VR solutions. This provides an even deeper level of productivity improvement. One example of this is the Varjo VR display, which is being integrated with voice-assistant technology to provide a totally immersive environment for industrial design and engineering. Another is Realwear, whose flagship product, a head-mounted, wearable, Android-class tablet computer, is safely controlled with voice and thus frees a worker’s hands for dangerous jobs.
Workflow augmentation. Consider professions like healthcare and field services. These two professions are very different in nature, but both involve diagnosis and treatment/maintenance. A lot of time is spent diagnosing the problem and recording the issue. Combining these workflows saves a considerable amount of time and reduces the risk of human error. Medical professionals currently spend 1-2 hours per day manually entering data into health record systems. Companies like Corti are combining voice with AI, providing a digital assistant that can analyze patient interviews, and provide support for emergency calls. On the consumer side of healthcare, one example of re-engineering an existing application for voice is a feature in the Lifesum Health app that enables Google Assistant users to log meals into the app using voice only (Lifesum is an NGP Capital portfolio company).

Voice – the new future of interaction?

Technology challenges (and opportunities) with voice remain in a number of areas. While training general-purpose speech-recognition solutions require thousands of hours of data, low-resource languages (e.g., Haitian, Zulu, Assamese) and certain domain-specific applications have significantly lower data requirements, providing an opportunity for applying deep learning techniques to build solutions for these scenarios. Small-footprint devices will need technologies that ensure industry-level accuracy and voice quality in devices with lightweight processors, memory, and power sources.

Conversational UIs have the potential to change the way people interact with technology at home and in businesses, but it will take years before we get to a generalized conversational interface.

In the meantime, a host of companies, large and small, are focusing their efforts on the difficult technical challenges that still need to be overcome, and on the specific needs of those vertical and horizontal applications that would benefit most from a voice-enabled user interface.

We believe that voice, as a user interface, is here to stay, along with touch screens, GUIs, keyboards, and the like. However, we still lack the killer apps. Maybe that is why Alibaba announced last month a further $1.4B investment in its smart speaker ecosystem.

Fastest growing voice tech companies – quantified by Q

Company	Description	HQ	Stage	Funding $
1. Observe.AI	Customer conversation analytics for sales team	US	Series A	35M
2. Speechmatics	Cloud-based speech recognition services to convert speech-to-text	UK	Series A	8M
3. Cognigy	Platform provider for developing conversational AI	DE	Series A	6M
4. Orbita	Provider of Virtual AI assistants for Healthcare	US	Series A	16M
5. Rasa	Provider of open source machine learning tools for conversational AI applications	DE	Series A	15M
6. CallMiner	Conversational Analytics Software Solutions	US	Series D	149M
7. Gong	Provider of call analytics for inside sales teams	US	Series C	134M
8. Deepgram	Provider of speech analytics API	US	Series A	14M
9. ID R&D	Authentication solution for enterprises	US	Series A	6M
10. Voiceflow	Platform to design, prototype and publish voice applications	CA	Seed	4M
11. Finn	AI based virtual assistant for banking and financial services	CA	Series A	14M
12. Mindsay	Conversational marketing solutions provider	FR	Series A	10M
13. Smith	Voice based virtual customer service	US	Series A	7M
14. Loris	Hard conversations navigation solution	US	Series A	7M
15. Tagove	Multi-channel call center solution for businesses	US	Series A	5M
16. OJO	AI-based conversational assistant for the real estate industry	US	Series C	76M
17. Octane	Platform to create & manage e-commerce chatbots	US	Seed	8M
18. Verbit	AI based speech recognition platform	US	Series B	65M
19. RealWear	Wearable heads-up display for industrial workers which allows handsfree access of data	US	Series B	103M
20. HYPR Corp	Multi-modal biometrics based authentication system for mobile, desktop & IoT systems	US	Series B	43M

The list covers North America and Europe. We have excluded companies with funding below $3M.

Q is our predictive AI platform, that makes us better and more efficient investors. Q operates at the very core of NGP Capital by automating our global investment workflow. It can scan, rank, and quantify 500,000 companies based on more than 200 different growth indicators. With Q, we surface the most promising companies in real time, reduce decision making bias, and monitor market developments.

‍

Unlocking the value of voice technology

Listen now

Fastest growing voice tech companies – quantified by Q

Related articles

DACH Startups Decoded Vol.2024: 5 key startup trends to watch

How API-based SaaS is redefining software

The rise of China’s enterprise economy

Related articles

Join the magic

Get in touch