Google Invests in Cognitive: Cloud Speech API Reaches General Availability Like
Quick overview of most significant highlights in the industry and on the site.
Build your own feed by choosing topics you want to read about and editors you want to hear from.
Set up your notifications and don’t miss out on content that matters to you
Estimated reading time: three minutes
- Added to
reading list
reading list
If the behaviour persists please contact us.
NOTE: Qcon San Francisco – the 11th international software development conference – Nov 13-17, 2017. 100+ pro practitioner speakers, 1400+ attendees, eighteen tracks to cover topics driving the evolution of software development today. Get more details or register now!
In a latest blog post, Google announced their Cloud Speech API has reached General Availability (GA). The Cloud Speech API permits developers to include pre-trained machine learning models for cognitive tasks such as movie, pic and text analysis in addition to dynamic translation. The Cloud Speech API was launched, in open beta, last summer.
Cloud Speech API takes advantage of Google’s neural network-based speech recognition that has its roots in Google’s own voice offerings, including Google Assistant and Google Home. The Cloud Speech API presently supports language services in more than eighty languages and variants. It is also able to ingest audio in two modes:
- Real-time streaming that will provide prompt text results while a person is speaking
- Batch for pre-recorded transcript functionality
The service is able to operate in noisy environments by filtering out background noise and can also learn through word and phase hints by adding fresh words or phrases to a dictionary.
As part of this GA launch, Google has added some fresh features and improved spectacle in the areas of:
- Transcription accuracy for long-form audio
- Quicker processing, typically 3x quicker than the prior version for batch screenplays
- Expanded file format support, now including WAV, Opus and Speex
In a latest presentation at Google Cloud Next '17, Dan Aharon, product manager at Google, described some of the use cases behind Cloud Speech API including human-computer interactions using mobile, web and IoT applications. The service can also be used to generate speech analytics for businesses in customer service scripts.
Aharon also discussed the momentum behind speech and why it has reached an inflection point:
- Voice is swifter (150 words per min vs 20-40 for typing)
- Lighter (does not require a hierarchical UI)
- More convenient (permits palms free operation)
- Over 20% of all Android app searches are now done through voice
- Always-listening devices (Google Home, Google Pixel, Amazon Echo) becoming mainstream
Google has showcased a duo customer screenplays that demonstrate the capability of the Cloud Speech API. The very first example is a mobile talk application called Azar. In this mobile talk application, users are able to communicate with others in real time using movie talk. In addition to streaming movie and audio, a transcript is provided to users in the language of their choice. Thus far, Azar has made more than fifteen billion discovery matches and is operating the service at scale.
Another use case that Google is showcasing concentrates on customer service. Nowadays, most organizations providing customer service, over the phone, provide a prompt indicating the conversation is about to be recorded for customer satisfaction purposes. But what do organizations do with that data? Gary Graves, CTO of InteractiveTel, indicates those conversations are usually reviewed only after a customer dispute. But, Graves feels that organizations, including car dealerships, are missing out on many opportunities as a result:
Not only are our car dealership customers making more sales, but it’s causing a shift in mentality because now everyone in the dealership is being held accountable. It’s one thing to have a recording or monitoring solution in place, and people know it’s there. But that’s reactive, meaning the only time that information is ever going to be leveraged is if there’s a situation that calls that into question. Whereas using Cloud Speech we are able to mine these conversations for actionable intelligence to permit us to empower dealers to be proactive and provide a higher level of customer service.
Within InteractiveTel’s suggesting, they provide car dealerships with a transcription and sentiment analysis solution. As a phone conversation takes place in real time, InteractiveTel is able to run those conversations through their platform that leverages the Google Cloud Speech API. As a result, car dealerships can create actionable insight to their salesforce and also determine customer sentiment on a per call basis.
As part of InteractiveTel’s demo at Google Cloud Next ’17, Graves demonstrated how their technology can be used to provide real time speech to text translation, keyword detection and sentiment analysis. Graves feels that even if customers are unwilling to provide their contact information, there is still a lot of product request information that can be captured without relying upon a salesperson to accurately capture this information in a system.