It’s 1939 and the New York World’s Fair is in full swing, and the star attraction is The Voder, a robot and the first device that could generate continuous human voice. It was invented by Homer Dudley of Bell Labs and while it was intelligible, it was light years away from sounding “human” and was dismissed as a novelty without any real commercial or social use.
Synthesized speech continued to develop over the years in various applications such as children’s toys, speech recognition IVR, GPS systems and basic voice bots and the quality and realism of the voices improved. While these robotic voices were “OK” they failed to capture the imagination and attention of ordinary folk. But then came Alexa, Siri and Google Assistant, and user expectations and interactions, changed dramatically. Early implementations had stilted speech, limited understanding and were lacking in natural language fluency. Consequently, they made limited appearances in IVR but were inadequate for main-stream exposure. This is particularly true where callers want to speak in a natural way using a variety of expressions, conveying emotions, and all the nuances that are the trademarks of a human-to-human conversation. And if that wasn’t enough, the world has accents and dialects that can often challenge humans, let alone a Voice Assistants (“VA”), and South Africa has some of the most challenging!
While there are “only” 12 official South African languages, including sign language, the variety of dialects and accents that are spawned from these means that the linguistic landscape is as wide ranging as the Karoo and as challenging to navigate as the Orange River. South African English has its own peculiarities and nuances that distinguish it from the English spoken in the rest of the world. It’s said that South Africans communicate in such a unique way that even their accents have accents. Even native South Africans aren’t always able to understand or interpret many of these accents, so expecting a VA to easily achieve this is one of the major issues facing bot designers and developers working on Conversational AI solutions. . But these aren’t the only issues facing companies wanting to add a VA to their operations or those innovators seeking to develop these solutions. So where do we start?
This is a familiar refrain in human-to-human discourse, and usually refers to a snappy phrase or a leading question. But we’re not talking about dating here! When designing and building a VA and before we even talk about technology, the first step is conversational design and as a key part of that is identifying what do we want to achieve with a VA and why. It is vital to understand the “why”. With the “why” at the core, the how becomes readily apparent. Elerian works closely with stakeholders and developers to clearly understand the rationale for automation, evolve the designs and bring the experience to life, which is the foundation of successful use case development.
Communication between VAs and humans require a delicate balance of natural human language and structured data. Conversational design is based on human conversation and the more a VA can understand and simulate human conversation, the less friction callers face and the more intuitive the interaction. The role of a conversational designer is to consider both the needs of the caller’s and the enabling technology. Providing the flexibility to design and deploy helpful, authentic, customer-centric conversation flows. This isn’t a million miles away from developing a script for a human agent and preparing them for the variety of customer service questions and issues that customers may have that are a smorgasbord of linguistic possibilities.
These first steps have been vital elements in the design, development, and deployment of a VA for a leading South African leading insurance and pension provider. Elerian worked closely with the customer and their front-line team to gain a strong understanding of the types of questions and issues that their clients would call about. There are three key elements in this.
As well as identifying these key steps, the VA also must deal with the different ways the caller may say common phrases and of course the range of accents it would encounter. Elerian accomplished this by training the VA on the customers’ and use case specific audio data, ensuring exceptional levels of recognition and digital agent understanding. As the VA handles more calls and is iteratively retrained it effectively learns to handle more complex conversations.
Being able to clearly understand what a customer might say in any conversations is obviously vital to the successful completion of a call and the ability to provide first contact resolution and leave the client feeling positive about the experience. As noted earlier, even a human may often find it difficult to understand every word, every nuance of conversation so for an VA to do this effectively and consistently it needs performant technology infrastructure operating in harmony.
Automatic Speech Recognition (ASR) which is effectively the speech to text converter. This is trained on caller dialects and needs to be able to quickly and accurately (>90%) recognize the specific words and phrases spoken to be able to hand these off to the next piece of the technology stack which is:
Solomon, a proprietary algorithm and selects the best transcription based on contextual understanding from Natural Language Understanding Models and the Intelligence Model, providing a significant accuracy advantage.
The Language model, that is also trained on customer recordings, but is looking for the content or topic, extracting sentiment and validating the intents and entities. It also has built-in disambiguation that can automatically resolve ambiguity from callers regarding intents and entities.
The VA must also be able to integrate with the front-end telephony system for swift and accurate call answering as well as any back-end data base or system from where it can confirm contact details and customer verification and retrieve information to use in completing the interaction. This is where the next key steps and the engagement with the customer really come into their own.
To achieve this part of the project meant developing close cooperation with all the customer’s stakeholders. This initially means reviewing and categorizing operations, and if possible, quantifying the cost, complexity, and volume of each type of interaction to determine which are most suitable for automation. Once that has been agreed, to then understand and map the necessary underlying workflows, systems, data, and processes that support a use case, and that can accurately and consistently be delivered by the VA.
A key feature of Elerian’s work with this client has been the robust and collaborative pilot program and Proof of Concept (POC) that provided vital and actionable feedback that has led to a successful client deployment.
Much of the early testing was done using the customer and Elerian’s internal resources. This testing started by gathering the data on existing channels to train the VA in conjunction with human supervision. As the customer has a large and diverse employee base, this provided the VA with exposure to many dialects that significantly reduced the training time and provided a rich database of voices, that continues to grow and enhance the system.
Another vital step was to create a clear and reliable hand-off to a live agent if escalation is needed or requested, and that can supply the context and history of the interaction via the VA to a human. Next monitoring, tracking, and reporting on the success of the VA like we would with a human agent, and opportunities for continuous improvement. As processes and data may change over time, it was vital that both parties conducted regular reviews of customer feedback to identify any pinch points or potential roadblocks to success.
A rich, built-in, analytics suite, capable of drilling down into each conversational turn provides insights on the caller’s journey, but critically also includes a full voice
recording so that a complete audio analysis of the conversation, and a clear understanding of where any problems occurred, in understanding and responding.
The customer started with after-hours calls for relatively simple transactions such as requesting a tax certificate or generating an account statement. It was incrementally introduced during business hours and the response from both the customer and its clients has been positive.
The system is now handling thousands of calls per day and continues to be tweaked and enhanced as new data and voices are incorporated into the various models. Additional capabilities are currently being tested, including voice biometrics, a WhatsApp channel and expansion into other customer service functions. While the driving force behind a conversational AI deployment must be about making life better for customers and colleagues – saving time and reducing contact centre costs, are ancillary benefits that can’t be overlooked.
And of course, it’s done with South African flare and spirit!