Calling your bank and authenticating transactions by phone without any cumbersome routines – like “difficult” security questions – is something many people are getting used to thanks to voice biometrics.
Enterprises are trying to balance high levels of security required to manage access to customer data, and a positive customer experience – delivering user-friendly access to data and services, which is also perceived as sufficiently secure by customers themselves. This perception is key, and the voice biometric market growth is tied both to the objective security parameters delivered by biometric systems and customer perceptions, concerning security, ease of use, etc.
Voice biometry is the only biometric authentication method that can be used remotely over the telephone without any additional equipment. Unsurprisingly, it is increasingly regarded by businesses as the most pragmatic and cost-effective authentication procedure.
For customers – it’s a matter of positive experience. Voice as a password works smoothly even without the need to remember the password as such, as text-independent voice biometrics systems can work in the background throughout the conversation or in a hybrid mode – where automatically generated combinations of digits or words should be repeated by a caller.
There is another angle to that comforting reality, however. People seem to be increasingly concerned that the Big Tech and governments might eventually obtain complete control over the individuals’ access to resources, ability to travel freely, and many other aspects of life that typically make us feel free and “unsupervised,” so to speak. Protecting data privacy and preventing identity theft is becoming less topical for many compared to a frightening perspective of being suddenly blacklisted and cut off from banking, travel, and other essential activities.
Voice biometry, just like face recognition, fingerprints and other methods is bound to become more and more regulated and it is good. It is critical for enterprises, therefore, to be confident that they know exactly what they are doing when implementing voice biometry, as new laws and regulations are being adopted and significant differences exist across jurisdictions. There is a number of important issues to be considered, and questions that need answers. We are glad to contribute by clarifying in three articles most important aspects of voice biometry, answering some key questions about different phases. Here we go with the first ones!
Many banks have started deploying the so-called ‘active’ or ‘passphrase-dependent’ form of voice verification for their customers. It requires the speaker to exactly repeat the passphrase used during enrolment. During the call, the spoken passphrase is mapped against a previously created voiceprint recording of the same passphrase to ensure that the caller’s voice matches.
There is also a more sophisticated form of authentication called ‘passive’ or ‘text-independent’ speaker verification. It uses a process of verifying the identity of the speaker in a natural conversation, where the speech is free-flowing. It is less intrusive and also more secure by design, providing for ongoing live verification throughout the entire conversation to ensure that the caller remains the same, which obviously guarantees the security of remote financial transactions.
It is expected that the passive form of voice verification will replace the currently used active form by 2022, as it is better equipped to deal with playback and text-to-speech attacks, which the active form of voice authentication is particularly vulnerable to. Since there are no set phrases in the natural language conversation that could be pre-recorded, it is incredibly difficult to gain access to someone’s account or profile by fraudulently recording their voices.
At the moment though, we are still seeing implementations of different engines for active and passive solutions – even if the same company is using both. It became clear to most experts in this field, however, that the ‘hybrid approach’ – where the same solution is used regardless of the contact channel, e.g. IVR or call centre agents – delivers the greatest level of security. If there is not enough audio for a passive verification (a typical requirement is 7-10 seconds) – the system asks an easy-to-answer question to complete the verification process. Alternatively, it can transparently switch to an active form of verification, with no need to enrol, completing the authentication process via a passphrase, for less talkative consumers.
The most suitable way for voiceprint enrolment depends on which of the three above-mentioned approaches is chosen – active, passive or hybrid. If it’s the active one, the user has to repeat at least three times, in a separate process, a passphrase that will need to be reproduced exactly at the time of verification. This may be cumbersome, also because many users are simply not willing to enrol like this and memorise the passphrase, which delays the mass adoption of such a solution.
To achieve the highest accuracy and a better customer experience for a voice biometrics system, it is preferable to collect voiceprints during live customer conversations. However, in some cases these conversations may be very short and most unlikely to happen frequently enough to collect a sufficient amount of audio for the voiceprint creation (approximately one minute of speech). In this case, the best way would be to enrol voiceprints from the archive recordings, following the procedures described above. It is mandatory to obtain customers’ consent to enrol and use their voiceprints in voice biometrics systems – irrespective of whether live conversations or archive recordings are used to enrol the voiceprints.
To deploy voice biometrics systems quicker, it is indeed possible to use archived recordings of the correct user, instead of samples from live conversations, to create voiceprints. Although, the accuracy may not be as high as with live samples, the subsequent accumulation of live voice fragments will make the resulting voiceprint more robust.
To start this process, it is essential to ensure that there is an agreement in place between you and your customers, confirming the customers’ consent to create voiceprints using their archived voice recordings, as mentioned above.
Spitch normally offers consulting services to its clients during the process, in accordance with regulatory frameworks in a given country, including data protection regulations.
Our lawyers help clients formulate customer agreements based on the existing legal requirements in each country that regulate the creation and processing of voiceprints. These should be approved by lawyers at the client side before any project roll-out. In some jurisdictions, e.g. in Switzerland, certain customer engagement methods for consent may be interpreted as spam or deemed inappropriate. Our consultants help clients to avoid such a scenario adhering to specific project requirements.
Authentication: The process of establishing confidence in the identity of users or information systems. Authentication of users (e.g. customers) implies confirmation of their presence and intent to authenticate.
Authentication Factor: The three types of authentication factors are something you know, something you have, and something you are. Every authenticator has one or more authentication factors.
Authenticator: Something that the user possesses and controls (typically a device – e.g. mobile phone with a known SIM card number, cryptographic module or password) that is used to authenticate the user identity. It is sometimes referred to as a token.
Biometrics: Automated recognition of individuals based on their biological and behavioral characteristics, e.g. a set of unique properties of one’s voice, peculiarities of pronunciation of phonemes etc. Biometric authentication may be used to unlock multi-factor authenticators.
Identity: A set of attributes that uniquely describe a person within a given context.