Detect speech using the Web Speech Recognition API

As the web matures, we can use interfaces and sensors with improved cross browser/device compatibility. However normally we are given access to the raw data and expected to do our own processing. This has made detecting speech in audio data very intensive, and inaccessible for lower end devices.

The formalising of a Web Speech API opens up a realm of possibilities because it moves the processing into the native browser code and out of javascript. We can now access speech input data much easier and efficiently.

Lets create a simple example to show how you can detect speech and then trigger an action based on the text. This example will work in Chrome (Desktop, Android) and Firefox.

First we need to create a Speech recognition object:

var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition,
    recognition = new SpeechRecognition(),

Next we can set some options, in this case we want continous speech processing and returning incomplete results:

recognition.continuous = true;
recognition.interimResults = true;

After this we need to create the functions to receive the data:

recognition.onstart = function(e) {
    console.log('onstart', e);
recognition.onend = function(e) {
    console.log('onend', e);
recognition.onresult = function(e) {
    console.log('onresult', e);
recognition.onspeechend = function(e) {
    console.log('onspeechend', e);
recognition.onerror = function(e) {
    console.log('onerror', e);

We can now view console logs of our data by starting recognition playback:


If we want to output our text into the page we can update our onresult function:

recognition.onresult = function(e) {
    console.log('onresult', e);
    var i = 0,
        html = '';
    for (i = 0; i < e.results.length; i += 1) {
        html += e.results[i][0].transcript;
    document.getElementById('output').innerHTML = html;

See a full working version here:

No comments:

Post a Comment