Speech To Text

Speech to text (Chrome only)

This functionality enables developers to convert audio to text. This page explains how to use this functionality with Inbenta's SDK.

Important

This functionality is only available on the Google Chrome browser.

Note

For more information about this functionality, refer to Google's Voice Driven Web Apps (google developers) page.

Initialization

Create a new webkitSpeechRecognition object. The object provides the speech interface and sets some of its attributes and event handlers. Once this object is created, initialize it.

  var recognition = new webkitSpeechRecognition();
  // set `continuous` to `true`: when the user stops talking, speech recognition will continue until we stop it.
  recognition.continuous = true;
  // set `interimResults` to `true`, the results returned by the recognizer can change and we can see the corrections as it is working.
  recognition.interimResults = true;
  // Set the language to english
  recognition.lang = 'en-EN';

Start listening

To start, call the start function. You can add a function even to onstart to perform an action when the recognition starts.

  recognition.onstart = function() {
    // Do something awesome
  };
  recognition.start();

End listening

To finish, call the stop function. You can add a function even to onend to perform an action when the recognition ends.

  recognition.onend = function() {
    // Do something awesome
  };
  recognition.stop();

Get results

To retrieve results, use the onresult function event. It starts an input when the recognition identifies a word and returns it with the word it identified before.

Example: The function gets the results and puts them into a input field:

  recognition.onresult = function(event) {
    var final_transcript = '';
    for (var i = event.resultIndex; i < event.results.length; ++i) {
        final_transcript += event.results[i][0].transcript;
    }
    document.getElementById('final_span').value = final_transcript;
  };

Demonstration

The following demonstration contains:

  • a button to start, and stop, the recognition,
  • a form with an input field for the text, and
  • a "Submit" button to submit the text.

When you click "Submit", it sends the input text to Inbenta to retrieve the results.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Speech to text</title>
  <script src="https://sdk.inbenta.io/km/1/inbenta-km-sdk.js"></script>
</head>
<body>
  <form onsubmit="setQuery(); return false;">
    <button id="start_button" onclick="startButton()" type="button">Start</button>
    <input id="final_span" class="final"></input>
    <input type="submit" value="Submit"/>
  </form>
  <div id="results"></div>
  <script>
    var sdk = InbentaKmSDK.createFromDomainKey(<domain_key>, <inbenta_key>);
    var results = sdk.component('results', '#results');

    var recognizing = false;
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
    recognition.lang = 'en-EN';

    recognition.onstart = function() {
      recognizing = true;
      document.getElementById('start_button').innerHTML = 'Stop';
    };
    recognition.onend = function() {
      recognizing = false;
      document.getElementById('start_button').innerHTML = 'Start';
    };
    recognition.onresult = function(event) {
      var final_transcript = '';
      for (var i = event.resultIndex; i < event.results.length; ++i) {
          final_transcript += event.results[i][0].transcript;
      }
      document.getElementById('final_span').value = final_transcript;
    };

    function startButton() {
      if (recognizing) {
        recognition.stop();
        return true;
      }
      recognition.start();
      document.getElementById('final_span').value = '';
    }
    function setQuery(){
      results.setQuery(document.getElementById('final_span').value);
    }
  </script>
</body>