Speech To Text

Speech to text (Chrome only)

This functionality enables developers to convert audio to text. This page explains how to use this functionality with Inbenta's SDK.

Important: This functionality is only available on the Google Chrome browser.

Note: For more information about this functionality, refer to Google's Voice Driven Web Apps (google developers) page.

Initialization

Create a new webkitSpeechRecognition object. The object provides the speech interface and sets some of its attributes and event handlers. Once this object is created, initialize it.

  var recognition = new webkitSpeechRecognition();
  // set `continuous` to `true`: when the user stops talking, speech recognition will continue until we stop it.
  recognition.continuous = true;
  // set `interimResults` to `true`, the results returned by the recognizer can change and we can see the corrections as it is working.
  recognition.interimResults = true;
  // Set the language to english
  recognition.lang = 'en-EN';

Start listening

To start, call the start function. You can add a function even to onstart to perform an action when the recognition starts.

  recognition.onstart = function() {
    // Do something awesome
  };
  recognition.start();

End listening

To finish, call the stop function. You can add a function even to onend to perform an action when the recognition ends.

  recognition.onend = function() {
    // Do something awesome
  };
  recognition.stop();

Get results

To retrieve results, use the onresult function event. It starts an input when the recognition identifies a word and returns it with the word it identified before.

Example: The function gets the results and puts them into a input field:

  recognition.onresult = function(event) {
    var final_transcript = '';
    for (var i = event.resultIndex; i < event.results.length; ++i) {
        final_transcript += event.results[i][0].transcript;
    }
    document.getElementById('final_span').value = final_transcript;
  };

Demonstration

The following demonstration contains:

  • a button to start, and stop, the recognition,
  • a form with an input field for the text, and
  • a "Submit" button to submit the text.

When you click "Submit", it sends the input text to Inbenta to retrieve the results.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Speech to text</title>
  <script src="https://sdk.inbenta.io/km/1/inbenta-km-sdk.js"></script>
</head>
<body>
  <form onsubmit="setQuery(); return false;">
    <button id="start_button" onclick="startButton()" type="button">Start</button>
    <input id="final_span" class="final"></input>
    <input type="submit" value="Submit"/>
  </form>
  <div id="results"></div>
  <script>
    var sdk = InbentaKmSDK.createFromDomainKey(<domain_key>, <inbenta_key>);
    var results = sdk.component('results', '#results');

    var recognizing = false;
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
    recognition.lang = 'en-EN';

    recognition.onstart = function() {
      recognizing = true;
      document.getElementById('start_button').innerHTML = 'Stop';
    };
    recognition.onend = function() {
      recognizing = false;
      document.getElementById('start_button').innerHTML = 'Start';
    };
    recognition.onresult = function(event) {
      var final_transcript = '';
      for (var i = event.resultIndex; i < event.results.length; ++i) {
          final_transcript += event.results[i][0].transcript;
      }
      document.getElementById('final_span').value = final_transcript;
    };

    function startButton() {
      if (recognizing) {
        recognition.stop();
        return true;
      }
      recognition.start();
      document.getElementById('final_span').value = '';
    }
    function setQuery(){
      results.setQuery(document.getElementById('final_span').value);
    }
  </script>
</body>