Translate Speech To Text With OpenAi’s Whisper

Post author: Adam VanBuskirk
Adam VanBuskirk
3/24/23 in
AI

Open AI, the company behind GPT-4, DALL-E, and ChatGPT also has an AI model that can convert speech to text with astonishing accuracy. It’s named Whisper. Whisper can also translate speech from several languages into English. In the future, it’s expected to also translate speech into languages other than English. In this brief post, we’re going to cover how this product works, including from within the Open AI Playground and by programmatically calling the API. The full product description for Whisper can be found at https://openai.com/research/whisper.

Using Whisper in The Open AI Playground

If you’re unfamiliar with the playground, read our article title Learn The OpenAI Playground Step-By-Step written by wordbot’s cofounder Kevin Sims. In a nutshell, the playground allows you to play with Open AI’s AI models in a sandbox environment.

To demonstrate using Whisper in the playground, I’m going to record my voice on my mac and upload the audio file into the playground. I’m only do this so I can include the audio file in this post and include it in the API call example later. In reality, when playing with Whisper in the playground, I tend to simply use the microphone and record in real time to avoid having to upload an audio file. Below is a screenshot of how to use the microphone to capture your audio.

Using the microphone with Whisper in OpenAI Playground

Uploading an Audio File

I created an audio file of me talking about our blog. It’s embedded below.

Next, I uploaded this audio file using the microphone and upload option in the playground.

Upload audio to the openai playground

It. Nailed. It.

Even my last name it correctly transcribed.

Now let’s see how to do the same thing, but by programmatically calling the API instead of using the Open AI playground / sandbox.

Using the Translations API to Call Whisper From Your Own App

If you are creating your own website, plugin, or app and would like to include speech to text functionality via Whisper you can easily do so with the Transcriptions API. The API is very easy and straightforward to use. For the full API documentation, visit https://platform.openai.com/docs/guides/speech-to-text/quickstart.

I’m not actually going to write software here, but rather show you a simple screenshot from the docs at the link above. In our example, you would simply replace the audio file with mine and make the call and get the transcribed text in return. All languages such as Python, C#, PHP, JavaScript, and others support calling restful APIs, so you should have no trouble converting the below Python example to your language of choice.

Calling the whisper transcriptions api to convert speech to text

Conclusion

The power of Whisper and the Transcriptions API is unbelievable. I hope this small blog post has given you a solid overview of what it can do and helped point you in the correct direction to learn more and experiment.

Sign up today for our weekly newsletter about AI, SEO, and Entrepreneurship

Leave a Reply

Your email address will not be published. Required fields are marked *


Read Next




© 2024 Menyu LLC