Speech To Text

Get Started • Community • Youtube • Discord • GitHub

Speech To Text

The Speech to Text template is a powerful tool that leverages advanced speech recognition and natural language processing capabilities to generate accurate textual transcriptions from uploaded audio or video files. This template provides a seamless experience for users, enabling them to effortlessly extract meaningful information from audio-visual content.

Node-RED Flows

This is a Node-RED flow that allows users to describe video/audio to text, and using your custom prompt make a conclusion of the received text.

HTTP Input Node (`/convertSpeech`): This node listens for incoming HTTP POST requests at the `/convertSpeech` endpoint. It expects the request to include an audio file or a YouTube video URL, along with an OpenAI API key.
Function Node (`check type`): This function node determines whether the user has provided an audio file or a YouTube video URL. If a URL is provided, it sets up the necessary parameters for downloading the audio from the video. If a file is provided, it prepares the payload for the OpenAI API request.
YouTube-YTDL Node: If a YouTube video URL is provided, this node downloads the audio from the video.
HTTP Request Node (to OpenAI): If an audio file is provided, this node sends a POST request to the OpenAI API (`https://api.openai.com/v1/audio/transcriptions`) with the audio file and the necessary headers, including the API key.
Function Node (`response`): This function node processes the response from the OpenAI API. If the response status code is 200 (successful), it extracts the transcribed text from the response payload and assigns it to `msg.payload`. If there's an error, it constructs an error message and assigns it to `msg.payload`.
HTTP Response Node: This node sends the final response back to the client, containing either the transcribed text or an error message.

Key Features

Audio/Video Upload

The template features a user-friendly interface that allows users to upload audio files in popular formats such as WAV, MP3, FLAC, or provide YouTube video URLs. The uploading process is straightforward and intuitive, ensuring a smooth user experience.

Text Transcription with Whisper

At the heart of this template lies the powerful Whisper AI model from OpenAI, specifically designed for speech recognition and transcription tasks. Whisper employs advanced machine learning techniques to accurately transcribe audio content into textual form, capturing the spoken words with high fidelity.

Multiple Language Support

To cater to diverse linguistic needs, the template offers support for multiple languages, allowing users to transcribe audio in various languages and dialects. The available language options are regularly updated to ensure wide coverage and accuracy.

API Integration

To leverage the Whisper AI model, users need to obtain an OpenAI API key. The template provides clear instructions and guidance on how to acquire and utilize the API key effectively, ensuring secure and seamless integration with the transcription service.

Fast Processing

Thanks to Whisper's efficient processing capabilities, users can expect quick turnaround times for transcribing audio files or videos. This feature ensures a smooth and responsive user experience, minimizing wait times and enabling users to access textual insights from audio-visual content promptly.

Accuracy and Reliability

Whisper is trained on vast datasets and continuously updated to maintain high accuracy and reliability in transcribing speech across various domains, accents, and noise conditions. Users can trust the quality of the outputted text, ensuring that the transcriptions faithfully capture the spoken content.

Customization Options

The template offers various customization options, allowing users to fine-tune the output according to their specific requirements. This includes adjusting parameters such as language settings, formatting options, and specific areas of focus, ensuring that the transcriptions align with the user's needs.

Benefits

The Speech-to-Text template empowers users with a robust and efficient solution for extracting valuable textual information from audio-visual content. Whether for accessibility purposes, content analysis, data mining, or simply capturing spoken words in written form, this template offers a comprehensive and user-friendly experience. By leveraging the advanced capabilities of the Whisper AI model, users can unlock the hidden potential of their audio-visual data and transform it into actionable and meaningful textual information.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
UBOS-template-Speech-To-Text-nr @ 367c319		UBOS-template-Speech-To-Text-nr @ 367c319
UBOS-template-Speech-To-Text-ui @ bd1db06		UBOS-template-Speech-To-Text-ui @ bd1db06
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Get Started • Community • Youtube • Discord • GitHub

Speech To Text

Node-RED Flows

Key Features

Audio/Video Upload

Text Transcription with Whisper

Multiple Language Support

API Integration

Fast Processing

Accuracy and Reliability

Customization Options

Benefits

About

Releases

Packages

UBOS-tech/UBOS-template-Speech-To-Text

Folders and files

Latest commit

History

Repository files navigation

Get Started • Community • Youtube • Discord • GitHub

Speech To Text

Node-RED Flows

Key Features

Audio/Video Upload

Text Transcription with Whisper

Multiple Language Support

API Integration

Fast Processing

Accuracy and Reliability

Customization Options

Benefits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages