• Website • Report Bug • Request Feature • Contributing Guidelines
Audio Analyser leverages the power of Microsoft Azure's advanced AI services to transform your audio data into valuable insight reports in no time through automatic speech-to-text, text analysis, and recommendations.
- Solve the pain of manual audio analysis: Manually analyzing audio is time consuming and limited. Audio Analyser automates the process, quickly surfacing key insights through AI-powered speech and language processing.
- Discover Hidden Insights in Minutes: AI-Powered Audio Analysis for Your Call Recordings and Audio Files.
- Streamline call recording and audio file transcription, uncover actionable insights in seconds with advanced text analysis, powered by Microsoft Azure AI services
- Go beyond simple transcription: Discover sentiment, key information, and gain a multi-faceted understanding of your conversations through in-depth analysis and comprehensive reports.
- Audio Analyser leverages the power of Azure's advanced AI services to transform your audio data into valuable insight reports in no time.
- Audio Analyser: Speech-to-Text, Analysis, Recommendations & Translations
- Overview
- Table of Contents
- Key Features
- Built on a Robust Foundation
- Dependencies
- Installation
- Usage
- Configuration
- Modules
- License
- Contribution
- Acknowledgements
- Audio Recording: Record audio files and conversations.
- Speech to Text: Convert spoken language into text using Azure's speech-to-text service.
- Text to Speech: Convert text into spoken language using Azure's text-to-speech service.
- Instant Transcription: Instantly transcribe audio files and recordings into text.
- Text Analysis: Analyze text for various features using Azure's text analytics service.
- Recommendations: Get actionable recommendations based on the results of the analysis.
- Support for outputting results in different formats, including JSON, TXT and SQLite.
- Actionable Insights:
- Analyze text for various features, including Overall Sentiment, Positive/Negative Sentiment Analysis, Identify Key Topics and Entities, Language, Personally Identifiable Information (PII).
- Uncover sentiment and key information within conversations.
- Data-Driven Reports:
- Generate detailed reports for easy sharing and analysis.
- Translations: Translate text to and from a variety of languages using Azure's Translator API.
- Support for Multiple Languages: Supports a wide range of languages, including English, French, German, Spanish, and more.
- Batch Translation: Translate multiple text files simultaneously, saving time and effort.
- Flexible Output Options: Output translation results in various formats, including plain text files, JSON, and SQLite databases.
- Web Server: A CherryPy-based web server to handle incoming requests and process them.
- Azure-powered technology and a secure CherryPy web server ensure accurate analysis and reliable data management.
- Scalable architecture: Adapt seamlessly to your needs, handling large datasets with ease.
Experience the power of Audio Analyser today!
- CherryPy
- Azure Cognitive Services Speech SDK
- Azure AI Text Analytics
- Azure Open AI Services
- Python standard libraries: asyncio, threading, logging, sqlite3, json
- Dotenv for environment variable management
Audio Analyser is built on Azure Cognitive Services for speech and language processing, with a CherryPy web server frontend. Key components include:
- Audio Recorder - record audio clips
- Speech-to-Text - transcribe audio
- Text-to-Speech - convert text to speech
- Text Analytics - analyze transcripts
- Recommendation Generator - suggest actions
- Web Server - handle API requests
We recommend creating a virtual environment to install the Audio Analyser. This will ensure that the package is installed in an isolated environment and will not affect other projects.
python3 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install required Python packages:
pip install cherrypy azure-ai-textanalytics azure-cognitiveservices-speech
-
Set up Azure services and obtain necessary API keys.
-
Configure environment variables for Azure services in a
.env
file.
Install audioanalyser
with just one command:
pip install audioanalyser
- Start the CLI using
audioanalyser
:
python -m audioanalyser
-
Follow the instructions to utilize speech-to-text and text analysis features.
-
Access the generated transcript and report files in the
resources
directory in the root folder.
- Start the server using
audioanalyser
:
python -m audioanalyser -s
- Access the server at the specified host and port to utilize speech-to-text and text analysis features.
To run the application, use the following command:
python server.py
This will start the CherryPy web server, and you can interact with the application through the defined endpoints.
The minimum supported Python version is 3.6.
- Azure Cognitive Services for speech and text processing.
- CherryPy for the web server.
- Open AI Services for summarization.
- Python's standard libraries including asyncio, sqlite3, and threading.
Ensure that your Azure credentials and other configurations are correctly set in a .env
file in the root directory.
Please refer to the env.example
file for the required environment variables.
The Audio Recorder Module in Audio Analyser is a robust tool designed for high-quality audio recording. It integrates seamlessly with the rest of the application, providing a user-friendly interface for capturing audio data, which is essential for the subsequent speech-to-text and analysis processes.
- High-Quality Recording: Capture clear and crisp audio, which is vital for accurate speech-to-text conversion.
- Flexible Configuration: Utilizes a
Config
class to load settings from a.env
file, allowing for easy customization of recording parameters such as duration, format, and quality. - Directory Management: Automatically validates and manages input and output directories, ensuring a smooth and error-free recording experience.
- Advanced Audio Settings Validation: Checks and confirms audio settings before recording begins, thereby minimizing potential issues during the recording process.
- Automated File Path Generation: Dynamically generates file paths for the recorded audio, streamlining the file management process.
- Setup and Configuration: The module reads configurations from the
.env
file, setting up necessary parameters for recording. - Directory Validation: It checks the specified input and output directories to ensure they exist and are accessible.
- Recording Execution: On initiating the recording process, the module captures audio based on predefined settings. This can be triggered manually or automatically as part of a larger workflow.
- File Management: After recording, the audio file is saved to the designated output directory, with a file name generated based on customizable rules.
- To start recording, ensure that the environment variables are set up in the
.env
file. - Run the Audio Recorder Module through the Audio Analyser interface or as a standalone process.
- The module will handle the rest, from validating settings to saving the recorded audio file.
- The module can be customized to record audio for variable durations and in different formats, as required by the user.
- It's designed to be flexible enough to integrate with different audio sources and output requirements.
- Designed to handle both small-scale and large-scale audio recording tasks.
- Implements robust error handling to deal with potential recording issues, ensuring reliability in diverse environments.
The Analyze Text Files Module in Audio Analyser is a sophisticated tool designed for in-depth analysis of text data, utilizing Azure Text Analytics. It’s capable of extracting meaningful insights from text files, such as sentiment, key entities, and more, making it an essential component for understanding and interpreting textual data.
- Advanced Text Analytics: Leverages Azure's AI capabilities for comprehensive analysis including sentiment analysis, entity recognition, and key phrase extraction.
- Configurable Environment: Uses the
Config
class to seamlessly integrate with Azure Language services, ensuring a flexible and customizable setup. - Diverse Output Formats: Capable of saving analysis results in multiple formats, accommodating various data presentation and storage needs.
- Efficient File Processing: Processes text files for analysis efficiently, handling both single files and batches, suitable for different scales of data.
- Environment Setup: The module begins by setting up necessary configurations using environment variables. This includes connecting to Azure Language services.
- File Processing: It reads text files from a specified directory, preparing them for analysis.
- Executing Text Analysis: The
TextAnalysis
class performs various analytics tasks on the text data, extracting insights like overall sentiment, key entities, and phrases. - Storing Results: Analysis results are then stored in the preferred format, be it plain text, JSON, or another format, in the designated output directory.
- Ensure that the Azure service credentials and other settings are correctly configured in the
.env
file. - Place the text files to be analyzed in the specified input directory.
- Execute the Analyze Text Files Module, which will automatically process the files and save the analysis results.
- The module allows for customization of analysis parameters and output formats, catering to specific needs of the analysis task.
- Users can specify particular aspects of text analysis to focus on, such as sentiment analysis or entity extraction, based on their requirements.
- Optimized for performance, the module can handle large volumes of text data without compromising on speed or accuracy.
- Scalable architecture ensures that the module can adapt to increasing amounts of data as the application grows.
This module represents a vital part of the Audio Analyser’s capability to turn textual data into actionable insights, enhancing the overall value of the analysis process.
The Azure Recommendation Module in Audio Analyser is an advanced tool that leverages the power of OpenAI's GPT-3 to generate insightful and relevant recommendations from customer transcripts. This module transforms raw text data into actionable advice, enhancing decision-making processes.
- Intelligent Recommendations: Utilizes OpenAI's GPT-3 for generating smart and contextually relevant recommendations based on the content of customer transcripts.
- Automated Transcript Processing: Automatically reads and processes transcripts from a designated directory, streamlining the workflow.
- Customizable Output: Offers flexibility in saving recommendations to a preferred format and location, tailored to user requirements.
- Configurable Settings: Allows users to configure various parameters like API keys, folder paths, and output preferences through environment variables.
- Reading Transcripts: The module scans a specified directory to load customer transcripts, ensuring that all relevant data is considered for analysis.
- Generating Recommendations: Leverages GPT-3's advanced natural language understanding capabilities to analyze the transcripts and generate recommendations.
- Saving Outputs: The insightful recommendations are then saved in a designated folder, in a format that facilitates easy review and implementation.
- Set up the necessary environment variables, including API keys and directory paths, in the
.env
file. - Place the transcripts in the specified input directory.
- Run the Azure Recommendation Module to automatically process the transcripts and generate recommendations.
- Access the generated recommendations in the specified output directory.
- Users can customize the type of recommendations generated by tweaking the prompt strategy sent to GPT-3, enabling tailored advice for different scenarios.
- The module supports various output preferences, allowing users to choose how and where the recommendations are stored.
- Designed to handle a wide range of transcript volumes, from individual files to large batches, ensuring scalability.
- Represents a cutting-edge application of AI in text analysis, setting a new standard for automated recommendation systems.
This module is a testament to the Audio Analyser's commitment to harnessing the latest in AI technology to provide valuable, data-driven insights and recommendations.
The Speech Text Server Module in Audio Analyser is a robust server-side component designed to handle speech-to-text processing efficiently. This module serves as the backbone of the application, managing the conversion of audio data into text and further analyzing this textual data for insights.
- Comprehensive Speech-to-Text Operations: Employs advanced algorithms to accurately transcribe spoken words into written text, forming the basis for further analysis.
- Integrated Audio Recording and Analysis: Seamlessly records audio, transcribes it, and then analyzes the text to extract meaningful insights.
- Recommendation Generation: Utilizes transcribed text to generate actionable recommendations, adding significant value to the analysis process.
- Efficient Request Handling: Capable of managing various server operations and handling multiple client requests simultaneously, ensuring a smooth user experience.
- Audio Processing: Initially, the module captures and processes audio recordings, preparing them for transcription.
- Speech-to-Text Conversion: Utilizes advanced speech recognition technology to transcribe audio data into text with high accuracy.
- Text Analysis and Recommendations: Once the audio is transcribed, the module analyzes the text data, extracting key insights and generating recommendations based on the content.
- Server Operations: Manages all server-side functionalities, ensuring efficient processing and response to client requests.
- The module is typically used as a part of the Audio Analyser's server-side operations.
- It can handle requests for audio processing, transcription, text analysis, and recommendation generation.
- Ideal for applications requiring real-time speech-to-text conversion and subsequent analysis.
- Customizable to suit various speech-to-text scenarios and can be configured to handle specific analysis requirements.
- Scalable to accommodate a growing number of requests and larger data sets, making it suitable for both small-scale and large-scale applications.
- Integrates state-of-the-art speech recognition and natural language processing technologies to provide fast and accurate transcriptions.
- The module's architecture allows for easy integration with additional AI services and tools for enhanced functionality.
The Speech Text Server Module is crucial for transforming raw audio data into actionable textual information, thereby playing a vital role in the Audio Analyser's capability to deliver comprehensive audio analysis solutions.
The Text-to-Speech Synthesis Module in the application is a highly efficient component crafted to transform text into spoken audio using Azure's cutting-edge Text-to-Speech API. This module stands out as a crucial instrument for generating audible content from textual data, facilitating diverse applications such as audiobook production, voice notifications, or enhancing accessibility features.
- Superior Voice Quality: Employs Azure's Text-to-Speech API to produce clear and natural-sounding voice outputs from text.
- Customizable Voice Attributes: Offers flexibility in choosing voice tones, accents, and languages to suit varied requirements.
- Efficient Error Management: Features advanced error detection and handling to ensure high reliability across different operational scenarios.
- Diverse Output Formats: Supports saving synthesized speech in various audio file formats, accommodating different usage contexts.
- Text Input Processing: Accepts textual data as input, which can range from simple sentences to comprehensive paragraphs.
- Speech Synthesis: Leverages Azure's API to convert text into digital speech with options for customizing voice properties.
- Error Handling: Implements robust mechanisms to manage errors, ensuring smooth and consistent audio output generation.
- Audio File Saving: Outputs the synthesized speech into designated audio formats, ready for playback or integration into other systems.
- Input the desired text into the module via its programming interface.
- Configure the module settings, including voice type and output format preferences.
- Trigger the text-to-speech synthesis process through the module's execution command.
- Retrieve the generated audio file from the specified output location.
- Enables extensive customization of voice characteristics and speech parameters, enhancing the module's adaptability to different text types and use cases.
- Designed to process a wide range of textual inputs, making it versatile for various applications and user needs.
- Scalable architecture allows for handling growing amounts of text inputs efficiently, suitable for both small and extensive text-to-speech conversion tasks.
- Easily integrates with Azure services and other components within the application ecosystem, contributing to a seamless operational flow.
The Transcribe Audio Files Module in Audio Analyser is a specialized component designed to convert spoken language in audio files into accurate text. Utilizing Azure's state-of-the-art Speech-to-Text API, this module is an essential tool for transforming audio data into a format that can be easily analyzed and processed.
- High-Efficiency Transcription: Leverages Azure's powerful Speech-to-Text API to provide fast and accurate transcription of audio files.
- Batch Processing Capability: Capable of processing both individual audio files and large batches, making it versatile for various project sizes.
- Robust Error Handling: Incorporates sophisticated error handling mechanisms to ensure reliability even in cases of challenging audio quality or API issues.
- Flexible Output Options: Transcriptions can be saved in multiple formats, including plain text files, JSON, and SQLite databases, catering to diverse data management needs.
- Audio File Processing: The module accepts audio files as input, processing them individually or in batches based on user requirements.
- Speech-to-Text Conversion: Utilizes Azure's Speech-to-Text API to accurately transcribe the spoken words in the audio files into written text.
- Error Management: During transcription, the module efficiently handles any errors or exceptions, ensuring consistent output quality.
- Saving Transcripts: The transcribed text is then saved in the specified format, allowing for easy integration with other modules or systems.
- Place the audio files in the designated input directory.
- Execute the Transcribe Audio Files Module through the Audio Analyser interface.
- The module will automatically process the audio files and save the transcriptions in the chosen format.
- Users can customize various aspects of the transcription process, including the choice of output format and error handling strategies.
- The module's design allows it to handle different audio formats and qualities, making it adaptable to a wide range of audio data sources.
- Scalable to handle increasing volumes of audio data, suitable for both small-scale and large-scale transcription tasks.
- Seamlessly integrates with other Azure services and modules within the Audio Analyser application, enhancing the overall functionality of the system.
This module plays a pivotal role in the Audio Analyser's ability to extract textual data from audio, laying the foundation for in-depth analysis and insight generation.
The Translations Module in Audio Analyser is specifically designed to handle multilingual text translation tasks, leveraging Azure AI Translator API. This powerful service offers cloud-based neural machine translation, compatible across different operating systems, to provide seamless translation experiences.
- Batch Translation: Process multiple text files simultaneously, offering efficiency and time-saving for large-scale translation tasks.
- Support for Multiple Languages: Capable of translating text to and from a variety of languages, as listed in the Languages Supported section.
- Format Versatility: Output translation results in diverse formats, including plain text files, JSON, and SQLite databases, catering to different use case requirements.
- Seamless Integration with Azure Translator API: Utilizes Azure's robust machine translation capabilities for accurate and context-aware translations.
- Error Handling: Incorporates comprehensive error handling mechanisms to ensure reliable translation processes even in case of unexpected API behavior.
- File Processing: The module takes text files as input. It can process individual files or batches of files, making it adaptable to both small and large-scale translation tasks.
- Translation Execution: Utilizes Azure's Translator API to translate the content of the text files. It supports a wide range of languages, providing versatility for global use cases.
- Output Generation: After translation, the results are outputted in the user-preferred format. The module supports various output formats like JSON, TXT, and SQLite, providing flexibility in how the results are utilized.
- To translate a text file, place it in the specified input directory.
- Run the translation module through the Audio Analyser interface.
- Choose your target language and output format.
- The translated text will be saved in the designated output directory in the chosen format.
Below is a list of languages supported by the Translations Module, along with their respective language codes:
Language | Language code |
---|---|
Afrikaans | af |
Albanian | sq |
Amharic | am |
Arabic | ar |
Armenian | hy |
Assamese | as |
Azerbaijani (Latin) | az |
Bangla | bn |
Bashkir | ba |
Basque | eu |
Bhojpuri | bho |
Bodo | brx |
Bosnian (Latin) | bs |
Bulgarian | bg |
Cantonese (Traditional) | yue |
Catalan | ca |
Chinese (Literary) | lzh |
Chinese Simplified | zh |
Chinese Traditional | zh |
chiShona | sn |
Croatian | hr |
Czech | cs |
Danish | da |
Dari | prs |
Divehi | dv |
Dogri | doi |
Dutch | nl |
English | en |
Estonian | et |
Faroese | fo |
Fijian | fj |
Filipino | fil |
Finnish | fi |
French | fr |
French (Canada) | fr |
Galician | gl |
Georgian | ka |
German | de |
Greek | el |
Gujarati | gu |
Haitian Creole | ht |
Hausa | ha |
Hebrew | he |
Hindi | hi |
Hmong Daw (Latin) | mww |
Hungarian | hu |
Icelandic | is |
Igbo | ig |
Indonesian | id |
Inuinnaqtun | ikt |
Inuktitut | iu |
Inuktitut (Latin) | iu |
Irish | ga |
Italian | it |
Japanese | ja |
Kannada | kn |
Kashmiri | ks |
Kazakh | kk |
Khmer | km |
Kinyarwanda | rw |
Klingon | tlh |
Klingon (plqaD) | tlh |
Konkani | gom |
Korean | ko |
Kurdish (Central) | ku |
Kurdish (Northern) | kmr |
Kyrgyz (Cyrillic) | ky |
Lao | lo |
Latvian | lv |
Lithuanian | lt |
Lingala | ln |
Lower Sorbian | dsb |
Luganda | lug |
Macedonian | mk |
Maithili | mai |
Malagasy | mg |
Malay (Latin) | ms |
Malayalam | ml |
Maltese | mt |
Maori | mi |
Marathi | mr |
Mongolian (Cyrillic) | mn |
Mongolian (Traditional) | mn |
Myanmar | my |
Nepali | ne |
Norwegian | nb |
Nyanja | nya |
Odia | or |
Pashto | ps |
Persian | fa |
Polish | pl |
Portuguese (Brazil) | pt |
Portuguese (Portugal) | pt |
Punjabi | pa |
Queretaro Otomi | otq |
Romanian | ro |
Rundi | run |
Russian | ru |
Samoan (Latin) | sm |
Serbian (Cyrillic) | sr |
Serbian (Latin) | sr |
Sesotho | st |
Sesotho sa Leboa | nso |
Setswana | tn |
Sindhi | sd |
Sinhala | si |
Slovak | sk |
Slovenian | sl |
Somali (Arabic) | so |
Spanish | es |
Swahili (Latin) | sw |
Swedish | sv |
Tahitian | ty |
Tamil | ta |
Tatar (Latin) | tt |
Telugu | te |
Thai | th |
Tibetan | bo |
Tigrinya | ti |
Tongan | to |
Turkish | tr |
Turkmen (Latin) | tk |
Ukrainian | uk |
Upper Sorbian | hsb |
Urdu | ur |
Uyghur (Arabic) | ug |
Uzbek (Latin) | uz |
Vietnamese | vi |
Welsh | cy |
Xhosa | xh |
Yoruba | yo |
Yucatec Maya | yua |
Zulu | zu |
The module is designed to robustly handle various errors, including API connection issues, file reading/writing errors, and unsupported language codes. Detailed logs are generated for troubleshooting and audit purposes.
This module is built with extensibility in mind, allowing for future enhancements such as additional language support, improved translation accuracy, and integration with other translation services or custom models.
The project is licensed under the terms of both the MIT license and the Apache License (Version 2.0).
We welcome contributions to audioanalyser. Please see the contributing instructions for more information.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
We would like to extend a big thank you to all the awesome contributors of audioanalyser for their help and support.