Alibaba Releases New Voice Model Qwen2-Audio, Surpassing OpenAI Whisper

Summary:

Alibaba has introduced a new open-source speech model called Qwen2-Audio, built on the Qwen-Audio foundation. This model excels in various areas including speech recognition, translation, and audio analysis, offering significant enhancements in functionality and performance over its predecessor.

Key Features:

Dual Versions: Qwen2-Audio is available in both a basic version and an instruction-tuned version.
Language Support: The model supports multiple languages such as Mandarin, Cantonese, French, English, and Japanese.
Advanced Capabilities: It can analyze speaker attributes like age and emotion, and dissect noisy audio clips to identify different sound components.
Enhanced Architecture: Comprehensive optimizations have been made in its architecture and performance, including the use of natural language prompts during pre-training instead of complex hierarchical labels.
Improved Instruction-Following: The model's ability to understand and follow user instructions has been significantly improved.
Voice Chat and Audio Analysis Modes: These new modes make voice interactions more natural and allow for detailed and accurate audio analysis.
Supervised Fine-Tuning and Direct Preference Optimization: These advanced techniques ensure that the model’s outputs align well with human expectations.

Performance:

In mainstream benchmark tests, Qwen2-Audio has shown excellent results, particularly in speech recognition and translation accuracy, surpassing OpenAI's Whisper-large-v3.

Industry Impact:

The launch of Qwen2-Audio has garnered widespread industry attention and marks a significant advancement in speech technology.

Conclusion:
Qwen2-Audio is a powerful, multi-functional speech model by Alibaba that surpasses previous models and competitors, promising substantial advancements in the field of speech technology.

Source: AIbase News

Source:https://www.aibase.com/news/10964

近期新闻

AI-NEWS · 2024年 8月 10日

Alibaba Releases New Voice Model Qwen2-Audio, Surpassing OpenAI Whisper