AliCloud launches AI audio and video transcription tool

June 2, 2023 1:31 pm

Alibaba Cloud, a subsidiary of Alibaba Group, announced the public beta launch of its AI product, Tongyi Tingwu, on June 1st. This application, specializing in audio and video content management, is powered by the  Tongyi Qianwen large model, a ChatGPT-like service by Alibaba.

The tool has been described as an AI assistant capable of transcribing, searching, summarizing, and organizing audio and video content. Specific functions include automated note-taking, managing interviews, and extracting key content from presentations. As part of its launch promotion, users can receive more than 100 hours of free transcription time during the beta testing period.

"Audio and video content can now be effortlessly read, organized, and shared thanks to Tongyi Tingwu," commented Zhou Jingren, the CTO of AliCloud.

Tongyi Tingwu demonstrated its AI capabilities during a live session. The AI assistant was able to segment audio and video files into chapters, forming summaries, understanding perspectives of different speakers, and managing to-do lists. According to the company, future updates to the model will enable features like one-click extraction of PowerPoint presentations, specific paragraph summaries, and AI-driven multi-content question answering.

The application also introduces unique features for various user scenarios. For instance, the Chrome plugin enables bilingual floating subtitles for foreign language learners and hearing-impaired individuals. The application can even serve as a meeting surrogate for professionals, recording and summarizing points of discussion in a silent mode.

Furthermore, Tingwu integrates seamlessly with AliCloud Drive, allowing a one-click transcription of audio and video content stored in the cloud drive. Beta testers will also receive additional storage space in AliCloud Drive, facilitating automatic subtitle generation when playing videos.

Tingwu also offers an enterprise version, which has already been widely adopted within the Alibaba Group, according to the company. The backbone of Tingwu's capabilities lies in its integration with Ali's speech and language technologies. This includes an industry-grade speech recognition model and a proprietary voice and language multimodal speaker algorithm.

The implementation of AI tools like Tongyi Tingwu is a step towards realizing forecasts made by research agencies. According to research reports, CICC proposed that office work is one of the high-frequency scenarios for utility software. The need for graphic creation, spreadsheet data processing, and other capabilities naturally aligns with LLM (large language models), and these will be among the first AI applications to be implemented. Among them, document creation is a sub-application in the office scenario that can directly benefit from the text creation capabilities of large models. In addition, Guojin Securities believes that AI application fields with lower trial-and-error costs are easier to implement and anticipate that the financial IT, corporate services, and gaming industries will be the first to implement AI.