Microsoft Azure AI Speech Unveils Text-to-Speech Avatar in Public Preview

The creation of engaging avatar videos involves composing talking scripts using plaintext or SSML.

Microsoft Azure AI Speech Unveils Text-to-Speech Avatar in Public Preview
Photo by Matthew Manuel / Unsplash

Azure AI Speech has launched the public preview of its latest feature, the Text-to-Speech (TTS) Avatar, facilitating the creation of talking avatar videos and real-time interactive bots trained using human images. This innovation aims to offer versatile applications across various scenarios.

What is the TTS Avatar?
The TTS Avatar integrates text-to-speech capabilities with vision, enabling the generation of synthetic videos featuring 2D photorealistic avatars speaking. These avatars are created through Neural TTS Avatar models, utilizing deep neural networks based on human video recording samples, paired with voice provided by the TTS voice model.

Image / Microsoft

Why Avatars?
This feature streamlines video content creation by allowing users to efficiently produce videos like training modules, product introductions, or customer testimonials with text input. Additionally, it enhances interactive digital experiences, ideal for conversational agents, virtual assistants, and chatbots.

Features in the Release:

  • Prebuilt TTS Avatar: Offers diverse language options, allowing users to create videos or applications with real-time avatar responses.
  • Custom TTS Avatar: Enables the creation of personalized avatars using uploaded video recordings to train a synthetic video of a custom avatar speaking. This can include both prebuilt and custom neural voices.

Responsible AI Approach:
In adherence to Microsoft's commitment to responsible AI, the custom avatar feature is accessible via a Limited Access registration. This measure aims to safeguard individuals' rights, ensure transparent human-computer interaction, and counteract misleading deepfakes.

Capabilities of TTS Avatar:
Users can create engaging videos for training or presentations and build interactive experiences using avatars, such as virtual sales applications. The tool's capabilities span multiple languages and leverage Azure OpenAI Service for enhanced interaction.

Avatar Content Creation:
The creation of engaging avatar videos involves composing talking scripts using plaintext or SSML. Users can then synthesize videos using the Azure TTS 3.1 API, specifying the character and style of the avatar, desired video format, and additional elements like background music.

Microsoft

Interactive Avatar Experience:
The platform demonstrates an interactive avatar functioning as a virtual sales agent, engaging with customers in real-time, answering queries, and facilitating product orders through Azure AI Speech and other integrated Azure services.

The Azure AI Speech Text-to-Speech Avatar tool offers extensive possibilities for video content creation and interactive experiences, catering to diverse business needs.