Sign In
Subscribe

πŸ’¬Is there a way to write a script that makes AI Avatar speak naturally?

Category
  1. Usage Guide
Rating
Empty
Yes, when writing a script, you can generate more natural and effective AI Avatar speech by following basic rules like spelling and spacing and tailoring it to the characteristics of each TTS engine. For more information on script writing, please refer to the explanation below.
The same script may be pronounced differently depending on the TTS engine, so please understand the common rules below and the characteristics of each TTS engine and apply them when writing your script!

Common Rules for Writing Scripts

The common script writing rules apply to all TTS engines and are used as a guide when writing scripts to ensure a comfortable viewing experience for viewers. Writing scripts according to the rules below will help you produce more effective speech.

Write as you speak

β€’
Avoid using Chinese characters or technical terms and choose simple and concise words.
β€’
Avoid formal or colloquial language and use concise vocabulary.
πŸ’‘
The schedule for this event has been confirmed. β†’ The schedule for this event has been set .
Therefore , this event β†’ So , this event

Short and clear sentences

β€’
Keep each sentence concise and contain only one idea .
β€’
Long sentences are separated by commas or periods to avoid breathing sounds or awkward intonation .
β€’
Mix long and short sentences appropriately .
πŸ’‘
This release includes performance improvements and security patches, a revised statistics layout for the admin dashboard, and updated guidance documentation. Please check it out. (Long sentences without any breakpoints)
β†’
This release includes performance improvements and security patches. The statistics layout in the admin dashboard has also changed. We've also updated the documentation. Please be sure to check it out. (Short sentences separated by spaces)

Use of punctuation marks

β€’
Use commas, periods, question marks, and exclamation marks to determine intonation and breathing.
πŸ’‘
How will the new features be delivered? The new statistics report is provided in beta. Some charts may be unstable and data may take time to reflect, so please check it in a test environment before applying it to the live service. (Punctuation X)
β†’
How will the new features be delivered ? The new statistics report is available in beta ! Some charts may be unstable , and data may take time to reflect . Please test it in a test environment before implementing it in live service ! (Punctuation mark O)

Removing exclamations and interjections

β€’
Avoid using exclamations and interjections , as words containing delicate emotions may feel awkward .
πŸ’‘
Wow , important announcement! We'll be performing maintenance starting at midnight today, so please... take note of this when using the service.
β†’
Important notice! Maintenance will be taking place starting at midnight today. Please take note when using the service.

Consistent style

β€’
Keep the tone and terminology of your script consistent.
πŸ’‘
Sign up now . Customers can click the register button .
β†’
Dear customer, please sign up now . Please click the 'Sign Up' button .
Dear customer, please sign up now. Please click the 'Sign Up' button .

How to write special expressions

β€’
Special expressions using special characters, etc. are written as they are read.
πŸ’‘
16:15 β†’ 16:15 / 16 vs. 15
1kg β†’ 1 kilogram

Writing rules for each TTS engine

Even scripts with the same content may be pronounced differently depending on the TTS engine and voice used. Please refer to the script writing guidelines for each TTS engine below and create a script appropriate for your chosen engine.

PERSO TTS

Avoid mixing languages

β€’
If your script contains English Although the spelling is correct (MVP, VIP, TTS, etc.), avoid using English words (Sentence, Value, etc.) because their pronunciation is awkward .

Use numbers

β€’
If numbers are attached, read them as a unit rather than individually .
πŸ’‘
123456789 β†’ 123456789 (X) / 123456789 (O)
β€’
If you separate the numbers with a comma and space , they will be read individually.
πŸ’‘
1, 2, 3, 4, 5, 6, 7, 8, 9 β†’ 123456789 (O) / 123456789 (X)
β€’
A comma to separate units means that the numbers are not read individually.
β—¦
In case of comma+space, the numbers are read individually.
πŸ’‘
12,345 β†’ Twelve thousand three hundred and forty-five (O) / Twelve, three hundred and forty-five (X)
12, 345 β†’ Twelve thousand three hundred and forty-five (X) / Twelve, three hundred and forty-five (O) Twelve, three hundred and forty-five (O)
β€’
If a unit is added after a number, it is pronounced according to the unit .
πŸ’‘
5 people β†’ five people
55 minutes β†’ fifty-five minutes
55 β†’ fifty-five
β€’
It is recommended to write it out in letters as the pronunciation becomes unstable when listing numbers consecutively.
πŸ’‘
I will announce 1st, 2nd, 3rd, 4th, and 5th place β†’ I will announce 1st, 2nd, 3rd, 4th, and 5th place I will announce 1st, 2nd, 3rd, 4th, and 5th place

Microsoft Azure

Punctuation marks

β€’
Sentence division using commas and periods is very important.
β€’
All sentences can be connected without using commas or periods.

Avoid mixing languages

β€’
When English is included in the Korean script Although the spelling is correct (MVP, VIP, TTS, etc.), avoid using English words (Sentence, Value, etc.) because they sound awkward .
β€’
Multilingual TTS avoids mixing languages when the script includes multiple languages, as the language with the highest percentage becomes the base language, which may result in awkward pronunciation in other languages .

Use numbers

β€’
Numbers are read individually starting from when there are seven of them .
πŸ’‘
123456 β†’ One hundred and twenty-three thousand four hundred and fifty-six β†’ One hundred and twenty-three thousand four hundred and fifty-six
123456789 β†’ 123456789 β†’ 123456789
β€’
If you separate the numbers with a comma and space , they will be read individually.
πŸ’‘
1, 2, 3, 4, 5, 6, 7, 8, 9 β†’ 123456789 (O) / 123456789 (X) / 123456789 (X)
β€’
A comma to separate units means that the numbers are not read individually.
β—¦
In case of comma+space, the numbers are read individually.
πŸ’‘
12,345 β†’ Twelve thousand three hundred and forty-five (O) / Twelve, three hundred and forty-five (X) / Twelve, three hundred and forty-five (X)
12, 345 β†’ Twelve thousand three hundred and forty-five (X) / Twelve, three hundred and forty-five (O) Twelve, three hundred and forty-five (O)
β€’
If a unit is added after a number, it is pronounced according to the unit .
πŸ’‘
5 people β†’ five people
55 minutes β†’ fifty-five minutes
55 β†’ fifty-five

Elevenlabs

Mixed languages available

β€’
Even if languages are mixed , pronounce each language correctly . (Avoid mixing too many languages.)

Use numbers

β€’
If you must use numbers, we recommend writing them out in letters rather than numbers .
πŸ’‘
1234567 β†’ 1234567​
β€’
If the number is long, it will not fire properly.
πŸ’‘
12345 β†’ Spoken in a foreign language or not spoken properly (error from 100th digit for Korean, 1000th digit for English) β†’ Spoken in a foreign language or not spoken properly (error from 100th digit for Korean, 1000th digit for English)
β€’
If a unit is added after a number, it will be pronounced according to the unit . (Performance is lower than that of PERSO and Azure)
πŸ’‘
5 people β†’ five people
55 minutes β†’ fifty-five minutes
55 β†’ fifty-five

Situation-specific TTS engine recommendations

You can create the optimal AI video by selecting a TTS engine based on the purpose of the video and the content of the script. Please refer to the examples below to select a TTS engine.

General situation

β€’
We recommend using the voice with the 'BEST FIT' badge, which is selected by default .

A script with a lot of numbers

β€’
We recommend using Microsoft Azure . It currently offers the strongest performance in numerical speech among the three engines offered by STUDIO PERSO.

Scripts including various languages

β€’
We recommend using a multilingual TTS . Multilingual TTS ensures accurate pronunciation even when languages are mixed. You can use multilingual TTS by selecting a voice marked as multilingual in MS Azure or by selecting Elevenlabs.

When various situations and tones are required

β€’
We recommend using Elevenlabs . Elevenlabs offers over 7,000 voices, tailored to different job roles and situations, allowing you to choose the voice that best suits your needs.
πŸ’‘
Elevenlabs is This is an Enterprise-only feature .

Frequently Asked Questions & Solutions

β€’
How can I use Multilingual?
β—¦
Multilingual is available through MS Azure TTS using voices labeled as multilingual or through Elevenlabs TTS.

Related documents

Create high-quality, easy-to-understand AI videos with our script writing guide!
Subscribe to 'Perso AI Community Hub'
By subscribing to the site, you will be the first to receive the latest updates, including new posts, via notifications and email.
Subscribe to the Perso AI Community Hub!
Subscribe
πŸ‘