WellSaid Studio Tips and Tricks: Optimizing WellSaid For the Best AI Voiceover

Audio by Tilda C. using WellSaid

High-quality voiceovers can be expensive and time-consuming to achieve, but with WellSaid, users can have dynamic, production-ready voiceovers in seconds. Whether you’re new to WellSaid or an experienced user looking for more advanced ways to optimize your Voices, this in-depth tips and tricks guide has something for everyone. From picking a voice, to combining clips into the perfect voiceover for your project, there are so many ways you can make every story WellSaid!

‍

Picking a Voice

With hundreds of voice styles to choose from, it can be nearly impossible to settle on a voice that’s right for your project. Additionally, some audiences respond better to certain styles of voices or prefer certain accents over others. Thankfully, after a few considerations and filters, it can be easy to narrow down your options.

Voice Style, Characteristics, and Accent

It’s important to understand what style you’re going for with your voiceover. Is a narrative or promotional style the best route? Or would a more conversational style be better for your content? Additionally, are you looking for a Voice that is engaging, dramatic, and has a specific United States accent? Or would you rather your voiceover be laidback and calm with a South African accent?

On our Voices page, you can filter your choices by performance style, characteristics, and accents. That way, when you’re looking for a U.S.-based promotional style voice that has an empathetic tone, instead of sorting through fifty options, you will end up with five quality Voices to choose from.

Still having trouble finding the Voice options you’re looking for? Access our straightforward Voice Library Guide here.

Auditioning Voices

After deciding what mood and tone you want and filtering your Voice options, sample each voice using the sample audio player. Then select a handful of voices and create audio clips using the same exact text.

TIP: Try favoriting your top three choices by clicking on the vertical ellipses (⋮ icon) to the right of each voice to refer back to quickly while in Studio.

‍

Generating Clean Audio

Once you’ve selected your Voice of choice, getting the perfect voiceover can take some finessing.

Break Down Text

The easiest way to achieve the highest-quality voiceover possible is to break down your text into small paragraphs or even sentences. That way, when you stumble on a part of the voiceover that blends words together or may need an extra pause to get the emphasis just right, it’s easy hone in on the problem spot and refine the text in detail. That way, you can ensure your message is delivered exactly how you want it!

Combine Clips

Once you’ve broken down your clips, it’s easy to combine the clips into one perfect voiceover. Simply select the checkbox next to the desired takes and combine the clips using our Combine Tool.

That way, you end up with a finely tuned, flawless voiceover that’s production-ready and perfectly fits your project’s needs.

Organizing Clips

Think that a sentence you currently have at the end of the voiceover may send better at the beginning? Or create a clip that may fit better for a different project? It’s easy to organize clips as needed.

For moving clips within a project: Grab the gripper bar on the left side of the take and drag it to the desired location of the voiceover. Then, when you combine clips, they will combine in that exact order.

For moving clips to a different project: If you want to move a clip to an entirely different project, here are a few simple steps:

Select the checkbox on the left side of the clip.
Navigate to the top bar and select Move.
Choose the name of the Project from the drop-down menu and select Move.
Your clip is now in your new project.

For copying clips to a different project: If you created a clip that may be useful for more than one project, it’s easy to copy that clip over:

Select the checkbox on the left side of each clip you want to copy.
Navigate to the top bar and select Copy.
Choose the destination project and click Copy.
Your clips are now copied to your new project.

‍

Pronunciation Best Practices

Our voice model is great at predicting pronunciation, but a few things can trip it up: unfamiliar or rare nouns such as company names or specialized terminology, homonyms (read vs. read), acronyms, and numbers. With the following best practices, you can ensure your selected Voice pronounces your text exactly how you want it.

Emphasizing Words, Phrases, or Syllables

By utilized certain punctuation marks or respelling certain words, it’s easy to achieve unique emphasis on certain words or phrases in your text.

Using quotation marks

When you place a word in quotation marks, the AI pays particular attention to that word or phrase. You can try placing the syllable you’d like to have emphasized in quotation marks.

If you want the AI to say “PROduce,” as in vegetables, but it’s saying “proDUCE” instead, try entering “pro”duce.

Using a Respelling or Replacement

With our Respellings and Replacements tools, you can now give the AI specific cues to help shape the way all avatars say a word, making pronunciation more predictable as you create with different voices.

Respellings

Respellings lets you format a word within the Studio text editor that tells the AI exactly what sound each syllable should make—and which syllables should be emphasized.

Example: Saskatchewan --> ::sask-ACH-oo-when::

These can be saved to your individual library or your team’s library, and can be toggled on and off based on your persona preferences. For a more indepth guide on Respellings, visit our Respellings page.

Replacements

Replacements allow you to substitute words, terms, or even abbreviations in your script with a replacement text. By storing a Replacement, you can take advantage of reusing this replacement text over and over again, easily. Multiple Replacements can be stored for the same original text, and individual Replacements can easily be turned on and off during clip creation.

Example: The homonym "content," which has multiple pronunciations in different contexts. A replacement can be created and toggled to have the Voice pronounce it one way or another.

For a more indepth guide on Replacements, visit our Replacements page.

‍

Acronym Pronunciation

Acronyms can be used in many different ways, and sometimes, it can be hard for the model to differentiate how the acronym is being used and should be pronounced.

Acronyms as words

For an acronym as a word, try spelling the word the way it sounds.

Example: FEMA → "feema" or "::FEE-ma::"

Acronyms as individual letters

If you want an acronym to be pronounced letter by letter, add spaces or dashes between them or use a Respelling for better control.

Example: TBD → "T.B.D." or "T-B-D" or "::TEE-BEE-DEE::"

QUICK TIP: sometimes, stressing the first and last syllables tend to produce better pronounciation results.

Example: ASAP → "AY-ehs-ay-PEE"

For quick reference on how to phonetically spell certain acronyms, letters, or words, check out these charts.

‍

Number Pronunciation

Just like real voice actors, AI voices need to know if a number is a reference number, a value, an address, a dollar amount, a year, a phone number, and so on.

Reference numbers or pages

You may need to use spaces between numbers to help the AI make accurate predictions. If the reference number 1246 should be read as “one two four six,” enter it as "1 2 4 6"

Dollar amounts

With dollar amounts: If $1,200,0000 should be read as “one point two million dollars” enter it as " $1.2 million" If it should be read as "one point two million" enter "1.2 million"

Years

If you want 2022 to be read as "two thousand and twenty-two", you would enter it as "2,022 "

Phone numbers

A simple way to get the AI to read a number as a phone number ending with double digits is to add spaces between the numbers. For (206) 555-3131 you would enter it as "2 0 6 5 5 5 31 31"

‍

Natural Pauses

Some voices are more quickly paced than others, but there are many ways to ensure that natural pauses are emphasized to achieve the perfect voiceover.

Pauses Between Words

Using commas and periods will flag to the model that a pause is desired in that part of the sentence.

Comma

Commas add pauses in places where a human voice actor would make a small, subtle pause. If you find the AI struggling to make a good prediction for adding pauses, insert commas in these places.

Period

Periods create a pause plus a downward inflection. They are best used to break a long sentence up into two pieces, allowing the AI to better predict which words to emphasize.

Pauses Between Sentences

WellSaid Voices bring a natural pacing to language, yet you might want to add longer pauses between sentences for emphasis or clarity. To do this, try the following:

Use ellipses or a combination of punctuation

Using an ellipse (...) or a combination of punctuation marks (".,.,.,.") can create "breathing room" and add a slight pause between sentences. Simply add the ellipses or combination of punctuation at the end of a sentence.

Use the return key

Pressing the return or enter key and entering a period a few times can create a slightly longer pause. Press the return or enter key a few times after the period at the end of the sentence.

Use the Combine feature

Wellsaid's Combine feature allows you to choose the length of time between each audio file. By combining clips, you can create a longer pause between sentences.

‍

Vocal Inflections

In our beta version of Studio, users are able to adjust both verbal and non-verbal cues.

Verbal Cues

Loudness

Reads a word, sentence, or phrase louder or quieter depending on the slider setting.

Pace

Reads a word, sentence, or phrase faster or slower, depending on the slider setting.

Pitch

Reads a word, sentence, or phrase with a higher or lower tone.

Non-verbal Cues

Pause

Shortens or lengthens an existing pause when applied to a punctuation block in your script.
***NOTE: The pause cue only allows you to adjust existing pauses within your script. These pauses naturally occur at punctuation marks like commas, periods, and dashes. The Pause cue will let you change the length of those natural pauses. However, if you want to add new pauses, we still recommend manually inserting additional punctuation for best results.

For a more indepth guide to Cues, check out our Guide to Voice Cues page!

‍

Conclusion

Mastering the art of generating high-quality voiceovers WellSaid can transform your projects, making them more engaging, professional, and precisely suited to your needs at a fraction of the cost and production time of traditional voiceover methods. By leveraging the wide variety of voices, adjusting tones and inflections, and utilizing features like Respellings and Replacements, you can produce tailored, dynamic voiceovers that capture your intended message perfectly.

Also, in case you missed our recent Best Practices webinar, you can see the tips from this article in action below!

ALL blog posts