How to synthesise consonant sounds

clowm · November 2020

Hello!

A bit off-topic from drambo, but I've been searching any examples of consonant sound synthesis and really found nothing. My thought that it has to be done with noise, fast envelopes and filters. But since there are members with quite a skills in synth area in this forum, maybe someone can give me links or advices on that topic?

Thanks!

Fedor · November 2020

Making a formant filter out of a few bandpass filters would be a good start.

Where did you get that term from? Do you mean vowel-consonant synthesis?

rs2000 · November 2020

https://forum.beepstreet.com/discussion/998/how-to-synthesise-consonant-sounds

Indeed consonants leave some room for reasearch, although there are scientific papers that detail on speech synthesis.

You basically have several options:

Use samples for the ssh, plop, rrr, b, p, g, k, d, t and so on
Synthesize them by using custom pulses (Graphic shaper to the rescue), filters and Drambo's physical models & FX
Use wavetables (2048 samples per WT frame can be enough for doing limited bandwidth speech synthesis).

clowm · November 2020

https://forum.beepstreet.com/discussion/comment/11739#Comment_11739

Well it's not a term, I just tried to describe what I'm after 🙂

clowm · November 2020

https://forum.beepstreet.com/discussion/comment/11740#Comment_11740

Thanks!

lala · November 2020

vowels are simple, (a,e,i,o,u,ü,ö,ä, y)

use something sample based for the rest as plosives (p,b,d,t) are a nightmare to synthesize.

ch. sh, sch, and zzz kind of works

L & n & m are my worst nightmare , use something sample based too ( its so tricky, u think if I can make it go "aaaaaaa" I can make it go "LLLLLLLL". or nnnnnnn or mmmmmm )

nope

;)

Btw. its easy to fool yourself into thinking it works here,

play it it to someone else to see if it works ;)

I wasted month an month on this (analyzing my voice and the news speaker from channel one)

In the end I couldn't make it say something simple like

the bakerman has backen bread ^^

that was halfway understandable, I ended up with hm, interesting frequency garbage

so I simply gave up

more luck to you ;)

lala · November 2020

this is in German so will have to use google translate

https://www.phonetik.uni-muenchen.de/studium/skripten/SGL/SGLKap2.html

lala · November 2020

Bildschirmfoto 2020-11-26 um 14.21.37.png

Bildschirmfoto 2020-11-26 um 14.21.26.png

if u look at Wolfgang Palm (PPG)

he is using wavetables to do the tricky stuff

to me it looks like that is the only way to go (or something else sample based)

(I tried all kinds of other shit that I could think of and it didn't really work out at the end , was unintelligible ;) )

above is the frequency table for all formants

m is male, w is female, ch is children

u just need 4 bandpassfilters and a rich source

or take 4 sinewaves ... what ever

good luck with the l, m, n stuff

sadly it just doesnt work ike that at all ;)

lala · November 2020

https://soundcloud.com/the_only_real_lala/zuse3

formants here done with graphic eq

(In 3 I show of some of the stuff that didn't work out,)

you can basically use anything, its just about the specific frequencys

https://soundcloud.com/the_only_real_lala/zuse2

sadly I dont have the examples that clearly shows it doesnt work anymore

I tried to make it say simple words like "Ananas" (pineapple)

doesnt work

clowm · November 2020

https://forum.beepstreet.com/discussion/comment/11755#Comment_11755

Wow, that will keep me busy for days! Huge thanks!

lala · November 2020

Have fun

another way to get to formants is the graphic shaper ;)

https://patchstorage.com/formants-through-the-backdoor/

lala · November 2020

on the other hand i may give l,m,n a 2nd try

lala · November 2020

hehe, as I remembered the other stuff doesn’t work

its not recognizable

(with a little noise we get a little closer but still unintelligible )

i can fool myself into thinking this sounds kind of like L

but nope

it’s an illusion

if I try make to it say “ lala “

I get “ ?a?a “ back

L

L2

lala · November 2020

U can hear L doesn’t work

it still sounds somewhat vowel like, but nothing recognizable

Morph from a to L or not.zip

(Move morph fader)

lala · November 2020

Pseudo N

oh this works actually a little better than I remembered

so there is room for experiments here

it sure works for musical purposes as some absurd filtering but not as intelligible speech

if someone comes up with something intelligible I will invite him over for dinner ;)

(I expect this won’t happen)

btw. It doesn’t get easier k is just a click followed by silence ...

hm, the frustrating thing is I find only papers from linguistic research

there must be better papers from sound people about none vowels???

most of the stuff were you think this is synthesized speech is fake, meaning not really synthesized

its little samples glued together, or someone talking into a vocoder or frequency shifter and so on

everything except from vowels is a ridiculous amount of work with little results to show of

rs2000 · November 2020

Yes, it's a lot of work, no question.

Arbitrary control of custom wavetables for vowels and consonants is possible in Drambo.

I've done it in the wavetable demo song and I would still stick to it and combine with consonant samples where appropriate.

Some consonants like the "L" @lala mentioned are actually vowels too.

BTW @lala, dinner sounds good 😋

lala · November 2020

https://forum.beepstreet.com/discussion/comment/11771#Comment_11771

see, these linguist confuse me ^^

they dont call "L" a vowel

its a "lateral" or something for them

I cant see much difference between L & A either (except there is something different about "L"

because I just cant make it work ... (3 formants vs 4?) meh

I get kind of close with the formant frequencies from the pictures above but I definitely dont hear "L"

I hear something that is similar to an "Ä" or so

I have only looked at the spectrum in my research of speech, maybe I should actually start looking at the actual waveforms and see what I can bend because I am stuck ?

I make passable Lasagne. ^^

L, M,N cant be that complex, I can hum it as constant sound, god damned 🤯

lala · November 2020

I dont want to cheat with samples, I want to really understand what's going on ;)

rs2000 · November 2020

https://forum.beepstreet.com/discussion/comment/11780#Comment_11780

Hehe, sounds like we need a tweakable vocal tract and mouth/tongue/teeth model 😁

orchid · November 2020

I'm just gonna uh… leave this here 😅

https://dood.al/pinktrombone/

rs2000 · November 2020

@orchid This one is excellent indeed, I'm glad it's still online!

lala · November 2020

https://forum.beepstreet.com/discussion/comment/11785#Comment_11785

well its pretty primitive, isn't it?

while it does some articulations it doesnt do it all (lol, they left plosives out because it has to heavy interaction of things) - say pineapple, say concubine , say F### you,

nor is it able to articulate something ( say lala )

(they are still making endoscope videos of how certain articulations in certain languages are made )

:/

with spoken or sung language the frequencies change depending what letter was before it (Vowells dont change)

rs2000 · November 2020

@lala it's not easy indeed.

Imagine we already had a model that could produce all sounds, how would you control the maybe 40 parameters to produce actual speech?

That old black and white video showing the secretary typing speech might be the way to go, I mean we have pad controllers and MIDI keyboards and the sequencer could record it 😅

lala · November 2020

https://forum.beepstreet.com/discussion/comment/11788#Comment_11788

you mean the bell lab demo?

im not quite sure how that worked

I think it had the words to say preprogrammed ?

oh no, it was all realtime ( they say something about 20 sounds, hm, it couldn't do it all, there are more letters in the alphabet ...)

looking at the specs of the voder its just pulswaves, noise and a formantfilter,

(that explains why you cant understand shit it is trying to say) ^^

lala · November 2020

thinking about it,

French rolling "RRRRR" should also be archivable

so to do:

convincing

(L is somehow a mixture of vocal and consonant)

L,

M,

N,

F, (V) (F is similar to M & N & S)

S

sh,

ch,

sch

english TH

Z

R

someone remind me to look into this again when we had a few updates

I grouped them into similar sounds already

hm

if I do M,N,F,S its kind of I'm starting low in the spectrum and then go all the way tru it,

it starts at the throut and ends at the lips

mmmmmmnnnnnffffffSSSSS

note to myself 4 or 5 rooms

rs2000 · November 2020

It would be quite helpful to find a way how to synthesize all kinds of transitions between any vowel and consonant without bloating the project. Wavetables are a good start but they can't cover them all. Maybe an additional bank of steady noisy samples that can be crossfaded freely? At least for a start, when everything works as expected, one can still re-build the noise samples with synth modules.

I would use MIDI note numbers to control all that, 128 notes should be enough to control the robot...

Pitch would be controlled independently (pitch bend), as would be vibrato (mod wheel) and maybe even a few CCs for shaping the vocal tract, mouth, lips, teeth and tongue position. Oh my, that sounds like a project eating up the whole christmas holidays, just to implement something that I can do myself 😅

lala · December 2020

https://forum.beepstreet.com/discussion/comment/11805#Comment_11805

Hehe I remember biting my nails.

i think I spend 4 month or so on (without wavetables/samples) it until I realized it was complete trash because you couldn’t understand a word.

i thought hah I found it (it’s really easy to trick yourself here if u work alone) then made some show off demos, and then I realized oh fuck,

it doesn’t work

if I have to write text to the example (this is supposed to say ananas) it clearly doesn’t work

Ananas ;)

never the less it is a good exercise in Sounddesign. :)

I don’t think of it as wasted time even if I have nothing amazeballs to show of. :)

lala · December 2020

https://forum.beepstreet.com/discussion/comment/11805#Comment_11805

Hm, the frequency change seems like a thing that we are able to figure out?

do these frequencies after those frequencies

it must be a small number of changes as we don’t have endless letters in the alphabet

and some combinations of letters are really odd (in Roman based languages) like qk or something so

(Say that 5 times quick gkgkgkgk — the dogs looks at me to check I don’t have a stroke 🤣)

a lot falls down the stairs in the first place (and doesn’t need to be figured out)

(I only know about Roman languages, German, English, French, Italien, Spanish;

none Roman based languages are completely alien to me like Thai, Japanese, Chinese

thai has words with 3 k and ish something like kkkgulasch ...

I would appreciate some input from a native speaker of a none Roman based language/ languages here.

lala · December 2020

Looking at the sonogram

comparing Ana to ama

i guess the „frequency smear“ we see at the start of the M

is the trallala that Happens when certain letters follow each other ?

i kind of bet it’s Resonance frequencies from one or more of the „chambers“ during transition

rs2000 · December 2020

https://forum.beepstreet.com/discussion/comment/12429#Comment_12429

Who knows. In Drambo, it would be possible to model the vocal tract with seamless parameter control. Maybe eavetables are just a dirty workaround and a proper model would be better?

Wavetables and samples could cover the excitation signals while filters, delays and impulse responses (from the vocal tract using Thafknar 😎) would shape them. Sounds like a nice experiment.

Who's voluntaring to place the full-blown IR recording setup into his throat? 😂

lala · December 2020

^^

How to synthesise consonant sounds

Comments