How to synthesise consonant sounds
Hello!
A bit off-topic from drambo, but I've been searching any examples of consonant sound synthesis and really found nothing. My thought that it has to be done with noise, fast envelopes and filters. But since there are members with quite a skills in synth area in this forum, maybe someone can give me links or advices on that topic?
Thanks!
Comments
Making a formant filter out of a few bandpass filters would be a good start.
Where did you get that term from? Do you mean vowel-consonant synthesis?
Indeed consonants leave some room for reasearch, although there are scientific papers that detail on speech synthesis.
You basically have several options:
Well it's not a term, I just tried to describe what I'm after 🙂
Thanks!
vowels are simple, (a,e,i,o,u,ü,ö,ä, y)
use something sample based for the rest as plosives (p,b,d,t) are a nightmare to synthesize.
ch. sh, sch, and zzz kind of works
L & n & m are my worst nightmare , use something sample based too ( its so tricky, u think if I can make it go "aaaaaaa" I can make it go "LLLLLLLL". or nnnnnnn or mmmmmm )
nope
;)
Btw. its easy to fool yourself into thinking it works here,
play it it to someone else to see if it works ;)
I wasted month an month on this (analyzing my voice and the news speaker from channel one)
In the end I couldn't make it say something simple like
the bakerman has backen bread ^^
that was halfway understandable, I ended up with hm, interesting frequency garbage
so I simply gave up
more luck to you ;)
this is in German so will have to use google translate
https://www.phonetik.uni-muenchen.de/studium/skripten/SGL/SGLKap2.html
if u look at Wolfgang Palm (PPG)
he is using wavetables to do the tricky stuff
to me it looks like that is the only way to go (or something else sample based)
(I tried all kinds of other shit that I could think of and it didn't really work out at the end , was unintelligible ;) )
above is the frequency table for all formants
m is male, w is female, ch is children
u just need 4 bandpassfilters and a rich source
or take 4 sinewaves ... what ever
good luck with the l, m, n stuff
sadly it just doesnt work ike that at all ;)
formants here done with graphic eq
(In 3 I show of some of the stuff that didn't work out,)
you can basically use anything, its just about the specific frequencys
sadly I dont have the examples that clearly shows it doesnt work anymore
I tried to make it say simple words like "Ananas" (pineapple)
doesnt work
Wow, that will keep me busy for days! Huge thanks!
Have fun
another way to get to formants is the graphic shaper ;)
https://patchstorage.com/formants-through-the-backdoor/
on the other hand i may give l,m,n a 2nd try
hehe, as I remembered the other stuff doesn’t work
its not recognizable
(with a little noise we get a little closer but still unintelligible )
i can fool myself into thinking this sounds kind of like L
but nope
it’s an illusion
if I try make to it say “ lala “
I get “ ?a?a “ back
L
L2
U can hear L doesn’t work
it still sounds somewhat vowel like, but nothing recognizable
(Move morph fader)
Pseudo N
oh this works actually a little better than I remembered
so there is room for experiments here
it sure works for musical purposes as some absurd filtering but not as intelligible speech
if someone comes up with something intelligible I will invite him over for dinner ;)
(I expect this won’t happen)
btw. It doesn’t get easier k is just a click followed by silence ...
hm, the frustrating thing is I find only papers from linguistic research
there must be better papers from sound people about none vowels???
most of the stuff were you think this is synthesized speech is fake, meaning not really synthesized
its little samples glued together, or someone talking into a vocoder or frequency shifter and so on
everything except from vowels is a ridiculous amount of work with little results to show of
Yes, it's a lot of work, no question.
Arbitrary control of custom wavetables for vowels and consonants is possible in Drambo.
I've done it in the wavetable demo song and I would still stick to it and combine with consonant samples where appropriate.
Some consonants like the "L" @lala mentioned are actually vowels too.
BTW @lala, dinner sounds good 😋
see, these linguist confuse me ^^
they dont call "L" a vowel
its a "lateral" or something for them
I cant see much difference between L & A either (except there is something different about "L"
because I just cant make it work ... (3 formants vs 4?) meh
I get kind of close with the formant frequencies from the pictures above but I definitely dont hear "L"
I hear something that is similar to an "Ä" or so
I have only looked at the spectrum in my research of speech, maybe I should actually start looking at the actual waveforms and see what I can bend because I am stuck ?
I make passable Lasagne. ^^
L, M,N cant be that complex, I can hum it as constant sound, god damned 🤯
I dont want to cheat with samples, I want to really understand what's going on ;)
Hehe, sounds like we need a tweakable vocal tract and mouth/tongue/teeth model 😁
I'm just gonna uh… leave this here 😅
https://dood.al/pinktrombone/
@orchid This one is excellent indeed, I'm glad it's still online!
well its pretty primitive, isn't it?
while it does some articulations it doesnt do it all (lol, they left plosives out because it has to heavy interaction of things) - say pineapple, say concubine , say F### you,
nor is it able to articulate something ( say lala )
(they are still making endoscope videos of how certain articulations in certain languages are made )
:/
with spoken or sung language the frequencies change depending what letter was before it (Vowells dont change)
@lala it's not easy indeed.
Imagine we already had a model that could produce all sounds, how would you control the maybe 40 parameters to produce actual speech?
That old black and white video showing the secretary typing speech might be the way to go, I mean we have pad controllers and MIDI keyboards and the sequencer could record it 😅
you mean the bell lab demo?
im not quite sure how that worked
I think it had the words to say preprogrammed ?
oh no, it was all realtime ( they say something about 20 sounds, hm, it couldn't do it all, there are more letters in the alphabet ...)
looking at the specs of the voder its just pulswaves, noise and a formantfilter,
(that explains why you cant understand shit it is trying to say) ^^
thinking about it,
French rolling "RRRRR" should also be archivable
so to do:
convincing
(L is somehow a mixture of vocal and consonant)
L,
M,
N,
F, (V) (F is similar to M & N & S)
S
sh,
ch,
sch
english TH
Z
R
someone remind me to look into this again when we had a few updates
I grouped them into similar sounds already
hm
if I do M,N,F,S its kind of I'm starting low in the spectrum and then go all the way tru it,
it starts at the throut and ends at the lips
mmmmmmnnnnnffffffSSSSS
note to myself 4 or 5 rooms
It would be quite helpful to find a way how to synthesize all kinds of transitions between any vowel and consonant without bloating the project. Wavetables are a good start but they can't cover them all. Maybe an additional bank of steady noisy samples that can be crossfaded freely? At least for a start, when everything works as expected, one can still re-build the noise samples with synth modules.
I would use MIDI note numbers to control all that, 128 notes should be enough to control the robot...
Pitch would be controlled independently (pitch bend), as would be vibrato (mod wheel) and maybe even a few CCs for shaping the vocal tract, mouth, lips, teeth and tongue position. Oh my, that sounds like a project eating up the whole christmas holidays, just to implement something that I can do myself 😅
Hehe I remember biting my nails.
i think I spend 4 month or so on (without wavetables/samples) it until I realized it was complete trash because you couldn’t understand a word.
i thought hah I found it (it’s really easy to trick yourself here if u work alone) then made some show off demos, and then I realized oh fuck,
it doesn’t work
if I have to write text to the example (this is supposed to say ananas) it clearly doesn’t work
Ananas ;)
never the less it is a good exercise in Sounddesign. :)
I don’t think of it as wasted time even if I have nothing amazeballs to show of. :)
Hm, the frequency change seems like a thing that we are able to figure out?
do these frequencies after those frequencies
it must be a small number of changes as we don’t have endless letters in the alphabet
and some combinations of letters are really odd (in Roman based languages) like qk or something so
(Say that 5 times quick gkgkgkgk — the dogs looks at me to check I don’t have a stroke 🤣)
a lot falls down the stairs in the first place (and doesn’t need to be figured out)
(I only know about Roman languages, German, English, French, Italien, Spanish;
none Roman based languages are completely alien to me like Thai, Japanese, Chinese
thai has words with 3 k and ish something like kkkgulasch ...
I would appreciate some input from a native speaker of a none Roman based language/ languages here.
Looking at the sonogram
comparing Ana to ama
i guess the „frequency smear“ we see at the start of the M
is the trallala that Happens when certain letters follow each other ?
i kind of bet it’s Resonance frequencies from one or more of the „chambers“ during transition
Who knows. In Drambo, it would be possible to model the vocal tract with seamless parameter control. Maybe eavetables are just a dirty workaround and a proper model would be better?
Wavetables and samples could cover the excitation signals while filters, delays and impulse responses (from the vocal tract using Thafknar 😎) would shape them. Sounds like a nice experiment.
Who's voluntaring to place the full-blown IR recording setup into his throat? 😂
^^