Vine, D. S. G. and Sahandi, R., 2000. Synthesis of Emotional Speech using RP-PSOLA. In: IEE Seminar on the State of the Art in Speech Synthesis, 13 April 2000, London.
Full text not available from this repository.
Whilst TD-PSOLA remains an adequate solution for neutral speech synthesis, it is less suitable for emotional speech styles, which require more extreme pitch manipulation. By reducing the extent of the necessary pitch manipulation, distortions and artefacts introduced by TD-PSOLA could potentially be lessened. To accomplish this, a method for recording concatenative units with f0 values similar to the target intonation has been devised. This technique, termed reference pitch prompted recording, involves a speaker recording concatenative units at a set pitch. The speaker is guided by a `reference pitch prompt' (RPP), which is a monotonic, hummed note. In RP-PSOLA (reference pitch-PSOLA) synthesis, RPP-recorded units such as syllables are concatenated and an intonation contour applied using TD-PSOLA. RP-PSOLA can be extended so that several versions of each syllable are recorded, each at a different pitch. In this synthesis technique, termed multiple pitch RP-PSOLA, syllables are selected from an inventory to approximate to the target f0 contour and concatenated. This paper compares the RP-PSOLA and multiple pitch RP-PSOLA synthesis methods in terms of the perceived distortion in emotional synthetic sentences, via a listening experiment. The results showed that multiple pitch RP-PSOLA is perceived to produce marginally less distorted synthetic speech than RP-PSOLA overall
|Item Type:||Conference or Workshop Item (Paper)|
|Subjects:||Technology > Engineering > General Engineering|
|Group:||School of Design, Engineering & Computing > Design Simulation Research Centre|
|Deposited By:||Dr Reza Sahandi|
|Deposited On:||31 Aug 2009 18:05|
|Last Modified:||07 Mar 2013 15:12|
|Repository Staff Only -|
|BU Staff Only -|
|Help Guide -||Editing Your Items in BURO|