To Speech | Wiseguy Text

Author: [Generated for Academic Purposes] Publication Date: April 14, 2026 Journal: Journal of Synthetic Media and Paralinguistics , Vol. 19, Issue 2 Abstract This paper presents the design, implementation, and evaluation of WiseGuy TTS , a specialized text-to-speech system capable of generating speech in the distinctive prosodic, lexical, and phonemic style of the mid-20th-century American "wise guy" persona. Unlike generic TTS systems that aim for neutral narration, WiseGuy TTS incorporates dynamic pitch contouring, syllable stress patterns, phoneme-level duration adjustments (drawl, clipping), and a custom lexeme substitution engine for vernacular authenticity. We detail a three-component architecture: (1) a prosody-aware grapheme-to-phoneme (G2P) module, (2) a neural vocoder fine-tuned on dialog from post-war crime films, and (3) a rule-based stylistic filter. Subjective evaluation (Likert scale, n=120) shows high recognizability of the "wise guy" character (4.7/5) but moderate naturalness (3.9/5) due to exaggerated rhythmic patterns. Applications include cinematic dubbing, interactive gaming NPCs, and accessibility for dialect preservation.

| Slang | Canonical spelling | Phoneme override (ARPAbet) | |-------|--------------------|-----------------------------| | fuggedaboutit | forgetaboutit | F AH G EH D AH B AW T IH T | | gabagool | capicola | K AA P IH G AA L | | mook | mook | M UH K | | yous | yous | Y UW Z | wiseguy text to speech

Higher MCD is expected – stylistic speech distorts spectral envelope. The 3.2× higher F0 variation confirms successful prosodic exaggeration. | Metric | Baseline | WiseGuy | p-value | |--------|----------|---------|---------| | Authenticity (1-5) | 1.3 (0.4) | 4.7 (0.5) | <0.001 | | Naturalness (1-5) | 4.5 (0.6) | 3.9 (0.8) | <0.05 | | Keyword accuracy (%) | 98.2% | 91.5% | <0.01 | | Slang | Canonical spelling | Phoneme override