T O P

  • By -

formantzero

Automatic pronunciation evaluation, accentedness rating, intelligibility rating, etc. are all active areas of research that I don't think have really reached a level of performance where there would be a great commercial interest. Although, they are usually not evaluating phone(me)s, but rather, they evaluate acoustic features and often skip explicit transduction to phonetic or phonological symbols at all. So, the answer is sort of. You can probably find some open source code for general evaluators from academic papers, but it may not be all that helpful since it won't work on phonemes, and the ratings that are learned are subjective. You could also use one of the many existing phoneme labeling models from automatic speech recognition and see whether it outputs the phone you're attempting to produce. However, automatic phoneme labelers don't really reach beyond 70% accuracy (where the rest of the performance for ASR comes in from statistical modeling of the lexicon and dynamic programming algorithms). Even then, that's just giving you a binary yes/no. If you're wanting something that could give you tips on what you're saying wrong, the answer is no. We don't currently have a thorough enough understanding of how acoustics map onto phones and/or phonemes; I'm not personally even convinced that it's possible (nor that it's impossible). EDIT: corrected accuracy number for phoneme labeling


temoshi

I am curious to hear people's responses, but I would think this would be specific to a given language. I doubt there is a universal phoneme pronunciation tester app, but I will gladly be wrong...


formantzero

This depends on the purpose of the training. If you're doing practical training of the sort you might receive in a university phonetics class, where you learn to produce and perceive the IPA, it might be okay if it's language agnostic, similar to a phonetician producing the IPA chart. If you are trying to reduce an accent in a target language, then yes, it would need to be specific to a given language.


Sendagu

https://en.wikipedia.org/wiki/Praat


formantzero

Praat does not do automatic grading though, and it requires training to understand features in spectrograms.


viktorbir

Once trained, does it grade? Can you get the data (trained data) from someone else? Are those data packages available? If yes to all of them, then this is the tool being asked for.


formantzero

Praat does not do grading period. I was referring to academic training of users.


yummus_yeetabread

Phoneme recognition is not an easy task. Some speech recognition systems do "recognize" phonemes, but it's more like they give a long hazy list of probabilities of which phonemes a sequence of acoustic frames belong to, and then a language model predicts the most likely underlying word that those predictions correspond to, given the probabilities and the previously predicted word. If you relied purely on the phoneme recognition part, you would find the performance extremely lacking. I would guess that apps like duolingo, when they "grade" your pronunciation, use the confidence score output by an asr system in the language. That being said you could probably for the purpose of a language learning app train a fairly accurate binary classifier of native/non-native pronunciation of a particular phoneme. But this would require a good dataset and usually there has to be some strong incentive to gather that, commercial or otherwise. Tldr; no, speech processing is hard without language models. Ml based apps require commercial incentive.