An Improved Algorithm for the Automatic Segmentation of Speech Corpora

In this paper we describe an improved algorithm for the automatic segmentation of speech corpora. Apart from their usefulness in several speech technology domains, segmentations provide easy access to speech corpora by using time stamps to couple the orthographic transcription to the speech signal. The segmentation tool we propose is based on the Forward-Backward algorithm. The Forward-Backward method not only produces more accurate segmentation results than the traditionally used Viterbi method, it also provides us with a confidence interval for each of the generated boundaries. These confidence intervals allow us to perform some advanced post-processing operations, leading to further improvement of the quality of automatic segmentations
Published in 2002