Speech Recognition Test

Speech Recognition (ASR) Evaluation Metrics


    In the process of testing speech recognition, to measure the accuracy of the recognized content, 

the following indicators are generally used to evaluate


    WER (Word Error Rate, word error rate)


    Definition: In order to keep the recognized word sequence consistent with the standard word 

sequence, some words need to be replaced, deleted or inserted. The total number of these 

inserted, replaced or deleted words is divided by the standard word sequence. The percentage 

of the total number of words, which is WER


    WER=(S+D+I)/N


    S: substitution


    D:Deletion, delete


    I: Insertion, Insertion


    N: the total number of words in the standard word sequence


   (S+D+I)=edit distance between the recognized word sequence and the standard word sequence


    Note: Because there are insertion words (Insertion), there may be cases where WER>1. However, 

when WER>1, it indicates that the recognized word sequence is very different from the standard 

word sequence, and the recognition effect is particularly poor.


  SER (Sentence Error Rate, sentence error rate)


    Definition: If there is a word recognition error in a sentence, then the sentence is considered to

 be recognized incorrectly, and the number of sentence recognition errors, divided by the total 

number of sentences, is SER


    SER=SE/N


    SE: In the recognized sequence, the number of incorrectly recognized sentences (that is, 

the number of sentences with WER!=0)


    N: the total number of sentences in the standard sequence


    (2), verify the accuracy of the identification content


    Statistically recognize WER and SER indicators of text


    (3), pay attention to the time-consuming speech recognition


    ①The recognition of short speech takes time


    ② Long speech recognition takes time