Speech Recognition (ASR) Evaluation Metrics
In the process of testing speech recognition, to measure the accuracy of the recognized content,
the following indicators are generally used to evaluate
WER (Word Error Rate, word error rate)
Definition: In order to keep the recognized word sequence consistent with the standard word
sequence, some words need to be replaced, deleted or inserted. The total number of these
inserted, replaced or deleted words is divided by the standard word sequence. The percentage
of the total number of words, which is WER
WER=(S+D+I)/N
S: substitution
D:Deletion, delete
I: Insertion, Insertion
N: the total number of words in the standard word sequence
(S+D+I)=edit distance between the recognized word sequence and the standard word sequence
Note: Because there are insertion words (Insertion), there may be cases where WER>1. However,
when WER>1, it indicates that the recognized word sequence is very different from the standard
word sequence, and the recognition effect is particularly poor.
SER (Sentence Error Rate, sentence error rate)
Definition: If there is a word recognition error in a sentence, then the sentence is considered to
be recognized incorrectly, and the number of sentence recognition errors, divided by the total
number of sentences, is SER
SER=SE/N
SE: In the recognized sequence, the number of incorrectly recognized sentences (that is,
the number of sentences with WER!=0)
N: the total number of sentences in the standard sequence
(2), verify the accuracy of the identification content
Statistically recognize WER and SER indicators of text
(3), pay attention to the time-consuming speech recognition
①The recognition of short speech takes time
② Long speech recognition takes time
