Speech Recognition Test

Speech Recognition (ASR) Evaluation Metrics

In the process of testing speech recognition, to measure the accuracy of the recognized content,

the following indicators are generally used to evaluate

WER (Word Error Rate, word error rate)

Definition: In order to keep the recognized word sequence consistent with the standard word

sequence, some words need to be replaced, deleted or inserted. The total number of these

inserted, replaced or deleted words is divided by the standard word sequence. The percentage

of the total number of words, which is WER

WER=(S+D+I)/N

S: substitution

D:Deletion, delete

I: Insertion, Insertion

N: the total number of words in the standard word sequence

(S+D+I)=edit distance between the recognized word sequence and the standard word sequence

Note: Because there are insertion words (Insertion), there may be cases where WER>1. However,

when WER>1, it indicates that the recognized word sequence is very different from the standard

word sequence, and the recognition effect is particularly poor.

SER (Sentence Error Rate, sentence error rate)

Definition: If there is a word recognition error in a sentence, then the sentence is considered to

be recognized incorrectly, and the number of sentence recognition errors, divided by the total

number of sentences, is SER

SER=SE/N

SE: In the recognized sequence, the number of incorrectly recognized sentences (that is,

the number of sentences with WER!=0)

N: the total number of sentences in the standard sequence

(2), verify the accuracy of the identification content

Statistically recognize WER and SER indicators of text

(3), pay attention to the time-consuming speech recognition

①The recognition of short speech takes time

② Long speech recognition takes time