Speech To Text

Quality Assessment and Analysis (Part 1)

Download Whitepaper

Whitepaper: Speech to Text Quality Assessment and Analysis - Part 1


With the accelerating improvement of NLP (Natural Language Processing) technology in recent years, the demand for speech recognition and analytics services has grown exponentially. Common use cases include video subtitle generators, voice-enabled virtual assistants, the smart speaker for home devices, and customer interaction analysis. Speech to text transcription is at the center of all these applications. Transcription accuracy is arguably the most important factor that determines the overall quality of the services.

This paper is the first of Macrosoft’s two-part series on Speech to Text quality assessment and analysis research on some of the leading tools available in the marketplace. Our focus is on contact center conversations where we took high-quality call recordings and fed them into the three leading speech to text platforms:

  • CallMiner
  • GCP (Google Cloud Platform)
  • AWS (Amazon Web Services)
The evaluation metric we use is the BLEU (Bilingual Evaluation Understudy) score. The source audio we use is stereo mp3 format with 44100 Hz sampling frequency at 128 kbps bitrates, which is at the high end of contact center recording quality.

Download the Whitepaper to learn more on Speech to Text Quality Assessment and Analysis research on the above top three providers.