You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deepgram has a worst WER by 40%, which it's forcing us to do a postprocessing using whisper-x.
Also tried assembly AI, unfortunately streaming only works for english language, so it's discarded.
Speechmatics is marginally better than assembly ai, but works with all languages, and has interesting features future proof.
NOTE I will do the exact same pipeline first in Soniox first, we already have 10k in credits, but I'm unsure if I trust their accuracy for some reason, as the WER comparison was made by themselves.
Also they made the research before the releases of latest models.
Still the reason of testing soniox first, is because we have already a good % of the pipeline integrated, so it shouldn't take long.
Setup speechmatics websocket concurrently with existing deepgram websocket.
From the app use a settings dropdown, that allows to select transcript model (only while testing)
Test both options in 10 scenarios. (Deepgram + postprocessing) (Speechmatics + postprocessing)
Script to view line by line comparison between each one of them
Prompt GPT to compare the 3 transcripts at each scenario, which one has better accuracy.
(Maybe) Use groq whisper v3 as source of truth and perform WER in comparison
If tests point that speechmatics <= whisper-x results by 5-10%, skip and remove postprocessing.
Important:
Need to double check scalability
Need to ask for free credits, it's 4x more expensive than deepgram.
Speechmatics will only be supported for opus, for 1.0.2, will continue using deepgram.
Add ons:
VAD Implementation will be needed. Finish ticket specially for Opus.
Push more users to migrate, initiate "campaign" to help users migrate from 1.0.2 to 1.0.4 in < 30days so we can deprecate pcm8.
Understand the data (how many are still on pcm8?)
Improve speech recognition, make sure the file is being sent correctly (use the raw audio .wav instead of the saved opus encoded bytes), double check the duration at which performs 90% of the time.
The text was updated successfully, but these errors were encountered:
Refactoring STT system
https://artificialanalysis.ai/speech-to-text
Points to https://www.speechmatics.com/ as the winner in WER %
Deepgram has a worst WER by 40%, which it's forcing us to do a postprocessing using whisper-x.
Also tried assembly AI, unfortunately streaming only works for english language, so it's discarded.
Speechmatics is marginally better than assembly ai, but works with all languages, and has interesting features future proof.
NOTE I will do the exact same pipeline first in Soniox first, we already have 10k in credits, but I'm unsure if I trust their accuracy for some reason, as the WER comparison was made by themselves.
Also they made the research before the releases of latest models.
Still the reason of testing soniox first, is because we have already a good % of the pipeline integrated, so it shouldn't take long.
Important:
Need to double check scalability
Need to ask for free credits, it's 4x more expensive than deepgram.
Speechmatics will only be supported for opus, for 1.0.2, will continue using deepgram.
Add ons:
.wav
instead of the saved opus encoded bytes), double check the duration at which performs 90% of the time.The text was updated successfully, but these errors were encountered: