Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech Profile [Part 1] ~ Opus #556

Closed
josancamon19 opened this issue Aug 9, 2024 · 2 comments
Closed

Speech Profile [Part 1] ~ Opus #556

josancamon19 opened this issue Aug 9, 2024 · 2 comments
Assignees

Comments

@josancamon19
Copy link
Contributor

Is your feature request related to a problem? Please describe.
https://github.com/BasedHardware/Friend/blob/7539e2eb315ca3fba4ce07b8ed1057b03dfdc09e/backend/routers/transcribe.py#L66

The websocket shift doesn't work with opus as even after uploading all samples from opus recording, in the samples tab in the app, we have locally a wav file with 16khz, but apparently not in opus.

  • Connect your device with opus firmware.
  • Setup your speech profile samples with this device.
  • Restart the app / websocket
  • This will start a websocket 1 (where samples are uploaded) but you will see that deepgram util is not receiving the text from that transcription.
  • And then once ws2 is killed and connection is moved to ws1, appears if like the codec in the ws is other, thus causing the opus bytes to not working once that happens, you'll see transcription works for the 30 seconds of uploading the sample file.
  • if you set codec to pcm16, you will notice the print from the sentences that deepgram streams, but once the websocket switches, again nothing will work.
@josancamon19 josancamon19 self-assigned this Aug 9, 2024
@josancamon19 josancamon19 changed the title Speech Profile [Opus] Speech Profile [Part 1] ~ Opus Aug 9, 2024
@ebariaux
Copy link
Contributor

ebariaux commented Aug 9, 2024

I haven't looked at the app or server code but wanted to react to your comment on "we have locally a wav file with 16khz, but apparently not in opus".

You need to consider the difference between the container file format and the codec used.
Raw audio (your content) is uncompressed LPCM, just the bytes representing the samples.
WAV is the container and usually it stores uncompressed audio in LPCM (linear PCM) format, just adding a header to the raw bytes. It supports some basic compression schemes but not Opus AFAIK.
Opus is the codec, so it takes LPCM and encodes it into some opus compressed format or goes from opus format to LPCM.

A (far from perfect) analogy would be.
You have text as your content.
Your container / file format could be a word document.
You can zip your text (content) e.g. to send it over the wire but you don't store a zip in a word document.
You capture text in your Omi device, zip it, send it over BLE, unzip it on your phone and store it in a word document.
Same with audio, capture in your Omi, encode with Opus, send over BLE, decode with Opus (you get LPCM back) and store in WAV file.

Hope some of this makes sense...

@josancamon19
Copy link
Contributor Author

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants