Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Parameter use_enhanced and model to GoogleCloudSpeech #735

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

HideyoshiNakazone
Copy link

Adds the parameters use_enhanced and model to the recognize_google_cloud method for more customizable options for the user and better results in specific cases

Adds the parameters use_enhanced and model to the recognize_google_cloud method for more customizable options for the user and better results in specific cases
@HideyoshiNakazone
Copy link
Author

HideyoshiNakazone commented Feb 22, 2024

Hello @Uberi and @ftnext, i was wondering if it's possible for someone to review my merge request.

Thank you very much,
Vitor Hideyoshi.

@HideyoshiNakazone
Copy link
Author

Hello @ftnext, is there any interest in this feature? It doesn't break any of GoogleCloudSpeech python api, only extends it. I'm currently already using this implementation in the company i work in, but would love to have this feature merged.
If there is anything blocking the merge please tell me :)

@Uberi
Copy link
Owner

Uberi commented Apr 26, 2024

Hi @HideyoshiNakazone!

Looks good overall, but would it be possible to document these parameters in the docs for that function? If so, happy to merge this!

@HideyoshiNakazone
Copy link
Author

@Uberi, thanks a lot! I added the parameters to the Docstring of the method Recognizer.recognize_google_cloud and added them to the library reference file.
If there is any other places you'd like me to add documentation i'll be happy to :)

@@ -238,6 +238,10 @@ The recognition language is determined by ``language``, which is a BCP-47 langua

If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.

The ``use_enhanced`` is a boolean option that sets a flag with the same name on the Google Cloud Speech API, it will make the API uses the enhanced version of the model. More information can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech-to-text/docs/enhanced-models>` __.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HideyoshiNakazone Thanks! Would you like to remove space?

-<https://cloud.google.com/speech-to-text/docs/enhanced-models>` __
+<https://cloud.google.com/speech-to-text/docs/enhanced-models>`__

@ftnext
Copy link
Collaborator

ftnext commented Apr 29, 2024

@HideyoshiNakazone Thank you very much for this pull request! I'm very sorry to respond too late.
@Uberi Thanks your comment!

In my opinion, it seems to be better to introduce keyword arguments (a.k.a. **kwargs)
https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments

Certainly, adding use_enhanced and model as arguments would implement this feature.
However, if there are additional arguments to be added in the future, there is a concern that they could be added again (not easy to extend).

I think it would be preferable for Cloud Speech API-specific arguments to be specified as variant keyword arguments.

def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False, **api_params):
    """
    If ``preferred_phrases`` is an iterable of phrase strings, ...

    api_params: Cloud Speech API-specific parameters as dict (optional)

        The ``use_enhanced`` is a boolean option ...

        Furthermore, you can use the option ``model`` to set your desired model,

    Returns the most likely transcription if ``show_all`` is False (the default).
    """

    config = {
        'encoding': speech.RecognitionConfig.AudioEncoding.FLAC,
        'sample_rate_hertz': audio_data.sample_rate,
        'language_code': language,
        **api_params,
    }

(It seems that preferred_phrases might be included in api_params too, but this is another issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants