-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is marvelous work! Great job
@@ -53,41 +55,116 @@ class SpeechVoiceEvents(IntEnum): | |||
Bookmark = 16 | |||
|
|||
|
|||
class SapiSink(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly spoken, this is breaking backwards compatibility. That said, I think it is safe to mark this as such in the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the implementation of SapiSink
should be internal and not considered as part of the public API, so it shouldn't be a breaking change. I couldn't find references to SapiSink
outside sapi5.py
in NVDA's code.
Also, the function parameter list of StartStream
, Bookmark
and EndStream
is mostly the same.
But it should be noted that event notification can be sent on any thread, instead of always on the main thread, which is also the reason why I chose to use ISpNotifySink
directly.
Do I need to mention this in the "Changes for Developers" section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, any removed public code (not prefixed with _
), and other breaking API changes must be listed in API breaking changes under changes for developers. It doesn't seem like the API for SapiSink
(e.g functions you listed) has changed, can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the if synth is None
checks in them, because the events are now dispatched in ISpNotifySink_Notify
and the check has been done there. The usage of SapiSink
is also changed, although the parameter lists of those functions are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gexgd0419 ! Mostly minor review items
@@ -53,41 +55,116 @@ class SpeechVoiceEvents(IntEnum): | |||
Bookmark = 16 | |||
|
|||
|
|||
class SapiSink(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, any removed public code (not prefixed with _
), and other breaking API changes must be listed in API breaking changes under changes for developers. It doesn't seem like the API for SapiSink
(e.g functions you listed) has changed, can you confirm?
source/synthDrivers/sapi5.py
Outdated
|
||
def StartStream(self, streamNum, pos): | ||
def ISequentialStream_RemoteWrite(self, pv, cb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add type information. Can more descriptive variable names be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those names are from the original interface definition from Microsoft, which use the Hungarian naming convention. For example, "cb" stands for "count of bytes". But yes, I can make it more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pv
is of type POINTER(c_ubyte)
, but this is not accepted as a type hint because POINTER
is a function. Can I use c_void_p
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would typically use c_ubyte
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But pv
is a pointer, not a single byte. And its type is neither c_ubyte
nor c_void_p
, if you use isinstance
to test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the parameter names to resemble the interface definition as much as possible. The params can be described in a doc string.
Re typing, what should probably work, is defining a type variable at the top of the file and then use that in your annotation. So for example, somewhere at the top:
LP_c_ubyte = POINTER(c_ubyte) # This follows comtypes naming convention for pointer type names
then:
def ISequentialStream_RemoteWrite(self, pv, cb): | |
def ISequentialStream_RemoteWrite(self, pv: LP_c_ubyte, cb: typename): |
That said, there's some comtypes magic under the hoot, especially when using the high level implementation. Have you ever logged the type of pv after it entered the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of pv is comtypes.gen._1EA4DBF0_3C3B_11CF_810C_00AA00389B71_0_1_1.LP_c_ubyte
.
And the type of cb is int
, although defined as c_ulong
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(pv, POINTER(c_ubyte))
returns True
.
So I think it's safe to assume that they are the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re typing, what should probably work, is defining a type variable at the top of the file and then use that in your annotation.
Pylance still tells me that variables are not allowed in a type expression. Guess that the type has to be "static".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether you could trick it with:
class LP_c_ubyte(POINTER(c_ubyte)): ...
Link to issue number:
#13284
Summary of the issue:
Currently, SAPI5 and MSSP voices use their own audio output mechanisms, instead of using the WavePlayer (WASAPI) inside NVDA.
This may make them less responsive compared to eSpeak and OneCore voices, which are using the WavePlayer, or compared to other screen readers using SAPI5 voices, according to my test result.
This also gives NVDA less control of audio output. For example, audio ducking logic inside WavePlayer cannot be applied to SAPI5 voices, so additional code is required to compensate for this.
Description of user facing changes
SAPI5 and MSSP voices will be changed to use the WavePlayer, which may make them more responsive (have less delay).
According to my test result, this can reduce the delay by at least 50ms.
This haven't trimmed the leading silence yet. If we do that also, we can expect the delay to be even less.
Description of development approach
Instead of setting
self.tts.audioOutput
to a real output device, do the following:SynthDriverAudioStream
to implement COM interfaceIStream
, which can be used to stream in audio data from the voices.SpCustomStream
object to wrapSynthDriverAudioStream
and provide the wave format.SpCustomStream
object toself.tts.AudioOutputStream
, so SAPI will output audio to this stream instead.Each time an audio chunk needs to be streamed in,
ISequentialStream_RemoteWrite
will be called, and we just feed the audio to the player.IStream_RemoteSeek
can also be called when SAPI wants to know the current byte position of the stream (dlibMove
should be zero anddwOrigin
should beSTREAM_SEEK_CUR
in this case), but it is not used to actually "seek" to a new position.IStream_Commit
can be called by MSSP voices to "flush" the audio data, where we do nothing. Other methods are left unimplemented, as they are not used when acting as an audio output stream.Previously,
comtypes.client.GetEvents
was used to get the event notifications. But those notifications will be routed to the main thread via the main message loop. According to the documentation ofISpNotifySource
:Because the audio data is generated and sent via
IStream
on a dedicated thread, receiving events on the main thread can make synchronizing events and audio difficult.So here
SapiSink
is changed to become an implementation ofISpNotifySink
. Notifications received viaISpNotifySink
are "free-threaded", sent on the original thread instead of being routed to the main thread.ISpNotifySource::SetNotifySink
.ISpEventSource::GetEvents
. Events can contain pointers to objects or memory, so they need to be freed manually.Finally, all audio ducking related code are removed. Now WavePlayer should be able to handle audio ducking when using SAPI5 and MSSP voices.
There should be no change to public APIs.
Testing strategy:
Tested the delay of some built-in SAPI5 voices.
Audio ducking seemed to be working.
Stability of this is not proven yet, which needs further tests.
Known issues with pull request:
None yet
Code Review Checklist:
@coderabbitai summary