Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

gexgd0419 · 2025-01-07T04:17:49Z

Link to issue number:

Summary of the issue:

Currently, SAPI5 and MSSP voices use their own audio output mechanisms, instead of using the WavePlayer (WASAPI) inside NVDA.

This may make them less responsive compared to eSpeak and OneCore voices, which are using the WavePlayer, or compared to other screen readers using SAPI5 voices, according to my test result.

This also gives NVDA less control of audio output. For example, audio ducking logic inside WavePlayer cannot be applied to SAPI5 voices, so additional code is required to compensate for this.

Description of user facing changes

SAPI5 and MSSP voices will be changed to use the WavePlayer, which may make them more responsive (have less delay).

According to my test result, this can reduce the delay by at least 50ms.

This haven't trimmed the leading silence yet. If we do that also, we can expect the delay to be even less.

Description of development approach

Instead of setting self.tts.audioOutput to a real output device, do the following:

create an implementation class SynthDriverAudioStream to implement COM interface IStream, which can be used to stream in audio data from the voices.
Use an SpCustomStream object to wrap SynthDriverAudioStream and provide the wave format.
Assign the SpCustomStream object to self.tts.AudioOutputStream, so SAPI will output audio to this stream instead.

Each time an audio chunk needs to be streamed in, ISequentialStream_RemoteWrite will be called, and we just feed the audio to the player. IStream_RemoteSeek can also be called when SAPI wants to know the current byte position of the stream (dlibMove should be zero and dwOrigin should be STREAM_SEEK_CUR in this case), but it is not used to actually "seek" to a new position. IStream_Commit can be called by MSSP voices to "flush" the audio data, where we do nothing. Other methods are left unimplemented, as they are not used when acting as an audio output stream.

Previously, comtypes.client.GetEvents was used to get the event notifications. But those notifications will be routed to the main thread via the main message loop. According to the documentation of ISpNotifySource:

Note that both variations of callbacks as well as the window message notification require a window message pump to run on the thread that initialized the notification source. Callback will only be called as the result of window message processing, and will always be called on the same thread that initialized the notify source. However, using Win32 events for SAPI event notification does not require a window message pump.

Because the audio data is generated and sent via IStream on a dedicated thread, receiving events on the main thread can make synchronizing events and audio difficult.

So here SapiSink is changed to become an implementation of ISpNotifySink. Notifications received via ISpNotifySink are "free-threaded", sent on the original thread instead of being routed to the main thread.

To connect the sink, use ISpNotifySource::SetNotifySink.
To get the actual event that triggers the notification, use ISpEventSource::GetEvents. Events can contain pointers to objects or memory, so they need to be freed manually.

Finally, all audio ducking related code are removed. Now WavePlayer should be able to handle audio ducking when using SAPI5 and MSSP voices.

There should be no change to public APIs.

Testing strategy:

Tested the delay of some built-in SAPI5 voices.

Audio ducking seemed to be working.

Stability of this is not proven yet, which needs further tests.

Known issues with pull request:

None yet

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

@coderabbitai summary

LeonarddeR

This is marvelous work! Great job

LeonarddeR · 2025-01-07T17:15:43Z

source/synthDrivers/sapi5.py

@@ -53,41 +55,116 @@ class SpeechVoiceEvents(IntEnum):
 	Bookmark = 16


-class SapiSink(object):


Strictly spoken, this is breaking backwards compatibility. That said, I think it is safe to mark this as such in the changelog.

I think that the implementation of SapiSink should be internal and not considered as part of the public API, so it shouldn't be a breaking change. I couldn't find references to SapiSink outside sapi5.py in NVDA's code.

Also, the function parameter list of StartStream, Bookmark and EndStream is mostly the same.

But it should be noted that event notification can be sent on any thread, instead of always on the main thread, which is also the reason why I chose to use ISpNotifySink directly.

Do I need to mention this in the "Changes for Developers" section?

yes, any removed public code (not prefixed with _), and other breaking API changes must be listed in API breaking changes under changes for developers. It doesn't seem like the API for SapiSink (e.g functions you listed) has changed, can you confirm?

I removed the if synth is None checks in them, because the events are now dispatched in ISpNotifySink_Notify and the check has been done there. The usage of SapiSink is also changed, although the parameter lists of those functions are the same.

source/synthDrivers/sapi5.py

seanbudd

Thanks @gexgd0419 ! Mostly minor review items

seanbudd · 2025-01-09T01:23:53Z

source/synthDrivers/sapi5.py

@@ -53,41 +55,116 @@ class SpeechVoiceEvents(IntEnum):
 	Bookmark = 16


-class SapiSink(object):


yes, any removed public code (not prefixed with _), and other breaking API changes must be listed in API breaking changes under changes for developers. It doesn't seem like the API for SapiSink (e.g functions you listed) has changed, can you confirm?

seanbudd · 2025-01-09T01:24:28Z

source/synthDrivers/sapi5.py


-	def StartStream(self, streamNum, pos):
+	def ISequentialStream_RemoteWrite(self, pv, cb):


please add type information. Can more descriptive variable names be used?

Those names are from the original interface definition from Microsoft, which use the Hungarian naming convention. For example, "cb" stands for "count of bytes". But yes, I can make it more descriptive.

pv is of type POINTER(c_ubyte), but this is not accepted as a type hint because POINTER is a function. Can I use c_void_p instead?

I think we would typically use c_ubyte here

But pv is a pointer, not a single byte. And its type is neither c_ubyte nor c_void_p, if you use isinstance to test it.

I think I prefer the parameter names to resemble the interface definition as much as possible. The params can be described in a doc string.

Re typing, what should probably work, is defining a type variable at the top of the file and then use that in your annotation. So for example, somewhere at the top:

LP_c_ubyte = POINTER(c_ubyte) # This follows comtypes naming convention for pointer type names

then:

Suggested change

def ISequentialStream_RemoteWrite(self, pv, cb):

def ISequentialStream_RemoteWrite(self, pv: LP_c_ubyte, cb: typename):

That said, there's some comtypes magic under the hoot, especially when using the high level implementation. Have you ever logged the type of pv after it entered the function?

The type of pv is comtypes.gen._1EA4DBF0_3C3B_11CF_810C_00AA00389B71_0_1_1.LP_c_ubyte.

And the type of cb is int, although defined as c_ulong.

isinstance(pv, POINTER(c_ubyte)) returns True.

So I think it's safe to assume that they are the same type.

Re typing, what should probably work, is defining a type variable at the top of the file and then use that in your annotation.

Pylance still tells me that variables are not allowed in a type expression. Guess that the type has to be "static".

I wonder whether you could trick it with:
class LP_c_ubyte(POINTER(c_ubyte)): ...

source/synthDrivers/sapi5.py

…inition

gexgd0419 and others added 2 commits January 7, 2025 12:15

Make SAPI5 voices use WavePlayer

92f07ef

Pre-commit auto-fix

20f754d

gexgd0419 mentioned this pull request Jan 7, 2025

improve the responsiveness of onecore voices and sapi voices #13284

Open

LeonarddeR reviewed Jan 7, 2025

View reviewed changes

gexgd0419 added 2 commits January 8, 2025 13:50

Use high-level implementation when implementing COM interfaces

93ebd88

Add changelog entry in Changes

2f7c14e

gexgd0419 marked this pull request as ready for review January 8, 2025 10:43

gexgd0419 requested a review from a team as a code owner January 8, 2025 10:43

gexgd0419 requested a review from seanbudd January 8, 2025 10:43

gexgd0419 mentioned this pull request Jan 8, 2025

Make continuous reading work when using SAPI5 voices without bookmark support #17523

Open

5 tasks

cary-rowen mentioned this pull request Jan 9, 2025

Fix SAPI 4 driver #17599

Open

5 tasks

seanbudd reviewed Jan 9, 2025

View reviewed changes

seanbudd marked this pull request as draft January 9, 2025 03:04

gexgd0419 added 5 commits January 9, 2025 14:49

Add type hints and docstrings

c55bada

Change parameter names of RemoteWrite back to match the interface def…

266e333

…inition

Remove SPAudioState

3d458ce

Add API breaking change entries

50627b4

Merge branch 'master' into sapi5-wasapi

efb8604

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

gexgd0419 commented Jan 7, 2025 •

edited

Loading

LeonarddeR left a comment

LeonarddeR Jan 7, 2025

gexgd0419 Jan 8, 2025

seanbudd Jan 9, 2025

gexgd0419 Jan 9, 2025

seanbudd left a comment

seanbudd Jan 9, 2025

seanbudd Jan 9, 2025

gexgd0419 Jan 9, 2025

gexgd0419 Jan 9, 2025

seanbudd Jan 9, 2025

gexgd0419 Jan 9, 2025

LeonarddeR Jan 9, 2025

gexgd0419 Jan 9, 2025

gexgd0419 Jan 9, 2025

gexgd0419 Jan 9, 2025

LeonarddeR Jan 9, 2025

		@@ -53,41 +55,116 @@ class SpeechVoiceEvents(IntEnum):
		Bookmark = 16


		class SapiSink(object):


		def StartStream(self, streamNum, pos):
		def ISequentialStream_RemoteWrite(self, pv, cb):

Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

Are you sure you want to change the base?

Make SAPI5 & MSSP voices use WavePlayer (WASAPI) #17592

Conversation

gexgd0419 commented Jan 7, 2025 • edited Loading

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

LeonarddeR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanbudd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gexgd0419 commented Jan 7, 2025 •

edited

Loading