-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Touchpad init fixes 2 #45
Conversation
Regarding commit 3cb8a1b, this sounds like an ordering issue that needs to be fixed in the master's driver. (Doesn't preclude such a workaround of course until it's fixed.) Regarding commit 33e2e87, this seems to be a missing feature in the SPI core, it's indeed admitted in <linux/spi/spi.h>. A lot of chips need a delay between lowering of CS and the first clock cycle, e.g. to come out of a low-power state. Usually this is specified in the datasheet and is so low that it doesn't matter, but in some cases clearly it does. |
For illustrative purposes I've cobbled together (untested) commit l1k/linux@ca6e39f which introduces a per-slave CS-to-CLK delay. However I gather from applespi.c that a per-message delay may be needed since you seem to delay only on reads, not on writes. |
My code is atrocious as usual, 0-day has alerted me that I've missed to replace a few "us" with "usecs", updated branch is here. |
After using this fix for close to a week now, I've just encountered one touchpad init failure while I was rebooting the machine multiple times to investigate the NVMe-related boot problem. The dmesg output has the same Not sure if this is significant though, as I was testing things just now. |
I agree that it's a bug in the spi driver core. However, it can't normally occur. The spi_master_class and spi_slave_class structs are static, i.e. not normally accessible outside spi.c - I had to use a hack to get a pointer to the struct to be able to register a class callback. And nothing in Regarding the clock delay, that looks good - I'll give it whirl shortly. While you're right that currently we don't have that delay on writes, it probably doesn't hurt, and we may just have been lucky (the whole delay stuff is far more fragile than I was anticipating - wish there were some clear documentation on what delays where needed where). |
@peterychuang Thanks for the report. Bummer, looks like I'll need to add that extra delay between read and write. Unless this was due to currently not having a delay between cs and clk for writes. Hmm... |
@l1k I just realized that your clk-delay patch is working quite differently from the applespi driver. In particular:
Of course, these assumptions could be all wrong, and I'm just getting lucky with things... Also, I don't think spi_transfer_one_message is being used here, but instead pxa2xx_spi_transfer_one_message (but I need to double check that). |
@roadrunner2: Oh, so that won't occur with your spi-properties branch, right? Indeed then it's not necessary to submit a patch. As for the proper timings, I've just dug around in The |
@roadrunner2: My comment above is a reply to yours of 5 hours ago, I hadn't read your new comment before sending it away, sorry. You're certainly right about |
@l1k Right: with your apple properties changes that code becomes obsolete, and hence the race is not encountered. Regarding timings, that's interesting: we definitely need that 100µs between transfers, in addition to delays after asserting CS. But I've been reworking things a bit, and now the only multi-transfer message is the write-command + read-status, so maybe the delay in there is the write-read delay you saw in the apple code. The comments about the reset* properties is interesting and make sense - but that then leaves the question of where write-read delay comes from - maybe that's hardcoded? I had picked the spiCSDelay because it was the only one without known units, but the longer I look at things (and the more things get clarified) the worse that choice looks. |
@roadrunner2: The macOS driver lets the SPI master do DMA and the purpose of the delay appears to be that it waits for the DMA to complete. So it's not really comparable to how the Linux driver works, which just lets the SPI core invoke a callback upon completion. The macOS driver calculates the delay in
In cases like this we usually just insert a hardcoded delay, see e.g. the |
@l1k Thanks for that info. The first line is clear:
(clock-period-in-ns is 125) But I can't make sense of that second line; the |
33e2e87
to
df3060f
Compare
Here is a substantially updated set of patches. I have been able to reproduce a number of issues by generating lots of commands to send while at the same time generating a lot of events (mostly by running my finger around the touchpad). As a result I think I've been able to clean up things quite a bit and make them very stable now - I'm unable to produce any failures anymore. As part of this the touchpad-init is now handled/processed like the other commands and should therefore also be robust now. @peterychuang Could you try this out? You seem to have been able to consistently get errors here. Many thanks @l1k for the assistance here in figuring some of the delays. I still haven't been able to make sense of the |
@roadrunner2 sure. Is it the |
@peterychuang Yes; either that or the |
The latest changes roadrunner2/macbook12-spi-driver@1b97804 in the |
@roadrunner2 sorry I've been pretty busy of late so I haven't had the chance to give this project the attention it deserves. I'll try out this branch on my Macbook9,1 - I occasionally had init problems too so it will be interesting to see if it fixes those. Other than that, if you can fix the conflict (caused by me merging #42 which I see this includes) I'd be happy to merge it in. |
@roadrunner2 I've been using Other than that one time, I haven't encountered another touchpad init problem. |
The spi-master-device registration registers the device before it starts the queue and marks the device as running. Because device registration immediately runs all class hooks, we end up with a small window where we are notified that the spi-master device is available but it's not actually ready to receive commands. Therefore we now check if spi master is marked running and wait a bit and retry if not. This fixes some sporadic touchpad ininitialization failures.
In particular we need to ensure the keyboard input handler is registered after all structures are initialized because it registers an event handler and hence we can be called at any point after registration. Also moved the USB check to the very top, since there's no point in doing anything if that succeeds.
After sending any command we will receive a response message (this includes the backlight and caps-lock commands). But that may not be the next message received. However, all received messages are signaled by an interrupt. So the receiving and processing of those response messages is now left to our regular receive handler (currently all we do with the messages is log them).
This avoids occasional status codes of (9c 17 e5 f8) if a new command is sent too early.
This is an attempt to fix the last remaining occasional failures due to insufficient delays. To summarize we now have only two delays: - the spiCSDelay after asserting CS (for both reads and writes) - a fixed, 100µs delay whenever switching between writes and reads In more detail: spiCSDelay appears to be truly what it says, a delay after asserting CS and before clocking any data. This also implies that it appears to be in units of µs. Furthermore this is needed on writes, not just on reads (so this change includes the write equivalent of commit 3d20af5). The delay at the end of messages (i.e. before de-asserting CS) appears to be unnecessary, and was probably an artifact of earlier versions of this code. Hence removed them. The resetXXX properties are for resetting the device only, something we do not currently implement. So stop using them. Lastly, the 100µs delay between writing and reading is also needed when switching from reading to writing (otherwise we get occasional (b3 b3 b3 b3) write statuses). Since there is no property in the DSDT that directly corresponds to this, it is hardcoded for now.
I believe this isn't necessary anymore, with all the fixes that have occurred in the mean time. And it has the potential to cause an event to get dropped (though unlikely).
This simplifies the code overall and ensures the contraints on commands and response are fullfilled for the init command(s) too.
This ensures our callbacks are not called after we've been removed, and it also ensures that writes have been fully processed (including the associated response being received), thereby resolving occasional errors when removing and re-inserting this module.
df3060f
to
1faadac
Compare
@cb22 No problem - I was waiting/hoping for some folks to test this first anyway. I rebased and re-pushed the branch now. @stefan-langenmaier and @peterychuang: thanks to both of you for testing and reporting back. Good to hear it mostly works; sad to hear it only mostly works - I was really hoping to have nailed this once and for all. But the fact that it fixed this on 14,2, and that only one failure occurred under apparently weird conditions, and my much more extensive tests work, does give me hope we're on the right track now and getting close. @peterychuang: do you happen to still have the logs from when the init failed? If so, could you show me what the exact error was? |
@roadrunner2 unfortunately I haven't kep the log. If I remember it right, it was the same I'll let you know if I encounter the problem again. But as I said, that one time could be a fluke. |
@peterychuang FYI: IIRC you're using Arch, and that uses systemd and hence journald - you can show the messages from previous boots with Anyway, the |
@roadrunner2 unfortunately that was too long ago that the log had been rotated out. The good news is, now that a few more days have passed, I haven't encountered any init problem. So it's quite likely that the one init failure I mentioned was a one-off unrelated thing. |
@roadrunner2 merged 👍. On my Macbook9,1 so far it has been looking good. I also experienced #6 a few times. After doing quite a few reboots, it seems like that no longer happens! |
@cb22 Thanks for testing and merging. |
This fixes the issues on the MBP14,1 reported by @peterychuang on #6.