Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Touchpad init fixes 2 #45

Merged
merged 10 commits into from
Sep 12, 2017
Merged

Touchpad init fixes 2 #45

merged 10 commits into from
Sep 12, 2017

Conversation

roadrunner2
Copy link
Contributor

This fixes the issues on the MBP14,1 reported by @peterychuang on #6.

@l1k
Copy link
Contributor

l1k commented Aug 11, 2017

Regarding commit 3cb8a1b, this sounds like an ordering issue that needs to be fixed in the master's driver. (Doesn't preclude such a workaround of course until it's fixed.)

Regarding commit 33e2e87, this seems to be a missing feature in the SPI core, it's indeed admitted in <linux/spi/spi.h>. A lot of chips need a delay between lowering of CS and the first clock cycle, e.g. to come out of a low-power state. Usually this is specified in the datasheet and is so low that it doesn't matter, but in some cases clearly it does.

@l1k
Copy link
Contributor

l1k commented Aug 11, 2017

For illustrative purposes I've cobbled together (untested) commit l1k/linux@ca6e39f which introduces a per-slave CS-to-CLK delay. However I gather from applespi.c that a per-message delay may be needed since you seem to delay only on reads, not on writes.

@l1k
Copy link
Contributor

l1k commented Aug 11, 2017

My code is atrocious as usual, 0-day has alerted me that I've missed to replace a few "us" with "usecs", updated branch is here.

@peterychuang
Copy link

After using this fix for close to a week now, I've just encountered one touchpad init failure while I was rebooting the machine multiple times to investigate the NVMe-related boot problem. The dmesg output has the same Error writing to device: b3 b3 b3 b3 line before modswitch.

Not sure if this is significant though, as I was testing things just now.

@roadrunner2
Copy link
Contributor Author

@l1k

Regarding commit 3cb8a1b, this sounds like an ordering issue that needs to be fixed in the master's driver. (Doesn't preclude such a workaround of course until it's fixed.)

I agree that it's a bug in the spi driver core. However, it can't normally occur. The spi_master_class and spi_slave_class structs are static, i.e. not normally accessible outside spi.c - I had to use a hack to get a pointer to the struct to be able to register a class callback. And nothing in spi.c uses class callbacks (the class is just used to be able to enumerate the devices). So I'm not sure if they'd be willing to accept a patch.

Regarding the clock delay, that looks good - I'll give it whirl shortly. While you're right that currently we don't have that delay on writes, it probably doesn't hurt, and we may just have been lucky (the whole delay stuff is far more fragile than I was anticipating - wish there were some clear documentation on what delays where needed where).

@roadrunner2
Copy link
Contributor Author

@peterychuang Thanks for the report. Bummer, looks like I'll need to add that extra delay between read and write. Unless this was due to currently not having a delay between cs and clk for writes. Hmm...

@roadrunner2
Copy link
Contributor Author

@l1k I just realized that your clk-delay patch is working quite differently from the applespi driver. In particular:

  • The value I'm currently using for cs-to-clk delay is actually resetA2RUsec
  • The spiCSDelay value is being used for delaying between individual transfers of a message
  • I'm assuming spiCSDelay is in units of 10µs

Of course, these assumptions could be all wrong, and I'm just getting lucky with things...

Also, I don't think spi_transfer_one_message is being used here, but instead pxa2xx_spi_transfer_one_message (but I need to double check that).

@l1k
Copy link
Contributor

l1k commented Aug 13, 2017

@roadrunner2: Oh, so that won't occur with your spi-properties branch, right? Indeed then it's not necessary to submit a patch.

As for the proper timings, I've just dug around in AppleIntelLpssSpiController.kext and AppleIntelLpssGspi.kext. Basically it should be sufficient to look where IODelay() is called and the only places of interest appear to be in AppleIntelLpssGspi::csCtl(), which seems to delay after asserting CS, and in AppleIntelLpssGspi::transferMmioDuplexSubr() which delays after write and before read. The duration of the delay is stored in the AppleIntelLpssGspi class and I wasn't immediately able to find out where it's set. The delay after asserting CS is most likely coming from the spiCsDelay property, however this would be in usec then, not 10*usec.

The resetA2RUsec and resetRecUsec properties seem uninteresting, they're used by AppleIntelLpssGspi::reset(), and since we never reset the device, I think we can ignore them.

@l1k
Copy link
Contributor

l1k commented Aug 13, 2017

@roadrunner2: My comment above is a reply to yours of 5 hours ago, I hadn't read your new comment before sending it away, sorry. You're certainly right about pxa2xx_spi_transfer_one_message() being used, I mixed it up with the bcm2835 (RasPi) SPI master driver which doesn't define that callback but uses the one in the core.

@roadrunner2
Copy link
Contributor Author

@l1k Right: with your apple properties changes that code becomes obsolete, and hence the race is not encountered.

Regarding timings, that's interesting: we definitely need that 100µs between transfers, in addition to delays after asserting CS. But I've been reworking things a bit, and now the only multi-transfer message is the write-command + read-status, so maybe the delay in there is the write-read delay you saw in the apple code.

The comments about the reset* properties is interesting and make sense - but that then leaves the question of where write-read delay comes from - maybe that's hardcoded? I had picked the spiCSDelay because it was the only one without known units, but the longer I look at things (and the more things get clarified) the worse that choice looks.

@l1k
Copy link
Contributor

l1k commented Aug 13, 2017

@roadrunner2: The macOS driver lets the SPI master do DMA and the purpose of the delay appears to be that it waits for the DMA to complete. So it's not really comparable to how the Linux driver works, which just lets the SPI core invoke a callback upon completion. The macOS driver calculates the delay in AppleIntelLpssGspi::spiTransferUsecCalc() using a somewhat odd formula:

usecWord = ((spiClockPeriod * wordSize + 999) * 0x10624dd3) >> 0x26
usecDpxDelay = (((usecWord * 1000 - 2656) + 999 >> 0x3) * 0x20c49ba5e353f7cf) >> 4

In cases like this we usually just insert a hardcoded delay, see e.g. the udelay(100) in drivers/platform/x86/apple-gmux.c when accessing the GMUX registers.

@roadrunner2
Copy link
Contributor Author

@l1k Thanks for that info. The first line is clear:

usecWord = ((clock-period-in-ns * wordsize + 999) / 1000 = ceil(clock-period-in-ns * wordsize / 1000) = 1

(clock-period-in-ns is 125)

But I can't make sense of that second line; the usecWord * 1000 - 2656 appears to be in 32-bit signed arithmetic, the rest in unsigned 64-bit; and result truncated to 32 bits. But that results in a value of 1958505086 for usecDpxDelay. Still trying to figure out what I'm missing.

@roadrunner2 roadrunner2 force-pushed the touchpad-init-fixes-2 branch from 33e2e87 to df3060f Compare August 28, 2017 09:07
@roadrunner2
Copy link
Contributor Author

Here is a substantially updated set of patches. I have been able to reproduce a number of issues by generating lots of commands to send while at the same time generating a lot of events (mostly by running my finger around the touchpad). As a result I think I've been able to clean up things quite a bit and make them very stable now - I'm unable to produce any failures anymore. As part of this the touchpad-init is now handled/processed like the other commands and should therefore also be robust now.

@peterychuang Could you try this out? You seem to have been able to consistently get errors here.

Many thanks @l1k for the assistance here in figuring some of the delays. I still haven't been able to make sense of the usecDpxDelay calculation above, so just have a hardcoded 100µs delay now (any time we switch between reads and writes).

@peterychuang
Copy link

@roadrunner2 sure. Is it the touchpad-init-fixes-2 branch?

@roadrunner2
Copy link
Contributor Author

@peterychuang Yes; either that or the touchbar-driver-hid-driver branch (the latter also contains the touchbar driver, which you don't need, but is otherwise identical). TIA.

@stefan-langenmaier
Copy link

The latest changes roadrunner2/macbook12-spi-driver@1b97804 in the touchbar-driver-hid-driver branch made the touchpad on a MacbookPro14,2 usable.

@cb22
Copy link
Owner

cb22 commented Sep 5, 2017

@roadrunner2 sorry I've been pretty busy of late so I haven't had the chance to give this project the attention it deserves.

I'll try out this branch on my Macbook9,1 - I occasionally had init problems too so it will be interesting to see if it fixes those.

Other than that, if you can fix the conflict (caused by me merging #42 which I see this includes) I'd be happy to merge it in.

@peterychuang
Copy link

@roadrunner2 I've been using touchbar-driver-hid-driver branch for a little over a week. I've encountered init problem once pretty soon after I switched. But I'm not sure if that one time counts, as the NVMe problem was hitting my machine pretty bad at that one time, so I had to reboot the machine many times before I booted into the DE successfully, and once the boot was successful, I had wifi init problem too in addition to the touchpad problem.

Other than that one time, I haven't encountered another touchpad init problem.

The spi-master-device registration registers the device before it starts
the queue and marks the device as running. Because device registration
immediately runs all class hooks, we end up with a small window where we
are notified that the spi-master device is available but it's not
actually ready to receive commands. Therefore we now check if spi master
is marked running and wait a bit and retry if not.

This fixes some sporadic touchpad ininitialization failures.
We were already doing this for read-after-write (commit 21155ff) and
async-read (commit a6d8c1a). This now adds that for the flush-read
during touchpad init, and refactors applespi_setup_read_txfr() to set up
the delay.
In particular we need to ensure the keyboard input handler is registered
after all structures are initialized because it registers an event
handler and hence we can be called at any point after registration.

Also moved the USB check to the very top, since there's no point in
doing anything if that succeeds.
After sending any command we will receive a response message (this
includes the backlight and caps-lock commands). But that may not be the
next message received. However, all received messages are signaled by an
interrupt. So the receiving and processing of those response messages is
now left to our regular receive handler (currently all we do with the
messages is log them).
This avoids occasional status codes of (9c 17 e5 f8) if a new command is
sent too early.
This is an attempt to fix the last remaining occasional failures due to
insufficient delays. To summarize we now have only two delays:

- the spiCSDelay after asserting CS (for both reads and writes)
- a fixed, 100µs delay whenever switching between writes and reads

In more detail:

spiCSDelay appears to be truly what it says, a delay after asserting CS
and before clocking any data. This also implies that it appears to be in
units of µs. Furthermore this is needed on writes, not just on reads (so
this change includes the write equivalent of commit 3d20af5).

The delay at the end of messages (i.e. before de-asserting CS) appears
to be unnecessary, and was probably an artifact of earlier versions of
this code. Hence removed them.

The resetXXX properties are for resetting the device only, something we
do not currently implement. So stop using them.

Lastly, the 100µs delay between writing and reading is also needed when
switching from reading to writing (otherwise we get occasional (b3 b3 b3
b3) write statuses). Since there is no property in the DSDT that
directly corresponds to this, it is hardcoded for now.
I believe this isn't necessary anymore, with all the fixes that have
occurred in the mean time. And it has the potential to cause an event
to get dropped (though unlikely).
This simplifies the code overall and ensures the contraints on commands
and response are fullfilled for the init command(s) too.
This ensures our callbacks are not called after we've been removed, and
it also ensures that writes have been fully processed (including the
associated response being received), thereby resolving occasional errors
when removing and re-inserting this module.
@roadrunner2 roadrunner2 force-pushed the touchpad-init-fixes-2 branch from df3060f to 1faadac Compare September 6, 2017 05:25
@roadrunner2
Copy link
Contributor Author

@cb22 No problem - I was waiting/hoping for some folks to test this first anyway. I rebased and re-pushed the branch now.

@stefan-langenmaier and @peterychuang: thanks to both of you for testing and reporting back. Good to hear it mostly works; sad to hear it only mostly works - I was really hoping to have nailed this once and for all. But the fact that it fixed this on 14,2, and that only one failure occurred under apparently weird conditions, and my much more extensive tests work, does give me hope we're on the right track now and getting close.

@peterychuang: do you happen to still have the logs from when the init failed? If so, could you show me what the exact error was?

@peterychuang
Copy link

@roadrunner2 unfortunately I haven't kep the log. If I remember it right, it was the same applespi: Error writing to device: b3 b3 b3 b3 thing.

I'll let you know if I encounter the problem again. But as I said, that one time could be a fluke.

@roadrunner2
Copy link
Contributor Author

@peterychuang FYI: IIRC you're using Arch, and that uses systemd and hence journald - you can show the messages from previous boots with journalctl -b -<n> (for n'th previous boot), and journalctl --list-boots makes it easy to narrow down which boots to look at.

Anyway, the b3 b3 b3 b3 has, in my testing, indicated insufficient delay when switching between reads and writes. So maybe under some circumstances the current 100µs delay is not sufficient. Well, let's see how many others run into this. Though if you feel like experimenting, you could always try a larger value for SPI_RW_CHG_DLY.

@peterychuang
Copy link

@roadrunner2 unfortunately that was too long ago that the log had been rotated out.

The good news is, now that a few more days have passed, I haven't encountered any init problem. So it's quite likely that the one init failure I mentioned was a one-off unrelated thing.

@cb22 cb22 merged commit e0283bc into cb22:master Sep 12, 2017
@cb22
Copy link
Owner

cb22 commented Sep 12, 2017

@roadrunner2 merged 👍. On my Macbook9,1 so far it has been looking good.

I also experienced #6 a few times. After doing quite a few reboots, it seems like that no longer happens!

@roadrunner2
Copy link
Contributor Author

@cb22 Thanks for testing and merging.
@peterychuang Ok - I'm sure you'll let us know if you run into again then 😉

@roadrunner2 roadrunner2 deleted the touchpad-init-fixes-2 branch September 14, 2017 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants