-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix checksum verification #56
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.; ``` release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20 uncompressed size: 1461506 compressed size: 365359 expected size: 365376 ``` These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability: - Removing the `AsyncBufWriter` Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer? - Setting `reader.multiple_members(true)` You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush. Surprisingly, trying to explicitly `flush` or `shutdown` the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. And I'm still not 100% convinced the instability is completely fixed but, so far, this change seems like a reasonable thing to try and observe.
This comment was marked as outdated.
This comment was marked as outdated.
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.; ``` release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20 uncompressed size: 1461506 compressed size: 365359 expected size: 365376 ``` These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability: - Removing the `AsyncBufWriter` Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer? - Setting `reader.multiple_members(true)` You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush. Surprisingly, trying to explicitly `flush` or `shutdown` the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. And I'm still not 100% convinced the instability is completely fixed but, so far, this change seems like a reasonable thing to try and observe.
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.; ``` release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20 uncompressed size: 1461506 compressed size: 365359 expected size: 365376 ``` These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability: - Removing the `AsyncBufWriter` Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer? - Setting `reader.multiple_members(true)` You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush. Surprisingly, trying to explicitly `flush` or `shutdown` the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. And I'm still not 100% convinced the instability is completely fixed but, so far, this change seems like a reasonable thing to try and observe.
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.; ``` release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20 uncompressed size: 1461506 compressed size: 365359 expected size: 365376 ``` These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability: - Removing the `AsyncBufWriter` Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer? - Setting `reader.multiple_members(true)` You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush. Surprisingly, trying to explicitly `flush` or `shutdown` the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. And I'm still not 100% convinced the instability is completely fixed but, so far, this change seems like a reasonable thing to try and observe.
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.; ``` release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20 uncompressed size: 1461506 compressed size: 365359 expected size: 365376 ``` These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability: - Removing the `AsyncBufWriter` Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer? - Setting `reader.multiple_members(true)` You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush. Surprisingly, trying to explicitly `flush` or `shutdown` the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. And I'm still not 100% convinced the instability is completely fixed but, so far, this change seems like a reasonable thing to try and observe.
edmorley
approved these changes
Oct 10, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.;
These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the
async_copy(reader, writer)
process. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability:Removing the
AsyncBufWriter
Since we need an
AsyncBufReader
in front of theGzipDecoder
, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer?Setting
reader.multiple_members(true)
You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush.
Surprisingly, trying to explicitly
flush
orshutdown
theAsyncBufWriter
did not provide the expected stability for correcting these checksum issues. I've modified the code to explicitly callflush()
on the write stream anyways.I'm still not 100% convinced the instability is completely fixed but, so far, these changes seems reasonable thing to try out and observe for failure.
Fixes #53
Link to test runs showing what appears to be increased stability (all passes for 150 integration test runs):
Where, previous to this, a checksum failure would happen during almost every other CI run.