-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(target-gh): Check asset MD5 hash by downloading the asset #328
Changes from all commits
469925d
a5a362a
5083340
e016782
4c889a7
477308a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ import { Octokit, RestEndpointMethodTypes } from '@octokit/rest'; | |
import { RequestError } from '@octokit/request-error'; | ||
import { readFileSync, promises, statSync } from 'fs'; | ||
import { basename } from 'path'; | ||
import { createHash } from 'crypto'; | ||
import { BinaryLike, createHash } from 'crypto'; | ||
|
||
import { getConfiguration } from '../config'; | ||
import { | ||
|
@@ -378,12 +378,33 @@ export class GithubTarget extends BaseTarget { | |
); | ||
} | ||
|
||
// XXX: This is a bit hacky as we rely on two things: | ||
// 1. GitHub issuing a redirect to S3, where they store the artifacts, | ||
// or at least pass those request headers unmodified to us | ||
// 2. AWS S3 using the MD5 hash of the file for its ETag cache header | ||
// when we issue a HEAD request. | ||
const response = await this.github.request(`HEAD ${url}`, { | ||
const remoteChecksum = await this.checksumFromUrl(url); | ||
const localChecksum = this.checksumFromData(file); | ||
if (localChecksum !== remoteChecksum) { | ||
throw new Error( | ||
`Uploaded asset checksum does not match local asset checksum for "${name} (${localChecksum} != ${remoteChecksum})` | ||
); | ||
} | ||
uploadSpinner.succeed(`Uploaded asset "${name}".`); | ||
return url; | ||
} catch (e) { | ||
uploadSpinner.fail(`Cannot upload asset "${name}".`); | ||
|
||
throw e; | ||
} | ||
} | ||
|
||
private async checksumFromUrl(url: string): Promise<string> { | ||
// XXX: This is a bit hacky as we rely on various things: | ||
// 1. GitHub issuing a redirect to AWS S3. | ||
// 2. S3 using the MD5 hash of the file for its ETag cache header. | ||
// 3. The file being small enough to fit in memory. | ||
// | ||
// Note that if assets are large (5GB) assumption 2 is not correct. See | ||
// https://github.com/getsentry/craft/issues/322#issuecomment-964303174 | ||
let response; | ||
try { | ||
response = await this.github.request(`HEAD ${url}`, { | ||
headers: { | ||
// WARNING: You **MUST** pass this accept header otherwise you'll | ||
// get a useless JSON API response back, instead of getting | ||
|
@@ -395,32 +416,40 @@ export class GithubTarget extends BaseTarget { | |
Accept: DEFAULT_CONTENT_TYPE, | ||
}, | ||
}); | ||
} catch (e) { | ||
throw new Error( | ||
`Cannot get asset on GitHub. Status: ${(e as any).status}\n` + e | ||
); | ||
} | ||
|
||
const etag = response.headers['etag']; | ||
if (etag && etag.length > 0) { | ||
// ETag header comes in quotes for some reason so strip those | ||
const remoteChecksum = response.headers['etag']?.slice(1, -1); | ||
if (remoteChecksum) { | ||
// XXX: This local checksum calculation only holds for simple cases: | ||
// https://github.com/getsentry/craft/issues/322#issuecomment-964303174 | ||
const localChecksum = createHash('md5').update(file).digest('hex'); | ||
if (localChecksum !== remoteChecksum) { | ||
logger.trace(`Checksum mismatch for "${name}"`, response.headers); | ||
throw new Error( | ||
`Uploaded asset MD5 checksum does not match local asset checksum for "${name} (${localChecksum} != ${remoteChecksum})` | ||
); | ||
} | ||
uploadSpinner.succeed(`Uploaded asset "${name}".`); | ||
} else { | ||
logger.warn(`Response headers:`, response.headers); | ||
uploadSpinner.succeed( | ||
`Uploaded asset "${name}, but failed to find a remote checksum. Cannot verify uploaded file is as expected.` | ||
); | ||
} | ||
return url; | ||
} catch (e) { | ||
uploadSpinner.fail(`Cannot upload asset "${name}".`); | ||
return etag.slice(1, -1); | ||
} | ||
|
||
throw e; | ||
return await this.md5FromUrl(url); | ||
} | ||
|
||
private async md5FromUrl(url: string): Promise<string> { | ||
this.logger.debug('Downloading asset from GitHub to check MD5 hash: ', url); | ||
let response; | ||
try { | ||
response = await this.github.request(`GET ${url}`, { | ||
headers: { | ||
Accept: DEFAULT_CONTENT_TYPE, | ||
}, | ||
}); | ||
} catch (e) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if catching the original exception and throwing a new one is actually good here as you obscure the original issue. In debug mode, we already log the response status etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @BYK !! In this case here, we were seeing something like "Failed: Not Found" mixed up in the logs. So the The original error is still printed along with a custom message. Do you think we could do better? |
||
throw new Error( | ||
`Cannot download asset from GitHub. Status: ${(e as any).status}\n` + e | ||
chadwhitacre marked this conversation as resolved.
Show resolved
Hide resolved
|
||
); | ||
} | ||
return this.checksumFromData(Buffer.from(response.data)); | ||
} | ||
|
||
private checksumFromData(data: BinaryLike): string { | ||
return createHash('md5').update(data).digest('hex'); | ||
} | ||
|
||
private async handleGitHubUpload( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating MD5 hash from a streaming data is quite trivial in Node so I'd encourage you to revisit this approach in a follow up PR as it causes unnecessary memory usage. We may have rather large files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @iker-barriocanal and I talked extensively about this.
Trade-offs:
request
method seems to buffer the full response payload, so if we wanted to work with streams we'd need to depart from usingrequest
. I believe we're originally using Octokit in this code path to benefit from the different rate limits given to authenticated requests -- @BYK can you confirm that?readSync
!) full asset files from disk into memory before uploading. We use the same bytes in memory to calculate the "local checksum". So the assumption that files are small enough was implicitly already there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Octokit takes care of a bunch of things from authentication to rate limiting to parallelization to automatic retries. So I'd say stick with it as long as it can. If solving this issue requires a departure from Octokit, delay that as long as possible and file an issue with them ahead of time so they might be able to bring this into the library itself.
Great catch. This is again due to Octokit limitations tho as it does not support streams or piping, only buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
References to our future selves (?):
2018 was a fine year, right? :)