Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching git heights #801

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

AArnott
Copy link
Collaborator

@AArnott AArnott commented Sep 5, 2022

As first seen in #499, which was inadvertently closed prior to merging. Recreating here for review and consideration.

Summary

Adds caching of git heights. As calculating the height of repositories with large volumes of commits is expensive, caching the git heights can save time in the following circumstances:

  • Repetitive invocations of the GetBuildVersion msbuild task- e.g. during dotnet pack'ing a project, the GetBuildVersion task is invoked four times
  • Incremental versioning- in cases where a cached height is available for older commits, this value will be used avoid the cost of recalculating the entire git height (only new commits will need to be traversed)

Implementation details

When can cached results be used?

  • The base version to calculate the height for is the same as the cached base version
  • The relative path of the cached height matches the requested relative path
  • The commit to calculate the height for is either the same OR a child of the cached commit id

Opting out

The caching behavior is enabled by default but can be opted-out by setting the new NerdbankGitVersioningUseHeightCache msbuild property to false

Cache file

A version.cache.json file is created per-project with contents that look like this:

/*Cached commit height, created by Nerdbank.GitVersioning. Do not modify.*/{"BaseVersion":"1.0","Height":1636,"CommitId":"f30bb9e26af1dc80f5c38f43d9d6a7b4f770bb14","RelativeProjectDir":""}

Testing

Automated testing

  • Verified GitHeightCache can serialize + deserialize heights correctly
  • Verified caching has a measurable impact on performance for cases when there are many commits to traverse
  • All existing tests pass

Manual testing

Consumed locally-packed version of Nerdbank.GitVersioning in a C# project and added ~1500 commits, verifying:

  • Height caching takes effect on second build, dramatically decreasing build time (~10s -> ~1s)
  • Adding an additional single commit can leverage the cached version for the previous ~1500 commits but adds the latest
  • Setting the NerdbankGitVersioningUseHeightCache property to false bypasses the caching behavior.

djluck added 10 commits August 27, 2020 10:03
Adds caching of git heights. As calculating the height of repositories with large volumes of commits is expensive, caching the git heights can save time in the following circumstances:
- Repetitive invocations of the `GetBuildVersion` msbuild task- e.g. during `dotnet pack`'ing a project, the `GetBuildVersion` task is invoked four times
- Incremental versioning- in cases where a cached height is available for older commits, this value will be used avoid the cost of recalculating the entire git height (only new commits will need to be traversed)

The caching behavior is enabled by default but can be opted-out by setting the new `NerdbankGitVersioningUseHeightCache` msbuild property to `false`

## Testing
### Automated testing
- Verified `GitHeightCache` can serialize + deserialize heights correctly
- Verified caching has a measurable impact on performance for cases when there are many commits to traverse
- All existing tests pass

### Manual testing
Consumed locally-packed version of `Nerdbank.GitVersioning` in a C# project and added ~1500 commits, verifying:
- Height caching takes effect on second build, dramatically decreasing build time (~10s -> ~1s)
- Adding an additional single commit can leverage the cached version for the previous ~1500 commits but adds the latest
- Setting the `NerdbankGitVersioningUseHeightCache` property to `false` bypasses the caching behavior.
…mmits with multiple parents (any cached height would be used and could prevent the relevant commit graph being walked)
…now cached on disk next to the `version.json`/ `version.txt` file in order that performance improvements can be seen by multiple projects.
@AArnott
Copy link
Collaborator Author

AArnott commented Sep 5, 2022

CC @djluck @TFTomSun

if (versionHeightPosition.HasValue)
{
int height = commit.GetHeight(repoRelativeProjectDirectory, c => CommitMatchesVersion(c, baseSemVer, versionHeightPosition.Value, tracker));
var cache = new GitHeightCache(commit.GetRepository().Info.WorkingDirectory, versionOptions.RelativeFilePath, baseVersion);
Copy link

@djluck djluck Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be constructing the git height cache only when useHeightCaching is true.

@djluck
Copy link

djluck commented Dec 9, 2022

Hey @AArnott, apologies for the incredibly slow response- work was incredibly stressful for a period and the idea of finding the time to look through an MR I wrote a good year ago was just too much!

So full disclosure: it's been a long time since I've thought about this MR and may not be the most reliable reviewer :) I left a comment on one minor thing I think we could improve. I think we should also provide some updates to the documentation regarding this, informing people about:

  • Why caching can be useful
  • How to opt out
  • What version.cache.json is to add it to their .gitignore files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants