Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: git_stat_files() #435

Open
jdblischak opened this issue Jul 16, 2021 · 3 comments
Open

Feature request: git_stat_files() #435

jdblischak opened this issue Jul 16, 2021 · 3 comments

Comments

@jdblischak
Copy link
Contributor

The gert package has a nice function called git_stat_files(). For each file passed to it, it returns the most recent commit, modification time, and more. Would it be possible to implement a similar function in git2r?

@stewid
Copy link
Member

stewid commented Jul 17, 2021

Thanks, that is indeed a nice function. I've sketched on a similar function that I can add to git2r

git_stat_files <- function(files, ref = "HEAD", repo = '.') {
    do.call("rbind", lapply(as.character(files), function(file) {
        created <- NA_character_
        modified <- NA_character_
        commits <- 0L
        authors <- 0L
        head <- NA_character_

        x <- commits(repo = repo, ref = ref, path = file)
        if (length(x)) {
            created <- when(x[[length(x)]])
            modified <- when(x[[1]])
            commits <- length(x)
            authors <- length(unique(sapply(x, function(y) y$author$name)))
            head <- sha(x[[1]])
        }

        data.frame(file = file,
                   created = as.POSIXct(created),
                   modified = as.POSIXct(modified),
                   commits = commits,
                   authors = authors,
                   head = head)
    }))
}

@jdblischak
Copy link
Contributor Author

@stewid Thanks for the quick response! I tested the function. One suggestion is to limit the number of commits returned. This reduced it from 20 seconds to 2 seconds when I ran git_stat_files() on the 36 R files in git2r/R/.

x <- commits(repo = repo, ref = ref, path = file, n = 1)

There is still a speed difference though. The gert implementation is twice as fast (1s vs 2s). I only mention this because this is a bottleneck step in my code. Since it gets called a lot, I am investigating how to reduce the computation time.

library(git2r)

r <- clone(
  url = "https://github.com/ropensci/git2r.git",
  local_path = tempfile()
)

git_stat_files <- function(files, ref = "HEAD", repo = '.') {
  do.call("rbind", lapply(as.character(files), function(file) {
    created <- NA_character_
    modified <- NA_character_
    commits <- 0L
    authors <- 0L
    head <- NA_character_
    
    x <- commits(repo = repo, ref = ref, path = file, n = 1)
    if (length(x)) {
      created <- when(x[[length(x)]])
      modified <- when(x[[1]])
      commits <- length(x)
      authors <- length(unique(sapply(x, function(y) y$author$name)))
      head <- sha(x[[1]])
    }
    
    data.frame(file = file,
               created = as.POSIXct(created),
               modified = as.POSIXct(modified),
               commits = commits,
               authors = authors,
               head = head)
  }))
}

files <- Sys.glob(file.path(workdir(r), "R", "*R"))
files_relative <- sub(paste0(workdir(r), "/"), "", files)

system.time(
  stat_git2r <- git_stat_files(files_relative, repo = r)
)
##   user  system elapsed 
##   1.60    0.47    2.06 

library(gert)

system.time(
  stat_gert <- gert::git_stat_files(files_relative, repo = workdir(r))
)
##   user  system elapsed 
##   0.92    0.17    1.09 

unlink(workdir(r), recursive = TRUE)

@stewid
Copy link
Member

stewid commented Jul 20, 2021

@jdblischak thanks for the feedback. I'm working on a faster version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants