mitre/churn
Identify the presence of unusually large commits impacting source code files in a project's history.
Parameter | Type | Explanation |
---|---|---|
langs-file | String | Path to a file specifying how to infer languages. |
churn-freq | Float | Threshold for a Z-score, above which a commit is considered "high churn" |
commit-percentage | Float | Threshold for a percentage of "high churn" commits permitted. |
(lte
(divz
(count (filter (gt {config.churn-freq or 3.0}) $))
(count $))
{config.commit-percentage or 0.02})
mitre/churn
Returns an array of churn Z-scores for all commits identified as modifying
source files. This is not all commits, as the analysis uses heuristics based
on the provided langs-file
to identify which files are likely source files,
and excludes commits which do not modify any likely source files.
Churn analysis attempts to identify the high prevalence of very large commits which may increase the risk of successful malicious contribution. The notion here being that it's easier to hide malicious content in a large commit than in a small one, as malicious contribution relies on getting malicious changes through a normal submission / review process (assuming review is performed).
Churn analysis works by determining the total number of lines and files changed across all commits containing changes to code in a repository, and from that the percentage, per commit, of those totals. For each commit, the file percentage and line percentage are then combined, as file frequency times line frequency squared, times 1,000,000, to produce a score. These scores are then normalized into Z-scores, to produce the final churn value for each commit. These churn values therefore represent how much the size of a given commit differs from the average for the repository.
Churn cannot run if a repository contains only one commit (or only one commit that affects a source file). Churn analysis will always give an error when run against a repo with a single commit.