This page lists Hipcheck's analyses with the names they are given in the configuration file, what their data source is, the details of the analysis performed, what its current limitations are, and how the results of that analysis are thresholded based on the configuration:
analysis.practices.activity
HEAD
branch)Activity analysis looks at the date of the most recent commit to the branch
pointed to by HEAD
in the repository. In the case of a local repository
source, that may be a branch other than the default. In the case of a remote
repository, it will always be the default branch on the remote host.
Hipcheck identifies the committed date of the most recent commit, and calculates the number of weeks between that commit and the day Hipcheck is performing this analysis. It then compares that duration against the configured threshold (default configuration: 71 weeks / one year). If the duration in the repository is greater than the configured threshold, then the analysis will be marked as a failure.
analysis.attacks.commit.affiliation
Affiliation analysis tries to identify when commit authors or committers may be affiliated or unaffiliated with some list of organizations. This determination is based on the email address associated with authors or committers on each Git commit, compared against a configured list of web hosts associated with organizations of concern.
The construction of the list is based on an "orgs file," whose path is provided in the configuration of this form of analysis. This orgs file defines two things: 1) a list of organizations, including web hosts associated with them, and the name of the country to which they primarily belong, and 2) a "strategy" for how the list of to-be-flagged hosts should be constructed.
The strategy defines the list of organizations to be included in the list of those considered when checking affiliation, and whether the analysis should flag commits from those affiliated with the list of organizations, or independent from the list of organizations (for completeness, it also permits all or none, which would flag all commits, or none of them).
If the strategy
key is used in the configuration, then all organizations
listed in the "orgs file" are implicitly included in the list of organizations
to consider.
If the strategy_spec
table is used, then strategy_spec.mode
and
strategy_spec.list
keys must be defined. The strategy_spec.mode
key accepts
the same set of values (affiliated
, independent
, all
, or none
) as the
strategy
key, while list
accepts an array of strings in one of two forms:
"country:<country_name>"
or "org:<org_name>"
. The first form will include
in the list of organizations all those organizations which are associated with
the named country, while the second form will include in the list a single
organization with the given name.
To illustrate this, imagine the following strategy specification:
[strategy_spec]
mode = "affiliated"
kind = ["country:United States", "org:MITRE"]
This strategy spec would flag any commits those authors or committers can be identified as being affiliated with any American company listed in the file or with MITRE specifically.
analysis.practices.binary
Binary analysis searches through all of the files in the repository for binary files (i.e. files not in readable text) that may contain code. There is a high liklihood that these are deliberately malicious insertions. The precense of such files could indicate the precense of malicious code in the repository and is a cause for suspicion.
The analysis works by searching through the entire repository filetree. It identifies all binary files and filters out files that are obviously not code (e.g. images or audio files). If, after filtering, more binary files remain than the configured thershold amount, the repository fails this analysis.
The analysis displays the internal filetree location of each suspicious binary file. The user can then examine each file to determine if it is malicious or not.
Not all binary files may be malicious: The repo may use certain binary files (beyond image and audio files) for legitimate purposes. This analysis does not investigate what the files do, only that they exist.
No additional information on binary files: Hipcheck does not currently return any additional information about the suspcious files, only their locations in the repo filetree. The user must search for them manually if they wish to learn more about them.
analysis.attacks.commit.churn
Churn analysis attempts to identify the high prevalence of very large commits which may increase the risk of successful malicious contribution. The notion here being that it's easier to hide malicious content in a large commit than in a small one, as malicious contribution relies on getting malicious changes through a normal submission / review process (assuming review is performed).
Churn analysis works by determining the total number of lines and files changed across all commits containing changes to code in a repository, and from that the percentage, per commit, of those totals. For each commit, the file percentage and line percentage are then combined, as file frequency times line frequency squared, times 1,000,000, to produce a score. These scores are then normalized into Z-scores, to produce the final churn value for each commit. These churn values therefore represent how much the size of a given commit differs from the average for the repository.
Churn cannot run if a repository contains only one commit (or only one commit that affects a source file). Churn analysis will always give an error when run against a repo with a single commit.
analysis.attacks.commit.entropy
Entropy analysis attempts to identify commits which contain a high degree of textual randomness, in the believe that high textual randomness may indicate the presence of packed malware or obfuscated code which ought to be assessed for possible malicious content.
Entropy analysis works by determining the total number of occurrences for all unicode graphemes which appear in a repository's Git diffs for commits which include code. In then converts these occurence counts into frequencies based on the total number of each individual grapheme divided by the total number of all graphemes in the combined set of Git diffs. It also determines grapheme frequencies for each commit individually. These individual and total grapheme frequencies are then combined into a score as an individual frequency times the log base 2 of the individual frequency divided by the total frequency. These individual grapheme scores are then summed to produce a per-commit score, which is normalized into a Z-score same as the churn metric. These entropy values therefore represent how much the grapheme frequency map of a given commit differs from the average set of grapheme frequencies across all commits.
Entropy cannot run if a repository contains only one commit (or only one commit that affects a source file). Entropy analysis will always give an error when run against a repo with a single commit.
analysis.practices.fuzz
Repos being checked by Hipcheck may receive regular fuzz testing. This analysis checks if the repo is participating in the OSS Fuzz program. If it is fuzzed, this is considered a signal of a repository being lower risk.
Not all languagues supported: Robust fuzzing tools do not exist for every language. It is possible fuzz testing was not done because no good option for it existed at the time. Lack of fuzzing in those cases would still indicate a higher risk, but it would not necessarily indicate bad software development practices.
Only OSS Fuzz checked: At this time, Hipcheck only checks if the repo participates in Google's OSS Fuzz. Other fuzz testing programs exist, but a repo will not pass this analysis if it uses one of those instead.
analysis.practices.identity
Identity analysis looks at whether the author and committer identities for each commit are the same, as part of gauging the likelihood that commits are receiving some degree of review before being merged into a repository.
When author and committer identity are the same, that may indicate that a commit did not receive review, which could be a cause for concern. At the larger level, having a large percentage of commits with the same author and committer identities may indicate a project that lacks code review.
analysis.practices.review
Review analysis looks at whether pull requests on GitHub (currently the only supported remote host for this analysis) receive at least one review prior to being merged.
If too few pull requests receive review prior to merging, then this analysis will flag that as a supply chain risk.
This works with the GitHub API, and requires a token in the configuration. Hipcheck only needs permissions for accessing public repository data, so those are the only permissions to assign to your generated token.
analysis.attacks.typo
Typo analysis attempts to identify possible typosquatting attacks in the dependency list for any projects which are analyzed and use a supported language (currently: JavaScript w/ the NPM package manager).
The analysis works by identifying a programming language based on the presence of a dependency file in the root of the repository, then attempting to get the full list of direct and transitive dependencies for that project. It then compares that list against a list of known popular repositories for that language to see if any in the dependencies list are possible typos of popular package name.
Typo detection is based on the generation of possible typos for known names, according to a collection of typo possibilities, including single-character deletion, substitution, swapping, and more.