In Hipcheck, the target resolution process finds a repository matching the specifications a user provides on the command line in the form of a target specifier string, optional type hint, and additional arguments. It then copies or clones that repository to the Hipcheck cache and checks it out to a suitable commit. As the Hipcheck team works to implement RFD #4, Hipcheck as a codebase will transition from housing analysis techniques to becoming an engine for orchestrating analyses implemented by plugins. As a result of this transition, the target resolution phase will become a more essential and critical part of what the core Hipcheck repo does.
As the Hipcheck team looks to provide users with more flexibility in repo/commit
specification (such as via the --ref
argument for hc check
), we have found
it difficult to debug the existing resolution code and add new features to it,
which is a good indication that the resolution subsystem needs a rewrite. This
RFD aims to do the following:
An example of a pain point in the existing codebase involves how package
versions are treated by the later remote repo resolution phase. When a user
targets a package for analysis, they may provide a version number in the target
string itself (e.g. node-ipc@10.1.0
), or provide a git reference to checkout
via the --ref
flag. We do perform a sanity-check before we even get to target
resolution that a user hasn't provided both a version string and a --ref
flag.
However, once we get to doing RemoteGitRepo
resolution, the version string if
provided collapses down to the same parameter as the --ref
flag, refspec
.
We've discovered that (unsurprisingly), a package's version numbering scheme
doesn't always map directly to the tags in its repository (e.g. hipcheck
version 3.5.0
has the tag hipcheck-v3.5.0
, not simply 3.5.0
). Therefore,
specifically when we go through Package
resolution, we would like to be able
to do "fuzzy matching" of the version to tags in the repository. We need the
local repository clone to be able to do that matching, but once we get to
cloning/updating a remote repository on the local machine, we have lost the
context that we started resolving from a package and that we would like to
handle the ref checkout differently if an error occurs (e.g. because the version
passed as a ref does not directly match a tag in the repository).
Fixing this problem requires designing the resolution in such a way that it remembers the context of previous resolution steps.
Overarching goals of the refactor are as follows:
A developer's ability to debug the target resolution subsystem will be proportional to how clearly the code itself reflects the business logic we want the resolution to follow. This includes clearly delineating the "phases" of resolution and what class of logic belongs where. For example, in the example above, is version fuzzy-matching logic something that the "package resolution" function should handle when the inner "remote repo resolution" function fails, or should it be a part of the "checkout" function that is aware of the context that it is operating on something that was previously a package?
Knowing what previous resolution steps were taken, being able to access what
they produced, being able to mutate internal state of the yet-unfinished
Target
struct are important to allow custom logic for particular resolution
"paths". As the illustrating example shows, the
current resolution subsystem does not adequately convey context to subsequent
resolution step.
A principled design will make it easy to distinguish between a "finished"
representation of a Target
and one that is actively being modified as part of
resolution.
TargetResolver
struct TargetSeed {
// What was passed on the CLI
pub specifier: String,
// What we resolved from it
pub kind: TargetSeedKind,
// Options related to repo resolution
pub refspec: Option<String>,
}
struct Target {
pub specifier: String,
// All instances must point to a valid repository
pub local: LocalGitRepo,
pub remote: Option<RemoteGitRepo>,
pub package: Option<Package>
}
One of the issues with the current design is that we do a direct transition
from TargetSeed
to Target
, with no intermedate state. Target
was designed
to represent a ground-truth, "finished" resolution, and so the local
field is
non-optional - indicating a Target
instance should always be pointing to a
valid local repository. So although our refactor design goals involve
maintaining state during resolution, Target
itself is not ideal for this role
as we need to put a dummy value for local
until we finish cloning a
repository to the cache when our seed is anything but existing local
repository.
struct TargetResolver {
seed: TargetSeed,
local: Option<LocalGitRepo>,
remote: Option<RemoteGitRepo>,
package: Option<Package>,
maven: Option<MavenPackage>,
sbom: Option<Sbom>,
}
Enter the TargetResolver
. This struct takes over from Target
as the
intermediate resolution state. All of the fields are meant to be mutable except
for seed
, which ensures resolution steps can always refer to exactly what was
originally passed. As intermediate resolution steps are taken, corresponding
optional fields (initialized to None
) are set to Some(_)
. Not all fields
need be filled with a Some
value, but eventually once local.is_some()
we
know the resolution is finished and we can synthesize and return a Target
.
This achieves the design goal of distinguishing finished and unfinished
resolutions; once another piece of code gets a Target
instance, all the
package version, git ref, and other information can be trusted to be accurate.
Note that this contains maven
, and sbom
fields that are not in Target
. We
expect that once a Target
is resolved, these fields will not be of interest
for analysis.
To define how resolution goes from a loose bag of Option
values to a Target
,
we applied the Strategy design pattern. We define a trait
ResolveRepo
for each TargetSeedKind
variant to determine the steps it will
take.
trait ResolveRepo {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo>;
}
Generally, the idea is for a type to produce another TargetSeedKind
type, and
then invoke that type's resolve()
implementation. The function takes a
mutable reference to TargetResolver
, allowing the state to be mutated and
then passed to the next resolve()
.
impl ResolveRepo for Sbom {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
let sbom = t.sbom.as_ref().unwrap();
// sbom_to_package()
let p: Package = todo!();
// Update the resolver context
t.package = Some(p.clone());
// Move onto resolving the package
p.resolve(t)
}
}
Instead of returning a Result<Target>
, we return a Result<LocalGitRepo>
,
allowing the TargetResolver
class itself to produce and return the Target
.
impl TargetResolver {
pub fn resolve(seed: TargetSeed) -> Result<Target> {
let mut resolver = TargetResolver {
seed: seed.clone(),
local: None,
remote: None,
package: None,
maven: None,
sbom: None,
};
use TargetSeedKind::*;
// Resolution logic depends on seed
let local = match seed.kind {
Sbom(sbom) => {
resolver.sbom = Some(sbom.clone());
sbom.resolve(&mut resolver)
},
MavenPackage(maven) => {
resolver.maven = Some(maven.clone());
maven.resolve(&mut resolver)
},
Package(pkg) => {
resolver.package = Some(pkg.clone());
pkg.resolve(&mut resolver)
},
RemoteRepo(repo) => {
resolver.remote = Some(repo.clone());
repo.resolve(&mut resolver)
},
LocalRepo(local) => {
resolver.local = Some(local.clone());
local.resolve(&mut resolver)
},
}?;
// Finally piece together the Target with the non-optional local repo
Ok(Target {
seed: resolver.seed,
local,
remote: resolver.remote,
package: resolver.package,
})
}
}
With this design, we make the choice to organize logic by subphase rather than
by TargetSeedKind
type. Referring back to the illustrating
example, although the fuzzy version matching logic
is specific to resolution paths with a Package
ancestry, instead of in
Package::resolve()
we will implement that logic in LocalGitRepo::resolve()
,
leveraging the context provided by the passed TargetResolver
to allow us to
express this logic. For example:
impl TargetResolver {
pub fn get_checkout_target(&mut self) -> Result<String> {
let res = if let Some(pkg) = &self.package {
// if version != "no version", try fuzzy match against repo, we know
// we already have a local repo to use
// if version fuzzy match fails, and self.seed.ignore_version_errors, use "origin/HEAD"
todo!()
} else if let Some(refspec) = &self.seed.refspec {
// if ref provided on CLI, use that
todo!()
} else {
use TargetSeedKind::*;
match &self.seed.kind {
LocalRepo(_) => "HEAD",
RemoteRepo(_) => "origin/HEAD",
_ => { return Err(anyhow!("please provide --ref")); }
}.to_owned()
};
Ok(res)
}
}
impl ResolveRepo for LocalGitRepo {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
let checkout_tgt: String = t.get_checkout_target()?;
git::checkout(&checkout_tgt)?;
todo!()
}
}
This get_checkout_target()
function returns what ref the local repo should be
changed to point to. See that we are able to express the custom version
fuzzy-matching if the resolution ancestry includes a package, plus custom
default targets if the TargetSeedKind
was a local or remote git repo.
struct TargetSeed {
// What was passed on the CLI
pub specifier: String,
// What we resolved from it
pub kind: TargetSeedKind,
// Options related to repo resolution
pub refspec: Option<String>,
pub ignore_version_errors: bool,
}
struct Target {
pub seed: TargetSeed,
pub local: LocalGitRepo,
pub remote: Option<RemoteGitRepo>,
pub package: Option<Package>
}
struct TargetResolver {
seed: TargetSeed,
local: Option<LocalGitRepo>,
remote: Option<RemoteGitRepo>,
package: Option<Package>,
maven: Option<MavenPackage>,
sbom: Option<Sbom>,
}
impl TargetResolver {
pub fn get_checkout_target(&mut self) -> Result<String> {
let res = if let Some(pkg) = &self.package {
// if version != "no version", try fuzzy match against repo, we know
// we already have a local repo to use
// if version fuzzy match fails, and self.seed.ignore_version_errors, use "origin/HEAD"
todo!()
} else if let Some(refspec) = &self.seed.refspec {
// if ref provided on CLI, use that
todo!()
} else {
use TargetSeedKind::*;
match &self.seed.kind {
LocalRepo(_) => "HEAD",
RemoteRepo(_) => "origin/HEAD",
_ => { return Err(anyhow!("please provide --ref")); }
}.to_owned()
};
Ok(res)
}
pub fn resolve(seed: TargetSeed) -> Result<Target> {
let mut resolver = TargetResolver {
seed: seed.clone(),
local: None,
remote: None,
package: None,
maven: None,
sbom: None,
};
use TargetSeedKind::*;
// Resolution logic depends on seed
let local = match seed.kind {
Sbom(sbom) => {
resolver.sbom = Some(sbom.clone());
sbom.resolve(&mut resolver)
},
MavenPackage(maven) => {
resolver.maven = Some(maven.clone());
maven.resolve(&mut resolver)
},
Package(pkg) => {
resolver.package = Some(pkg.clone());
pkg.resolve(&mut resolver)
},
RemoteRepo(repo) => {
resolver.remote = Some(repo.clone());
repo.resolve(&mut resolver)
},
LocalRepo(local) => {
resolver.local = Some(local.clone());
local.resolve(&mut resolver)
},
}?;
// Finally piece together the Target with the non-optional local repo
Ok(Target {
seed: resolver.seed,
local,
remote: resolver.remote,
package: resolver.package,
})
}
}
trait ResolveRepo {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo>;
}
impl ResolveRepo for Package {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
todo!()
}
}
impl ResolveRepo for MavenPackage {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
todo!()
}
}
impl ResolveRepo for Sbom {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
let sbom = t.sbom.as_ref().unwrap();
// sbom_to_package()
let p: Package = todo!();
// Update the resolver context
t.package = Some(p.clone());
// Move onto resolving the package
p.resolve(t)
}
}
impl ResolveRepo for RemoteGitRepo {
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
todo!()
}
}
impl ResolveRepo for LocalGitRepo {
// logic for refspec/version resolution, can look at t.seed to distinguish
// between the two
fn resolve(self, t: &mut TargetResolver) -> Result<LocalGitRepo> {
let checkout_tgt: String = t.get_checkout_target()?;
// git::checkout(&checkout_tgt)?;
todo!()
}
}