# Adversarial Tutorial ARLIN can be used from an adversarial standpoint to identify the optimal timing for attacks against a policy. Most adversarial methods focus on attacking at a given frequency, or by measuring internal metrics from the model and choose actions based on which actions are seen as "the worst" by the model. Using ARLIN, we can identify clusters that represent mission failure and use the SAMDP to analyze which clusters we should attack in and which actions to take to ensure the policy ends in a failure. Since the SAMDP maps actions that the agent has actually taken in the XRLDataset, the resulting attacks will not only be effective, but also less noticeable to an observer as the attack will look more natural and similar to how the policy would truly react in the scenario. Below is a simple example of ARLIN being used for adversarial attack enhancement.
Figure 1. Example workflow of how ARLIN can be used for adversarial purposes.
Figure 2. Gifs created from baseline performance and traditional adversarial attacks. Baseline is the policy with no attacks, Worst1 is the worst possible action at every step , Worst10 is the worst possible action at every 10 steps, and Pref75 takes the worst possible action when the delta between the probabilities of the most and least probable action is above a threshold of .75 (left to right: Baseline, Worst1, Worst10, Pref75).
```python from arlin.samdp import SAMDP samdp.save_terminal_paths('./paths_into_23.png`, best_path=True, term_cluster_id=23) ```
Figure 3. Neighboring clusters and associated actions for moving into Cluster 23.
Figure 4. ARLIN influenced attack, showing how the attack results in a reasonable failure as opposed to an obvious one as seen in Figure 1.
Figure 5. Detectability metrics illustrating ARLIN's usage in adversarial attack timing. Left to right: cosine similarity over an episode, cumulative reward over an episode (same episode as cosine similarity), average kl_divergence distribution over 25 episodes.
In Figure 5, attacked plots for cosine similarity and cumulative reward that closely resemble the baseline (non-attacked) plot are considered less detectable attacks to an observer. As the attacked plot strays from the non-attacked plot, the attack will become more visible as it is very out of the norm. Attacks that closely follow the path of the non-attacked agent except at limited points will look more like an error than an attack. In both figures, ARLIN follows the non-attacked policy for most of the episode, and only strays once the target clusters are reached, resulting in a quick movement to the target failure state and a reduction in overall reward received. In the KL Divergence graph in Figure 5, distributions that are closer to 0 are less detectable to an observer. If the action distribution of an attacked policy is similar to the distribution of the non-attacked policy, then the adversarial action is seen as a “reasonable” action by the policy. ARLIN’s distribution is the closest to 0 out of all methods, meaning the adversarial actions identified by ARLIN are more “reasonable” and therefore less detectable.