We study automated intrusion prevention using reinforcement learning. In a
novel approach, we formulate the problem of intrusion prevention as an optimal
stopping problem. This formulation allows us insight into the structure of the
optimal policies, which turn out to be threshold based. Since the computation
of the optimal defender policy using dynamic programming is not feasible for
practical cases, we approximate the optimal policy through reinforcement
learning in a simulation environment. To define the dynamics of the simulation,
we emulate the target infrastructure and collect measurements. Our evaluations
show that the learned policies are close to optimal and that they indeed can be
expressed using thresholds.

Author Of this post: Kim Hammar, Rolf Stadler

