Using Simulation to Investigate Virus Propagation in Computer Networks

Making the best decisions to respond to a virus threat can be critical in thwarting a quick spread and minimizing negative impacts of an attack. This paper uses simulation to compare two main prevention strategies: patching and quarantine. These strategies are borrowed from epidemiological models and are currently employed to prevent and control the spread of computer viruses throughout networks. Simulation is a powerful decision making tool which can be used to mimic the complex behavior of a spreading virus while testing a range of alternative parameters for different attack scenarios. The proposed simulation model suggests that patching is a better protection strategy than quarantine. A carefully selected patching strategy can be used to enforce the herd immunity effect and place the spread of a virus in an endemic state in the shortest possible amount of time.


Introduction
The rapid development of computer connectivity and the dependence of organizations on the new e-commerce markets have increased vulnerabilities of networks.There is a persistent threat of malware programs and its growth has become exponential (Cobb & Myers, 2009).The large number of existing computer viruses and their highly destructive nature to harm computer systems appear as an important security risk for both organizations and individuals.Computer viruses are basically computer programs created to damage the computer systems, erasing data, stealing information and altering the normal operations of computer systems (Piqueira et al., 2008).Establishing appropriate protection policies and implementation of realistic plans of actions are as important as antivirus technology used to thwart an attack.The main goal of these policies is to protect the network, guard organizational data, and continue to support organizational transactions.
Several studies have suggested the use of epidemiology models to understand the spread of viruses in computer networks and to design appropriate response strategies (Kephart & White, 1991;Pastor-Satorras & Vespignani, 2002).Recently, Mulligany and Schneider (2011) made the case that cyber security can be viewed as a "public good" and suggest the adoption of mechanism and strategies derived from public health.
No matter how well prepared a protection plan may be, it cannot be proven effective until is put to test or verified.While an actual infection will highlight failures of a given plan, simulation methodology shows high potential for studying and investigating response strategies.Unlike actual infections, simulation models are less expensive, take less time to be conducted, and are well suited for testing alternative solutions.The decision makers can modify and analyze the model in order to test and evaluate numerous scenarios and operating parameters.
This paper uses simulation as a decision making tool to replicate the spread of a virus in a computer network.The simulation model is based on mathematical foundations of epidemiological theory.Specifically, the model investigates the impact of different degrees of patching and quarantine on the spread of the virus and suggests optimal parameters which utilize the impact of herd immunity.These two strategies are borrowed from epidemiological models (vaccination and isolation) and are currently employed to prevent and control the spread of computer viruses throughout networks.
The paper is organized as follows: The next section offers a brief discussion of previous research in the areas of epidemiology, its mathematical model, and its potential use to model the spread of computer viruses.The next section explains the simulation approach, input/output variables, and its main algorithm.Once the model is validated, several experiments are conducted to investigate the impact of patching and quarantine.Finally, conclusions and future research are discussed.

Previous Research
The proposed simulation model is based on the assumption that a computer virus is spread throughout a network in the same way that a disease spreads in a population.As such, the first part of this section discusses the theory of epidemiology.The first complete discussion of mathematical epidemiology is offered by McKendrick (1925) in his paper presented at the Edinburgh Mathematical Society.A few years later, McKendrick and Kermack offer the basic compartmental models and mathematically describe the transfer rate of individuals from one compartment to the next using a set of partial differential equations (Kermack & McKendrick, 1927;Kermack & McKendrick, 1932).A more complete work of mathematical epidemiology is summarized in the work of Bailey (1975) and Frauenthal (1980).
Focusing on pandemic influenza, Larson and Nigmatulina (2009) use simple mathematical models to discuss courses of actions for response to major worldwide health events.Specifically, the authors employ simple mathematical models and use the "reproductive number" concept to suggest strategies to control the spread of the disease.They conclude, for example, that any numerical value for the reproductive number has "little meaning outside the social context to which it pertains" and their analysis shows the disease tends to be driven by high frequency individuals.The model discussed by Larson and Nigmatulina (2009) assumes a homogenous mixing of population.Homogeneous mixing is a reasonable assumption to simplify the mathematics of the model.More advanced models assume non-homogeneous mixing of population.For example, Hill and Longini (2003) describe a mathematical model, which optimally allocate vaccines to several subpopulations with potentially heterogeneous mixing of individuals.
Mathematical models are also used to depict the impact of vaccination rate on the spread of disease.Two of the most important theoretical concepts in infectious disease epidemiology are the basic reproduction number and herd immunity.The models following the theory regarding reproduction number come very close to determining the required vaccination coverage for eradication in a randomly mixed population (Anderson & May, 1991;Diekmann & Heesterbeek, 2000).These models were later extended to include such factors as non-homogenous distribution of population and contacts, contact tracing, and ring vaccination (Fine, 1993), which is the vaccination of all susceptible individuals around an outbreak.
Another aspect of vaccination models is the concept of herd immunity.Due to herd immunity, vaccination can also help protect people who are not vaccinated.The unvaccinated people in the herd community can escape the infection because they are protected by the immunized people who surround those (Anonymous, 2011).Immunity against a disease can be acquired either through natural infection or through artificial inoculation with a vaccine (Garnett, 2005).
Detailed mathematical models of diseases are common in medicine but rare in digital security (Geer & Conway, 2009).Kephart and White published a paper on the topic in 1991 and model the spread of viruses or other malware between hosts using the same methodology provided in the epidemiological models (Kephart & White, 1991).Zou, Gong, and Towsley (2003) present a mathematical analysis of three worm propagation models under a dynamic quarantine environment.The worm propagation based on quarantine is further investigated in three more recent papers.Chen and Jamil (2006) study the effectiveness of partial quarantine for simple epidemics (without removals) and quarantine for general epidemics (with removals) and derive a critical threshold for networks to have herd immunity.Also, Tao, Weng, and Zhu (2008) propose a worm propagation model with quarantine strategy and provide mathematical foundations to study of global stability of equilibriums of the model.Chen and Wei (2009) offer improvements of classical susceptible-infected-susceptible and susceptible-infected-recovered models with quarantine strategy, thresholds and equilibriums to the existence of worm epidemics.More recently, Wang et al. (2010) propose an epidemic model combining both vaccinations and quarantine methods to decrease the number of infected hosts and reduce the speed of worm propagation.
Network analysis is another approach to investigate the spread of a disease.In these networks nodes represent people, and edges represent specified relationships or interactions.Studies which use networks to simulate the spread of the disease from one source node to the rest of population have shown that the "betweenness" and the "farness" of nodes alter disease dynamics (Christley et al., 2005).The simulation model proposed in this paper is based on the mathematical foundations of the epidemiology.The model considers herd immunity and several other key factors, such as reproduction number, transmission period, patching (vaccination) rate and quarantine (isolation) rate.These factors are incorporated into a network based simulation model.The goal is to represent the complexity of the model into a practical simulation based decision making tool for evaluating alternative response strategies.

Mathematical Foundations of the Simulation Model
A host is a computer or device which is connected to other computers or devices in a given network.The host is able to forward a virus to other connected hosts in the network.In this paper we assume homogeneous mixing of the hosts, that is, hosts in the network under scrutiny make contact at random and do not mix solely in a smaller subgroup.Denoting the initial number of infected hosts by h, the number of connections or reproduction number by R o , and the generation number by n, the number of infected host increases according to the following series: Starting with a single initial infected host, the number of infected hosts in the n th generation is equal to the number of connections to the power of n.This exponential growth assumes that the infection ratio is the same as the number of connections.However, as the virus continues to spread in the network from one generation to the next, infected hosts are no longer susceptible to the virus.The infection ratio or the number of hosts infected in the next generation from a single infected host changes as the model progresses through generations and is calculated as: (2) Where: • I = number of hosts infected from a single infected host in a given generation • R o = number of initial contacts in the network group • S = number of susceptible hosts in the network group in the generation • P = number of hosts in the network group, also referred as the size of network When I=1 the spread of the virus is considered to be in an endemic mode.This means that on average, each infected host is infecting exactly one other host.From a network administrator perspective, the virus can be sustained by lowering the number of susceptible hosts by increasing the number of patched hosts.If I>1 the virus is considered to be in an epidemic state, and the number of hosts infected grows exponentially.If I<1 the disease will die out.The virus is contained when the number of hosts infected from a single infected host must be either less than or equal to 1.As such:

1
(3) Formula (3) indicates that in order to eliminate the virus or keep it in an endemic state, the number of susceptible hosts must be kept lower than or equal to the ratio between network size and the reproduction number, as follows: (4) Formula (4) indicates the rationale of a patching program: in order for any course of action to work, enough hosts must be patched so that the number of susceptible hosts (S) is kept below the threshold.If V represents the number of hosts to be patched before the first infection occurs, then S=P-V.Replacing S in (4), the lower boundary for V can be calculated as variable V m : (5) For example, in a network with 900 computers where each computer is connected and can infect an average of three other computers, the network administrator must patch at least 600 computers to keep the virus from spreading.
Besides patching, quarantine is an alternative method to control the spread of the virus.By isolating the infected hosts, the number of connections which can spread the virus is in fact reduced.Continuing with the above example and assuming that quarantine is the only course of action, and that the isolation of infected computers can lead to an average of two actual infectious connections, the system administrator must quarantine up to 450 computers after they are infected.
The above mathematical explanation is used to identify the minimum number of hosts to be patched or isolated in a deterministic environment.Formula (5) can be used successfully when number of connections is already expected that no more than 100 hosts are infected.Different quarantine rates provide no significant change in this number.The number of infected hosts varies from 92 to 100 even when the quarantine rate varies from 0% to 100%.Figure 6 also indicates no significant trend on reducing the time for the virus spread to reach the endemic state.It should be noted, that further investigation of the quarantine impact is needed.This investigation may include the impact of the quarantine rate for different patching scenarios and for non-homogenous networks.

Conclusions and Future Research
This paper proposes a simulation model which can be used as a decision making tool to formulate appropriate patching and quarantine strategies before and during a potential virus attack in a computer network.The model can be used by network administrators to identify appropriate responses using information about the network (number of hosts, connection density) and the expected virus (reproduction number, reproduction time and patching and quarantine effectiveness).The model can also be used to study the impact of herd immunity on the network given a selected patching strategy.
Simulation has several advantages over mathematical or other decision making methods.Simulation uses a logical abstraction of the reality through a computer model that "mimics" the behavior of the virus as it arrives in a given network target.Once the computer based simulation model is validated, the decision maker can test a range of alternative solutions for different scenarios.As such, the simulation model can be used to formulate appropriate response strategies against a network attack.The decision maker is able to evaluate IF-THEN scenarios, which would be difficult, if not impossible, to generate in the real environment.
The paper illustrates the use of the simulation model in the case of a homogenous network with 1000 hosts and a random connection density of approximately 5 percent.This network is attacked by a virus with an average reproduction time of 3 minutes.The simulation results indicate that patching is a far more efficient protection strategy than quarantine.In fact, patching seems to be the only strategy which utilizes the herd immunity effect to bring the spread on an endemic state.A carefully selected patching strategy, where the most active hosts are patched first, can lead to a significant reduction on the time required to bring the system in an endemic state.
As a final note, one should remember that a simulation model is only as good as the assumptions on which it is based.If a model makes predictions which are not supported by observed results, one must go back and change initial assumptions in order to make the model useful.The proposed model will serve as a basis for future studies where other factors can be incorporated.These factors include, but are not limited to, non-homogenous networks, costs of quarantine, costs of patching, and costs associated with loss of business due to infected computers.Further, the model can be extended to simulate different network configuration such as network size and structure.