To protect a system, we need to place safeguards which are appropriate for the importance of the data they are protecting. To do this, we have to make estimates of two things:
Many times we cannot directly know the probability or the cost of intrusion, but we can make estimates. This is what insurance companies do, based on past data. Once we know what our probabilities and costs are, we can use that information to decide where we should concentrate our efforts in security. Sometimes, there are more efficient ways of protecting our data than adding security, such as making backups.
The risk or ``exposure'' is how much we are likely to stand to lose due to a security problem. To understand risk exposure, we need to first understand some basic notions from probability theory.
n
possible outcomes. For
outcome i
, we represent the probability as
p(i)
, and the value (or cost) as v(i)
. Then
the formula for the expectation value of the game is:
E(game) = sum(i=1..n, p(i) * v(i))
Let's look at some examples of this in games of chance. We will look at three
games, each of which costs a dollar to play and has some payback distribution.
We will decide whether or not it is wise to play these games based on the
expectation value of the payback.
outcome | you win |
heads | $2.00 |
tails | $0.10 |
Should you play this game? The answer is yes. Let's find out why. The expected payback of the game comes from the equation above:
This means that on average you expect to win $1.05 per game, even though no one game will ever give you $1.05! Since the cost to play is $1.00 per game, you will make a net profit of $0.05 per game if you play, so you should play this game.E(coin game) = p(heads) * v(heads) + p(tails) * v(tails) E(coin game) = 0.5 * $2.00 + 0.5 * $0.10 E(coin game) = $1.05
outcome | you win |
1 | $4 |
2 | $0 |
3 | $0 |
4 | $0 |
5 | $1 |
6 | $2 |
Should you play this game? Let's do the math to find the expected payback:
This means that on average, you will win $1.16 per game (approximately), netting a profit of $0.16 per game. That's even better than the coin game. So you should play this game.E(1 die game) = p(1)*v(1) + p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + p(6)*v(6) E(1 die game) = 1/6 *$4 + 1/6 *$0 + 1/6 *$0 + 1/6 *$0 + 1/6 *$1 + 1/6 *$2 E(1 die game) = 1/6 * ($4 + $0 + $0 + $0 + $1 + $2) E(1 die game) = 1/6 * $7 E(1 die game) = $1.16 (approximately)
outcome | you win |
2 | $3 |
3 | $0 |
4 | $2 |
5 | $0 |
... | $0 |
11 | $0 |
12 | $3 |
Should you play this game? We can figure it out by expectation values. We know that there are 6*6=36 outcomes, and we can figure out the probability of each from this sum table:
1 | 2 | 3 | 4 | 5 | 6 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
3 | 4 | 5 | 6 | 7 | 8 | 9 |
4 | 5 | 6 | 7 | 8 | 9 | 10 |
5 | 6 | 7 | 8 | 9 | 10 | 11 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
In this game, you expect to get back only $0.33 per game, and you have to pay $1.00 to play. So you expect to lose $0.67 per game. This is not a game you should play.E(2 dice game) = p(2)*v(2) + p(3)*v(3) + p(4)*v(4) + p(5)*v(5) + ... + p(11)*v(11) + p(12)*v(12) E(2 dice game) = 1/36*$3 + 2/36*$0 + 3/36*$2 + 4/36 *$0 + ... + 2/36 *$0 + 1/36 *$3 E(2 dice game) = 1/36 * (1*$3 + 3*$2 + 1*$3) E(2 dice game) = 1/36 * $12 E(2 dice game) = $0.33 (approximately)
Note: sometimes we combine the cost of the game with the payback to compute a ``net'' game value by adding in the cost of the game in a net expected value calculation: the cost is simply a negative payback that occurs with probability 1.
Attackers will aim for the weakest link in the security the breaking of which will let them get at what they're after, which will be the assets that you're protecting. How much effort -- in terms of money spent to buy security hardware or software, or person-hours spent in improving security -- should be applied to what areas? The goal should be to apply the security improvement resources to minimize the risk exposure for the system. Doing this requires estimating the exposure -- the probability that various attacks will occur (not the same as whether they'll succeed!) and the cost if the attacks succeed gives the expected losses for the various attacks -- and the effectiveness and cost of various security measures that might be taken.
None of this is easy. Let us consider just the problem of choosing the ``right'' anti-virus software.
Can we rely on reports like `Virus protection software A can identify X number of viruses, while package B can identify only Y number of viruses' (where X > Y)? Well, we might be able to, but it is important to be aware of how many viruses that are actually `in the wild' that each package can detect. If package A detects a large number of viruses that don't even occur except in research labs, then those numbers can be meaningless.
How quickly new in-the-wild viruses are identified and new virus definition databases released are extremely critical to the effectiveness of virus detection software. Most software of this type are not able to recognize new viruses, and require periodic updates in order to protect you against new threats. Between the time that a new virus starts spreading in the wild and when that update occurs, the probability of catching the virus could be quite high, and therefore making the expected damage or risk exposure high.
We might want to try to differentiate among viruses (esp new ones) according to how destructive their ``payload'' is. This certainly would estimate the value of damage that would occur if the virus ran unchecked better. Note, however, that computer viruses, unlike naturally occurring ones, undergoes human-directed ``evolution'': a virus that is very successful at propagating itself but has a relatively benign payload might easily be modified to carry a much more destructive payload. This is certainly easier for a virus author to do than to design a new virus from scratch. Thus, the existence of the first virus is often statistically correlated with the later introduction of the more destructive variant. Furthermore, the new variant might be detected by the virus detection software in the same way as the ancestral virus is detected, i.e., the recognition is done on parts of the viruses that are unchanged by the mutation. It sometimes makes sense to conservatively evaluate the destructive power of viruses by over-estimating the damage to try to factor in this uncertainty. (Properly this should be modelled in the risk evaluation as a hypothetical new virus that has a good probability of propagating at least as well as the original, and greater -- but yet unknown -- destructive capabilities, coupled with a higher than average probability of detection, which partially mitigates the potential for damage.)
Another thing to consider is how well the software is at avoiding false positives. A false positive occurs when the protection software thinks there is a virus, but it is wrong. This is another cost of using the software: when the detector goes off, the users of the computers will have to spend time determining that it was a false alarm, leading to a loss of productivity.
See the problem here? The functionint f(int j) { int i; char buf[128]; ... gets(buf); ... }
gets()
does not know
how much storage the variable buf
has. This is because
buf
is actually passed as a pointer and not an array. So
what can happen as a result of this?
To answer this, we have to know how the C/C++ stack works. It varies a
bit by architecture and compiler, but in general we have a stack that
grows down as more memory is pushed onto it. So we may have a stack
that looks like this once we have gotten to gets()
:
Stack frame:
address | size | item |
1000 | 4 | argument j |
996 | 4 | return address of calling function |
992 | 4 | local variable i |
988 | 128 | local variable buf |
860 | (none) | (stack pointer) |
Next time we will investigate how this setup can cause problems.
bsy+cse127w02@cs.ucsd.edu, last updated
email bsy.