The Methodology Behind “Average” Rate of Return

When someone quotes the rate of return on a specific investment or quotes average returns on an index to support a theory behind achieving similar results in their investment, do you actually know what exactly they are talking about?  Further more, what is the probability that you will in fact end up with what everyone else is quoting as “average?”  The old joking definition of Statistics is: generally everyone; specifically no one.  And this seems to be  pretty fitting for a novice skill level in probability and statistics.  However, once we get a little more technical, and employ a few more mathematical techniques, we start to explore what purpose these numbers really serve.  We'll explore some useful applications of these figures in this post.

What does “Mean” Really Mean

The mean, or average, is a calculated value that shows us the general tendency of a set of data (I'm being intentionally lazy here, if you have a PhD in Math, I'm fully aware of the lack in providing a more complete and exhaustive explanation, but this works for what I'm going to provide for our practical purposes).  We derive this number by taking the given values of a set of data, and dividing it by the total number of observations (the data points) we have.  Ex. the mean of the data set: 5, 6, 7, 8, 9 is  7 (5+6+7+8+9=35 =>35/5=7).  This is referred to in higher math settings as the Arithmetic Mean.

On its own, the mean is generally a pretty useless number, though a lot of people insist on obnoxiously quoting it like it carries significant purpose.  We've heard that the average rate of return for the S&P 500 index is 10%.  The problem (and perhaps danger) of this sort of information, is that the applicability of this statistic is extremely limited, and often used to infer wildly incorrect things.

Does a 10% average rate of return on the S&P 500 index mean you can expect a 10% rate of return if you were invested in a fund that mocks the performance of the index?

The Importance of Context

Like most everything in life, the mean of a set of data is context to a larger idea, and when included with other pieces of information, becomes much more useful to you.  The single most important additional piece of information of knowing a data set's mean is knowing its standard deviation (at this point I'd like to note that most of the time people work with sample means and sample deviations because knowing the entire population mean or standard deviation is impossible or prohibitively costly, I'm not going to worry about making that distinction throughout this post).  The standard deviation tells us how localized the data is around a mean (e.g. is it tightly distributed or is there a lot of variation?).  The standard  deviation is critically important in putting together a probability distribution; this is very helpful in determining the probability of a certain observation in a set of data's being observed.    Instead of diving into an overly complicated discussion about the construct of a probability function and differentiating between a continuous probability function and a probability density function, let's just understand the following:

On a normally distributed set of date (one that follows the commonly recognized Bell Curve) 95% of all data points lie within two standard deviation of the mean (left and right) and 99.7% of all data points lie with in 3 standard deviations of the mean (both left and right).

So, when someone states the average rate of return on the S&P 500 index is 10% a helpful follow up question is : What's the standard deviation?  The answer happens to be 15%.  This would mean you'd have a 99.7% chance of having a return fall between -35% and 55%.

What's the Probability of that 10%?

Now let's equip you with a question that will be infinitely more helpful and staking out a decision given statistics that are handed to you.  What's the probability of getting a certain value (exactly)?  Calculating this value requires simply calculating the probability density function of the 10% rate of return based on what you know about the data (oh sure thing, just give me a second and I'll just fire up MATLAB…).  Ok, easier said than done, and in truth the density function is pretty boring and not particularly useful for our purposes–the continuous function is way more helpful.  The continuous function is useful because it will tell you the probability of getting “at most” a certain value (or less–and then subtracted from 1 to answer or more).  Now, here's the really good news, you don't need to perform a thorough calculation of this function to get an acceptable answer (the real answer is an integral of the probability function, if you don't know what that means, don't worry about it, let's just leave it at it has something to do with Calculus).  Instead, we can use a much simpler process that allows us to make predictions about what might happen given the parameters.

If your head just exploded during the last paragraph, don't worry, the technical crap is done, I got it out of my system.

The Monte-Carlo Simulation

Monte-Carlo Simulations sound impressive and complicated, but they aren't.  The toughest part is randomizing the data, and any good number crunching software can do it (I used MS Excel for the the example I'm about to unleash on you).  A Monte-Carlo Simulation takes the mean and standard deviation of a set of data, considers the distribution function of the data (e.g. Normal, Poisson, Bernouli , etc.)  and runs through several scenarios of randomized data given the parameters.  Results are then tabulated and a probability of a “success” is calculated (e.g. the probability of getting at least a 10% rate of return).  Often times Monte-Carlo Simulations are used in personal finance to calculate the probability of success for a given activity, and the most widely used scenario is the probability of taking a certain percentage from one's savings given an assumed rate of return and not running out of money during a certain time period.  Our example here won't go that far.  We're concerned instead with the probability getting a certain rate of return on the S&P given the distribution that we understand it to have.

10% Rate of Return, What does that Mean and What's the Likelihood We'll See it?

Taking a Monte-Carlo simulation of 1000 randomized observations given the above parameters for the S&P 500 over a 30 year period, we find that there's approximately a 50% probability of getting at least a 10%  rate of return on average for all 30 years (nothing unusual here, the math works as it's supposed to).  This elementary observation might shock people though, the truth behind that highly touted average 10% technically means there's a 50% chance of seeing at least a 10% return on the S&P 500.  However, we'd be truly remiss if we closed the book at this point, for when it comes to the money in your account, the Arithmetic Mean means almost nothing.  Though it's commonly quoted to confuse actual rates of return.

Arithmetic Mean, Meat Geometric Mean

Question: the S&P drops 25% the first year you begin an investment and then rises 50% the following year, what is your rate of return (on your money) if you are in a fund that exactly matches the S&P in return?

If we take the arithmetic mean we'll get the following answer:

-25+50=25 =>  25/2=12.5

And in terms of the arithmetic mean, this is correct, but it's not the actual year over year rate of return (the one we quote when we talk about inflation, and most other investment performance).  For that, we need a different mean, the Geometric Mean.

For the Geometric Mean we need a slightly more cumbersome formula:

((y/x)^1/n)-1

Where:

y=value at end of our time period

x=value at beginning of time periods

n=number of time periods

If we run the following scenario though this calculation we get a rate of return of 6%, which is the year over year rate of return, commonly referred to as the Compound Annual Growth Rate (CAGR).

And for those who enjoy a little multimedia explanation, here you go: