
Pseudo-Geometric Brownian Motion for Complex Financial Products in R
NOTE: You can watch the content of this post as a video at the bottom of the page!
Geometric Brownian Motion (GBM) is the standard workhorse for modelling most financial instruments that rely on some form of path dependency. While GBM is based on well-founded theory, one should never forget its original purpose – the modelling of particle movements that follow strictly normally distributed impulses. The basic formula is given by:
where the innovations are represented by standard Wiener process. What works well when modelling gases, has some serious drawbacks in financial modelling. The problem is that a Wiener process has two very restrictive conditions:
a) innovations are normally distributed, with mean zero and variance t
b) innovations are independent
Now everyone who has had at least some exposure to financial market data knows that stock returns do not satisfy the first condition and sometimes not even the second. The general consensus is that stock returns are skewed, leptokurtic and have unevenly heavy tails. Although stock returns tend to converge closer to a normal distribution with decreasing frequency (i.e. monthly returns are more normal than daily returns), most scholars would agree that a t-distribution or a Cauchy-distribution would be a better fit for returns.
In practice, however, most people just go with the normal distribution when simulating Brownian Motion and accept that the resulting asset prices are not 100 % accurate. I, on the other hand, am not content with such half-assed solutions and I will show in the following examples how costly it can be to trust in the GBM blindly. I propose a pseudo-Brownian approach where the random innovations are sampled from a kernel density estimate of empirical returns rather than an assumed normal distribution. The benefit of this approach is that it produces results that are closer to what has been observed in past without fully replicating the past (this would be the result of sampling random innovations directly from past innovations).
Introductory Example
Before we get to the fun part, where we show how much money is wasted in the market, we start off with a simple example. We will need to load three packages and their dependencies (you can download the R-Script for this post at the bottom of the page) install.packages("quantmod") att <- getSymbols("T", from = "1985-01-01", to = "2015-12-31", auto.assign = FALSE) plot(density(attr), main = "Distribution of AT&T Returns")
For our first example, we will try to simulate the returns of AT&T. The following commands allow us to download price information from Yahoo Finance and compute monthly log returns
require(quantmod)
To hammer home the point I made in the beginning, we will compare the distribution of returns to the normal distribution.
attr <- ts(periodReturn(att$T.Adjusted, period = "monthly", type = "log"), frequency = 12, start = c(1985, 02))
curve(dnorm(x, mean = mean(attr), sd = sd(attr)),
col = "darkblue", lwd = 2, add = TRUE, yaxt = "n")
rug(jitter(attr))
Even without a master’s degree in art history, most people would agree that the two lines do not match. For people who do not rely on such visuals methods, the trusty Kolmogorov-Smirnov test provides a more formal method. set.seed(2013)
ks.test(attr, rnorm(n = length(attr), mean = mean(attr), sd = sd(attr)))
The test returns a p-value of 0.027, which is far from adequate (the smaller the p-value the more we must conclude that the two distributions are different). Next we set up the standard GBM function. I am fully aware that various GBM functions exist as part of numerous packages. Nevertheless, I decided to create my own function just to make the inner workings more transparent. myGBM <- function(x, mu, sigma, n, dt = 1) { pseudoGBM <- function(x, rets, n, ...) { x <- as.numeric(tail(att$T.Adjusted, n = 1)) attGBMr <- diff(log(attGBM))[-1]
In this simple function (I know there are more elegant ways to do this, but the results remain the same) the rnorm function acts as the Wiener process driver. Needless to say that this does not pay respect to what we have seen above. In contrast, my pseudo-Brownian function samples the random innovations from the kernel density estimate of past empirical returns.
N <- matrix(c(1:n), n)
y <- matrix(0, n, length(n))
for (i in N) {
y[[i]] <- x +((mu * dt * x) + #drift
rnorm(1, mean = 0, sd = 1) * sqrt(dt) * sigma * x) #random innovation
x <- y[[i]]
}
return(y)
}
N <- matrix(c(1:n), n)
y <- matrix(0, n, length(n))
KDE <- density(as.numeric(rets) - mean(rets), ...)
samp <- sample (KDE$x, size = n, replace = TRUE, prob = KDE$y) # vector simulated innovations
for (i in N) {
y[[i]] <- x + x * (mean(rets) + samp[i])
x <- y[[i]]
}
return(y)
}
Admittedly, this function is somewhat spartan, as it assumes static increments (i.e. dt = 1) and requires almost no input by the user. It only requires a starting value (x), a vector of past returns (rets) and the specified path length (n). The … input allows the user to pass additional commands down to the density function. This gives the user the control over the smoothness of the kernel density estimate by adding a bandwidth command (bw = ). Without any further ado, let’s get started with simulations using the above functions. In this first example, we simulate only one price path using both functions from a starting value x, i.e. the last price in the series
set.seed(2013)
attGBM <- myGBM(x = x, mu = mean(attr), sigma = sd(attr), n = length(attr))
set.seed(2013)
attPGBM <- pseudoGBM(x = x, n = length(attr), rets = attr)
To see how well the two methods performed we compute returns from the simulated series and compare their distributions to the empirical distribution.
attPGBMr <- diff(log(attPGBM))[-1]
d1 <- density(attr)
d2 <- density(attGBMr)
d3 <- density(attPGBMr)
plot(range(d1$x, d3$x), range(d1$y, d3$y), type = "n",
ylab = "Density", xlab = "Returns", main = "Comparison of Achieved Densities")
lines(d1, col = "black", lwd = 1)
lines(d2, col = "red", lty = 2)
lines(d3, col = "blue", lty = 3)
Clearly, we see that the PGBM function (blue line) outperforms the standard GBM function (red line) in producing returns that approximate the empirical distribution of returns (black line). Again, the critical (or visually inept) readers can look at the results of the K-S test. ks.test(attr, attPGBMr)
ks.test(attr, attGBMr)
Again we see that the PGBM function (p-value = 0.41) by far outperforms the GBM function (p-value = 0.02).
Advanced Example
As promised, our second example will showcase how much money is on the line when one wrongly assumes a normal distribution when it is not representative of the underlying data. Since it awoke from a financial dark age, Europe especially has shown a hunger for structured financial products that offer some participation in the stock market while limiting or eliminating downside risk. Such securities are usually path dependent and therefore are very often modeled using GBM.
We will use one specific product offered by Generali Germany – Rente Chance Plus – which was the initial reason I developed the PGBM function. When I was working in private banking, I was tasked to evaluate this specific security, started with a standard Monte Carlo simulation based on GBM, but quickly realized that this was not enough. Rente Chance Plus offers a 20% participation in the EUROSTOXX 50 index up to cap of 15% with no downside neither on initial investment nor realized gains. The security is evaluated at the end of each year. Although Generali can change both the participation rate and cap rate freely over the 20 year investment horizon, for the sake of the argument, we will assume that these factors remain constant.
Mirroring our procedure from above, we start by downloading EUROSTOXX 50 price information from Yahoo Finance. eu <- getSymbols("^STOXX50E", from = "1990-01-01", to = "2015-12-31", auto.assign = FALSE) plot(density(eu.r), main = "Distribution of EUROSTOXX 50 Returns")
Next we look how well, or rather how badly, the data fits the normal distribution.
eu.r <- ts(periodReturn(eu$STOXX50E.Adjusted, period = "monthly", type = "arithmetic"), frequency = 12, start = c(1990, 02))
curve(dnorm(x, mean = mean(eu.r), sd = sd(eu.r)),
col="darkblue", lwd = 2, add = TRUE, yaxt = "n")
set.seed(2013)
ks.test(eu.r, rnorm(n = length(eu.r), mean = mean(eu.r), sd = sd(eu.r)))
From a strictly visual perspective, this looks even worse than the AT&T distribution. The EUROSTOXX returns are clearly negatively skewed and somewhat leptokurtic. The K-S test returns a p-value of 0.06, confirming the visual mismatch. Now that we established that the normal distribution is not really the best fit, we can look at the consequences of wrongly assuming it. We will run a simulation with 10,000 iterations using both standard GBM and my PGBM function and compare the results (If you are replicating the following code, please get yourself a cup of coffee while you wait. This will take some time to run). x <- as.numeric(tail(eu$STOXX50E.Adjusted, n = 1)) s1.r <- diff(log(sim1))
Naturally, we are not interested in the price levels of the EUROSTOXX 50, but rather in the returns evaluated with the constraints of participation rate and cap rate. The good news is that the hardest part is behind us. Calculating returns and applying constraints is rather straightforward. Stomaching the results is not so easy.
set.seed(2013)
SIM1 <- as.data.frame(matrix(replicate(10000, {eu.GBM <- myGBM(x=x, mu = mean(eu.r), sigma = sd(eu.r), n = 240)}), ncol = 10000))
set.seed(2013)
SIM2 <- as.data.frame(matrix(replicate(10000, {eu.PGBM <- pseudoGBM(x = x, n = 240, rets = eu.r)}), ncol = 10000))
sim1 <- ts(as.matrix(rbind(rep(x, 10000), SIM1[seq(0, 240, 12), ])), start = c(2016), frequency = 1)
sim2 <- ts(as.matrix(rbind(rep(x, 10000), SIM2[seq(0, 240, 12), ])), start = c(2016), frequency = 1)
s2.r <- diff(log(sim2))
s1.r[s1.r < 0] <-0
s2.r[s2.r < 0] <-0
s1.r <- ifelse(s1.r < 0.15, s1.r*0.2, 0.15*0.20)
s2.r <- ifelse(s2.r < 0.15, s2.r*0.2, 0.15*0.20)
S1<-colSums(s1.r)
S2<-colSums(s2.r)
plot(density(S1, bw = 0.0005), col = "red", main = "Comparison of Simulated Distributions")
lines(density(S2, bw = 0.0005), col = "blue")
rug(jitter(S1), side = 3, col = "darkred")
rug(jitter(S2), side = 1, col = "darkblue")
ks.test(S1, S2)
We can clearly see that the cumulative returns simulated by the PGBM function (blue) exhibit a negative skew and a wider range than the returns simulated by the standard GBM function (red). Please note, that due to the no-downside constraint of the security the distributions do not look that different in the lower tail. The K-S test confirms with overwhelming certainty that the two distributions are different (however, the small p-value is largely caused by the large sample size). Now to answer the million dollar question (quite literally actually). How much money is on the line? Well, if Generali used the normal distribution to forecast returns and re-insured themselves accordingly they would have…
mean(S1)-mean(S2)
…underestimated cumulative returns by about 0.6%. This might not seem like much, but if we assume a security volume of , say, EUR 1 billion, Generali falls short of EUR 6 million – quite a bunch of money for simply assuming the wrong distribution.
Conclusion and Limitations
So what do we learn from this? The distribution used to model innovations in any path-dependent security pricing model can have a material impact. While this statement alone seems obvious, the magnitude of the difference in distributions is astonishing. Naturally, there are probably smarter people than me working at Generali and other institutions who are well aware that the normal distribution is not always the best choice. However, most people will instead use a more formally adequate (but probably just as inaccurate) distribution like the t-distribution or Cauchy-distribution. Using kernel density distributions is a rather unheard of method. And there are reasons for that.
First and foremost, there is no guarantee that the kernel density estimate is a more accurate representation of the unknown underlying distribution than the shunned normal distribution. Forecasting the future using past data always leaves a bad taste in any data scientists mouth, but unfortunately we have no other choice. The normal distribution inherent in the standard GBM does, however, too rely on past information (i.e. historic mean and standard deviation), but has huge advantages when it comes to formalizing solutions, due to its central role (pun intended only in hindsight) in probability theory.
Second, kernel density estimates are very sensitive to the bandwidths used. If the bandwidths are too large, you will get a smooth distribution, that is, however, not that different from a normal distribution. If the bandwidths are too small, you will get a distribution that puts a lot of emphasis on extreme values, especially if the data sample from which you estimate the kernel density is rather small. In the examples above, we used the automatic bandwidth selector inherent in the density function, but there is little to no way of knowing what the optimal bandwidth is.
There are other limitations with the method presented above, as we have made numerous quite unrealistic assumptions. In the Generali example, we have assumed that Generali does not change the participation rate and the cap rate, which is highly unlikely. However, more generally, we have made some fundamental assumptions about financial markets. Knowingly (hopefully) we assumed capital markets are efficient. Therefore, we assumed that there is no autocorrelation in returns, which was the second condition for a Wiener process, but is this representative of the underlying data?
acf(eu.r, main = "Autocorrelation of EUROSTOXX 50 Returns")
Lucky us! There appears to be no serial correlation in the return data, as one would expect in developed markets. However, if you are dealing with data that shows clear signs of serial correlation the Pseudo-Geometric Brownian Motion might fall short. The good news is that I am already working on a more sophisticated PGBM function that allows for serial correlation. So stay tuned for more to come.
If you liked this post, feel free to share it with your friends, colleagues, and pets. I always accept invites on LinkedIn (hit the button below), as long as you make clear that you found out about me through this website. If you were not quite content with the information above, feel free to leave a comment down below (you could also leave some praise, if you liked the content) and tell me where I could improve.