## Using Genetic Algorithms in Quantitative Trading

The question one should always asked him/herself when using technical indicators is what would be an objective criteria to select indicators parameters (e.g., why using a 14 days RSI rather than 15 or 20 days?). Genetic algorithms (GA) are well suited tools to answer that question. In this post I’ll show you how to set up the problem in R. Before I proceed the usual reminder: What I present in this post is just a toy example and not an invitation to invest. It’s not a finished strategy either but a research idea that needs to be further researched, developed and tailored to individual needs.

**What are genetic algorithms?**

The best description of GA I came across comes from Cybernatic Trading a book by Murray A. Ruggiero. “Genetic Algorithms were invented by John Holland in the mid-1970 to solve hard optimisation problems. This method uses natural selection, survival of the fittest”. The general process follows the steps below:

- Encode the problem into chromosomes
- Using the encoding, develop a fitness function for use in evaluating each chromosome’s value in solving a given problem
- Initialize a population of chromosomes
- Evaluate each chromosome in the population
- Create new chromosomes by mating two chromosomes. This is done by muting and recombining two parents to form two children (parents are selected randomly but biased by their fitness)
- Evaluate the new chromosome
- Delete a member of the population that is less fit than the new chromosome and insert the new chromosome in the population.
- If the stop criteria is reached (maximum number of generations, fitness criteria is good enough…) then return the best chromosome alternatively go to step 4

From a trading perspective GA are very useful because they are good at dealing with highly nonlinear problems. However they exhibit some nasty features that are worth mentioning:

**Over fitting:**This is the main problem and it’s down to the analyst to set up the problem in a way that minimises this risk.**Computing time**: If the problem isn’t properly defined, it can be extremely long to reach a decent solution and the complexity increases exponentially with the number of variables. Hence the necessity to carefully select the parameters.

There are several R packages dealing with GA, I chose to use the most common one: rgenoud

**Data & experiment design**

Daily closing prices for most liquid ETFs from Yahoo finance going back to January 2000. The in sample period goes from January 2000 to December 2010. The Out of sample period starts on January 2011.

The logic is as following: the fitness function is optimised over the in sample period to obtain a set of optimal parameters for the selected technical indicators. The performance of those indicators is then evaluated in the out of sample period. But before doing so the technical indicators have to be selected.

The equity market exhibits two main characteristics that are familiar to anyone with some trading experience. Long term momentum and short term reversal. Those features can be translated in term of technical indicators by: moving averages cross over and RSI. This represents a set of 4 parameters: Look-back periods for long and short term moving averages, look-back period for RSI and RSI threshold. The sets of parameters are the **chromosomes**. The other key element is the **fitness function**. We might want to use something like: maximum return or Sharpe ratio or minimum average Drawdown. In what follows, I chose to maximise the Sharpe ratio.

The R implementation is a set of 3 functions:

**fitnessFunction**: defines the fitness function (e.g., maximum Sharpe ratio) to be used within the GA engine**tradingStatistics**: summary of trading statistics for the in and out of sample periods for comparison purposes**genoud**: the GA engine from the rgenoud package

The genoud function is rather complex but I’m not going to explain what each parameter means as I want to keep this post short (and the documentation is really good).

**Results**

In the table below I present for each instrument the optimal parameters (RSI look-back period, RSI threshold, Short Term Moving Average, and Long Term Moving Average) along with the in and out of sample trading statistics.

Instrument/Parameters | In Sample | Out Of Sample |
---|---|---|

SPY c(31,62,32,76) |
total Return = 14.4% Number of trades = 60 Hit ratio = 60% |
total Return = 2.3% Number of trades = 8 Hit ratio = 50% |

EFA c(37,60,36,127) |
total Return = 27.6% Number of trades = 107 Hit ratio = 57% |
total Return = 2.5% Number of trades = 11 Hit ratio = 64% |

EEM c(44,55,28,90) |
total Return = 39.1% Number of trades = 85 Hit ratio = 58% |
total Return = 1.0% Number of trades = 17 Hit ratio = 53% |

EWJ c(44,55,28,90) |
total Return = 15.7% Number of trades = 93 Hit ratio = 54% |
total Return = -13.1% Number of trades = 31 Hit ratio = 45% |

Before commenting the above results, I want to explain a few important points. To match the logic defined above, I bounded the parameters to make sure the look-back period for the long term moving average is always longer that the shorter moving average. I also constrained the optimiser to choose only the solutions with more than 50 trades in the in sample period (e.g;, statistical significance).

Overall the out of sample results are far from impressive. The returns are low even if the number of trades is small to make the outcome really significant. However there’s a significant loss of efficiency between in and out of sample period for Japan (EWJ) which very likely means over fitting.

**Conclusion**

This post is intended to give the reader the tools to properly use GA in a quantitative trading framework. Once again, It’s just an example that needs to be further refined. A few potential improvement to explore would be:

**fitness function**: maximising the Sharpe ratio is very simplistic. A “smarter” function would certainly improve the out of sample trading statistics**pattern**: we try to capture a very straightforward pattern. A more in depth pattern research is definitely needed.**optimisation**: there are many ways to improve the way the optimisation is conducted. This would improve both the computation speed and the rationality of the results.

The code used in this post is available on a Gist repository.

As usual any comments welcome

The problem with using such approaches: the chromosomes can’t change the Darwinian rules of their futures, while the Banksters who crash economies (with some regularity) do change the rules to benefit themselves. Taleb is mostly right, in that policy changes (e.g. liar loans) drive the data, not the other way around.

Thank you for the posDoyou have an example of a smarter function to try?

thanks

Hi,

Thank you for reaching out. The choice of the fitness function is really up to you and it depends on what you try to achieve. It might be a minimum return, stability of return, minimise drawdown, minimum correlation with other strategies etc…. I don’t think there’s any specific rules to follow. However some functions might be biased. For example if you try to minimize drawdown you’ll very likely end up with no trade (max DD = 0) if you don’t constraint to a minimum number of trades.

Hope this helps

Robert, great comment! Policies will forever drive data, just as data will drive policies. It is a vice-versa world in the financial industry…even so much more for traders, investors and bankers.

Hi,

How would you go about selecting trading rules and not optimizing trading parameters ?

Like in the Allen paper (Using genetic algorithms to find technical trading rules)

Hi Danton,

Thank you for reaching out and sorry for the late answer.

I just use common sense in the selection of parameters and above all I tend to use as much as possible the same set of parameters accross all instruments traded.

Best,

The R Trader

Hi, R trader

I am a Chinese reader and really appreciate the article “Using Genetic Algorithms in Quantitative Trading”.

May I translate this article into Chinese and post it on my blog? (I will let you know the link and remain your name on it.) I really hope that, this article can help more people.

Thank you.

Best Regards,

Yichen, CFA

Hi Yichen,

Thank you for reaching out.

As long as you clearly mention the source (me) and put a link to the original article in translated article, I have no problem with the post being translated into Chinese.

Best,

The R Trader

Thank you very much for posting this! For the posBuySignal, why is RSI evaluated as being less than or equal to 1 – a number? Wouldn’t 1 – xx[2] always result in a negative number? Isn’t the RSI always between 0-100?

posBuySignal <- which(isData[,"rsi"] isData[,”smal”]) + 1

Thank you!

Jim

Apologies, the code was cut off.

posBuySignal <- which(isData[,"rsi"] isData[,”smal”]) + 1

Hi,

U have just provided me a valuable source for me. I have used the Algorithm written by you and obatained the following results

NOTE: HARD MAXIMUM GENERATION LIMIT HIT

Solution Fitness Value: 5.033179e+00

Parameters at the Solution:

X[ 1] : 3.100000e+01

X[ 2] : 6.200000e+01

X[ 3] : 3.200000e+01

X[ 4] : 7.600000e+01

Solution Found Generation 38

Number of Generations Run 50

Sun Jan 24 12:57:50 2016

Total run time : 0 hours 0 minutes and 25 seconds

Warning messages:

1: In genoud(fitnessFunction, nvars = 4, max = TRUE, pop.size = 30, :

‘output.path’ can no longer be changed. Please use ‘sink’. Option is only provided for backward compatibility of the API.

2: In genoud(fitnessFunction, nvars = 4, max = TRUE, pop.size = 30, :

Stopped because hard maximum generation limit was hit.

>

> solution solution

[1] 31 62 32 76

Could u please explain what can we interpret in the solution and my Objective is to predict the future value and how it can be interpreted from this case

Hi,

Thank you for reaching out. I’m not sure I understand your question but I’ll give it a try.

Your optimal solution is:

X[1]: 31

X[2]: 62

X[3]: 32

X[4]: 76

which matches a fitness function value of: 5.03

Another important point to notice is that you reached the maximum number of iterations. You can easily relax this assumption should you want to explore different solutions.

Besides this I can’t see what exactly is your problem? More details about your fitness function and what you’re trying to achieve might help.

HTH

Hi,

My objective is to predict the future value that is like if i give input till 27th jan 2016 i should get prediction ofr the next date like for 28th. Can we do this with the methodology u have used.

please ping ur mail id to lakshmitharunponnam@gmail.com this would do a great help for me

Hi,

Yes you can. You only have to adjust the frequency of your data. My example used weekly data but you can use the exact same methodology with daily data. Obviously the variables to use will probably have to be adjusted as well.

HTH

Arnaud

Hi Arnaud,

Thank you for the wonderful post. I have been so excited to go through your codes. I am a newbie. Can you kindly let me know how can i interpret the readings viz.,

> optimum

$value

[1] 5.033179

$par

[1] 31 62 32 76

to the “SPY” Open, High, Low and Close predicted value.

Thank you

Sam