When testing trading strategies a common approach is to divide the initial data set into **in sample** data: the part of the data designed to calibrate the model and **out of sample** data: the part of the data used to validate the calibration and ensure that the performance created in sample will be reflected in the real world. As a rule of thumb around 70% of the initial data can be used for calibration (i.e. in sample) and 30% for validation (i.e. out of sample). Then a comparison of the in and out of sample data help to decide whether the model is robust enough. This post aims at going a step further and provides a statistical method to decide whether the out of sample data is in line with what was created in sample.

In the chart below the blue area represents the out of sample performance for one of my strategies.

A simple visual inspection reveals a good fit between the in and out of sample performance but what degree of confidence do I have in this? At this stage not much and this is the issue. What is truly needed is a measure of similarity between the in and out of sample data sets. In statistical terms this could be translated as the likelihood that the in and out of sample performance figures coming from the same distribution. There is a non-parametric statistical test that does exactly this: the **Kruskall-Wallis Test**. A good definition of this test could be found on R-Tutor “A collection of data samples are independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution.” The added benefit of this test is not assuming a normal distribution.

It exists other tests of the same nature that could fit into that framework. The **Mann-Whitney-Wilcoxon** test or the **Kolmogorov-Smirnov** tests would perfectly suits the framework describes here however this is beyond the scope of this article to discuss the pros and cons of each of these tests. A good description along with R examples can be found here.

Here’s the code used to generate the chart above and the analysis:

################################################ ## Making the most of the OOS data ## ## thertrader@gmail.com - Aug. 2016 ################################################ library(xts) library(PerformanceAnalytics) thePath <- "myPath" #change this theFile <- "data.csv" data <- read.csv(paste0(thePath,theFile),header=TRUE,sep=",") data <- xts(data[,2],order.by=as.Date(as.character(data[,1]),format = "%d/%m/%Y")) ##----- Strategy's Chart par(mex=0.8,cex=1) thePeriod <- c("2012-02/2016-05") chart.TimeSeries(cumsum(data), main = "System 1", ylab="", period.areas = thePeriod, grid.color = "lightgray", period.color = "slategray1") ##----- Kruskal tests pValue <- NULL i <- 1 while (i < 1000){ isSample <- sample(isData,length(osData)) pValue <- rbind(pValue,kruskal.test(list(osData, isSample))$p.value) i <- i + 1 } ##----- Mean of p-values mean(pValue)

In the example above the in sample period is longer than the out of sample period therefore I randomly created 1000 subsets of the in sample data each of them having the same length as the out of sample data. Then I tested each in sample subset against the out of sample data and I recorded the p-values. This process creates not a single p-value for the Kruskall-Wallis test but a distribution making the analysis more robust.** **In this example the mean of the p-values is well above zero (0.478) indicating that the null hypothesis should be accepted: there are strong evidences that the in and out of sample data is coming from the same distribution.

As usual what is presented in this post is a toy example that only scratches the surface of the problem and should be tailored to individual needs. However I think it proposes an interesting and rational statistical framework to evaluate out of sample results.

This post is inspired by the following two papers: