Archive for the ‘Portfolio Management’ Category.

BERT: a newcomer in the R Excel connection

A few months ago a reader point me out this new way of connecting R and Excel. I don’t know for how long this has been around, but I never came across it and I’ve never seen any blog post or article about it. So I decided to write a post as the tool is really worth it and before anyone asks, I’m not related to the company in any way.

BERT stands for Basic Excel R Toolkit. It’s free (licensed under the GPL v2)  and it has been developed by Structured Data LLC. At the time of writing the current version of BERT is 1.07. More information can be found here. From a more technical perspective, BERT is designed to support running R functions from Excel spreadsheet cells. In Excel terms, it’s for writing User-Defined Functions (UDFs) in R.

In this post I’m not going to show you how R and Excel interact via BERT. There are very good tutorials  here, here and here. Instead I want to show you how I used BERT to build a “control tower” for my trading.

How do I use BERT?

My trading signals are generated using a long list of R files but I need the flexibility of Excel to display results quickly and efficiently. As shown above BERT can do this for me but I also want to tailor the application to my needs. By combining the power of XML, VBA, R and BERT I can create a good looking yet powerful application in the form of an Excel file with minimum VBA code. Ultimately I have a single Excel file gathering all the necessary tasks to manage my portfolio: database update, signal generation, orders submission etc… My approach  could be broken down in the 3 steps below:

  1. Use XML to build user defined menus and buttons  in an Excel file.
  2. The above menus and buttons are essentially calls to VBA functions.
  3. Those VBA functions are wrapup around R functions defined using BERT.

With this approach I can keep a clear distinction between the core of my code kept in R, SQL and Python and everything used to display and format results kept in Excel, VBA & XML. In the next sections I present the prerequisite to developed such an approach and a step by step guide that explains how BERT could be used for simply passing data from R to Excel with minimal VBA code.

Prerequisite

1 – Download and install BERT from this link. Once the installation has completed you should have a new Add-Ins menu in Excel with the buttons as shown below. This is how BERT materialized in Excel.

picture1

2 – Download and install Custom UI editor: The Custom UI Editor allows to create user defined menus and buttons in Excel ribbon. A step by step procedure is available here.

Step by step guide

1 – R Code: The below R function is a very simple piece of code for illustration purposes only. It calculates and return the residuals from a linear regression. This is what we want to retrieve in Excel. Save this in a file called myRCode.R (any other name is fine) in a directory of your choice.

myFunction <- function(){
 aa <- rnorm(200)
 bb <- rnorm(200)
 res <- lm(aa~bb)$res
 return(res)
}

2 – functions.R in BERT: From Excel select Add-Ins -> Home Directory and open the file called functions.R. In this file paste the following code. Make sure you insert the correct path.

source("D:\\myPath\\myRCode.R")

This is just sourcing into BERT the R file you created above. Then save and close the file functions.R. Should you want to make any change to the R file created in step 1 you will have to reload it using the BERT button “Reload Startup File” from the Add-Ins menu in Excel

3 – In Excel: Create and save a file called myFile.xslm (any other name is fine). This is a macro-enabled file that you save in the directory of your choice. Once the file is saved close it.

4 – Open the file created above in Custom UI editor: Once the file is open, paste the below code.

<customUI xmlns="http://schemas.microsoft.com/office/2009/07/customui">
 <ribbon startFromScratch="false">
 <tabs>
 <tab id="RTrader" label="RTrader">
 <group id="myGroup" label="My Group">
 <button id="button1" label="New Button" size="large" onAction="myRCode" imageMso="Chart3DColumnChart" />
 </group> 
 </tab>
 </tabs>
 </ribbon>
</customUI>

You should have something like this in the XML editor:

xmlcodepicture

Essentially this piece of XML code creates an additional menu (RTrader), a new group (My Group) and a user defined button (New Button) in the Excel ribbon. Once you’re done, open myFile.xslm in Excel and close the Custom UI Editor. You should see something like this.

picture2

5 – Open VBA editor: In myFile.xlsm insert a new module. Paste the code below in the newly created module.

Sub myRCode(control As IRibbonControl)
   Dim a As Variant
   Dim theLength As Integer
   ThisWorkbook.Sheets("Sheet1").Range("B1:B10000").ClearContents 
   a = Application.Run("BERT.Call", "myFunction") 
   theLength = UBound(a, 1) + 1 
   ThisWorkbook.Sheets("Sheet1").Range("B1:B" & theLength).Value = a 
End Sub 

This erases previous results in the worksheet prior to coping new ones.

6 – Click New Button: Now go back to the spreadsheet and in the RTrader menu click the “New Button” button. You should see something like the below appearing.

resultsexcelcapture

You’re done!

The guide above is a very basic version of what can be achieved using BERT but it shows you how to combine the power of several specific tools to build your own custom application. From my perspective the interest of such an approach is the ability to glue together R and Excel obviously but also to include via XML (and batch) pieces of code from Python, SQL and more. This is exactly what I needed. Finally I would be curious to know if anyone has any experience with BERT?

Trading strategy: Making the most of the out of sample data

When testing trading strategies a common approach is to divide the initial data set into in sample data: the part of the data designed to calibrate the model and out of sample data: the part of the data used to validate the calibration and ensure that the performance created in sample will be reflected in the real world. As a rule of thumb around 70% of the initial data can be used for calibration (i.e. in sample) and 30% for validation (i.e. out of sample). Then a comparison of the in and out of sample data help to decide whether the model is robust enough. This post aims at going a step further and provides a statistical method to decide whether the out of sample data is in line with what was created in sample.

In the chart below the blue area represents the out of sample performance for one of my strategies.

system1Performance

A simple visual inspection reveals a good fit between the in and out of sample performance but what degree of confidence do I have in this? At this stage not much and this is the issue. What is truly needed is a measure of similarity between the in and out of sample data sets. In statistical terms this could be translated as the likelihood that the in and out of sample performance figures coming from the same distribution. There is a non-parametric statistical test that does exactly this: the Kruskall-Wallis Test. A good definition of this test could be found on R-Tutor “A collection of data samples are independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution.” The added benefit of this test is not assuming a normal distribution.

It exists other tests of the same nature that could fit into that framework. The Mann-Whitney-Wilcoxon test or the Kolmogorov-Smirnov tests would perfectly suits the framework describes here however this is beyond the scope of this article to discuss the pros and cons of each of these tests. A good description along with R examples can be found here.

Here’s the code used to generate the chart above and the analysis:

################################################
## Making the most of the OOS data
##
## thertrader@gmail.com - Aug. 2016
################################################
library(xts)
library(PerformanceAnalytics)

thePath <- "myPath" #change this
theFile <- "data.csv" 
data <- read.csv(paste0(thePath,theFile),header=TRUE,sep=",")
data <- xts(data[,2],order.by=as.Date(as.character(data[,1]),format = "%d/%m/%Y"))

##----- Strategy's Chart
par(mex=0.8,cex=1)
thePeriod <- c("2012-02/2016-05")
chart.TimeSeries(cumsum(data),
 main = "System 1",
 ylab="",
 period.areas = thePeriod,
 grid.color = "lightgray",
 period.color = "slategray1")

##----- Kruskal tests
pValue <- NULL
i <- 1
while (i < 1000){
 isSample <- sample(isData,length(osData))
 pValue <- rbind(pValue,kruskal.test(list(osData, isSample))$p.value)
 i <- i + 1
}

##----- Mean of p-values 
mean(pValue)

In the example above the in sample period is longer than the out of sample period therefore I randomly created 1000 subsets of the in sample data each of them having the same length as the out of sample data. Then I tested each in sample subset against the out of sample data and I recorded the p-values. This process creates not a single p-value for the Kruskall-Wallis test but a distribution making the analysis more robust. In this example the mean of the p-values is well above zero (0.478) indicating that the null hypothesis should be accepted: there are strong evidences that the in and out of sample data is coming from the same distribution.

As usual what is presented in this post is a toy example that only scratches the surface of the problem and should be tailored to individual needs. However I think it proposes an interesting and rational statistical framework to evaluate out of sample results.

This post is inspired by the following two papers:

Vigier Alexandre, Chmil Swann (2007), “Effects of Various Optimization Functions on the Out of Sample Performance of Genetically Evolved Trading Strategies”, Forecasting Financial Markets Conference

Vigier Alexandre, Chmil Swann (2010), « An optimization process to improve in/out of sample consistency, a Stock Market case», JP Morgan Cazenove Equity Quantitative Conference, London October 2010

 

Introducing fidlr: FInancial Data LoadeR

fidlr is an RStudio addin designed to simplify the financial data downloading process from various providers. This initial version is a wrapper around the getSymbols function in the quantmod package and only Yahoo, Google, FRED and Oanda are supported. I will probably add functionalities over time. As usual with those things just a kind reminder: “THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND…”

How to install and use fidlr?

  1. You can get the addin/package from its Github repository here (I will register it on CRAN later on)
  2. Install the addin. There is an excellent tutorial to install RStudio Addins here.
  3. Once the addin is installed it should appear in the Addin menu. Just chose fidlr in the menu and a window as pictured below should appear.
  4. Choose a data provider from the  the Source dropdown menu.
  5. Select a date range from the Date menu
  6. Enter the symbol you wish to download in the instrument text box. To download several symbols just enter the symbols separated by commas.
  7. Use the Radio buttons to choose whether you want to download the instrument in a csv file or in the global environment. The csv file will be saved in the working directory and there will be one csv file per instrument.
  8. Press Run to get the data or Close to close down the addin

fidlr screenshot

Error messages and warnings are handled by the underlying packages (quantmod and  Shiny) and can be read from the console

This is a very first version of the project so do not expect perfection but hopefully it will get better over time. Please report any comment, suggestion, bug etc… to: thertrader@gmail.com

Enjoy!

The Rise of the Robots (Advisors…)

The Asset Management industry is on the verge of a major change. Over the last couple of years Robots Advisors (RA) have emerged as new players. The term itself is hard to define as it encompasses a large variety of services. Some are designed to help traditional advisers to better allocate their clients money and some are real “black box”. The user enter a few criteria (age , income, children etc…) and the robot proposes a tailor-made allocation. Between those two extremes a full range of offers is available. I found the Wikipedia definition pretty good. “They are a class of financial adviser that provides  portfolio management online with minimal human intervention”. More precisely they use algorithm-based portfolio management to offer the full spectrum of services a traditional adviser would offer: dividend reinvesting, compliance reports, portfolio rebalancing, tax loss harvesting etc… (well this is what the quantitative investment community is doing for decades!). The industry is still in its infancy with most players still managing a small amount of money but I only realised how profound the change was when I was in NYC a few days ago. When RA get their names on TV adds or on the roof of NYC cab you know something big is happening…

IMG_0861

it is getting more and more attention from the media and above all it makes a lot of sense from an investor perspective. There are actually two main advantages in using RA:

  • Significantly lower fees over traditional advisers
  • Investment is made more transparent and simpler which is more appealing to people with limited financial knowledge

In this post R is just an excuse to present nicely what is a major trend in the asset management industry. The chart below shows the market shares of most popular RA as of the end of 2014. The code used to generate the chart below can be found at the end of this post and the data is here.

robots

Those figures are a bit dated given how fast this industry evolves but are still very informative. Not surprisingly the market is dominated by US providers like Wealthfront and Betterment but RA do emerge all over the world: Asia (8Now!), Switzerland (InvestGlass), France (Marie Quantier)….. It is starting to significantly affect the way traditional asset managers are doing business. A prominent example is the partnership between Fidelity and Betterment. Since December 2014 Betterment past the $2 billion AUM mark.

Despite all the above, I think the real change is ahead of us. Because they use less intermediaries and low commission products (like ETFs) they charge much lower fees than traditional advisers. RA will certainly gain significant market shares but they will also lowers fees charged by the industry as a whole. Ultimately it will affect the way traditional investment firms do business. Active portfolio management which is having a tough time for some years now will suffer even more. The high fees it charges will be even harder to justify unless it reinvents itself. Another potential impact is the rise of ETFs and low commission financial products in general. Obviously this has started a while ago but I do think the effect will be even more pronounced in the coming years. New generations of ETFs track more complex indices and custom made strategies. This trend will get stronger inevitably.

As usual any comments welcome

#######################################################
# The Rise of the Robots (Advisors...)
#
# thertrader@gmail.com - August 2015
#######################################################
library(ggplot2)
library(stringr)
library(XML)

#######################################################
# STEP 1: Data gathering
#######################################################
tables <- readHTMLTable("http://investorhome.com/robos.htm")
a <- as.data.frame(tables)
b <- a[,c(1:3)]
colnames(b) <- c("company","aum","type")
c <- b[which(b[,"type"] == "Roboadvisor"),]
d <- b[which(b[,"type"] == "RoboAdvisor"),]
e <- rbind(c,d)
f <- as.numeric(str_replace_all(e[,2], "[[:punct:]]",""))
g <- cbind(e[,c("company","type")],f)
colnames(g) <- c("company","type","aum")

#######################################################
# STEP 2: Chart
#######################################################
globalAum <- 19000000000
h <- cbind(g,g[,"aum"]/globalAum)
colnames(h) <- c("company","type","aum","marketShare")
i <- cbind("Others","Roboadvisor",globalAum-sum(h[,"aum"]),1-sum(h[,"aum"]/globalAum))
colnames(i) <- c("company","type","aum","marketShare")
j <- rbind(h,i)
k <- j[order(j[,"marketShare"],decreasing =TRUE),]
k[,"marketShare"] <- round(100*as.numeric(k[,"marketShare"]),1)

# Bar plot with ggplot2
ggplot(data=k, aes(x=company, y=marketShare, fill=company)) +
 geom_bar(stat="identity") +
 theme(axis.ticks = element_blank(), axis.text.y = element_blank(), text = element_text(size = 20)) +
 coord_flip() +
 labs(title = "Robots Advisors \n Market Shares as of end 2014") +
 ylab("Market Share (%)") +
 xlab(" ") +
 scale_fill_brewer(palette="Spectral")

Factor Evaluation in Quantitative Portfolio Management

When it comes to managing a portfolio of stocks versus a benchmark the problem is very different from defining an absolute return strategy. In the former one has to hold more stocks than in the later where no stocks at all can be held  if there is not good enough opportunity.  The reason for that is the tracking error. This is defined as the standard deviation of  the portfolio return minus the benchmark return. The less stocks is held  vs. a benchmark the higher the tracking error (e.g higher risk).

The analysis that follows is largely inspired by the book  “Active Portfolio Management” by Grinold & Kahn. This is the bible for anyone interested in running a portfolio against a benchmark. I strongly encourage anyone with an interest in the topic to read the book from the beginning to the end.  It’s very well written and lays the foundations of systematic active portfolio management (I have no affiliation to the editor or the authors).

1 – Factor Analysis

Here we’re trying to rank as accurately as possible the stocks in the investment universe on a forward return basis. Many people came up with many tools and countless variant of those tools have been developed to achieve this. In this post I focus on two simple and widely used metrics: Information Coefficient (IC) and Quantiles Return (QR).

1.1 – Information Coefficient

The IC gives an overview of the factor forecasting ability. More precisely, this is a measure of how well the factor ranks the stocks on a forward return basis. The IC is defined as the rank correlation (ρ) between the metric (e.g. factor) and the forward return. In statistical terms the rank correlation is a nonparametric measure of dependance between two variables. For a sample of size n, the n raw scores X_i , Y_i are converted to ranks x_i , y_i, and ρ is computed from:  rho = 1 - {6 Sigma (x_i - y_i)^2}/{n(n^2-1)}

The horizon for the forward return has to be defined by the analyst and it’s a function of  the strategy’s turnover and the alpha decay (this has been the subject of extensive research). Obviously ICs must be as high as possible in absolute terms.

For the keen reader, in the book by Grinold & Kahn a formula linking Information Ratio (IR) and IC is given: IC = IR * sqrt{breadth} with breadth being the number of independent bets (trades).  This formula is known as the fundamental law of active management. The problem is that often, defining breadth accurately is not as easy as it sounds.

1.2 – Quantiles Return

In order to have a more accurate estimate of the factor predictive power it’s necessary to go a step further and group stocks by quantile of factor values then analyse the average forward return (or any other central tendency metric) of each of those quantiles. The usefulness of this tool is straightforward. A factor can have a good IC but its predictive power might be limited to a small number of stocks. This is not good as a portfolio manager will have to pick stocks within the entire universe in order to meet its tracking error constraint. Good quantiles return are characterised by a monotonous relationship between the individual quantiles and forward returns.

2 – Data and code

All the stocks in the S&P500 index (at the time of writing). Obviously there is a survival ship bias: the list of stocks in the index has changed significantly between the start and the end of the sample period, however it’s good enough for illustration purposes only.

The code below downloads individual stock prices in the S&P500 between Jan 2005 and today (it takes a while) and turns the raw prices into return over the last 12 months and the last month. The former is our factor, the latter will be used as the forward return measure.

#####################################################################
# Factor Evaluation in Quantitative Portfolio Management
#
# thertrader@gmail.com - Mar. 2015
#####################################################################
library(tseries)
library(quantmod)
library(XML)

startDate <- "2005-01-01"
tables <- readHTMLTable("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
tickers <- as.matrix(tables[[1]]["Ticker symbol"])

instrumentRtn <- function(instrument=instrument,startDate=startDate,lag=lag){
 price <- get.hist.quote(instrument, quote="Adj", start=startDate, retclass="zoo")
 monthlyPrice <- aggregate(price, as.yearmon, tail, 1)
 monthlyReturn <- diff(log(monthlyPrice),lag=lag)
 monthlyReturn <- exp(monthlyReturn)-1
 return(monthlyReturn)
}

dataFactor <- list()
dataRtn <- list()

for (i in 1:length(tickers)) {
 print(tickers[i])
 dataFactor[[i]] <- instrumentRtn(tickers[i],startDate,lag=12)
 dataRtn[[i]] <- instrumentRtn(tickers[i],startDate,lag=1)
}

Below is the code to compute Information Coefficient and Quantiles Return. Note that I used quintiles in  this example but any other grouping method (terciles, deciles etc…) can be used. it really depends on the sample size, what you want to capture and wether you want to have a broad overview or focus on distribution tails.  For estimating returns within each quintile, median has been used as the central tendency estimator. This measure is much less sensitive to outliers than arithmetic mean.

theDates <- as.yearmon(seq(as.Date(startDate), to=Sys.Date(), by="month"))

findDateValue <- function(x=x,theDate=theDate){
 pos <- match(as.yearmon(theDate),index(x))
 return(x[pos])
}

factorStats <- NULL

for (i in 1:(length(theDates)-1)){
 factorValue <- unlist(lapply(dataFactor,findDateValue,theDate=as.yearmon(theDates[i])))
 if (length(which(!is.na(factorValue))) > 10){
 print(theDates[i])
 bucket <- cut(factorValue,breaks=quantile(factorValue,probs=seq(0,1,0.2),na.rm=TRUE),labels=c(1:5),include.lowest = TRUE)
 rtnValue <- unlist(lapply(dataRtn,findDateValue,theDate=as.yearmon(theDates[i+1])))

 ##IC
 ic <- cor(factorValue,rtnValue,method="spearman",use="pairwise.complete.obs")

 ##QS
 quantilesRtn <- NULL

 for (j in sort(unique(bucket))){
   pos <- which(bucket == j)
   quantilesRtn <- cbind(quantilesRtn,median(rtnValue[pos],na.rm=TRUE))
 }
 factorStats <- rbind(factorStats,cbind(quantilesRtn,ic))
 }
}

colnames(factorStats) <- c("Q1","Q2","Q3","Q4","Q5","IC")

qs <- apply(factorStats[,c("Q1","Q2","Q3","Q4","Q5")],2,median,na.rm=TRUE)
ic <- round(median(factorStats[,"IC"],na.rm=TRUE),4)

And finally the code to produce the Quantiles Return chart.


par(cex=0.8,mex=0.8)
bplot <- barplot(qs,
  border=NA,
  col="royal blue",
  ylim=c(0,max(qs)+0.005),
  main="S&P500 Universe \n 12 Months Momentum Return - IC and QS")

abline(h=0)

legend("topleft",
   paste("Information Coefficient = ",ic,sep=""),
   bty = "n")

 ICandQS - Mar2015

3 – How to exploit the information above?

In the chart above Q1 is lowest past 12 months return and Q5 highest. There is an almost monotonic increase in the quantiles return between Q1 and Q5 which clearly indicates that stocks falling into Q5 outperform those falling into Q1 by about 1% per month. This is very significant and powerful for such a simple factor (not really a surprise though…). Therefore there are greater chances to beat the index by overweighting the stocks falling into Q5 and underweighting those falling into Q1 relative to the benchmark.

An IC of 0.0206 might not mean a great deal in itself but it’s significantly different from 0 and indicates a good predictive power of the past 12 months return overall. Formal significance tests can be evaluated but this is beyond the scope of this article.

4 – Practical limitations

The above framework is excellent for evaluating investments factor’s quality however there are a number of practical limitations that have to be addressed for real life implementation:

  • Rebalancing: In the description above, it’s assumed that at the end of each month the portfolio is fully rebalanced. This means all stocks falling in Q1 are underweight and all stocks falling in Q5 are overweight relative to the benchmark. This is not always possible for practical reasons: some stocks might be excluded from the investment universe, there are constraints on industry or sector weight, there are constraints on turnover etc…
  • Transaction Costs: This has not be taken into account in the analysis above and this is a serious brake to real life implementation. Turnover considerations are usually implemented in real life in a form of penalty on factor quality.
  • Transfer coefficient: This is an extension of the fundamental law of active management and it relaxes the assumption of Grinold’s model that managers face no constraints which preclude them from translating their investments insights directly into portfolio bets.

And finally, I’m amazed by what can be achieved in less than 80 lines of code with R…

As usual any comments welcome