When it comes to managing a portfolio of stocks versus a benchmark the problem is very different from defining an absolute return strategy. In the former one has to hold more stocks than in the later where no stocks at all can be held if there is not good enough opportunity. The reason for that is the **tracking error**. This is defined as the standard deviation of the portfolio return minus the benchmark return. The less stocks is held vs. a benchmark the higher the tracking error (e.g higher risk).

The analysis that follows is largely inspired by the book “Active Portfolio Management” by Grinold & Kahn. This is the bible for anyone interested in running a portfolio against a benchmark. I strongly encourage anyone with an interest in the topic to read the book from the beginning to the end. It’s very well written and lays the foundations of systematic active portfolio management (I have no affiliation to the editor or the authors).

**1 – Factor Analysis**

Here we’re trying to rank as accurately as possible the stocks in the investment universe on a forward return basis. Many people came up with many tools and countless variant of those tools have been developed to achieve this. In this post I focus on two simple and widely used metrics: **Information Coefficient** (IC) and** Quantiles Return** (QR).

**1.1 – Information Coefficient**

The IC gives an overview of the factor forecasting ability. More precisely, this is a measure of how well the factor ranks the stocks on a forward return basis. The IC is defined as the rank correlation (*ρ*) between the metric (e.g. factor) and the forward return. In statistical terms the rank correlation is a nonparametric measure of dependance between two variables. For a sample of size *n*, the *n* raw scores are converted to ranks , and *ρ* is computed from:

The horizon for the forward return has to be defined by the analyst and it’s a function of the strategy’s turnover and the alpha decay (this has been the subject of extensive research). Obviously ICs must be as high as possible in absolute terms.

For the keen reader, in the book by Grinold & Kahn a formula linking Information Ratio (IR) and IC is given: with breadth being the number of independent bets (trades). This formula is known as **the fundamental law of active management**. The problem is that often, defining breadth accurately is not as easy as it sounds.

**1.2 – Quantiles Return**

In order to have a more accurate estimate of the factor predictive power it’s necessary to go a step further and group stocks by quantile of factor values then analyse the average forward return (or any other central tendency metric) of each of those quantiles. The usefulness of this tool is straightforward. A factor can have a good IC but its predictive power might be limited to a small number of stocks. This is not good as a portfolio manager will have to pick stocks within the entire universe in order to meet its tracking error constraint. Good quantiles return are characterised by a monotonous relationship between the individual quantiles and forward returns.

**2 – Data and code**

All the stocks in the S&P500 index (at the time of writing). Obviously there is a survival ship bias: the list of stocks in the index has changed significantly between the start and the end of the sample period, however it’s good enough for illustration purposes only.

The code below downloads individual stock prices in the S&P500 between Jan 2005 and today (it takes a while) and turns the raw prices into return over the last 12 months and the last month. The former is our factor, the latter will be used as the forward return measure.

##################################################################### # Factor Evaluation in Quantitative Portfolio Management # # thertrader@gmail.com - Mar. 2015 ##################################################################### library(tseries) library(quantmod) library(XML) startDate <- "2005-01-01" tables <- readHTMLTable("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies") tickers <- as.matrix(tables[[1]]["Ticker symbol"]) instrumentRtn <- function(instrument=instrument,startDate=startDate,lag=lag){ price <- get.hist.quote(instrument, quote="Adj", start=startDate, retclass="zoo") monthlyPrice <- aggregate(price, as.yearmon, tail, 1) monthlyReturn <- diff(log(monthlyPrice),lag=lag) monthlyReturn <- exp(monthlyReturn)-1 return(monthlyReturn) } dataFactor <- list() dataRtn <- list() for (i in 1:length(tickers)) { print(tickers[i]) dataFactor[[i]] <- instrumentRtn(tickers[i],startDate,lag=12) dataRtn[[i]] <- instrumentRtn(tickers[i],startDate,lag=1) }

Below is the code to compute Information Coefficient and Quantiles Return. Note that I used quintiles in this example but any other grouping method (terciles, deciles etc…) can be used. it really depends on the sample size, what you want to capture and wether you want to have a broad overview or focus on distribution tails. For estimating returns within each quintile, median has been used as the central tendency estimator. This measure is much less sensitive to outliers than arithmetic mean.

theDates <- as.yearmon(seq(as.Date(startDate), to=Sys.Date(), by="month")) findDateValue <- function(x=x,theDate=theDate){ pos <- match(as.yearmon(theDate),index(x)) return(x[pos]) } factorStats <- NULL for (i in 1:(length(theDates)-1)){ factorValue <- unlist(lapply(dataFactor,findDateValue,theDate=as.yearmon(theDates[i]))) if (length(which(!is.na(factorValue))) > 10){ print(theDates[i]) bucket <- cut(factorValue,breaks=quantile(factorValue,probs=seq(0,1,0.2),na.rm=TRUE),labels=c(1:5),include.lowest = TRUE) rtnValue <- unlist(lapply(dataRtn,findDateValue,theDate=as.yearmon(theDates[i+1]))) ##IC ic <- cor(factorValue,rtnValue,method="spearman",use="pairwise.complete.obs") ##QS quantilesRtn <- NULL for (j in sort(unique(bucket))){ pos <- which(bucket == j) quantilesRtn <- cbind(quantilesRtn,median(rtnValue[pos],na.rm=TRUE)) } factorStats <- rbind(factorStats,cbind(quantilesRtn,ic)) } } colnames(factorStats) <- c("Q1","Q2","Q3","Q4","Q5","IC") qs <- apply(factorStats[,c("Q1","Q2","Q3","Q4","Q5")],2,median,na.rm=TRUE) ic <- round(median(factorStats[,"IC"],na.rm=TRUE),4)

And finally the code to produce the Quantiles Return chart.

par(cex=0.8,mex=0.8) bplot <- barplot(qs, border=NA, col="royal blue", ylim=c(0,max(qs)+0.005), main="S&P500 Universe \n 12 Months Momentum Return - IC and QS") abline(h=0) legend("topleft", paste("Information Coefficient = ",ic,sep=""), bty = "n")

**3 – How to exploit the information above?**

In the chart above Q1 is lowest past 12 months return and Q5 highest. There is an almost monotonic increase in the quantiles return between Q1 and Q5 which clearly indicates that stocks falling into Q5 outperform those falling into Q1 by about 1% per month. This is very significant and powerful for such a simple factor (not really a surprise though…). Therefore there are greater chances to beat the index by overweighting the stocks falling into Q5 and underweighting those falling into Q1 relative to the benchmark.

An IC of 0.0206 might not mean a great deal in itself but it’s significantly different from 0 and indicates a good predictive power of the past 12 months return overall. Formal significance tests can be evaluated but this is beyond the scope of this article.

**4 – Practical limitations**

The above framework is excellent for evaluating investments factor’s quality however there are a number of practical limitations that have to be addressed for real life implementation:

**Rebalancing**: In the description above, it’s assumed that at the end of each month the portfolio is fully rebalanced. This means all stocks falling in Q1 are underweight and all stocks falling in Q5 are overweight relative to the benchmark. This is not always possible for practical reasons: some stocks might be excluded from the investment universe, there are constraints on industry or sector weight, there are constraints on turnover etc…**Transaction Costs**: This has not be taken into account in the analysis above and this is a serious brake to real life implementation. Turnover considerations are usually implemented in real life in a form of penalty on factor quality.**Transfer coefficient**: This is an extension of the fundamental law of active management and it relaxes the assumption of Grinold’s model that managers face no constraints which preclude them from translating their investments insights directly into portfolio bets.

And finally, I’m amazed by what can be achieved in less than 80 lines of code with R…

As usual any comments welcome

Nice article!

You mention a significance test could be performed. Any insight to how this test could be carried out?

Thanks!

Hi David,

You could start with testing whether the IC is significantly different from 0. A T-Test would do the job.

Hope this helps

Hi; I tried running your code. It certainly takes a while but does complete the first section.

I had difficult with the second section, particularly:

factorStats <- xts(factorStats,order.by=theDates[1:(length(theDates)-1)])

which returned an error:

Error in xts(factorStats, order.by = theDates[1:(length(theDates) – 1)]) :

NROW(x) must match length(order.by)

I found the length(theDates) on the day that I ran the code to be 123 but the number of elements in factorStats to be 110.

So I adjusted xts(factorStats, order.by = theDates[1:(length(theDates) – 1)])

to

xts(factorStats, order.by = theDates[1:(length(theDates) – 13)]).

Unfortunately my programing skills are not such that I can improve the generality of the function to make it applicable to all dates as required by

to=Sys.Date() in the defining function for theDates.

Secondly, I found a small typographic error in the bplot. I think you meant

ylim=c(0,max(qs)+0.005) not ylim=c(0,max(q)+0.005).

Jan

Jan,

Thank you for checking the code in full details.

I simply removed the line: factorStats <- xts(factorStats,order.by=theDates[1:(length(theDates)-1)]) It's not really used in the analysis, I just like putting object in xts format should I need to use them later. Regarding your second point yes this was a typo and I updated the post. Feel free to get back to me should you have further comments. Best, The R Trader

Hi thans for your post

My question is also on code for portfolio optimisation in R. For example if I want to compare and analyse a porfolio with long and short stock,which are weighted against an underlying benchmark, is there a code in R I can use?

Hi,

Thx for reaching out. I’m not sure there is something off-the-shelf to do exactly what you want but you can have a look at the PerformanceAnalytics package. There are a lot of functions that deal with benchmark relative performance metrics (Return.Relative, chart.CaptureRatio etc…). This will be a good starting point.

Hope this helps

Nice article.

tables <- readHTMLTable("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"😉

Error: failed to load HTTP resource

how can I fix this?

Thanks.

The table isn’t available anymore at this adress. The new one is:

https://en.wikipedia.org/wiki/List_of_S%26P_500_companies. However the code doesn’t work with this address either because it’s not recognise as an HTML table. I’ll have a look at it in coming days.

Arnaud

For anyone who stumbles across this great post, here is how I fixed the wikipedia request issue:

library(XML)

library(httr)

SampleTickers <- GET("http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"😉

tables <- readHTMLTable(rawToChar(SampleTickers$content), stringsAsFactors=F)

If you are using the past 12 months as your factor and the past 1 month as your forward return measure, wouldn’t it skew the results because your forward return measure is within the 12 months you are using as a predictive factor?

In other words, stocks that performed well last month are obviously more likely to have good past 12-month returns which includes the last month. Is there something in your code that I am missing that separates 12 months of returns from the 1 month you are using to evaluate the predictive power of the factor?

Hi,

Sorry for the late reply. The 2 periods (last 12 months & last month) should be non overlapping periods for the reasons you mentioned. That was my intention to create non overlapping periods when I wrote the post. I have to go back to the code to check this.

Thx

Arnaud