A Simple Shiny App for Monitoring Trading Strategies – Part II

This is a follow up on my previous post “A Simple Shiny App for Monitoring Trading Strategies“.  I added a few improvements that make the app a bit better (at least for me!). Below is the list of new features :

  • A sample  .csv file (the one that contains the raw data)
  • A “EndDate”  drop down box allowing to specify the end of the period.
  • A “Risk” page containing a VaR analysis and a chart of worst performance over various horizons
  • A “How To” page explaining how to use and tailor the app to individual needs

I also made the app totally self contained. It is now available as a stand alone product and there is no need to have R/RStudio installed on your computer to run it. It can be downloaded from the R Trader Google drive account. This version of the app runs using portable R and portable Chrome. For the keen reader, this link explains in full details how to package a Shiny app into a desktop app (Windows only for now).

1 – How to install & run the app on your computer

  • Create a specific folder
  • Unzip the contain of the .zip file onto that new folder.
  • Change the paths in the runShinyApp file to match your setings
  • To run the app, you just have launch the run.vbs file. I also included an icon (RTraderTradingApp.ico) should you want to create a shortcut on your desktop.

2 - How to use the app as it is?

The app uses as input several csv files (one for each strategy). Each file has two columns: date and daily return. There is an example of such a file in the Github repository. The code is essentially made of 3 files.
  • ui.R: controls the layout and appearance of the app
  • server.R: contains the instructions needed to build the app. You can load as much strategies as you want as long as the corresponding csv file has the right format (see below).
  • shinyStrategyGeneral.R: loads the required packages and launches the app
put ui.R and server.R file in a separate directory
In the server.R file change the inputPath, inputFile and keepColumns parameters to match your settings. The first two are self explanatory the third one is a list of column names within the csv file. Keep only date and daily return.

3 – How to add a trading strategy?

  • Create the corresponding .csv file in the right directory
  • Create a new input in the data reactive function (within the server.R file)
  • Add an extra element to the choice parameter in the first selectInput in the sidebarPanel (within the ui.R file). The element’s name should match the name of the new input above.

4 - How to remove a trading strategy?

  • Remove the input in the data reactive function corresponding to the strategy you want to remove (within the server.R file)
  • Remove the element in the choice parameter in the first selectInput in the sidebarPanel corresponding to the strategy you want to remove (within the ui.R file).

Please feel free to get in touch should you have any suggestion.

 

 

A Simple Shiny App for Monitoring Trading Strategies

In a previous post I showed how to use  R, Knitr and LaTeX to build a template strategy report. This post goes a step further by making  the analysis  interactive. Besides the interactivity, the Shiny App also solves two problems :

  • I can now access all my trading strategies from a single point regardless of the instrument traded. Coupled with the Shiny interactivity, it allows easier comparison.
  • I can focus on a specific time period.

The code used in this post is available on a Gist/Github repository. There are essentially 3 files.

  • ui.R:  controls the layout and appearance of the app.
  • server.R: contains the instructions needed to build the app.  It loads the data and format it. There is one csv file per strategy each containing at least two columns: date and return with the following format: (“2010-12-22″,”0.04%”  ). You can load as much strategies as you want as long as they have the right format.
  • shinyStrategyGeneral.R: loads the required packages and launches the app.

This app is probably far from perfect and I will certainly improve it in the future. Feel free to get in touch should you have any suggestion.

Capture1

Capture2

A big thank you to the RStudio/Shiny team for such a great tool.

 

Date formating in R

As I often manipulate time series from different sources, I rarely come across the same date format twice. Having to reformat the dates every time is a real waste of time because I never remember the syntax of the as.Date function. I put below a few examples that turn strings into standard R date format.

Besides the usual transformations, two tricks are worth mentioning:

  • When dates are given in two digits format, R century has to be adjusted depending on whether it is before or after 1969 (example 4 below).
  • When data is coming from Excel as an integer number (I am on Windows, it might be different for Mac users) the origin has to be specified in the as.Date function (example 9 below).

I usually refer to those examples when I have to create R dates. The code below is self explanatory.

rawDate1 <- "6aug2005"
date1 <- as.Date(rawDate1, format = "%d%B%Y")

rawDate2 <- "aug061999"
date2 <- as.Date(rawDate2, format = "%B%d%Y")

rawDate3 <- "12-05-2001"
date3 <- as.Date(rawDate3, format = "%m-%d-%Y")

rawDate4 <- "05/27/25"
## if you mean 2025
date4 <- as.Date(rawDate4, format = "%m/%d/%y")
## if you mean 1925
date4 <- as.Date(format(as.Date(rawDate4, format = "%m/%d/%y"), "19%y/%m/%d"),"%Y/%m/%d")

rawDate5 <- "May 27 1984"
date5 <- as.Date(rawDate5, format = "%B %d %Y")

rawDate6 <- "1998-07-22"
date6 <- as.Date(rawDate6, format = "%Y-%m-%d")

rawDate7 <- "20041024"
date7 <- as.Date(rawDate7, format = "%Y%m%d")

rawDate8 <- "22.10.2004"
date8 <- as.Date(rawDate8, format = "%d.%m.%Y")

## Excel on windows date format (origin as of December 30, 1899)
rawDate9 <- 36529
date9 <- as.Date(rawDate9, origin = "1899-12-30")

For those of you who wish to go further, I recommend the following link: Dates and Times in R. It is also worth mentioning the lubridate package and the date package. Both of them provide advanced functions for handling dates and times.

Using Genetic Algorithms in Quantitative Trading

The question one should always asked him/herself when using technical indicators is what would be an objective criteria to select indicators parameters (e.g., why using a 14 days RSI rather than 15 or 20 days?). Genetic algorithms (GA) are well suited tools to answer that question. In this post I’ll show you how to set up the problem in R. Before I proceed the usual reminder: What I present in this post is just a toy example and not an invitation to invest. It’s not a finished strategy either but a research idea that needs to be further researched, developed and tailored to individual needs.

What are genetic algorithms?

The best description of GA I came across comes from Cybernatic Trading a book by Murray A. Ruggiero. “Genetic Algorithms were invented by John Holland in the mid-1970 to solve hard optimisation problems. This method uses natural selection, survival of the fittest”. The general process follows the steps below:

  1. Encode the problem into chromosomes
  2. Using the encoding, develop a fitness function for use in evaluating each chromosome’s value in solving a given problem
  3. Initialize a population of chromosomes
  4. Evaluate each chromosome in the population
  5. Create new chromosomes by mating two chromosomes. This is done by  muting and recombining two parents to form two children (parents are selected randomly but biased by their fitness)
  6. Evaluate the new chromosome
  7. Delete a member of the population that is less fit than the new chromosome and insert the new chromosome in the population.
  8. If the stop criteria is reached (maximum number of generations, fitness criteria is good enough…) then return the best chromosome alternatively go to step 4

From a trading perspective GA are very useful because they are good at dealing with highly nonlinear problems. However they exhibit some nasty features that are worth mentioning:

  • Over fitting: This is the main problem and it’s down to the analyst to set up the problem in a way that minimises this risk.
  • Computing time: If the problem isn’t properly defined, it can be extremely long to reach a decent solution and the complexity increases exponentially with the number of variables. Hence the necessity to carefully select the parameters.

There are several R packages dealing with GA, I chose to use the most common one: rgenoud

Data & experiment design

Daily closing prices for most liquid ETFs from Yahoo finance going back to January 2000. The in sample period goes from January 2000 to December 2010. The Out of sample period starts on January 2011.

The logic is as following: the fitness function is optimised  over the in sample period to obtain a set of optimal parameters for the selected technical indicators. The performance of those indicators is then evaluated  in the out of sample period. But before doing so the technical indicators have to be selected.

The equity market exhibits two main characteristics that are familiar to anyone with some trading experience. Long term momentum and short term reversal. Those features can be translated in term of technical indicators by: moving averages cross over and RSI. This represents a set of 4 parameters: Look-back periods for long and short term moving averages, look-back period for RSI and RSI threshold. The sets of parameters are the chromosomes. The other key element is the fitness function. We might want to use something like: maximum return or Sharpe ratio or minimum average Drawdown. In what follows, I chose to maximise the Sharpe ratio.

The R implementation is a set of 3 functions:

  1. fitnessFunction: defines the fitness function (e.g., maximum Sharpe ratio) to be used within the GA engine
  2. tradingStatistics: summary of trading statistics for the in and out of sample periods for comparison purposes
  3. genoud: the GA engine from the rgenoud package

The genoud function is rather complex but I’m not going to explain what each parameter means as I want to keep this post short (and the documentation is really good).

Results

In the table below I present for each instrument the optimal parameters (RSI look-back period, RSI threshold, Short Term Moving Average, and Long Term Moving Average) along with the in and out of sample trading statistics.

Instrument/Parameters In Sample Out Of Sample
SPY c(31,62,32,76) total Return = 14.4%
Number of trades = 60
Hit ratio = 60%
total Return = 2.3%
Number of trades = 8
Hit ratio = 50%
EFA c(37,60,36,127) total Return = 27.6%
Number of trades = 107
Hit ratio = 57%
total Return = 2.5%
Number of trades = 11
Hit ratio = 64%
EEM c(44,55,28,90) total Return = 39.1%
Number of trades = 85
Hit ratio = 58%
total Return = 1.0%
Number of trades = 17
Hit ratio = 53%
EWJ c(44,55,28,90) total Return = 15.7%
Number of trades = 93
Hit ratio = 54%
total Return = -13.1%
Number of trades = 31
Hit ratio = 45%

Before commenting the above results, I want to explain a few important points. To match the logic defined above, I bounded the parameters to make sure the look-back period for the long term moving average is always longer that the shorter moving average. I also constrained the optimiser to choose only the solutions with more than 50 trades in the in sample period (e.g;, statistical significance).

Overall the out of sample results are far from impressive. The returns are low even if the number of trades is small to make the outcome really significant. However there’s a significant loss of efficiency between in and out of sample period for Japan (EWJ) which very likely means over fitting.

Conclusion

This post is intended to give the reader the tools to properly use GA in a quantitative trading framework. Once again, It’s just an example that needs to be further refined. A few potential improvement to explore would be:

  • fitness function: maximising the Sharpe ratio is very simplistic. A “smarter” function would certainly improve the out of sample trading statistics
  • pattern: we try to capture a very straightforward pattern. A more in depth pattern research is definitely needed.
  • optimisation: there are many ways to improve the way the optimisation is conducted. This would improve both the computation speed and the rationality of the results.

The code used in this post is available on a Gist repository.

As usual any comments welcome

Using CART for Stock Market Forecasting

There is an enormous body of literature both academic and empirical about market forecasting. Most of the time it mixes two market features: Magnitude and Direction. In this article I want to focus on identifying the market direction only. The goal I set myself, is to identify market conditions when the odds are significantly biased toward an up or a down market. This post gives an example of how CART (Classification And Regression Trees) can be used in this context. Before I proceed the usual reminder: What I present in this post is just a toy example and not an invitation to invest. It’s not a finished strategy either but a research idea that needs to be further researched, developed and tailored to individual needs.

1 – What is CART and why using it?

From statistics.com, CART are a set of techniques for classification and prediction. The technique is aimed at producing rules that predict the value of an outcome (target) variable from known values of predictor (explanatory) variables. There are many different implementations but they are all sharing a general characteristic and that’s what I’m interested in. From Wikipedia, “Algorithms for constructing decision trees usually work top-down, by choosing a variable at each step that best splits the set of items. Different algorithms use different metrics for measuring “best”. These generally measure the homogeneity of the target variable within the subsets. These metrics are applied to each candidate subset, and the resulting values are combined (e.g., averaged) to provide a measure of the quality of the split”.

CART methodology exhibits some characteristics that are very well suited for market analysis:

  • Non parametric: CART can handle any type of statistical distributions
  • Non linear: CART can handle a large spectrum of dependency between variables (e.g., not limited to linear relationships)
  • Robust to outliers

There are various R packages dealing with Recursive Partitioning, I use here rpart for trees estimation and rpart.plot for trees drawing.

2 – Data & Experiment Design

Daily OHLC prices for most liquid ETFs from January 2000 to December 2013 extracted from Google finance. The in sample period goes from January 2000 to December 2010;  the rest of the dataset is the out of sample period. Before running any type of analysis the dataset has to be prepared for the task.

The target variable is the ETF weekly forward return defined as a two states of the world  outcome (UP or DOWN). If weekly forward return > 0 then the market in the UP state, DOWN state otherwise

The explanatory variables are a set of technical indicators derived from the initial daily OHLC dataset. Each indicator represents a well-documented market behavior.  In order to reduce the noise in the data and to try to identify robust relationships, each independent variable is considered to have a binary outcome.

  • Volatility (VAR1): High volatility is usually associated with a down market and low volatility with an up market. Volatility is defined as the 20 days raw ATR (Average True Range) spread to its moving average (MA).  If raw ATR > MA then VAR1 = 1, else VAR1 = -1.
  • Short term momentum (VAR2): The equity market exhibits short term momentum behavior  captured here by a 5 days simple moving averages (SMA). If  Price > SMA  then VAR2 = 1 else VAR2 = -1
  • Long term momentum (VAR3): The equity market exhibits long term momentum behavior  captured here by a 50 days simple moving averages (LMA). If Price > LMA then VAR3 = 1 else VAR3  = -1
  • Short term reversal (VAR4): This is captured by the CRTDR which stands for Close Relative To Daily Range and calculated as following:  CRTDR = {Close - Low }/ {High - Low}. If CRTDR > 0.5, then VAR4 = 1 else VAR4 = -1
  • Autocorrelation regime (VAR5):  The equity market tends to go through periods of negative and positive autocorrelation regimes. If returns autocorrelation over the last 5 days  > 0 then VAR5 = 1 else VAR5 = -1

I put below a tree example with some explanations

tree

In the tree above, the path to reach node #4 is: VAR3 >=0 (Long Term Momentum >= 0)  and  VAR4 >= 0 (CRTDR >= 0).  The red rectangle indicates this is a DOWN leaf (e.g., terminal node) with a probability of 58% (1 – 0.42). In market terms this means that if Long Term Momentum is Up and CRTDR is > 0.5 then the probability of a positive return next week is 42% based on the in sample sample data. 18% indicates the proportion of the data set that falls into that terminal node (e.g., leaf).

There are many ways to use the above approach, I chose to estimate and combine all possible trees. From the in sample data, I collect all leaves from all possible trees and I gather them into a matrix. This is the “rules matrix”  giving the probability of next week beeing UP or DOWN.

3 – Results

I apply the rules in the above matrix to the out of sample data  (Jan 2011 – Dec 2013) and I compare the results to the real outcome. The problem with this approach is that a single point (week) can fall into several rules and even belong to UP and DOWN rules simultaneously. Therefore I apply a voting scheme. For a given week I sum up all the rules that apply to that week giving a +1 for an UP rule and -1 for a DOWN rule. If the sum is greater than 0 the week is classified as UP, if the sum is negative it’s a DOWN week and if the sum is equal to 0 there will be no position taken that week (return = 0)

The above methodology is applied to a set of very liquid ETFs. I plot below the out of sample equity curves along with the buy and hold strategy over the same period.

etfOOSPerf

4 – Conclusion

Initial results seem encouraging even if the quality of the outcome varies greatly by instrument. However there is a huge room for improvement. I put below some directions for further analysis

  • Path optimality: The algorithm used here for defining the trees is optimal at each split but it doesn’t guarantee the optimality of the path. Adding a metric to measure the optimality of the path would certainly improve the above results.
  • Other variables: I chose the explanatory variables solely based on experience. It’s very likely that this choice is neither good nor optimal.
  • Backtest methodology: I used a simple In and Out of sample methodology. In a more formal backtest I would rather use a rolling or expanding window of in and out sample sub-periods (e.g., walk forward analysis)

As usual, any comments welcome

 

A million ways to connect R and Excel

In quantitative finance both R and Excel are the basis tools for any type of analysis. Whenever one has to use Excel in conjunction with R, there are many ways to approach the problem and many solutions. It depends on what you really want to do and the size of the dataset you’re dealing with. I list some possible connections in the table below.

I want to… R function/package
Read Excel spreadsheet in R gdata
RODBC
XLConnect
xlsx
xlsReadWrite
read.table(“clipboard”)
RExcel
Read R output in Excel write.table
RExcel
Execute R code in VBA Custom function
RExcel
Execute R code from Excel spreadsheet RExcel
Execute VBA code in R Custom function
Fully integrate R and Excel RExcel

 

1 – Read Excel spreadsheet in R

  • gdata: it requires you to install additional Perl libraries on Windows platforms but it’s very powerful.
require(gdata)
myDf <- read.xls ("myfile.xlsx"), sheet = 1, header = TRUE)
  • RODBC: This is reported for completeness only. It’s rather dated; there are better ways to interact with Excel nowadays.
  • XLConnect:  It might be slow for large dataset but very powerful otherwise.
require(XLConnect)
wb <- loadWorkbook("myfile.xlsx")
myDf <- readWorksheet(wb, sheet = "Sheet1", header = TRUE)
  • xlsx:  Prefer the read.xlsx2() over read.xlsx(), it’s significantly faster for large dataset.
require(xlsx)
read.xlsx2("myfile.xlsx", sheetName = "Sheet1")
  • xlsReadWrite: Available for Windows only. It’s rather fast but doesn’t support .xlsx files which is a serious drawback. It has been removed from CRAN lately.
  • read.table(“clipboard”):  It allows to copy data from Excel and read it directly in R. This is the quick and dirty R/Excel interaction but it’s very useful in some cases.
myDf <- read.table("clipboard")

2 – Read R output in Excel
First create a csv output from an R data.frame then read this file in Excel. There is one function that you need to know it’s write.table. You might also want to consider: write.csv which uses “.” for the decimal point and a comma for the separator and write.csv2 which uses a comma for the decimal point and a semicolon for the separator.

x <- cbind(rnorm(20),runif(20))
colnames(x) <- c("A","B")
write.table(x,"your_path",sep=",",row.names=FALSE)

3 – Execute R code in VBA
RExcel is from my perspective the best suited tool but there is at least one alternative. You can run a batch file within the VBA code.  If R.exe is in your PATH, the general syntax for the batch file (.bat) is:

R CMD BATCH [options] myRScript.R

Here’s an example of how to integrate the batch file above within your VBA code.

4 - Execute R code from an Excel spreadsheet
Rexcel is the only tool I know for the task. Generally speaking once you installed RExcel you insert the excel code within a cell and execute from RExcel spreadsheet menu. See the RExcel references below for an example.

5 – Execute VBA code in R 
This is something I came across but I never tested it myself. This is a two steps process. First write a VBscript wrapper that calls the VBA code. Second run the VBscript in R with the system or shell functions. The method is described in full details here.

6 – Fully integrate R and Excel
RExcel is a project developped by Thomas Baier and Erich Neuwirth, “making R accessible from Excel and allowing to use Excel as a frontend to R”. It allows communication in both directions: Excel to R and R to Excel and covers most of what is described above and more. I’m not going to put any example of RExcel use here as the topic is largely covered elsewhere but I will show you where to find the relevant information. There is a wiki for installing RExcel and an excellent tutorial available here.  I also recommand the following two documents: RExcel – Using R from within Excel and High-Level Interface Between R and Excel. They both give an in-depth view of RExcel capabilities.

The list above is probably not exhaustive. Feel free to come back to me for any addition or modification you might find useful. All code snipets have been created by Pretty R at inside-R.org

Overnight vs. Intraday ETF Returns

I haven’t done much “googling” before posting, so this topic might have been covered elsewhere but I think it’s  really worth sharing or repeating anyway.

A lot has been written about the source of  ETF returns (some insights might be found here). In a nutshell some analysis found that the bulk of the return is made overnight (return between close price at t and open price at t+1). This is only partially true as it hides some major differences across asset classes and regions. The table below displays the sum of daily returns (close to close) , intraday returns (open to close) and overnight returns (close to open) for most liquid ETF over a period going from today back to January 1st 2000 when data is available. The inception date of the ETF is used when no data is available prior to January 1st 2000.

ETF Daily Rtn Intraday Rtn Overnight Rtn
SPY 53.7% -8.1% 59.2%
QQQ 10.7% -84.3% 93.3%
IWN 81.8% 30.4% 52.1%
EEM 51.5% -42.5% 83.8%
EFA 13.2% 73.3% -61.5%
EWG 77.7% 143.1% -62.6%
EWU 41.2% 132.3% -84.5%
EWL 109.4% 229.9% -110.3%
EWJ 10.4% 115% -107.9%
FXI 72.8% 13.8% 45.3%
EWS 89.7% -83.9% 175.9%
GLD 120.9% 18.7% 101.1%
GDX 29% -270.2% 293.5%
SLV -2.8% -36.6% 39.1%
USO -21.6% 56.7% -79.5%
SHY 4% 10.7% -6.5%
IEF 23.5% 37.4% -13.4%
TLT 37.1% 50.6% -13.5%
LQD 16.7% -36.3% 54.3%

A few obvious features clearly appear

  • For US Equity markets (SPY, QQQ, IWM), Emerging Equity Markets (EEM), Metals (GLD,GDX,SLV) and Investment Grades (LQD) the bulk of the return is definitely made overnight. Intraday returns tend to deteriorate the overall performance (intraday return < 0)
  • The exact opposite is true for European Equity Markets (EFA,EWG,EWU,EWL), US Bonds (SHY,IEF,TLT) and Oil (USO). Overnight returns are detracting significantly from the overall performance.

I didn’t manage to come up with a decent explanation about why this is happening but I’m keen on learning if someone is willing to share! I’m not too sure at this stage how this information can be used but it has to be taken into account somehow.

Below is the code for generating the analysis above.

####################################################
## OVERNIGHT RETURN IN ETF PRICES
##
## thertrader@gmail.com - Jan 2014
####################################################
library(quantmod)

symbolList <- c("SPY","QQQ","IWN","EEM","EFA","EWG","EWU","EWL","EWJ","FXI","EWS","GLD","GDX","SLV","USO","SHY","IEF","TLT","LQD")

results <- NULL

for (ii in symbolList){
  data <- getSymbols(Symbols = ii, 
                     src = "yahoo", 
                     from = "2000-01-01", 
                     auto.assign = FALSE)

  colnames(data) <- c("open","high","low","close","volume","adj.")

  dailyRtn <- (as.numeric(data[2:nrow(data),"close"])/as.numeric(data[1:(nrow(data)-1),"close"])) - 1
  intradayRtn <- (as.numeric(data[,"close"])/as.numeric(data[,"open"]))-1
  overnightRtn <- (as.numeric(data[2:nrow(data),"open"])/as.numeric(data[1:(nrow(data)-1),"close"])) - 1

  results <- rbind(results,cbind(
    paste(round(100 * sum(dailyRtn,na.rm=TRUE),1),"%",sep=""),
    paste(round(100 * sum(intradayRtn,na.rm=TRUE),1),"%",sep=""),
    paste(round(100 * sum(overnightRtn,na.rm=TRUE),1),"%",sep="")))
} 
colnames(results) <- c("dailyRtn","intradayRtn","overnightRtn")
rownames(results) <- symbolList

As usual any comments welcome

Introduction to R for Quantitative Finance – Book Review

I used some spare time I had over the christmas break to review a book I came across: Introduction to R for Quantitative Finance. An introduction to the book by the authors can be found here.

Introduction to R for Quantitative Finance - cover picture

The book targets folks with some finance knowledge but no or little experience with R. Each chapter is organised around a quant finance topic. Step by step, financial models are built with the associated R code allowing the reader to fully understand the transition from theory to implementation. It also includes some real life examples. The following concepts are covered:

Chap 1: Time Series Analysis

Chap 2: Portfolio Optimisation

Chap 3: Asset Pricing Model

Chap 4: Fixed Income Securities

Chap 5: Estimating the Term Structure of Interest Rates

Chap 6: Derivatives pricing

Chap 7: Credit Risk Management

Chap 8: Extreme Value Theory

Chap 9: Financial Networks

As an experimented R user, I didn’t expect to learn much but I was wrong. I didn’t know about the GUIDE package: a GUI for derivatives pricing, the evir package which gathers functions for extreme value theory and I also learned a few programming tricks.

All in all, this is an excellent book for anyone keen on learning R in a quantitative finance framework. I think it would have benefited from a formal introduction to R and a data Export/Import capabilities review but both topics are extensively covered in many other R resources.

As usual, any comments welcome

Financial Data Accessible from R – part IV

DataMarket is the latest data source of financial data accessible from R I came across. A good tutorial can be found here. I updated the table and the descriptions below.

Source R Package Free Access Available on CRAN Provider url
Yahoo, FRED, Oanda, Google Quantmod Yes Yes Quantmod
Quandl Quandl Yes Yes Quandl
TrueFX TFX Yes Yes TrueFX
Bloomberg Rbbg No No findata
Interactive Broker IBrokers No Yes InteractiveBrokers
Datastream rdatastream No No Datastream
Penn World Table pwt Yes Yes Penn World Table
Yahoo, FRED, Oanda fImport Yes Yes Rmetrics
ThinkNum Thinknum Yes Yes ThinkNum
DataMarket rdatamarket Yes Yes DataMarket

Data Description

  • Yahoo: Free stock quotes, up to date news, portfolio management resources, international market data, message boards, and mortgage rates that help you manage your financial life
  • FRED: Download, graph, and track 149,000 economic time series from 59 sources
  • Oanda: Currency information, tools, and resources for investors, businesses, and travelers
  • Google: Stock market quotes, news, currency conversions & more
  • Quandl: Futures prices, daily. Quandl is a search engine for numerical data. The site offers access to several million financial, economic and social datasets
  • TrueFX: Tick-By-Tick Real-Time And Historical Market Rates, Clean, Aggregated, Dealer Prices
  • Bloomberg: Financial news, business news, economic news, stock quotes, markets quotes, finance stocks, financial markets, stock futures, personal finance, personal finance advice, mutual funds, financial calculators, world business, small business, financial trends, forex trading, technology news, bloomberg financial news
  • Interactive Broker: Interactive Brokers Group, Inc. is an online discount brokerage firm in the United States
  • Datastream: Datastream Professional is a powerful tool that integrates economic research and strategy with cross asset analysis to seamlessly bring together top down and bottom up in one single, integrated application
  • pwt: The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries/territories for some or all of the years 1950-2010
  • Thinknum: Thinknum brings financial data from a variety of useful sources together on one platform. We use this data to develop applications
  • DataMarket: DataMarket brings complex and diverse data together so you can search, visualize and share data in one place and one format

Package Detail

  • Quantmod: Specify, build, trade, and analyse quantitative financial trading strategies
  • Quandl: This package interacts directly with the Quandl API to offer data in a number of formats usable in R, as well as the ability to upload and search
  • TFX: Connects R to TrueFX(tm) for free streaming real-time and historical tick-by-tick market data for dealable interbank foreign exchange rates with millisecond detail
  • Rbbg: Handles fetching data from the Bloomberg financial data application
  • IBrokers: Provides native R access to Interactive Brokers Trader Workstation API
  • rdatastream: RDatastream is a R interface to the Thomson Dataworks Entreprise SOAP API (non free), with some convenience functions for retrieving Datastream data specifically. This package requires valid credentials for this API
  • pwt: The Penn World Table provides purchasing power parity and national income accounts converted to international prices for 189 countries/territories for some or all of the years 1950-2010
  • fImport: Rmetrics is the premier open source software solution for teaching and training quantitative finance. fImport is the package for Economic and Financial Data Import
  • Thinknum: This package interacts directly with the Thinknum API to offer data in a number of formats usable in R
  • rdatamarket: Fetches data from DataMarket.com, either as timeseries in zoo form (dmseries) or as long-form data frames (dmlist). Metadata including dimension structure is fetched with dminfo, or just the dimensions with dmdims.

 

Evaluating Quandl Data Quality – part II

This post is a more in depth analysis of Quandl futures data vs. Bloomberg data. Since my last post Quandl has updated its futures database to 200+ contracts from 68 contracts originally. For practical reasons, I limit myself here to the initial list of 60+ contracts. I’m still comparing the “Front Month” contract between the two sources. When evaluating the differences, I want the following:

  • Evaluate the scale of the differences
  • Evaluate the time localization of the differences (if any)
  • A single number that captures both features above
  • A measure that is comparable across instruments

After a bit of thinking, I came up with the below metric:

 D_t  =  {P(Quandl)_t  -  P(Bloomberg)_t }/ {Tick Size}

As an example, below is the chart of the above formula over time for the E-mini S&P 500 contract.

ES1

I plotted the same chart for each of the 60 contracts in the list of my previous post. Interested readers can find all the charts here.

From my perspective there are essentially two main sources of differences. First, plain wrong data points largely off compared to the reality and second a difference in the data building process (i.e. construction methodology for the front month contract). A mix of both is very likely to happen here.  In order to quantify this, I defined one additional metric: Mean Absolute Differences (MAD).

 MAD=sum{t=1}{n}{Abs(D_t)}/n for D_t <> 0″ title=”MAD=sum{t=1}{n}{Abs(D_t)}/n for D_t <> 0″/></p>
<p style=

Instrument Quandl Symbol Bloomberg Ticker MAD
Soybean Oil OFDP/FUTURE_BO1 BO1 Comdty 12254897
Russian Ruble OFDP/FUTURE_RU1 RU1 Curncy 29653
DJ-UBS Commodity Index OFDP/FUTURE_AW1 DNA Index 3041
S&P500 Volatility Index OFDP/FUTURE_VX1 UX1 Index 2453
Cocoa OFDP/FUTURE_CC1 CC1 Comdty 1552
Lean Hogs OFDP/FUTURE_LN1 LH1 Comdty 391

Ranking the 60+ contracts on MAD allows to identify immediately large differences which are: Soybean Oil, Russian Ruble, DJ-UBS Commodity Index, S&P500 Volatility Index, Cocoa, and Lean Hogs. Those are the obvious candidates for immediate checking.

I put together what I think is the basis for a systematic data checking approach. It can obviously be refined in many ways but those refinements are largely dependent upon what one want to do with the data and which contracts are relevant to the analyst. As an example I assume that it is more relevant for most people to have accurate data for the E-mini S&P 500 contract than for the Milk contract.

As usual any comments welcome