Australian (ASX) Stock Market Forum

Reply to thread

Greetings all --


Thanks for the kind words and civilized response -- my views are upsetting to some people.


----


On the question of how to select the in-sample data and how to select the out-of-sample data.


If you are working with a single issue, say a major stock index or a specific commodity, then all of the in-sample data will be coming from one ticker and the question becomes "what periods of time should be used?"  The short answer is to use whatever periods of time give good results.  My feeling is that every market we model is dynamic -- non-stationary.  The characteristics of the market change over time.  The most obvious changes are easily measured -- the slope of a moving average, the average true range, the number of overnight gaps, as so forth.  Any single version of the trading system we build to model that market is static for the entire period we use it.  As long as the underlying market does not "move" too much, the model continues to accurately represent it, and the buy and sell signals continue to be profitable.  Eventually the characteristics of the market change and the model is out-of-synch with the market.  Maybe the market will return to its earlier state, but usually it will not, and a new model is needed, so we must reoptimize.  That is, we must perform the next walk-forward iteration. 


The length of time that the model and the market stay in synch determines the maximum length of the out-of-sample period.  While there is no theoretical need that all out-of-sample periods be the same length, it is common for them to be.  The algorithms to perform the walk-forward testing automatically are easiest to implement when reoptimization is done after a set number of bars.  The most often reoptimization can take place is after every bar, and that is acceptable, although computationally intensive.  The least often reoptimization can take place is never, and that is acceptable as long as the model and the market stay sufficiently in synch that the trades are profitable. 


Which brings up another issue I'll address here briefly, but put off for a later posting.  "Will a trading system that once worked, but is now broken, ever return to profitability?"  My answer is that all trading system eventually break and that broken systems very seldom return to profitability.  Trading systems are unique as models go (in the sense of statistical models of physical processes), in that the act of modeling changes the process being modeled -- every profitable trade made removes some of the inefficiency that the model has identified.  Every profitable trade actually made (paper trades do not count) makes it less likely that the next trade will be profitable.  For example, the Donchian-style breakout systems that worked so well in the 1970s and 80s no longer work, and will probably never work again.


Back to selecting the length of the in-sample period.  There must be a large enough number of data points such that the model can detect some general feature of the market.  The relationship between the number of data points and the number of parameters in the model is very similar to the relationship between the number of data points and the degree of a polynomial being fit to them.  As the degree of the polynomial increases, the goodness of the fit increases, but the accuracy of the prediction of the next data point does not necessarily increase.  The model becomes curve-fit to the data.  In some models that is desirable -- a model of the rotation of two stars around a common center of gravity.  But in trading systems, there is a peak in the fitness to the in-sample data where the general features of the market have been identified and learned, but the noise has not yet been learned.  To deal with this, some model development platforms use three data periods -- the first is the in-sample period used to pick the best parameter values, the second is another in-sample period used to guide the modeling process and determine when the fit has found its peak and stop, and the third is the out-of-sample period used to validate the model.


What is the practical solution?  For an end-of-day trading system, try out some different lengths for the in-sample period -- say two years, one year, six months, three months.  The best length will depend on the complexity of the model and the stability of that portion of the underlying market that your model is trying to match.  What length for the out-of-sample period?  Any length shorter than the amount of time it usually takes for the market to shift away from the model.  If a model is going to be accurate and profitable, it should immediately be accurate and profitable.  So try out-of-sample periods of one month, one week, one day.


I can hear someone saying "But wait, trying all these combinations and then picking the lengths is using results from out-of-sample testing to determine the model."  And they are correct.  But only the length of the in-sample period is being chosen by optimization, so only four data points are being tested -- two years, one year, six months, three months.  But at that, there is still a contamination of the purity of the out-of-sampleness.  So be careful.


----


Which leads directly to one of the other questions posted -- can a trading system be developed using one ticker and validated using other tickers?  That is, can the in-sample data be one stock and the out-of-sample data be another stock?  Yes, but there is a caution here as well. 


First, my opinion is that there is no requirement that a trading system must work on every data series in order for it to be considered robust -- that will just never happen.  For whatever fundamental reasons prices vary, I have no reason to expect that the share prices of a gold exploration company, an automobile manufacturer, a bank, a food producer, and a soft drink company all act the same.  But I should see similarities within sectors -- if a model works well for one insurance company, it should work reasonably well for most other insurance companies, and maybe even for banks. 


I can use one company as in-sample data to build a model, then other similar companies as out-of-sample data to validate it. 


But avoid what I call "optimizing the ticker space."  It is not valid to build a trading model using the data series from one ticker, then test it over, say, 500 different tickers and trade only those 10 of the 500 that were profitable.  That is just curve-fitting.  The 10 are probably profitable because they got lucky.  If 400 were profitable, there is hope.


There is a technique that helps determine which tickers to trade.  Develop a model using one ticker over one in-sample period of time, but reserve two out-of-sample periods of time.  Test the model on the universe of say 500 tickers for the first out-of-sample period and count the proportion of the 500 that are profitable.  Make a separate list of those that are profitable, and test them over the second out-of-sample period.  If the proportion that are profitable in the second out-of-sample period is about the same as the proportion that were profitable in the first in-sample period, then the model is not robust.  If the proportion that is profitable in the second out-of-sample period is high, then the model is probably robust.


----


Let me clarify one point.  Throughout this posting I have been using the term "is profitable."  That is shorthand for saying "score a high value using the objective function."  All measurements of the merit of a trading system are made by computing the score for the objective function -- not just the profitability. 


----


This has become longer, and a little more theoretical, than I intended.  I hope it is useful.


Thanks for listening,

Howard

www.quantitativetradingsystems.com


Top