Turtle Trading Rules
Chapter 24 The Statistical Basis of History Testing
Chapter 24 The Statistical Basis of History Testing (2)
稳健指标测试期:1996年1月~2006年6月测试期:1996年2月~2006年4月变化幅度RAR 54.7%54.9%0.4%R立方3.313.639.7%稳健夏普比率1.581.61.3%很明显,稳健指标的敏感度要低于现有的指标。R立方指标虽然对首尾两个大衰落的去除也很敏感,但敏感度要低于MAR比率。单次衰落的影响被R立方指标的平均化处理缓和了。所有的稳健指标受数据变化的影响都要小于相应的普通指标。假如新的测试没有改变最大衰落,R立方将与RAR一样只有0.4%的变化,而且与普通指标的差距将更富戏剧性,因为MAR的变动幅度将达到5.2%(等于它的分子CAGR的变化幅度),远远大于RAR的0.4%。
我们从的6种基本交易系统的业绩对比中也能看出稳健指标的优越性。请回忆一下,当我们添加了2006年7~11月这5个月的数据后,所有6种系统的表现都显著下降。但从表12–2和表12–3中可以看出,在这最后几个月的相对不利条件下,稳健指标的状况要比普通指标好得多。表12–2对比了这些系统的RAR和CAGR变化情况。
截至2006年6月截至2006年11月变动幅度截至2006年6月截至2006年11月变动幅度ATR通道突破52.4%48.7%–7.0%54.7%55.0%0.5%布林格通道突破40.7%36.7%–9.8%40.4%40.7%0.6%唐奇安趋势27.2%25.8%–5.2%28.0%26.7%–4.6%唐奇安定时47.2%4%–0.4%45.4%44.8%–1.4%双重移动均线50.3%42.4%–15.7%55.0%53.6%–2.6%三重移动均线41.6%36.0%–13.5%41.3%40.8%–1.2%平均变动幅度–8.6%–1.4%在这段时间内,RAR的变动幅度不到CAGR的1/6。这表明RAR指标比CAGR要稳健得多,也就是说,它在实际交易过程中会表现得更加稳定。R立方与MAR比起来同样如此。表12–3对比了这些系统的R立方和MAR变动幅度。
System MAR ratio R cubed
截至2006年6月截至2006年11月变动幅度截至2006年6月截至2006年11月变动幅度ATR通道突破1.351.25–7.4%3.723.67–1.4%布林格通道突破1.291.17–9.3%3.483.31–4.9%唐奇安趋势0.760.72–5.3%1.321.17–11.4%唐奇安定时1.171.17–0.0%2.152.09–2.8%双重移动均线1.290.77–40.3%4.693.96–15.6%三重移动均线1.320.86–34.9%3.272.87–12.2%平均变动幅度–16.2%–8.0%R立方在这段时间内的变动幅度大约是MAR比率的1/2。
Robust indicators are also less susceptible to the element of luck than non-robust indicators.For example, if a trader is lucky to avoid a big drawdown because he happened to be away on vacation, his MAR ratio may be higher than his peers, but this luck factor will be exposed on the R cubed, because a single event is very important for R Cubic doesn't have as much of an impact.If you are using volatile indicators, it is more likely that the desired results you are getting are due to good luck rather than a repetitive pattern of market behavior that can be exploited.This is another reason to use robust indicators.
Using robust metrics can also help you avoid the dangers of overfitting because they are less prone to large changes due to small changes in the data.Recall that when we discussed the phenomenon of overfitting, we experimented with a dual moving average system and added several rules to improve its performance.The new law, intended to reduce the degree of decline, increased the CAGR of the system from 41.4% to 45.7% (a gain of 10.3%) and the MAR ratio from 0.74 to 1.17 (a gain of 60%).In contrast, the robust rate of return indicator RAR only rose from 53.5% to 53.75%, an increase of only 0.4%; the robust risk-reward ratio indicator R cube rose from 3.29 to 3.86, an increase of only 17.3%.It can be seen that it is not easy for robust indicators to show large improvements due to the adjustment of a few transactions.Therefore, since curve fitting is often only beneficial for a small number of trades, if you use robust indicators, it is not easy for you to use curve fitting to significantly improve the performance of the system.
Let us now consider several other factors that have an impact on the predictive value of the history test.
Representativeness of the sample
How representative our sample trades and inspection results are for the future is determined by two factors:
Number of markets: The more markets we test, the more likely we are to include various states of the market.
Test time: Tests with longer time spans cover more market states and are more likely to include historical periods that are representative of the future.
I suggest you test all the data you can get.It doesn't cost much to buy data, but if you blindly trust a system without adequately testing it in multiple markets and over many years, you're at great risk.You don't feel stupid if your system fails the first time it encounters a certain market state, but that state has occurred three or four times in the past 20 years, and you just haven't tested it ?
Young traders are especially prone to this mistake.They believe that the state they see is representative of the overall state of the market, but they often do not realize that the market is cyclical and volatile, and often returns to states that have appeared in the past.As in life, young people often fail to see the value of history simply because it happened before they were born.It's good to be young, but don't be too stupid - do study history.
Remember, in the dot-com era, everyone was a short-term expert, everyone was a genius.But how many of these geniuses survived when the bubble suddenly burst and the once-successful methods no longer worked?If they had done a little testing, they would have known their methods were based on the particular market conditions of that golden age, and so would have dropped them when those conditions no longer existed.Perhaps, they will have a robust approach that works for all states from the start.
sample size
The concept of sample size is simple: you need a large enough sample to make valid statistical inferences.The smaller the sample, the coarser the inference; the larger the sample, the more accurate the inference.There is no magic standard number for this, the bigger the sample the better, the smaller the worse.A sample size of less than 20 can lead to severe bias; a sample size of more than 100 is more predictive; and a sample size of hundreds may be adequate for most tests.There are formulas and methods that explicitly specify the necessary sample size, but unfortunately, none of these formulas are designed for data in the trading world, where there is no fine and regular distribution curve of potential returns (as shown in Fig. 4–3 in the same female height distribution curve).
The real challenge, though, is not determining the necessary size of the sample, but the difficulty of judging inferences drawn from past data when you consider a law that doesn't always play out.Because for such a law, you can't get a large enough sample.Take, for example, the market behavior when a big bubble is on the verge of bursting. You can think of certain laws for this market state, and you can even test these laws, but you can't gather the large samples needed to make decisions.In this case, we must understand that our test results are not very convincing, because our sample is much smaller than necessary.This problem also exists in the analysis of seasonal trends mentioned above.
When you test a new law, you have to measure how often the law is applied.If a law only works 4 times over the entire test period, then you have no way of telling statistically whether the law is working or not, and the effects you see are most likely just random.There is a way around this: You can try to generalize the law so that it works more often.In doing so, the sample size increases and the statistical power of the test increases accordingly.
There are two common practices that may further amplify the problem of small sample size: one is single-market optimization, and the other is that the system design is too complex.
Single Market Optimization: Optimization methods applied individually to each market are more difficult to test with a sufficiently large sample because there are far fewer trading opportunities in a single market.
Overly complex system: A complex system has many laws, and it is sometimes difficult to judge how often or to what extent a certain law is in effect.Therefore, it is more difficult for us to have confidence in the convincingness of the test results if we test with an overly complex system.
For these reasons, I don't recommend optimizing for a single market, and I prefer the simple idea of statistical significance.
From Virtual Testing to Real Trading
How do you judge what kind of results you are likely to achieve in actual trading?This is perhaps one of the most interesting questions for a history test.
To get meaningful answers, you must understand the factors that affect system performance, the need to use robust metrics, and the importance of taking a sufficiently large representative sample.Once you've done that, you can start thinking about the potential impact of market shifts and why even good systems designed by experienced traders experience ups and downs in performance.It's a reality that you can't know, and it's impossible to foresee how a system will perform.At best, you can only use effective tools to judge the potential performance of the system, and the factors that affect this performance.
lucky system
If a system has performed particularly well in the recent period, it may be a matter of luck, and perhaps the market is in an ideal state for this system.Generally speaking, such a top-notch system can easily turn into bad times after good times, and it cannot be expected to repeat this performance of good luck in the future.It might happen, but you can't count on luck.You are more likely to experience a decline in performance.
(End of this chapter)
稳健指标测试期:1996年1月~2006年6月测试期:1996年2月~2006年4月变化幅度RAR 54.7%54.9%0.4%R立方3.313.639.7%稳健夏普比率1.581.61.3%很明显,稳健指标的敏感度要低于现有的指标。R立方指标虽然对首尾两个大衰落的去除也很敏感,但敏感度要低于MAR比率。单次衰落的影响被R立方指标的平均化处理缓和了。所有的稳健指标受数据变化的影响都要小于相应的普通指标。假如新的测试没有改变最大衰落,R立方将与RAR一样只有0.4%的变化,而且与普通指标的差距将更富戏剧性,因为MAR的变动幅度将达到5.2%(等于它的分子CAGR的变化幅度),远远大于RAR的0.4%。
我们从的6种基本交易系统的业绩对比中也能看出稳健指标的优越性。请回忆一下,当我们添加了2006年7~11月这5个月的数据后,所有6种系统的表现都显著下降。但从表12–2和表12–3中可以看出,在这最后几个月的相对不利条件下,稳健指标的状况要比普通指标好得多。表12–2对比了这些系统的RAR和CAGR变化情况。
截至2006年6月截至2006年11月变动幅度截至2006年6月截至2006年11月变动幅度ATR通道突破52.4%48.7%–7.0%54.7%55.0%0.5%布林格通道突破40.7%36.7%–9.8%40.4%40.7%0.6%唐奇安趋势27.2%25.8%–5.2%28.0%26.7%–4.6%唐奇安定时47.2%4%–0.4%45.4%44.8%–1.4%双重移动均线50.3%42.4%–15.7%55.0%53.6%–2.6%三重移动均线41.6%36.0%–13.5%41.3%40.8%–1.2%平均变动幅度–8.6%–1.4%在这段时间内,RAR的变动幅度不到CAGR的1/6。这表明RAR指标比CAGR要稳健得多,也就是说,它在实际交易过程中会表现得更加稳定。R立方与MAR比起来同样如此。表12–3对比了这些系统的R立方和MAR变动幅度。
System MAR ratio R cubed
截至2006年6月截至2006年11月变动幅度截至2006年6月截至2006年11月变动幅度ATR通道突破1.351.25–7.4%3.723.67–1.4%布林格通道突破1.291.17–9.3%3.483.31–4.9%唐奇安趋势0.760.72–5.3%1.321.17–11.4%唐奇安定时1.171.17–0.0%2.152.09–2.8%双重移动均线1.290.77–40.3%4.693.96–15.6%三重移动均线1.320.86–34.9%3.272.87–12.2%平均变动幅度–16.2%–8.0%R立方在这段时间内的变动幅度大约是MAR比率的1/2。
Robust indicators are also less susceptible to the element of luck than non-robust indicators.For example, if a trader is lucky to avoid a big drawdown because he happened to be away on vacation, his MAR ratio may be higher than his peers, but this luck factor will be exposed on the R cubed, because a single event is very important for R Cubic doesn't have as much of an impact.If you are using volatile indicators, it is more likely that the desired results you are getting are due to good luck rather than a repetitive pattern of market behavior that can be exploited.This is another reason to use robust indicators.
Using robust metrics can also help you avoid the dangers of overfitting because they are less prone to large changes due to small changes in the data.Recall that when we discussed the phenomenon of overfitting, we experimented with a dual moving average system and added several rules to improve its performance.The new law, intended to reduce the degree of decline, increased the CAGR of the system from 41.4% to 45.7% (a gain of 10.3%) and the MAR ratio from 0.74 to 1.17 (a gain of 60%).In contrast, the robust rate of return indicator RAR only rose from 53.5% to 53.75%, an increase of only 0.4%; the robust risk-reward ratio indicator R cube rose from 3.29 to 3.86, an increase of only 17.3%.It can be seen that it is not easy for robust indicators to show large improvements due to the adjustment of a few transactions.Therefore, since curve fitting is often only beneficial for a small number of trades, if you use robust indicators, it is not easy for you to use curve fitting to significantly improve the performance of the system.
Let us now consider several other factors that have an impact on the predictive value of the history test.
Representativeness of the sample
How representative our sample trades and inspection results are for the future is determined by two factors:
Number of markets: The more markets we test, the more likely we are to include various states of the market.
Test time: Tests with longer time spans cover more market states and are more likely to include historical periods that are representative of the future.
I suggest you test all the data you can get.It doesn't cost much to buy data, but if you blindly trust a system without adequately testing it in multiple markets and over many years, you're at great risk.You don't feel stupid if your system fails the first time it encounters a certain market state, but that state has occurred three or four times in the past 20 years, and you just haven't tested it ?
Young traders are especially prone to this mistake.They believe that the state they see is representative of the overall state of the market, but they often do not realize that the market is cyclical and volatile, and often returns to states that have appeared in the past.As in life, young people often fail to see the value of history simply because it happened before they were born.It's good to be young, but don't be too stupid - do study history.
Remember, in the dot-com era, everyone was a short-term expert, everyone was a genius.But how many of these geniuses survived when the bubble suddenly burst and the once-successful methods no longer worked?If they had done a little testing, they would have known their methods were based on the particular market conditions of that golden age, and so would have dropped them when those conditions no longer existed.Perhaps, they will have a robust approach that works for all states from the start.
sample size
The concept of sample size is simple: you need a large enough sample to make valid statistical inferences.The smaller the sample, the coarser the inference; the larger the sample, the more accurate the inference.There is no magic standard number for this, the bigger the sample the better, the smaller the worse.A sample size of less than 20 can lead to severe bias; a sample size of more than 100 is more predictive; and a sample size of hundreds may be adequate for most tests.There are formulas and methods that explicitly specify the necessary sample size, but unfortunately, none of these formulas are designed for data in the trading world, where there is no fine and regular distribution curve of potential returns (as shown in Fig. 4–3 in the same female height distribution curve).
The real challenge, though, is not determining the necessary size of the sample, but the difficulty of judging inferences drawn from past data when you consider a law that doesn't always play out.Because for such a law, you can't get a large enough sample.Take, for example, the market behavior when a big bubble is on the verge of bursting. You can think of certain laws for this market state, and you can even test these laws, but you can't gather the large samples needed to make decisions.In this case, we must understand that our test results are not very convincing, because our sample is much smaller than necessary.This problem also exists in the analysis of seasonal trends mentioned above.
When you test a new law, you have to measure how often the law is applied.If a law only works 4 times over the entire test period, then you have no way of telling statistically whether the law is working or not, and the effects you see are most likely just random.There is a way around this: You can try to generalize the law so that it works more often.In doing so, the sample size increases and the statistical power of the test increases accordingly.
There are two common practices that may further amplify the problem of small sample size: one is single-market optimization, and the other is that the system design is too complex.
Single Market Optimization: Optimization methods applied individually to each market are more difficult to test with a sufficiently large sample because there are far fewer trading opportunities in a single market.
Overly complex system: A complex system has many laws, and it is sometimes difficult to judge how often or to what extent a certain law is in effect.Therefore, it is more difficult for us to have confidence in the convincingness of the test results if we test with an overly complex system.
For these reasons, I don't recommend optimizing for a single market, and I prefer the simple idea of statistical significance.
From Virtual Testing to Real Trading
How do you judge what kind of results you are likely to achieve in actual trading?This is perhaps one of the most interesting questions for a history test.
To get meaningful answers, you must understand the factors that affect system performance, the need to use robust metrics, and the importance of taking a sufficiently large representative sample.Once you've done that, you can start thinking about the potential impact of market shifts and why even good systems designed by experienced traders experience ups and downs in performance.It's a reality that you can't know, and it's impossible to foresee how a system will perform.At best, you can only use effective tools to judge the potential performance of the system, and the factors that affect this performance.
lucky system
If a system has performed particularly well in the recent period, it may be a matter of luck, and perhaps the market is in an ideal state for this system.Generally speaking, such a top-notch system can easily turn into bad times after good times, and it cannot be expected to repeat this performance of good luck in the future.It might happen, but you can't count on luck.You are more likely to experience a decline in performance.
(End of this chapter)
You'll Also Like
-
Pokémon: I start as a civilian and awaken the system
Chapter 401 15 hours ago -
Is the mecha just a limiter? Myo-lock, open!
Chapter 213 15 hours ago -
Honghuang: People in Jiejiao, picking up entries to prove Hunyuan
Chapter 267 15 hours ago -
Elf Entry: Starting from the Cultivator
Chapter 120 15 hours ago -
After binding with the rich school beauty, I became a martial god by lying flat
Chapter 168 15 hours ago -
One person controls one prison. After entering the world, I am invincible.
Chapter 2568 1 days ago -
I stack buffs in a weird world!
Chapter 622 1 days ago -
You, a druid, go to practice Taoism?
Chapter 206 2 days ago -
The magician of the fairy tale world
Chapter 183 2 days ago -
What if I become a beast?
Chapter 567 2 days ago