I am baffled by something of a statistical nature and so would aprreciate help from mazza or any other stats guru, thanks
suppose I have 2000 lines of data in excel, each line being the daily return/change for some security. I want to put a figure on the volatility based on daily returns over this whole period. I have done that a couple of differnt ways and I am baffled by why i am getting different results
one easy method is to take the SD for the whole sample (or should that be population) using STDEV.P or STDEV.S of lines 1 to 2000, and convert that to HV. simple enough.
another possible method is to break it into 20 day lots, work out HV20 for each 20 day period, and then average the HV of those 100 periods. This gives 'the average HV20 over the last 2000 days'. This always gives a lower figure than method one.
as an experiment i tried a middle way, breaking it into 4 differnt 500 line lots, taking the 4 resulting HV of each and averaging them. this results in a figure somewhere between the other two.
the differences are significant enough to matter eg one security gives HV of 21.3%/19.8%/17.9% for methods 1/3/2
a currency pair gives 13.07%/12.1%/10.56%
for the life of me I cannot see why this should be so, since they are measuring the distribution of exactly the same daily returns. I have eliminated as a cause the following;
- it isnt that shorter samples cause the average to vary from zero more in a strong trend, meaning each deviation is on average less; this does happen but eliminating it didnt make much difference
- it isnt an issue with using population or sample stdev; again it doesnt make much difference which one is used over 2000 lines and for the shorter timeframes i am using sample anyway
- i thought it might be the outlier days having a bigger effect on the whole sample v being lost in a couple out of 100 smaller samples, but eliminating the outlier days made no difference
- using log returns or straight returns does not make the differnce
Now if I try the same experiment using random numbers in place of daily returns, the observed effect does not happen; all methods give pretty much the same answer, which is what i expected to happen with stocks but it didnt. I therefore suspect it must be something to do with the normalish distribution characterising security returns, but i cant figure out exactly what.
Hoping someone can shed any light on this for me, thanks