2021. 8. 18. 22:26ㆍMachine Learning
본 글은 Marcos Lopez de Prado의 'Advances in Financial Machine Learning(2018)'의 5장 'Fractionally Differenced Features'의 내용을 발췌해서 정리한 것임.
1. Literature Review
Virtually all the financial time series literature is based on the premise of making non-stationary series stationary through "integer transformation."
실제로 논문을 읽어봐도 대부분 거리낌 없이 1계 내지는 2계 차분을 통해 stationarity를 회복한 후 분석을 진행한다. 따라서 다음의 질문들이 떠오르는 것은 당연하다.
This raises two questions:
(1) Why would integer 1 differentiation be optimal?
(2) Is over-differentiation one reason why the literature has been so biased in favor of the efficient market hypothesis?
In 1981, Hosking...
A family of ARIMA processes was generalized by permitting the degree of differencing to take fractional values.
...
"Apart from a passing reference by Granger(1978), fractional differencing does not appear to have been previously mentioned in connection with time series analysis."
2. The Method
$ B $ is the backshift operator, such that $B^k X_t = X_{t-k}$ for any integer $k \geq 0 $. For example, $(1-B)^2 X_t = X_t - 2 X_{t-1} + X_{t-2}$.
Noting that for a real number $d$, $(1+x)^d = \sum^{\infty}_{k=0} \begin{pmatrix} d \\ k \end{pmatrix} x^k$. Thus,
$$
\begin{align}
(1-B)^d = \sum^{\infty}_{k=0} \begin{pmatrix} d \\ k \end{pmatrix} (-B)^k &= \sum^{\infty}_{k=0} \frac{\prod^{k-1}_{i=0}(d-i)}{k!} (-B)^k \\
&= \sum^{\infty}_{k=0} (-B)^k \prod^{k-1}_{i=0}\frac{d-i}{k-i} \\
&= 1-dB + \frac{d(d-1)}{2!}B^2 - \frac{d(d-1)(d-2)}{3!}B^3 + \cdots
\end{align}
$$
2. 1 Long Memory
Let us see how a real(non-integer) positive d preserves memory.
$$ \tilde{X_t} = \sum^{\infty}_{k=0} \omega_k X_{t-k} $$
with weights $\omega$
$$ \omega = \left \{1, -d, \frac{d(d-1)}{2!}, -\frac{d(d-1)(d-2)}{3!}, \cdots, (-1)^k \prod^{k-1}_{i=0} \frac{d-i}{k!}, \cdots \right \} $$
and values $X$
$$X = \left \{ X_t, X_{t-1}, X_{t-2}, \cdots, X_{t-k}, \cdots \right \}$$
2. 2 Iterative Estimation
With $\omega_0 = 1$, the weights can be generated iteratively as follows.
$$ w_k = -w_{k-1} \frac{d-k+1}{k}$$
Suppose that $d=0.5$. Then,
\begin{align}
\omega_0 &= 1 \\
\omega_1 &= -\omega_0 \frac{0.5-1+1}{1} = -0.5 \\
\omega_2 &= -\omega_1 \frac{0.5-2+1}{2} = -0.125 \\
\omega_3 &= -\omega_2 \frac{0.5-3+1}{3} = -0.0625 \\
\end{align}
Code Snippets
#------------------------------------------------------------------
def getWeights(d, size):
# threshold > 0 drops insignificant weights
w = [1]
for k in range(1, size):
w_ = -w[-1] / k*(d-k+1)
w.append(w_)
w = np.array(w[::-1]).reshape(-1, 1)
return w
#------------------------------------------------------------------
def plotWeights(dRange, nPlots, size):
w = pd.DataFrame()
for d in np.linspace(dRange[0], dRange[1], nPlots):
w_ = getWeights(d, size=size)
w_ = pd.DataFrame(w_, index=range(w_.shape[0])[::-1], columns=[round(d, 3)])
w = w.join(w_, how='outer')
ax = w.plot(figsize=(16, 10))
ax.legend(loc='best')
plt.show()
return
#------------------------------------------------------------------
if __name__=='__main__':
plotWeights(dRange=[0, 1], nPlots=11, size=6)
plotWeights(dRange=[1, 2], nPlots=11, size=6)
2. 3 Convergence
Let us consider the convergence of the weights. From the above result, we can find that for $k>d$, if $\omega_{k-1} \neq 0$, then $|\frac{w_k}{w_{k-1}}|$ = $|\frac{d-k+1}{k}|<1$, and $\omega_k=0$ otherwise.
즉, $k$가 $d$보다 클 경우, $\omega_{k-1}$이 0인 경우를 제외하고 $\omega_k$는 항상 0이다.
Consequently, the weights converge asymptotically to zero, as infinite product of factors within the unit circle. Also, for a positive $d$ and $k<d+1$, we have $\frac{d-k+1}{k}>1$, which makes the initial weights alternate in sign. For a non-integer $d$, once $k \geq d+1$, $\omega_k$ will be negative if int[$d$] is even, and positive otherwise.
요약하면, [$d$]가 짝수이면 $\lim_{k \rightarrow \infty} \omega_k = 0^{-}$이고, [$d$]가 홀수이면 $\lim_{k \rightarrow \infty} \omega_k = 0^{+}$이 성립한다. 따라서 $d$의 범위가 $(0, 1)$일 경우, 모든 양의 $k$에 대하여 $-1<\omega_k<0$이 성립한다. 이러한 부호의 교차는 $\tilde{X_t}$를 stationary하게 만들기 위한 필요조건이다.
plotWeights(dRange=[1,5], nPlots=5, size=6)
3. Conclusion
Most econometric analyses follow one of two paradigms:
1. Box-Jenkins: Returns are stationary, however memoryless.
2. Engle-Granger: Log-prices have memory, however they are non-stationary. Cointegration is the trick that makes regression work on non-stationary series, so that memory is preserved. However the number of cointegrated variables is limited, and the cointegrating vectors are notoriously unstable.
In contrast, the FFD approach shows that there is no need to give up all of the memory in order to gain stationarity. And there is no need for the cointegration trick as it relates to ML forecasting. In practice, the author suggests that we experiment with the follwing transformation of our features:
First, compute a cumulative sum of the time series. This guarantees that some order of differentiation is needed.
Second, compute the FFD($d$) series for various $d \in [0, 1]$.
Third, determine the minimum $d$ such that the p-values of the ADF statistic on FFD($d$) falls below 5%.
Fourth, use the FFD($d$) series as our predictive feature.
def fracDiff_FFD(series, d, thres = 0.0001):
"""
Constant width window
Note 1: thres determines the cut-off weights for the window.
Note 2: d can be any positive fractional, not necessarily bounded [0, 1].
"""
# (1) Compute weights for the longest series
w = getWeights_FFD(d, thres)
width = len(w)-1
# (2) Apply weights to values
df = {}
for name in series.columns:
seriesF = series[[name]].fillna(method = 'ffill').dropna()
df_ = pd.Series()
for iloc1 in range(width, seriesF.shape[0]):
loc0 = seriesF.index[iloc1 - width]
loc1 = seriesF.index[iloc1]
if not np.isfinite(series.loc[loc1, name]):
continue # exclude NAs
df_[loc1] = np.dot(w.T, seriesF.loc[loc0:loc1])[0, 0]
df[name] = df_.copy(deep = True)
df = pd.concat(df, axis = 1)
return df
def plotMinFFD():
path, instName = './', 'ES1_Index_Method12'
out = pd.DataFrame(columns=['ADF Stat', 'pvalue', 'lags', 'nObs', '95% conf', 'corr'])
df0 = close[['Close']]
for col in df0.columns:
for d in np.linspace(0, 1, 11):
# df1 = np.log(df0[[col]]).resample('1d').last().dropna()
df1 = np.log(df0[[col]]).dropna()
df2 = fracDiff_FFD(df1, d, thres=0.01)
corr = np.corrcoef(df1.loc[df2.index, col], df2[col])[0, 1]
result = adfuller(df2[col], maxlag=1, regression='c', autolag=None)
out.loc[d] = list(result[:4]) + [result[4]['5%']] + [corr] # with critical value
out.to_csv(path + instName + '_testMinFFD.csv')
out[['ADF Stat', 'corr']].plot(figsize=(16, 6), secondary_y='ADF Stat', fontsize=15)
plt.axhline(out['95% conf'].mean(), linewidth=1, color='r', linestyle='dotted')
plt.savefig(path + instName + '_testMinFFD.png')
아래는 삼성전자의 주가를 예로 테스트한 결과이다. $d=0.4$부근에서 ADF Test를 통과하고, 이 때의 상관관계는 약 0.97이다.
'Machine Learning' 카테고리의 다른 글
Adative Linear Neuron(적응형 선형 뉴런; Adaline) (0) | 2021.01.02 |
---|---|
Perceptron(퍼셉트론) (0) | 2021.01.02 |