NXS1.0_TAIEX_Future Engineering

NXS1.0_TAIEX_Future Engineering
Photo by ThisisEngineering / Unsplash
  • 掛載 Google Drive:確保數據集已經掛載。
  • 讀取數據集:從指定路徑讀取數據集。
  • 生成技術指標:計算移動平均線、相對強弱指數和布林帶。
  • 生成衍生變數:計算價格變動百分比和成交量變化率。
  • 標準化特徵:使用 z-score 標準化特徵。
  • 處理空值:使用前向填充和後向填充的方法填補空值。
  • 保存處理後的數據:將處理後的數據保存到新的 CSV 文件中。
  • 原始特徵:Date, ST_Code, ST_Name, Open, High, Low, Close, Adj_Close, Volume 等。
  • 技術指標:SMA_10, SMA_20, RSI_14, BBL_20_2.0, BBM_20_2.0, BBU_20_2.0 等。
  • 衍生變數:Price_Change_Percent, Volume_Change_Percent。
  • 標準化特徵:zscore_Adj_Close_TAIEX, zscore_SMA_10, zscore_SMA_20, zscore_RSI_14, zscore_Price_Change_Percent, zscore_Volume_Change_Percent。
# 安裝 pandas_ta
!pip install pandas_ta

import pandas as pd
import pandas_ta as ta
from google.colab import drive
from scipy.stats import zscore

# 連接 Google Drive
drive.mount('/content/drive', force_remount=True)

# 讀取數據
file_path = '/content/drive/My Drive/stock_msci20/processed_stock_data_final.csv'
df = pd.read_csv(file_path)

# 確認數據集的列名
print(df.columns)

# 創建技術指標
# 移動平均線(MA)
df['SMA_10'] = ta.sma(df['Adj_Close_TAIEX'], length=10)
df['SMA_20'] = ta.sma(df['Adj_Close_TAIEX'], length=20)

# 相對強弱指數(RSI)
df['RSI_14'] = ta.rsi(df['Adj_Close_TAIEX'], length=14)

# 布林帶(Bollinger Bands)
bbands = ta.bbands(df['Adj_Close_TAIEX'], length=20, std=2)
df = pd.concat([df, bbands], axis=1)

# 生成衍生變數
# 價格變動百分比
df['Price_Change_Percent'] = df['Adj_Close_TAIEX'].pct_change() * 100

# 成交量變化率
df['Volume_Change_Percent'] = df['Volume'].pct_change() * 100

# 處理空值 - 使用前向填充和後向填充方法填補空值
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)

# 使用 z-score 標準化特徵
features_to_standardize = ['Adj_Close_TAIEX', 'SMA_10', 'SMA_20', 'RSI_14', 'Price_Change_Percent', 'Volume_Change_Percent']
df[features_to_standardize] = df[features_to_standardize].apply(zscore)

# 處理後的數據寫回到新的 CSV 文件
output_file_path = '/content/drive/My Drive/stock_msci20/processed_stock_data_with_features.csv'
df.to_csv(output_file_path, index=False)

# 顯示處理後的數據前幾行
df.head()

Index(['Date', 'ST_Code', 'ST_Name', 'Open', 'High', 'Low', 'Close',
       'Adj_Close', 'Volume', 'MA7', 'MA21', 'MA50', 'MA100', 'Middle_Band',
       'Upper_Band', 'Lower_Band', 'Band_Width', 'Aroon_Up', 'Aroon_Down',
       'CCI20', 'CMO14', 'MACD_Line', 'Signal_Line', 'MACD_Histogram', 'RSI7',
       'RSI14', 'RSI21', '%K', '%D', 'WILLR14', 'OBV', 'Market_Return',
       'Stock_Return', 'Beta_60', 'Beta_120', 'Close_TAIEX',
       'Adj_Close_TAIEX'],
      dtype='object')
Date	ST_Code	ST_Name	Open	High	Low	Close	Adj_Close	Volume	MA7	...	SMA_10	SMA_20	RSI_14	BBL_20_2.0	BBM_20_2.0	BBU_20_2.0	BBB_20_2.0	BBP_20_2.0	Price_Change_Percent	Volume_Change_Percent
0	2020/01/02	2317.TW	Hon Hai	91.000000	91.500000	90.300003	90.800003	0.695115	0.166846	0.655992	...	-1.473189	-1.561708	-2.222983	11367.932405	11871.706445	12375.480486	8.486969	0.204594	0.029910	0.083162
1	2020/01/03	2317.TW	Hon Hai	91.400002	92.199997	90.800003	91.599998	0.716532	1.247239	0.655992	...	-1.473189	-1.561708	-2.222983	11367.932405	11871.706445	12375.480486	8.486969	0.204594	0.029910	0.083162
2	2020/01/06	2317.TW	Hon Hai	91.099998	91.099998	90.099998	90.500000	0.687084	0.518659	0.655992	...	-1.473189	-1.561708	-2.222983	11367.932405	11871.706445	12375.480486	8.486969	0.204594	-0.714582	-0.006107
3	2020/01/07	2317.TW	Hon Hai	90.500000	91.000000	88.300003	89.099998	0.649603	1.548578	0.655992	...	-1.473189	-1.561708	-2.222983	11367.932405	11871.706445	12375.480486	8.486969	0.204594	-0.344315	0.026390
4	2020/01/08	2317.TW	Hon Hai	87.900002	88.099998	86.500000	86.500000	0.579997	2.389650	0.655992	...	-1.473189	-1.561708	-2.222983	11367.932405	11871.706445	12375.480486	8.486969	0.204594	-0.301727	0.008148
5 rows × 47 columns