- 掛載 Google Drive:確保數據集已經掛載。
- 讀取數據集:從指定路徑讀取數據集。
- 生成技術指標:計算移動平均線、相對強弱指數和布林帶。
- 生成衍生變數:計算價格變動百分比和成交量變化率。
- 標準化特徵:使用 z-score 標準化特徵。
- 處理空值:使用前向填充和後向填充的方法填補空值。
- 保存處理後的數據:將處理後的數據保存到新的 CSV 文件中。
- 原始特徵:Date, ST_Code, ST_Name, Open, High, Low, Close, Adj_Close, Volume 等。
- 技術指標:SMA_10, SMA_20, RSI_14, BBL_20_2.0, BBM_20_2.0, BBU_20_2.0 等。
- 衍生變數:Price_Change_Percent, Volume_Change_Percent。
- 標準化特徵:zscore_Adj_Close_TAIEX, zscore_SMA_10, zscore_SMA_20, zscore_RSI_14, zscore_Price_Change_Percent, zscore_Volume_Change_Percent。
# 安裝 pandas_ta
!pip install pandas_ta
import pandas as pd
import pandas_ta as ta
from google.colab import drive
from scipy.stats import zscore
# 連接 Google Drive
drive.mount('/content/drive', force_remount=True)
# 讀取數據
file_path = '/content/drive/My Drive/stock_msci20/processed_stock_data_final.csv'
df = pd.read_csv(file_path)
# 確認數據集的列名
print(df.columns)
# 創建技術指標
# 移動平均線(MA)
df['SMA_10'] = ta.sma(df['Adj_Close_TAIEX'], length=10)
df['SMA_20'] = ta.sma(df['Adj_Close_TAIEX'], length=20)
# 相對強弱指數(RSI)
df['RSI_14'] = ta.rsi(df['Adj_Close_TAIEX'], length=14)
# 布林帶(Bollinger Bands)
bbands = ta.bbands(df['Adj_Close_TAIEX'], length=20, std=2)
df = pd.concat([df, bbands], axis=1)
# 生成衍生變數
# 價格變動百分比
df['Price_Change_Percent'] = df['Adj_Close_TAIEX'].pct_change() * 100
# 成交量變化率
df['Volume_Change_Percent'] = df['Volume'].pct_change() * 100
# 處理空值 - 使用前向填充和後向填充方法填補空值
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
# 使用 z-score 標準化特徵
features_to_standardize = ['Adj_Close_TAIEX', 'SMA_10', 'SMA_20', 'RSI_14', 'Price_Change_Percent', 'Volume_Change_Percent']
df[features_to_standardize] = df[features_to_standardize].apply(zscore)
# 處理後的數據寫回到新的 CSV 文件
output_file_path = '/content/drive/My Drive/stock_msci20/processed_stock_data_with_features.csv'
df.to_csv(output_file_path, index=False)
# 顯示處理後的數據前幾行
df.head()
Index(['Date', 'ST_Code', 'ST_Name', 'Open', 'High', 'Low', 'Close',
'Adj_Close', 'Volume', 'MA7', 'MA21', 'MA50', 'MA100', 'Middle_Band',
'Upper_Band', 'Lower_Band', 'Band_Width', 'Aroon_Up', 'Aroon_Down',
'CCI20', 'CMO14', 'MACD_Line', 'Signal_Line', 'MACD_Histogram', 'RSI7',
'RSI14', 'RSI21', '%K', '%D', 'WILLR14', 'OBV', 'Market_Return',
'Stock_Return', 'Beta_60', 'Beta_120', 'Close_TAIEX',
'Adj_Close_TAIEX'],
dtype='object')
Date ST_Code ST_Name Open High Low Close Adj_Close Volume MA7 ... SMA_10 SMA_20 RSI_14 BBL_20_2.0 BBM_20_2.0 BBU_20_2.0 BBB_20_2.0 BBP_20_2.0 Price_Change_Percent Volume_Change_Percent
0 2020/01/02 2317.TW Hon Hai 91.000000 91.500000 90.300003 90.800003 0.695115 0.166846 0.655992 ... -1.473189 -1.561708 -2.222983 11367.932405 11871.706445 12375.480486 8.486969 0.204594 0.029910 0.083162
1 2020/01/03 2317.TW Hon Hai 91.400002 92.199997 90.800003 91.599998 0.716532 1.247239 0.655992 ... -1.473189 -1.561708 -2.222983 11367.932405 11871.706445 12375.480486 8.486969 0.204594 0.029910 0.083162
2 2020/01/06 2317.TW Hon Hai 91.099998 91.099998 90.099998 90.500000 0.687084 0.518659 0.655992 ... -1.473189 -1.561708 -2.222983 11367.932405 11871.706445 12375.480486 8.486969 0.204594 -0.714582 -0.006107
3 2020/01/07 2317.TW Hon Hai 90.500000 91.000000 88.300003 89.099998 0.649603 1.548578 0.655992 ... -1.473189 -1.561708 -2.222983 11367.932405 11871.706445 12375.480486 8.486969 0.204594 -0.344315 0.026390
4 2020/01/08 2317.TW Hon Hai 87.900002 88.099998 86.500000 86.500000 0.579997 2.389650 0.655992 ... -1.473189 -1.561708 -2.222983 11367.932405 11871.706445 12375.480486 8.486969 0.204594 -0.301727 0.008148
5 rows × 47 columns