[프로젝트 3일차]

Notice

Recent Posts

Recent Comments

Link

« 2024/12 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Tags more

Archives

Today

Total

관리 메뉴

rubus0304 님의 블로그

[프로젝트 3일차] 본문

Data Analyst/daily

[프로젝트 3일차]

rubus0304 2024. 12. 3. 21:03

1. 전처리

2. 파생변수 (시간정보)

3. 파생변수 (공간정보)

train_df_9 (동별 대구 CCTV, 보안등, 어린이 보호구역, 주차장 갯수/ 동별 사망자수, 중상자수, 경상자수, ECLO/ 고속도로유무/ 도로유형)

test_df_9 (동별 대구 CCTV, 보안등, 어린이 보호구역, 주차장 갯수/ 동별 사망자수, 중상자수, 경상자수, ECLO/ 고속도로유무/ 도로유형)

4. 모델링

test_x_1 = test_df_9.drop(columns=['ID','군구','사고유형시']).copy()

train_x_1 = train_df_9[test_x_1.columns].copy()

train_y_1 = train_df_9['동사망자수'].copy()

train_y_2 = train_df_9['동중상자수'].copy()

train_y_3 = train_df_9['동경상자수'].copy()

train_y_4 = train_df_9['동부상자수'].copy()

train_y_5 = train_df_9['ECLO'].copy()

from sklearn.preprocessing import LabelEncoder

categorical_features = list(train_x_1.dtypes[train_x_1.dtypes == "object"].index)

# 추출된 문자열 변수 확인

for i in categorical_features:

le = LabelEncoder()

le=le.fit(train_x_1[i])

train_x_1[i]=le.transform(train_x_1[i])

for case in np.unique(test_x_1[i]):

if case not in le.classes_:

le.classes_ = np.append(le.classes_, case)

test_x_1[i]=le.transform(test_x_1[i])

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split

from xgboost import XGBRegressor

from sklearn.metrics import mean_squared_error

from sklearn.feature_selection import SelectFromModel

X = train_x_1

y = train_y_5

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X,np.log1p(y), test_size=0.2, random_state=42)

# Create an XGBoost Regressor

model = XGBRegressor(

max_depth=8,

learning_rate=0.01,

subsample=0.9,

colsample_bytree=0.9,

random_state=42,

min_child_weight=50,

objective='reg:squarederror',

eval_metric='rmse')

model.fit(X_train, y_train)

# Display feature importances

feature_importances = model.feature_importances_

feature_names = X.columns

feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance': feature_importances})

feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False)

sel_features = feature_importance_df[feature_importance_df['Importance']>0]['Feature']

train_x_1 = train_x_1[sel_features]

test_x_1 = test_x_1[sel_features]

train_x_1

5. CATBOOST - 적합 (인코딩 자동)

# CatBoost Regression Model

from catboost import CatBoostRegressor

# Initialize the CatBoostRegressor with RMSE as the loss function

model = CatBoostRegressor(loss_function='RMSE', iterations=5000, depth=9, l2_leaf_reg=3)

# Fit the model on the training data with verbose logging every 100 iterations

model.fit(X_train, y_train, verbose=100)

# Import the mean squared error (MSE) function from sklearn and alias it as 'mse'

from sklearn.metrics import mean_squared_error as mse

# Generate predictions on the training and validation sets using the trained 'model'

y_train_pred = model.predict(X_train)

y_test_pred = model.predict(X_test)

# Calculate and print the Root Mean Squared Error (RMSE) for training and validation sets

print("Training RMSE: ", np.sqrt(mse(y_train, y_train_pred)))

print("Validation RMSE: ", np.sqrt(mse(y_test, y_test_pred)))

'Data Analyst > daily' 카테고리의 다른 글

[프로젝트 5주차] (0)	2024.12.06
[프로젝트 4일차] (1)	2024.12.04
[프로젝트 2일차] (0)	2024.12.02
프로젝트 시작! (2)	2024.11.29
[코트카타 102] (0)	2024.11.27

'Data Analyst/daily' Related Articles

rubus0304 님의 블로그

[프로젝트 3일차] 본문

[프로젝트 3일차]

'Data Analyst > daily' 카테고리의 다른 글

티스토리툴바