LLouis0622's picture
Upload folder using huggingface_hub
5092c1e verified

์†Œ์Šค ์ฝ”๋“œ ์„ค๋ช…

ํŒŒ์ผ ๊ตฌ์กฐ

src/
โ”œโ”€โ”€ predictor.py              # ์˜ˆ์ธก ํด๋ž˜์Šค
โ”œโ”€โ”€ feature_engineering.py    # ํŠน์ง• ์ƒ์„ฑ
โ”œโ”€โ”€ train.py                  # ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ””โ”€โ”€ README.md                 # ์ด ํŒŒ์ผ

๊ฐ ํŒŒ์ผ ์„ค๋ช…

1. predictor.py - ์˜ˆ์ธก ํด๋ž˜์Šค

์šฉ๋„: ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋กœ๋“œํ•˜๊ณ  ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฉ”์ธ ํด๋ž˜์Šค

์ฃผ์š” ํด๋ž˜์Šค: EarlyWarningPredictor

์ฃผ์š” ๋ฉ”์„œ๋“œ:

# ๋ชจ๋ธ ๋กœ๋“œ (ํ—ˆ๊น…ํŽ˜์ด์Šค ์Šคํƒ€์ผ)
model = EarlyWarningPredictor.from_pretrained("models/")

# ๋‹จ์ผ ์˜ˆ์ธก
result = model.predict(store_data)

# ๋ฐฐ์น˜ ์˜ˆ์ธก
results = model.predict_batch(stores_df)

# ์˜ˆ์ธก ์„ค๋ช…
explanation = model.explain(store_data)

# ๋ชจ๋ธ ์ •๋ณด
info = model.get_model_info()

๋ฐ˜ํ™˜ ๊ฐ’:

{
    'risk_score': 78.5,           # 0-100์  ์œ„ํ—˜๋„
    'risk_level': '๋†’์Œ',          # ๋‚ฎ์Œ/๋ณดํ†ต/๋†’์Œ
    'closure_probability': 0.785, # ํ์—… ํ™•๋ฅ 
    'risk_factors': {...},        # ์œ„ํ—˜ ์š”์ธ๋ณ„ ์ ์ˆ˜
    'action_items': [...]         # ๊ถŒ์žฅ ์กฐ์น˜
}

์ˆ˜์ • ๋ฐฉ๋ฒ•:

# 1. ์œ„ํ—˜๋„ ์ž„๊ณ„๊ฐ’ ๋ณ€๊ฒฝ
def predict(self, store_data, threshold=0.5):  # ๊ธฐ๋ณธ๊ฐ’ ๋ณ€๊ฒฝ
    ...

# 2. ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ์กฐ์ •
# models/config.json ํŒŒ์ผ์—์„œ:
{
    "ensemble_weights": [0.6, 0.4]  # XGBoost 60%, LightGBM 40%
}

# 3. ์œ„ํ—˜ ๋“ฑ๊ธ‰ ๊ธฐ์ค€ ๋ณ€๊ฒฝ
if risk_score < 40:  # ๊ธฐ์กด 30์—์„œ 40์œผ๋กœ
    risk_level = '๋‚ฎ์Œ'

2. feature_engineering.py - ํŠน์ง• ์ƒ์„ฑ

์šฉ๋„: ์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ 47๊ฐœ์˜ ํŠน์ง•์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑ

์ฃผ์š” ํด๋ž˜์Šค: FeatureEngineer

์ƒ์„ฑ๋˜๋Š” ํŠน์ง•:

๋งค์ถœ ๊ด€๋ จ (15๊ฐœ)

  • sales_avg_1m, sales_avg_3m, sales_avg_6m, sales_avg_12m
  • sales_recent_vs_previous, sales_mom_change, sales_yoy_change
  • sales_max, sales_min, sales_range

๊ณ ๊ฐ ๊ด€๋ จ (12๊ฐœ)

  • customer_reuse_rate, customer_reuse_trend
  • customer_new_rate
  • ์—ฐ๋ น/์„ฑ๋ณ„๋ณ„ ๊ณ ๊ฐ ๋น„์œจ (10๊ฐœ)

์šด์˜ ๊ด€๋ จ (8๊ฐœ)

  • operation_months, operation_avg_amount
  • operation_cancel_rate, operation_delivery_rate

ํŠธ๋ Œ๋“œ (5๊ฐœ)

  • trend_slope, trend_r2, trend_direction
  • trend_consecutive_down, trend_consecutive_up

๋ณ€๋™์„ฑ (4๊ฐœ)

  • volatility_cv, volatility_std, volatility_mad, volatility_recent_std

๊ณ„์ ˆ์„ฑ (2๊ฐœ)

  • seasonality_detected, seasonality_strength

๋งฅ๋ฝ (1๊ฐœ)

  • context_industry

์‚ฌ์šฉ ์˜ˆ์‹œ:

from feature_engineering import FeatureEngineer

engineer = FeatureEngineer()

features = engineer.create_features(
    store_data={'industry': '์นดํŽ˜', 'location': '์„œ์šธ'},
    monthly_usage=usage_df,
    monthly_customers=customer_df
)

์ƒˆ๋กœ์šด ํŠน์ง• ์ถ”๊ฐ€ ๋ฐฉ๋ฒ•:

class FeatureEngineer:
    def _create_custom_features(self, df):
        """์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€"""
        features = {}
        
        # ์˜ˆ: ์„ฑ์žฅ๋ฅ  ์ง€ํ‘œ
        if 'RC_M1_SAA' in df.columns and len(df) >= 6:
            recent_3m = df['RC_M1_SAA'].tail(3).mean()
            past_3m = df['RC_M1_SAA'].head(3).mean()
            features['growth_rate'] = (recent_3m / past_3m - 1) * 100
        
        return features
    
    def create_features(self, store_data, monthly_usage, monthly_customers):
        features = {}
        
        # ๊ธฐ์กด ํŠน์ง•๋“ค...
        features.update(self._create_sales_features(monthly_usage))
        features.update(self._create_customer_features(monthly_customers))
        
        # ์ƒˆ๋กœ์šด ์ปค์Šคํ…€ ํŠน์ง• ์ถ”๊ฐ€
        features.update(self._create_custom_features(monthly_usage))
        
        return pd.DataFrame([features])

3. train.py - ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ

์šฉ๋„: ์ปค๋งจ๋“œ๋ผ์ธ์—์„œ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ

์‚ฌ์šฉ๋ฒ•:

# ๊ธฐ๋ณธ ์‚ฌ์šฉ
python src/train.py

# ์˜ต์…˜ ์ง€์ •
python src/train.py --data data/raw --output models/ --max-stores 1000

# ๋„์›€๋ง
python src/train.py --help

ํŒŒ๋ผ๋ฏธํ„ฐ:

  • --data: ๋ฐ์ดํ„ฐ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ (๊ธฐ๋ณธ: data/raw)
  • --output: ๋ชจ๋ธ ์ €์žฅ ๊ฒฝ๋กœ (๊ธฐ๋ณธ: models)
  • --max-stores: ํ…Œ์ŠคํŠธ์šฉ ์ตœ๋Œ€ ๋งค์žฅ ์ˆ˜ (์„ ํƒ์‚ฌํ•ญ)

์ฃผ์š” ํ•จ์ˆ˜:

def load_data(data_dir)
    """๋ฐ์ดํ„ฐ ๋กœ๋“œ"""

def create_features(df_store, df_usage, df_customer)
    """ํŠน์ง• ์ƒ์„ฑ"""

def preprocess_data(X, y)
    """์ „์ฒ˜๋ฆฌ ๋ฐ ๋ถ„ํ• """

def apply_smote(X_train, y_train)
    """SMOTE ์ ์šฉ"""

def train_models(X_train, y_train)
    """๋ชจ๋ธ ํ•™์Šต"""

def evaluate_models(xgb_model, lgb_model, X_test, y_test)
    """ํ‰๊ฐ€"""

def save_models(...)
    """๋ชจ๋ธ ์ €์žฅ"""

์ˆ˜์ • ๋ฐฉ๋ฒ•:

# 1. ๋ชจ๋ธ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ๋ณ€๊ฒฝ
def train_models(X_train, y_train):
    xgb_model = xgb.XGBClassifier(
        max_depth=8,           # 6์—์„œ 8๋กœ ์ฆ๊ฐ€
        learning_rate=0.05,    # 0.1์—์„œ 0.05๋กœ ๊ฐ์†Œ
        n_estimators=300,      # 200์—์„œ 300์œผ๋กœ ์ฆ๊ฐ€
        # ...
    )

# 2. ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ๋ณ€๊ฒฝ
def evaluate_models(...):
    ensemble_pred = 0.6 * xgb_pred + 0.4 * lgb_pred  # ๊ธฐ์กด 0.5, 0.5

# 3. ๋ฐ์ดํ„ฐ ๋ถ„ํ•  ๋น„์œจ ๋ณ€๊ฒฝ
def preprocess_data(X, y):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, ...  # 0.25์—์„œ 0.2๋กœ
    )

์ฃผ์š” ์ˆ˜์ • ์‹œ๋‚˜๋ฆฌ์˜ค

์‹œ๋‚˜๋ฆฌ์˜ค 1: ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต

1๋‹จ๊ณ„: ๋ฐ์ดํ„ฐ ์ค€๋น„

# data/raw/์— CSV ํŒŒ์ผ 3๊ฐœ ๋ฐฐ์น˜
data/raw/
โ”œโ”€โ”€ big_data_set1_f.csv
โ”œโ”€โ”€ ds2_monthly_usage.csv
โ””โ”€โ”€ ds3_monthly_customers.csv

2๋‹จ๊ณ„: ํ•™์Šต ์‹คํ–‰

python src/train.py

3๋‹จ๊ณ„: ์˜ˆ์ธก ์‚ฌ์šฉ

from src.predictor import EarlyWarningPredictor
model = EarlyWarningPredictor.from_pretrained("models/")

์‹œ๋‚˜๋ฆฌ์˜ค 2: ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ 

๋ฐฉ๋ฒ• 1: ํŠน์ง• ์ถ”๊ฐ€

# feature_engineering.py์— ์ƒˆ๋กœ์šด ํŠน์ง• ์ถ”๊ฐ€
def _create_custom_features(self, df):
    # ์ƒˆ๋กœ์šด ์ง€ํ‘œ ๊ณ„์‚ฐ
    pass

๋ฐฉ๋ฒ• 2: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹

# train.py์—์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ •
xgb_model = xgb.XGBClassifier(
    max_depth=8,
    learning_rate=0.05,
    ...
)

๋ฐฉ๋ฒ• 3: ์•™์ƒ๋ธ” ๊ฐ€์ค‘์น˜ ์กฐ์ •

# models/config.json ์ˆ˜์ •
{
    "ensemble_weights": [0.6, 0.4]
}

์‹œ๋‚˜๋ฆฌ์˜ค 3: ์˜ˆ์ธก ์ž„๊ณ„๊ฐ’ ์กฐ์ •

๋” ๋ฏผ๊ฐํ•˜๊ฒŒ (์กฐ๊ธฐ ๊ฒฝ๋ณด ๊ฐ•ํ™”):

result = model.predict(store_data, threshold=0.3)
# ํ์—… ํ™•๋ฅ  30% ์ด์ƒ์ด๋ฉด ์œ„ํ—˜์œผ๋กœ ํŒ๋‹จ

๋” ๋ณด์ˆ˜์ ์œผ๋กœ:

result = model.predict(store_data, threshold=0.7)
# ํ์—… ํ™•๋ฅ  70% ์ด์ƒ์ด์–ด์•ผ ์œ„ํ—˜์œผ๋กœ ํŒ๋‹จ

์ฐธ๊ณ  ์ž๋ฃŒ