Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

README.md +178 -3
checkpoints/phase2/exp_factor_balanced/best_model.pt +3 -0
configs/features_config.json +108 -0
feature_calculator.py +273 -0
processed_data/stage3/norm_params.json +146 -0
requirements.txt +4 -0
test_wearable_service.py +182 -0
wearable_anomaly_detector.py +428 -0

README.md CHANGED Viewed

@@ -1,3 +1,178 @@
----
-license: apache-2.0
----

+# Wearable_TimeSeries_Health_Monitor
+面向可穿戴设备的多用户健康监控方案：一份模型、一个配置，就能为不同用户构建个性化异常检测。模型基于 **Phased LSTM + Temporal Fusion Transformer (TFT)**，并整合自适应基线、因子特征以及单位秒级的数据滑窗能力，适合当作 HuggingFace 模型或企业内部服务快速接入。
+---
+## 🌟 模型应用亮点
+| 能力 | 说明 |
+| --- | --- |
+| **即插即用** | 内置 `WearableAnomalyDetector` 封装，加载模型即可预测，一次初始化后可持续监控多个用户 |
+| **配置驱动特征** | `configs/features_config.json` 描述所有特征、缺省值、类别映射，新增/删减血氧、呼吸率等只需改配置 |
+| **多用户实时服务** | `FeatureCalculator` + 轻量级 `data_storage` 缓存，实现用户历史管理、基线演化、批量推理 |
+| **多场景 Demo** | `test_wearable_service.py` 内置 3 个真实“客户”案例：完整传感器、缺少字段、匿名设备，即使没有原始数据也能立即体验 |
+| **自适应基线支持** | 可扩展 `UserDataManager` 将个人/分组基线接入推理流程，持续改善个体敏感度 |
+---
+## 📊 核心指标（短期窗口）
+- **F1**: 0.2819
+- **Precision**: 0.1769
+- **Recall**: 0.6941
+- **最佳阈值**: 0.53
+- **窗口定义**: 12 条 5 分钟数据（1小时时间窗，预测未来 0.5 小时）
+> 模型偏向召回，适合“异常先提醒、人机协同复核”的场景。可通过阈值/采样策略调节精度与召回。
+---
+## 🚀 快速体验
+### 1. 克隆或下载模型仓库
+```bash
+git clone https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor
+cd Wearable_TimeSeries_Health_Monitor
+pip install -r requirements.txt
+```
+### 2. 运行内置 Demo（无需额外数据）
+```bash
+# 默认跑 ab60 案例
+python test_wearable_service.py
+# 批量跑全部预置客户
+python test_wearable_service.py --case all
+# 想从原始 stage1 CSV 抽样测试
+python test_wearable_service.py --from-raw
+```
+`test_wearable_service.py` 将自动：
+- 加载 `WearableAnomalyDetector`
+- 读取配置驱动特征
+- 构建窗口并执行预测
+- 输出每位“客户”的异常分数、阈值、预测详情
+### 3. 在业务代码中调用
+```python
+from wearable_anomaly_detector import WearableAnomalyDetector
+detector = WearableAnomalyDetector(
+    model_dir="checkpoints/phase2/exp_factor_balanced",
+    threshold=0.53,
+)
+result = detector.predict(data_points, return_score=True, return_details=True)
+print(result)
+```
+> `data_points` 为 12 条最新的 5 分钟记录；若缺静态特征/设备信息，系统会自动从配置/缓存补齐。
+---
+## 🔧 输入与输出
+### 输入（单个数据点）
+```python
+{
+  "timestamp": "2024-01-01T08:00:00",
+  "deviceId": "ab60",            # 可选，缺失时会自动创建匿名 ID
+  "features": {
+    "hr": 72.0,
+    "hrv_rmssd": 30.0,
+    "time_period_primary": "morning",
+    "data_quality": "high",
+    ...
+  }
+}
+```
+- 每个窗口需 12 条数据（默认 1 小时）
+- 特征是否必填由 `configs/features_config.json` 控制
+- 缺失值会自动回落到 default 或 category_mapping 定义值
+### 输出
+```python
+{
+  "is_anomaly": True,
+  "anomaly_score": 0.5760,
+  "threshold": 0.5300,
+  "details": {
+     "window_size": 12,
+     "model_output": 0.5760,
+     "prediction_confidence": 0.0460
+  }
+}
+```
+---
+## 🧱 模型架构与训练
+- **模型骨干**：Phased LSTM 处理不等间隔序列 + Temporal Fusion Transformer 聚合时间上下文
+- **异常检测头**：增强注意力、多层 MLP、可选对比学习/类型辅助头
+- **特征体系**：
+  - 生理：HR、HRV（RMSSD/SDNN/PNN50…）
+  - 活动：步数、距离、能量消耗、加速度、陀螺仪
+  - 环境：光线、昼夜标签、数据质量
+  - 基线：自适应基线均值/标准差 + 偏差特征
+- **标签来源**：问卷高置信度标签 + 自适应基线低置信度标签
+- **训练流程**：Stage1/2/3 数据加工 ➜ Phase1 自监督预训练 ➜ Phase2 监督微调 ➜ 阈值/案例校正
+---
+## 📦 仓库结构（部分）
+```
+├─ configs/
+│   └─ features_config.json     # 特征定义 & 归一化策略
+├─ wearable_anomaly_detector.py # 核心封装：加载、预测、批处理
+├─ feature_calculator.py        # 配置驱动的特征构建 + 用户历史缓存
+├─ test_wearable_service.py     # HuggingFace Demo脚本（内含预置案例）
+└─ checkpoints/phase2/...       # 模型权重 & summary
+```
+---
+## 📚 数据来源与许可证
+- 训练数据基于 **“A continuous real-world dataset comprising wearable-based heart rate variability alongside sleep diaries”**（Baigutanova *et al.*, Scientific Data, 2025）以及其 Figshare 数据集 [doi:10.1038/s41597-025-05801-3](https://www.nature.com/articles/s41597-025-05801-3) / [dataset link](https://springernature.figshare.com/articles/dataset/In-situ_wearable-based_dataset_of_continuous_heart_rate_variability_monitoring_accompanied_by_sleep_diaries/28509740)。
+- 该数据集以 **Creative Commons Attribution 4.0 (CC BY 4.0)** 许可发布，可自由使用、修改、分发，但必须保留署名并附上许可证链接。
+- 本仓库沿用 CC BY 4.0 对原始数据的要求；若你在此基础上再加工或发布，请继续保留上述署名与许可证说明。
+- 代码/模型可根据需要使用 MIT/Apache 等许可证，但凡涉及数据的部分，仍需遵循 CC BY 4.0。
+---
+## 🤝 贡献与扩展
+欢迎：
+1. 新增特征或数据源 ⇒ 更新 `features_config.json` + 提交 PR
+2. 接入新的用户数据管理/基线策略 ⇒ 扩展 `FeatureCalculator` 或贡献 `UserDataManager`
+3. 反馈案例或真实部署经验 ⇒ 提 Issue 或 Discussion
+---
+## 📄 许可证
+待定（可根据项目需要替换）。
+---
+## 🔖 引用
+```bibtex
+@software{Wearable_TimeSeries_Health_Monitor,
+  title  = {Wearable\_TimeSeries\_Health\_Monitor},
+  author = {oscarzhang},
+  year   = {2025},
+  url    = {https://huggingface.co/oscarzhang/Wearable_TimeSeries_Health_Monitor}
+}
+```

checkpoints/phase2/exp_factor_balanced/best_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4f2f056ea3cec48902ffda2399e905189dce62826034470bb6514f8739eba9ff
+size 27270610

configs/features_config.json ADDED Viewed

	@@ -0,0 +1,108 @@

+{
+  "metadata": {
+    "version": "1.0",
+    "description": "Wearable anomaly detection feature configuration"
+  },
+  "time_series": [
+    {"name": "hr", "enabled": true, "default": 70.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hr_resting", "enabled": true, "default": 65.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_rmssd", "enabled": true, "default": 30.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_sdnn", "enabled": true, "default": 40.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_pnn50", "enabled": true, "default": 15.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "sdnn", "enabled": true, "default": 35.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "sdsd", "enabled": true, "default": 25.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "rmssd", "enabled": true, "default": 30.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "pnn20", "enabled": true, "default": 25.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "pnn50", "enabled": true, "default": 12.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "ibi", "enabled": true, "default": 0.86, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "lf/hf", "enabled": true, "default": 1.8, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "steps", "enabled": true, "default": 20.0, "normalization": {"type": "minmax", "min": 0.0, "max": 500.0}},
+    {"name": "distance", "enabled": true, "default": 10.0, "normalization": {"type": "minmax", "min": 0.0, "max": 2000.0}},
+    {"name": "calories", "enabled": true, "default": 1.5, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "acc_x_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "acc_y_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "acc_z_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "grv_x_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "grv_y_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "grv_z_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "grv_w_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "gyr_x_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "gyr_y_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "gyr_z_avg", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "light_avg", "enabled": true, "default": 100.0, "normalization": {"type": "minmax", "min": 0.0, "max": 1000.0}},
+    {
+      "name": "time_period_primary",
+      "enabled": true,
+      "default": 2.0,
+      "normalization": {"type": "none"},
+      "category_mapping": {
+        "night": 0,
+        "morning": 1,
+        "day": 2,
+        "evening": 3,
+        "unknown": 4
+      }
+    },
+    {
+      "name": "time_period_secondary",
+      "enabled": true,
+      "default": 7.0,
+      "normalization": {"type": "none"},
+      "category_mapping": {
+        "commute_morning": 0,
+        "breakfast": 1,
+        "work_morning": 2,
+        "lunch": 3,
+        "work_afternoon": 4,
+        "commute_evening": 5,
+        "dinner": 6,
+        "rest_evening": 7,
+        "rest_night": 8,
+        "exercise": 9,
+        "unknown": 10
+      }
+    },
+    {"name": "is_weekend", "enabled": true, "default": 0.0, "normalization": {"type": "none"}},
+    {
+      "name": "data_quality",
+      "enabled": true,
+      "default": 0.9,
+      "normalization": {"type": "minmax", "min": 0.0, "max": 1.0},
+      "category_mapping": {
+        "low": 0.3,
+        "medium": 0.6,
+        "high": 1.0
+      }
+    },
+    {"name": "missingness_score", "enabled": true, "default": 0.0, "normalization": {"type": "minmax", "min": 0.0, "max": 1.0}},
+    {"name": "baseline_hrv_mean", "enabled": true, "default": 30.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "baseline_hrv_std", "enabled": true, "default": 5.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_deviation_abs", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_deviation_pct", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}},
+    {"name": "hrv_z_score", "enabled": true, "default": 0.0, "normalization": {"type": "zscore", "use_norm_params": true}}
+  ],
+  "static": [
+    {"name": "age_group", "enabled": true, "default": -1},
+    {"name": "age_normalized", "enabled": true, "default": 0.5},
+    {"name": "sex", "enabled": true, "default": 0.5},
+    {"name": "marriage", "enabled": true, "default": -1},
+    {"name": "exercise", "enabled": true, "default": -1},
+    {"name": "coffee", "enabled": true, "default": -1},
+    {"name": "smoking", "enabled": true, "default": -1},
+    {"name": "drinking", "enabled": true, "default": -1},
+    {"name": "MEQ", "enabled": true, "default": 0.0},
+    {"name": "baseline_commute_morning_mean", "enabled": true, "default": 30.0},
+    {"name": "baseline_commute_morning_std", "enabled": true, "default": 5.0}
+  ],
+  "factor_features": {
+    "enabled": true,
+    "factor_names": ["physio", "activity", "context"],
+    "factor_dim": 4
+  },
+  "known_future": [
+    {"name": "hour_of_day", "enabled": true},
+    {"name": "day_of_week", "enabled": true},
+    {"name": "is_weekend", "enabled": true}
+  ]
+}

feature_calculator.py ADDED Viewed

	@@ -0,0 +1,273 @@

+import json
+from pathlib import Path
+from typing import Dict, List, Optional, Any
+from collections import defaultdict
+import numpy as np
+import pandas as pd
+class FeatureCalculator:
+    """
+    统一从配置文件加载特征定义，构建推理/训练需要的窗口结构
+    """
+    def __init__(
+        self,
+        config_path: Optional[Path] = None,
+        norm_params_path: Optional[Path] = None,
+        static_features_path: Optional[Path] = None,
+        storage_dir: Optional[Path] = None,
+    ):
+        base_dir = Path(__file__).parent
+        self.config_path = Path(config_path or base_dir / "configs" / "features_config.json")
+        self.norm_params_path = Path(norm_params_path or base_dir / "processed_data" / "stage3" / "norm_params.json")
+        self.static_features_path = Path(static_features_path or base_dir / "processed_data" / "stage2" / "static_features.csv")
+        self.storage_dir = Path(storage_dir or base_dir / "data_storage")
+        self.storage_dir.mkdir(parents=True, exist_ok=True)
+        self.features_config = self._load_json(self.config_path)
+        self.norm_params = self._load_json(self.norm_params_path) if self.norm_params_path.exists() else {}
+        self.static_features_dict = self._load_static_features(self.static_features_path)
+        self.time_series_features = [f for f in self.features_config.get("time_series", []) if f.get("enabled", True)]
+        self.static_feature_defs = [f for f in self.features_config.get("static", []) if f.get("enabled", True)]
+        self.known_future_defs = [f for f in self.features_config.get("known_future", []) if f.get("enabled", True)]
+        factor_cfg = self.features_config.get("factor_features", {})
+        self.factor_enabled = factor_cfg.get("enabled", False)
+        self.factor_names = factor_cfg.get("factor_names", [])
+        self.factor_dim = factor_cfg.get("factor_dim", 0)
+        # 简单的内存级历史缓存，便于后续扩展个性化特征
+        self.user_histories: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
+    @staticmethod
+    def _load_json(path: Path) -> Dict:
+        if not path.exists():
+            return {}
+        with open(path, "r") as f:
+            return json.load(f)
+    @staticmethod
+    def _load_static_features(static_file: Path) -> Dict[str, Dict]:
+        if not static_file.exists():
+            return {}
+        df = pd.read_csv(static_file)
+        static_dict = {}
+        for _, row in df.iterrows():
+            device_id = str(row.get("deviceId"))
+            if device_id:
+                static_dict[device_id] = {
+                    col: row[col]
+                    for col in df.columns
+                    if col != "deviceId"
+                }
+        return static_dict
+    @staticmethod
+    def _to_serializable(value):
+        import numpy as np
+        from datetime import datetime
+        if isinstance(value, (np.integer, )):
+            return int(value)
+        if isinstance(value, (np.floating, )):
+            return float(value)
+        if isinstance(value, (pd.Timestamp, datetime)):
+            return value.isoformat()
+        if isinstance(value, (np.ndarray, )):
+            return value.tolist()
+        raise TypeError(f"Object of type {type(value)} is not JSON serializable")
+    def register_data_points(self, user_id: str, data_points: List[Dict]):
+        """
+        轻量缓存用户数据，并写入 data_storage/users/{user_id}.jsonl
+        """
+        if not user_id:
+            return
+        user_dir = self.storage_dir / "users"
+        user_dir.mkdir(exist_ok=True, parents=True)
+        history_file = user_dir / f"{user_id}.jsonl"
+        with history_file.open("a", encoding="utf-8") as f:
+            for point in data_points:
+                serializable = dict(point)
+                ts = serializable.get('timestamp')
+                if isinstance(ts, (pd.Timestamp, )):
+                    serializable['timestamp'] = ts.isoformat()
+                elif hasattr(ts, "isoformat"):
+                    serializable['timestamp'] = ts.isoformat()
+                f.write(json.dumps(serializable, ensure_ascii=False, default=self._to_serializable) + "\n")
+        self.user_histories[user_id].extend(data_points)
+        # 只保留最近 5,000 条在内存，避免占用
+        if len(self.user_histories[user_id]) > 5000:
+            self.user_histories[user_id] = self.user_histories[user_id][-5000:]
+    def normalize_series(self, values: List[float], feature_name: str, cfg: Dict) -> List[float]:
+        arr = np.array(values, dtype=np.float32)
+        norm_cfg = cfg.get("normalization", {"type": "none"})
+        norm_type = norm_cfg.get("type", "none")
+        if norm_type == "zscore":
+            mean, std = self._get_norm_stats(feature_name, norm_cfg)
+            if std == 0:
+                std = 1.0
+            arr = (arr - mean) / std
+        elif norm_type == "minmax":
+            min_v = norm_cfg.get("min", 0.0)
+            max_v = norm_cfg.get("max", 1.0)
+            scale = max(max_v - min_v, 1e-6)
+            arr = (arr - min_v) / scale
+        else:
+            # none
+            pass
+        arr = np.nan_to_num(arr, nan=0.0, posinf=0.0, neginf=0.0)
+        return arr.tolist()
+    @staticmethod
+    def _coerce_value(value, feat_cfg):
+        default = feat_cfg.get("default", 0.0)
+        if value is None or pd.isna(value):
+            return default
+        category_mapping = feat_cfg.get("category_mapping")
+        if isinstance(value, str):
+            if category_mapping:
+                return category_mapping.get(value, default)
+            try:
+                return float(value)
+            except ValueError:
+                return default
+        try:
+            return float(value)
+        except (TypeError, ValueError):
+            return default
+    def _get_norm_stats(self, feature_name: str, norm_cfg: Dict) -> (float, float):
+        if norm_cfg.get("use_norm_params") and feature_name in self.norm_params:
+            stats = self.norm_params[feature_name]
+            return stats.get("mean", 0.0), stats.get("std", 1.0)
+        return norm_cfg.get("mean", 0.0), norm_cfg.get("std", 1.0)
+    def build_window(self, data_points: List[Dict], user_id: Optional[str] = None) -> Dict:
+        if len(data_points) < 12:
+            raise ValueError("数据点不足，需要至少12个点构建短期窗口")
+        if user_id:
+            self.register_data_points(user_id, data_points)
+        timestamps = []
+        input_features = {feat["name"]: [] for feat in self.time_series_features}
+        for point in data_points:
+            ts = point.get("timestamp")
+            if isinstance(ts, str):
+                ts = pd.to_datetime(ts)
+            timestamps.append(ts)
+            feature_payload = point.get("features", {})
+            for feat_cfg in self.time_series_features:
+                name = feat_cfg["name"]
+                value = feature_payload.get(name)
+                value = self._coerce_value(value, feat_cfg)
+                input_features[name].append(value)
+        # delta_t
+        delta_t = [0.0]
+        for i in range(1, len(timestamps)):
+            diff = (timestamps[i] - timestamps[i - 1]).total_seconds()
+            delta_t.append(float(diff))
+        # 归一化
+        normalized_features = {}
+        for feat_cfg in self.time_series_features:
+            name = feat_cfg["name"]
+            normalized_features[name] = self.normalize_series(input_features[name], name, feat_cfg)
+        static_features = self._build_static_features(data_points[0], user_id)
+        factor_features = self._build_factor_features(normalized_features)
+        known_future = self._build_known_future(timestamps[-6:] if len(timestamps) >= 6 else timestamps)
+        return {
+            "input_timestamp": timestamps[:12],
+            "input_delta_t": delta_t[:12],
+            "input_features": normalized_features,
+            "target_timestamp": timestamps[12:] if len(timestamps) > 12 else [],
+            "target_delta_t": delta_t[12:] if len(delta_t) > 12 else [],
+            "static_features": static_features,
+            "known_future_features": known_future,
+            "factor_features": factor_features,
+        }
+    def _build_static_features(self, first_point: Dict, user_id: Optional[str]) -> Dict:
+        static_payload = dict(first_point.get("static_features", {}))
+        device_id = first_point.get("deviceId") or user_id
+        if device_id and str(device_id) in self.static_features_dict:
+            for key, value in self.static_features_dict[str(device_id)].items():
+                static_payload.setdefault(key, value)
+        result = {}
+        for feat_cfg in self.static_feature_defs:
+            name = feat_cfg["name"]
+            result[name] = static_payload.get(name, feat_cfg.get("default", 0.0))
+        return result
+    def _build_factor_features(self, normalized_features: Dict[str, List[float]]) -> Optional[Dict[str, List[float]]]:
+        if not self.factor_enabled or not self.factor_names:
+            return None
+        factor_vectors = {}
+        for factor_name in self.factor_names:
+            # 目前采用简单均值/最大值/最小值/最后值，方便后续替换
+            merged = []
+            for feat_name, values in normalized_features.items():
+                if factor_name == "physio" and feat_name.startswith("hrv"):
+                    merged.extend(values)
+                elif factor_name == "activity" and feat_name in {"steps", "distance", "calories"}:
+                    merged.extend(values)
+                elif factor_name == "context" and feat_name in {"time_period_primary", "time_period_secondary", "is_weekend"}:
+                    merged.extend(values)
+            if not merged:
+                factor_vectors[factor_name] = [0.0] * self.factor_dim
+            else:
+                arr = np.array(merged, dtype=np.float32)
+                stats = [
+                    float(arr.mean()),
+                    float(arr.std()),
+                    float(arr.max()),
+                    float(arr.min())
+                ]
+                factor_vectors[factor_name] = stats[: self.factor_dim] if len(stats) >= self.factor_dim else stats + [0.0] * (self.factor_dim - len(stats))
+        return factor_vectors
+    def _build_known_future(self, timestamps: List[pd.Timestamp]) -> Dict[str, List[float]]:
+        hours, days, weekends = [], [], []
+        for ts in timestamps:
+            if pd.isna(ts):
+                hours.append(12.0)
+                days.append(3.0)
+                weekends.append(0.0)
+            else:
+                hours.append(float(ts.hour))
+                days.append(float(ts.weekday()))
+                weekends.append(float(1 if ts.weekday() >= 5 else 0))
+        result = {}
+        for cfg in self.known_future_defs:
+            name = cfg["name"]
+            if name == "hour_of_day":
+                result[name] = hours
+            elif name == "day_of_week":
+                result[name] = days
+            elif name == "is_weekend":
+                result[name] = weekends
+        return result
+    def get_enabled_feature_names(self) -> List[str]:
+        return [feat["name"] for feat in self.time_series_features]
+__all__ = ["FeatureCalculator"]

processed_data/stage3/norm_params.json ADDED Viewed

	@@ -0,0 +1,146 @@

+{
+  "hr_mean": {
+    "mean": 79.88385009765625,
+    "std": 15.546831130981445,
+    "min": 33.0,
+    "max": 200.2244873046875
+  },
+  "hr_std": {
+    "mean": 12.757049560546875,
+    "std": 3.9224278926849365,
+    "min": 0.0,
+    "max": 32.2431755065918
+  },
+  "hr_median": {
+    "mean": 76.4555892944336,
+    "std": 6.908801555633545,
+    "min": 48.0,
+    "max": 104.0
+  },
+  "hr_resting": {
+    "mean": 65.74867248535156,
+    "std": 7.843548774719238,
+    "min": 44.12284469604492,
+    "max": 86.0
+  },
+  "hr_nrem": {
+    "mean": 61.779720306396484,
+    "std": 11.666051864624023,
+    "min": 0.0,
+    "max": 92.5469970703125
+  },
+  "hrv_rmssd": {
+    "mean": 83.4627685546875,
+    "std": 62.30027389526367,
+    "min": 0.0,
+    "max": 855.8391723632812
+  },
+  "hrv_sdnn": {
+    "mean": 100.59049987792969,
+    "std": 43.545467376708984,
+    "min": 0.0,
+    "max": 393.35162353515625
+  },
+  "steps": {
+    "mean": 342.7657470703125,
+    "std": 823.3682861328125,
+    "min": 0.0,
+    "max": 27004.0
+  },
+  "distance": {
+    "mean": 225.4749755859375,
+    "std": 504.8075866699219,
+    "min": 0.0,
+    "max": 10460.2998046875
+  },
+  "calories": {
+    "mean": 104.05133819580078,
+    "std": 211.85128784179688,
+    "min": 0.0,
+    "max": 2962.070068359375
+  },
+  "sleep_duration_total": {
+    "mean": 418.6901550292969,
+    "std": 142.2774200439453,
+    "min": 0.0,
+    "max": 1110.0
+  },
+  "sleep_efficiency": {
+    "mean": 93.89789581298828,
+    "std": 7.327056884765625,
+    "min": 34.0,
+    "max": 100.0
+  },
+  "sleep_deep_ratio": {
+    "mean": 1.00419020652771,
+    "std": 0.3390481770038605,
+    "min": 0.0,
+    "max": 4.310344696044922
+  },
+  "sleep_rem_ratio": {
+    "mean": 1.00448739528656,
+    "std": 0.35869544744491577,
+    "min": 0.0,
+    "max": 3.9259259700775146
+  },
+  "sleep_light_ratio": {
+    "mean": 0.9923003315925598,
+    "std": 0.23265497386455536,
+    "min": 0.0,
+    "max": 3.034313678741455
+  },
+  "spo2": {
+    "mean": 95.9047622680664,
+    "std": 1.04403817653656,
+    "min": 92.4000015258789,
+    "max": 100.0
+  },
+  "stress_score": {
+    "mean": 65.94886779785156,
+    "std": 28.051528930664062,
+    "min": 0.0,
+    "max": 93.0
+  },
+  "ALERT": {
+    "mean": 0.07375683635473251,
+    "std": 0.2613747715950012,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "HAPPY": {
+    "mean": 0.1726546734571457,
+    "std": 0.37794846296310425,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "NEUTRAL": {
+    "mean": 0.1967589408159256,
+    "std": 0.3975485563278198,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "RESTED/RELAXED": {
+    "mean": 0.23211927711963654,
+    "std": 0.42218467593193054,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "SAD": {
+    "mean": 0.018068943172693253,
+    "std": 0.13320080935955048,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "TENSE/ANXIOUS": {
+    "mean": 0.10590820014476776,
+    "std": 0.3077200949192047,
+    "min": 0.0,
+    "max": 1.0
+  },
+  "TIRED": {
+    "mean": 0.20073312520980835,
+    "std": 0.4005488157272339,
+    "min": 0.0,
+    "max": 1.0
+  }
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+torch>=2.1.0
+numpy>=1.24
+pandas>=2.0
+huggingface_hub>=0.23

test_wearable_service.py ADDED Viewed

	@@ -0,0 +1,182 @@

+import argparse
+from pathlib import Path
+from typing import List, Dict, Tuple
+import pandas as pd
+from wearable_anomaly_detector import load_detector
+from feature_calculator import FeatureCalculator
+# 预置案例（来自wearable原始数据）
+PREDEFINED_CASES: Dict[str, Dict] = {
+    "ab60_morning_rest": {
+        "description": "用户ab60，清晨休息到早餐前的连续12个窗口，用于快速验证服务是否可输出结果",
+        "user_id": "ab60",
+        "data_points": [
+            {"timestamp": "2021-03-04T04:45:20.170000", "deviceId": "ab60", "features": {"hr": 91.65860215053765, "hr_resting": 87.84302108870469, "hrv_rmssd": 73.33511196423747, "hrv_sdnn": 72.35486488414405, "hrv_pnn50": 0.3422818791946309, "sdnn": 72.35486488414405, "sdsd": 55.28945952794972, "rmssd": 73.33511196423747, "pnn20": 0.6677852348993288, "pnn50": 0.3422818791946309, "ibi": 671.2685790942928, "lf/hf": 0.6578861348372742, "acc_x_avg": 4.9294712037533515, "acc_y_avg": -3.1057153652814957, "acc_z_avg": 5.8750820100536005, "grv_x_avg": -0.5497846977211797, "grv_y_avg": 0.0042184631367292, "grv_z_avg": 0.1525969041554961, "grv_w_avg": 0.1771413800268097, "gyr_x_avg": -0.9240750428954416, "gyr_y_avg": 1.1994772238605889, "gyr_z_avg": 0.3142024215817695, "light_avg": 440.066889632107, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.2886646919083789}},
+            {"timestamp": "2021-03-04T05:15:20.497000", "deviceId": "ab60", "features": {"hr": 84.15604988203573, "hr_resting": 87.84302108870469, "hrv_rmssd": 73.48193641157741, "hrv_sdnn": 74.51390542231427, "hrv_pnn50": 0.4143302180685358, "sdnn": 74.51390542231427, "sdsd": 51.36438677767364, "rmssd": 73.48193641157741, "pnn20": 0.7538940809968847, "pnn50": 0.4143302180685358, "ibi": 729.7248964415015, "lf/hf": 1.1884595045473076, "acc_x_avg": 5.264426130609508, "acc_y_avg": -2.4949751527126582, "acc_z_avg": 5.638523847957136, "grv_x_avg": -0.2725604313462818, "grv_y_avg": -0.0417301761553922, "grv_z_avg": 0.0857810462156731, "grv_w_avg": 0.2177219383791028, "gyr_x_avg": -0.6168720663094444, "gyr_y_avg": 1.3517548573342275, "gyr_z_avg": 0.3159611446751512, "light_avg": 121.7391304347826, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1872242091224755}},
+            {"timestamp": "2021-03-04T05:35:21.317000", "deviceId": "ab60", "features": {"hr": 82.22642857142857, "hr_resting": 87.84302108870469, "hrv_rmssd": 73.0761458919254, "hrv_sdnn": 81.71063624330793, "hrv_pnn50": 0.45703125, "sdnn": 81.71063624330793, "sdsd": 47.19876056820636, "rmssd": 73.0761458919254, "pnn20": 0.7578125, "pnn50": 0.45703125, "ibi": 799.9492801995793, "lf/hf": 1.7751635086235489, "acc_x_avg": 4.37106831212324, "acc_y_avg": -0.3996270033489605, "acc_z_avg": 6.891127246483591, "grv_x_avg": -0.6645361294433281, "grv_y_avg": -0.3851286666666666, "grv_z_avg": 0.0854148665325285, "grv_w_avg": 0.189048981220657, "gyr_x_avg": -0.1036394095174265, "gyr_y_avg": 1.4141689068364616, "gyr_z_avg": 0.4626943733243945, "light_avg": 603.2266666666667, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.335979916085374}},
+            {"timestamp": "2021-03-04T05:55:21.428000", "deviceId": "ab60", "features": {"hr": 86.76466621712744, "hr_resting": 87.84302108870469, "hrv_rmssd": 74.59311564233765, "hrv_sdnn": 87.31034842159234, "hrv_pnn50": 0.391304347826087, "sdnn": 87.31034842159234, "sdsd": 52.14310859847846, "rmssd": 74.59311564233765, "pnn20": 0.7418478260869565, "pnn50": 0.391304347826087, "ibi": 709.0165592504526, "lf/hf": 2.4824085488532703, "acc_x_avg": 6.8974065639651725, "acc_y_avg": -0.6532005197588735, "acc_z_avg": 4.533895625586072, "grv_x_avg": -0.606806606831882, "grv_y_avg": -0.1301535706630945, "grv_z_avg": 0.0961370341594106, "grv_w_avg": 0.4334408365706628, "gyr_x_avg": -0.9999732337575332, "gyr_y_avg": 1.1268921553918274, "gyr_z_avg": 0.0987407849966511, "light_avg": 428.3612040133779, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1217623103705546}},
+            {"timestamp": "2021-03-04T06:00:21.434000", "deviceId": "ab60", "features": {"hr": 86.11454484380249, "hr_resting": 87.84302108870469, "hrv_rmssd": 80.21713778627182, "hrv_sdnn": 72.4260069463789, "hrv_pnn50": 0.4117647058823529, "sdnn": 72.4260069463789, "sdsd": 57.42532753452254, "rmssd": 80.21713778627182, "pnn20": 0.7536764705882353, "pnn50": 0.4117647058823529, "ibi": 711.989004966488, "lf/hf": 0.5778707099781514, "acc_x_avg": 2.6462390616208977, "acc_y_avg": -2.0968226182183503, "acc_z_avg": 7.492496608841257, "grv_x_avg": -0.3523352029470862, "grv_y_avg": -0.3171756162089749, "grv_z_avg": 0.0852522665773609, "grv_w_avg": 0.0408305612860013, "gyr_x_avg": -0.7581300395442356, "gyr_y_avg": 1.7399128639410202, "gyr_z_avg": 0.5810656782841818, "light_avg": 719.866220735786, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.3055760776711148}},
+            {"timestamp": "2021-03-04T06:05:21.530000", "deviceId": "ab60", "features": {"hr": 85.55876427132304, "hr_resting": 87.84302108870469, "hrv_rmssd": 61.68777738949315, "hrv_sdnn": 61.78286083667294, "hrv_pnn50": 0.3786127167630058, "sdnn": 61.78286083667294, "sdsd": 40.15746948668274, "rmssd": 61.68777738949315, "pnn20": 0.7283236994219653, "pnn50": 0.3786127167630058, "ibi": 715.325568241176, "lf/hf": 0.9103296457711838, "acc_x_avg": 3.0594024239785633, "acc_y_avg": -2.580148740120562, "acc_z_avg": 7.2776362170127245, "grv_x_avg": -0.2312993507712944, "grv_y_avg": -0.1165045117370892, "grv_z_avg": 0.1689689483568074, "grv_w_avg": 0.1020506572769953, "gyr_x_avg": -0.2617024135388738, "gyr_y_avg": 1.1428016065683635, "gyr_z_avg": 0.0963672908847178, "light_avg": 665.6321070234113, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1561355447930486}},
+            {"timestamp": "2021-03-04T06:15:21.623000", "deviceId": "ab60", "features": {"hr": 82.62592343854936, "hr_resting": 87.84302108870469, "hrv_rmssd": 81.00579057638534, "hrv_sdnn": 73.40584581276266, "hrv_pnn50": 0.4357366771159874, "sdnn": 73.40584581276266, "sdsd": 57.10852463732151, "rmssd": 81.00579057638534, "pnn20": 0.7366771159874608, "pnn50": 0.4357366771159874, "ibi": 746.8545044048512, "lf/hf": 2.2582676863270708, "acc_x_avg": 0.1661593281982583, "acc_y_avg": -1.6491836677829892, "acc_z_avg": 9.736414508372398, "grv_x_avg": -0.6639661513730741, "grv_y_avg": 0.2112439484259877, "grv_z_avg": 0.008919145344943, "grv_w_avg": 0.0059900468854654, "gyr_x_avg": -0.3158673777628935, "gyr_y_avg": 0.9963161399866036, "gyr_z_avg": 0.4727930301406551, "light_avg": 802.180602006689, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.174593188653174}},
+            {"timestamp": "2021-03-04T06:40:21.894000", "deviceId": "ab60", "features": {"hr": 83.86055107526882, "hr_resting": 87.84302108870469, "hrv_rmssd": 79.079102105631, "hrv_sdnn": 71.271849209199, "hrv_pnn50": 0.4221556886227545, "sdnn": 71.271849209199, "sdsd": 54.69084849089656, "rmssd": 79.079102105631, "pnn20": 0.7724550898203593, "pnn50": 0.4221556886227545, "ibi": 732.4375213133641, "lf/hf": 0.6677155405784594, "acc_x_avg": 1.7091933630274596, "acc_y_avg": -2.3761690361687893, "acc_z_avg": 8.209631811788354, "grv_x_avg": -0.3922270288010715, "grv_y_avg": 0.2412905398526454, "grv_z_avg": -0.0233138225050234, "grv_w_avg": 0.1066616838580042, "gyr_x_avg": 0.3474681640991295, "gyr_y_avg": 1.1165304762223718, "gyr_z_avg": 0.186041525117213, "light_avg": 729.1103678929766, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.162896032760479}},
+            {"timestamp": "2021-03-04T06:45:21.969000", "deviceId": "ab60", "features": {"hr": 85.06473364801079, "hr_resting": 87.84302108870469, "hrv_rmssd": 73.21565395106201, "hrv_sdnn": 69.24046192941114, "hrv_pnn50": 0.4137931034482758, "sdnn": 69.24046192941114, "sdsd": 49.87687480659143, "rmssd": 73.21565395106201, "pnn20": 0.768025078369906, "pnn50": 0.4137931034482758, "ibi": 719.8668191606536, "lf/hf": 1.6620124103207512, "acc_x_avg": 2.2839716548257365, "acc_y_avg": -2.790153060991958, "acc_z_avg": 8.004115201072393, "grv_x_avg": -0.4554428230563004, "grv_y_avg": -0.1927247533512062, "grv_z_avg": 0.0215668719839142, "grv_w_avg": 0.1034467037533509, "gyr_x_avg": -0.0257573813672921, "gyr_y_avg": 1.0048659463806948, "gyr_z_avg": 0.3241018806970504, "light_avg": 755.3177257525084, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1959064930123424}},
+            {"timestamp": "2021-03-04T06:50:21.981000", "deviceId": "ab60", "features": {"hr": 84.85642062689585, "hr_resting": 87.84302108870469, "hrv_rmssd": 68.0148861140481, "hrv_sdnn": 76.76458770787994, "hrv_pnn50": 0.3815384615384615, "sdnn": 76.76458770787994, "sdsd": 46.69614203120664, "rmssd": 68.0148861140481, "pnn20": 0.6707692307692308, "pnn50": 0.3815384615384615, "ibi": 729.6592731642268, "lf/hf": 0.9127720565801908, "acc_x_avg": 1.71269043335566, "acc_y_avg": -1.0698834099129273, "acc_z_avg": 9.200909472873422, "grv_x_avg": -0.375676233087742, "grv_y_avg": 0.1822926999330206, "grv_z_avg": 0.0157954038847957, "grv_w_avg": 0.0460910823844608, "gyr_x_avg": -0.1220897675820484, "gyr_y_avg": 0.6891694554588073, "gyr_z_avg": 0.4564768801071655, "light_avg": 847.0335570469799, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1845048437257962}},
+            {"timestamp": "2021-03-04T06:55:21.985000", "deviceId": "ab60", "features": {"hr": 80.75881760161236, "hr_resting": 87.84302108870469, "hrv_rmssd": 66.04701786885973, "hrv_sdnn": 76.17603294638347, "hrv_pnn50": 0.3684210526315789, "sdnn": 76.17603294638347, "sdsd": 45.0541425499126, "rmssd": 66.04701786885973, "pnn20": 0.7151702786377709, "pnn50": 0.3684210526315789, "ibi": 757.0256501417455, "lf/hf": 1.269681575994282, "acc_x_avg": 3.739257345612857, "acc_y_avg": -2.4820750622906904, "acc_z_avg": 7.388303849966525, "grv_x_avg": -0.5472615157401197, "grv_y_avg": 0.3134806992632281, "grv_z_avg": 0.0378557434695244, "grv_w_avg": 0.2093843141326185, "gyr_x_avg": -0.5253990328638486, "gyr_y_avg": 1.6931924748490956, "gyr_z_avg": 0.665633814889338, "light_avg": 703.4496644295302, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1579866815850661}},
+            {"timestamp": "2021-03-04T07:00:22.020000", "deviceId": "ab60", "features": {"hr": 83.36467427803895, "hr_resting": 87.84302108870469, "hrv_rmssd": 85.01833953484277, "hrv_sdnn": 76.7068088047925, "hrv_pnn50": 0.4434782608695652, "sdnn": 76.7068088047925, "sdsd": 62.38011995356233, "rmssd": 85.01833953484277, "pnn20": 0.7536231884057971, "pnn50": 0.4434782608695652, "ibi": 724.6919175240898, "lf/hf": 1.3219549787885163, "acc_x_avg": 0.4397476164658638, "acc_y_avg": -1.638112914323962, "acc_z_avg": 9.6665646124498, "grv_x_avg": -0.3218441425702814, "grv_y_avg": 0.1350831372155288, "grv_z_avg": 0.0047377771084337, "grv_w_avg": 0.0104064123159303, "gyr_x_avg": -0.2149196720214184, "gyr_y_avg": 0.9163253052208852, "gyr_z_avg": 0.5661378821954464, "light_avg": 861.3110367892976, "time_period_primary": "morning", "time_period_secondary": "breakfast", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1291275275920405}}
+        ]
+    },
+    "nd56_low_activity_sparse": {
+        "description": "nd56：多数传感器字段缺失，只提供心率/时间段等基础信息，验证模型默认填充能力",
+        "user_id": "nd56",
+        "data_points": [
+            {"timestamp": "2021-03-04T03:40:55.745000", "deviceId": "nd56", "features": {"hr": 73.3681592039801, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T03:55:55.963000", "deviceId": "nd56", "features": {"hr": 70.58150365934797, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T04:30:56.879000", "deviceId": "nd56", "features": {"hr": 79.00866089273818, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T05:11:22.339000", "deviceId": "nd56", "features": {"hr": 76.88147410358566, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T06:21:22.941000", "deviceId": "nd56", "features": {"hr": 79.52276503821868, "steps": None, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T06:36:22.983000", "deviceId": "nd56", "features": {"hr": 75.333, "steps": None, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T06:41:23.077000", "deviceId": "nd56", "features": {"hr": 75.7566401062417, "steps": None, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T06:56:23.275000", "deviceId": "nd56", "features": {"hr": 74.49435590969456, "steps": None, "time_period_primary": "morning", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T07:26:23.418000", "deviceId": "nd56", "features": {"hr": 73.90371845949535, "steps": None, "time_period_primary": "morning", "time_period_secondary": "breakfast", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T07:36:23.463000", "deviceId": "nd56", "features": {"hr": 72.7252911813644, "steps": None, "time_period_primary": "morning", "time_period_secondary": "breakfast", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T07:41:23.508000", "deviceId": "nd56", "features": {"hr": 70.14043824701196, "steps": None, "time_period_primary": "morning", "time_period_secondary": "breakfast", "is_weekend": 0, "data_quality": "medium"}},
+            {"timestamp": "2021-03-04T07:46:23.547000", "deviceId": "nd56", "features": {"hr": 71.33565737051792, "steps": None, "time_period_primary": "morning", "time_period_secondary": "breakfast", "is_weekend": 0, "data_quality": "medium"}}
+        ]
+    },
+    "anon_commuter_minimal": {
+        "description": "匿名通勤用户：仅提供心率/HRV/时间段等极简指标，deviceId缺失，��证服务可批量处理不同客户",
+        "user_id": "anon_commuter",
+        "data_points": [
+            {"timestamp": "2021-03-09T00:52:04.300000", "deviceId": None, "features": {"hr": 88.77359119706568, "hrv_rmssd": 68.78080551188198, "hrv_sdnn": 68.55954732717733, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.2340064304816851}},
+            {"timestamp": "2021-03-09T00:57:04.328000", "deviceId": None, "features": {"hr": 80.68233333333333, "hrv_rmssd": 59.2934623277283, "hrv_sdnn": 73.79953723523498, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1442517250480751}},
+            {"timestamp": "2021-03-09T01:02:04.404000", "deviceId": None, "features": {"hr": 75.80266666666667, "hrv_rmssd": 45.96602248249421, "hrv_sdnn": 86.08949225862196, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0475269119819883}},
+            {"timestamp": "2021-03-09T01:07:04.432000", "deviceId": None, "features": {"hr": 74.08533333333334, "hrv_rmssd": 35.124819470890934, "hrv_sdnn": 76.28600536828999, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0362464905334389}},
+            {"timestamp": "2021-03-09T01:12:04.441000", "deviceId": None, "features": {"hr": 76.9121411276375, "hrv_rmssd": 72.02894057601205, "hrv_sdnn": 87.11453810833142, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1366772654292948}},
+            {"timestamp": "2021-03-09T01:17:04.485000", "deviceId": None, "features": {"hr": 68.27024325224924, "hrv_rmssd": 36.20840500569033, "hrv_sdnn": 61.29313423988413, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": -0.0048301680504103}},
+            {"timestamp": "2021-03-09T01:22:04.496000", "deviceId": None, "features": {"hr": 71.42038640906063, "hrv_rmssd": 42.73973712175516, "hrv_sdnn": 100.9222381995624, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0282886513311317}},
+            {"timestamp": "2021-03-09T01:27:04.517000", "deviceId": None, "features": {"hr": 78.02364302364302, "hrv_rmssd": 68.8236455702779, "hrv_sdnn": 91.82364754816273, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.133595953991592}},
+            {"timestamp": "2021-03-09T01:32:04.589000", "deviceId": None, "features": {"hr": 74.2271818787475, "hrv_rmssd": 47.71422420363224, "hrv_sdnn": 80.68140559056071, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0569492438181573}},
+            {"timestamp": "2021-03-09T01:37:04.619000", "deviceId": None, "features": {"hr": 66.9780146568954, "hrv_rmssd": 35.74818419244297, "hrv_sdnn": 50.71532677029013, "steps": None, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0026578073089701}},
+            {"timestamp": "2021-03-09T01:47:04.721000", "deviceId": None, "features": {"hr": 74.31312458361093, "hrv_rmssd": 57.44451616196892, "hrv_sdnn": 98.6011860346101, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.1253227425948505}},
+            {"timestamp": "2021-03-09T01:52:04.735000", "deviceId": None, "features": {"hr": 71.42538307794804, "hrv_rmssd": 50.01054170733226, "hrv_sdnn": 73.22227779949023, "steps": 0.0, "time_period_primary": "night", "time_period_secondary": "rest_night", "is_weekend": 0, "data_quality": "medium", "missingness_score": 0.0255565038546025}}
+        ]
+    }
+}
+def load_raw_data_window(
+    stage1_file: Path,
+    feature_names: List[str],
+    window_size: int = 12,
+) -> Tuple[List[Dict], str]:
+    stage1_path = Path(stage1_file)
+    if not stage1_path.exists():
+        raise FileNotFoundError(f"找不到原始数据文件: {stage1_file}")
+    base_cols = pd.read_csv(stage1_path, nrows=0).columns.tolist()
+    usecols = ['deviceId', 'ts_start'] + [feat for feat in feature_names if feat in base_cols]
+    usecols = list(dict.fromkeys(usecols))
+    buffers: Dict[str, List[Dict]] = {}
+    reader = pd.read_csv(
+        stage1_path,
+        usecols=usecols,
+        parse_dates=['ts_start'],
+        chunksize=10000,
+    )
+    for chunk in reader:
+        chunk = chunk.sort_values(['deviceId', 'ts_start'])
+        for device_id, group in chunk.groupby('deviceId'):
+            records = group.to_dict('records')
+            if device_id not in buffers:
+                buffers[device_id] = []
+            buffers[device_id].extend(records)
+            buffers[device_id] = sorted(buffers[device_id], key=lambda r: r['ts_start'])
+            if len(buffers[device_id]) >= window_size:
+                segment = buffers[device_id][:window_size]
+                data_points: List[Dict] = []
+                for row in segment:
+                    feature_payload = {}
+                    for feat in feature_names:
+                        if feat in row and pd.notna(row[feat]):
+                            feature_payload[feat] = row[feat]
+                    data_points.append({
+                        'timestamp': row['ts_start'].to_pydatetime(),
+                        'deviceId': str(device_id),
+                        'features': feature_payload,
+                    })
+                return data_points, str(device_id)
+            if len(buffers[device_id]) > window_size * 4:
+                buffers[device_id] = buffers[device_id][-window_size*2:]
+    raise ValueError("没有找到满足窗口长度的用户数据（请检查原始数据是否存在足够连续的记录）")
+def load_predefined_case(case_name: str) -> Tuple[List[Dict], str, str]:
+    if case_name not in PREDEFINED_CASES:
+        raise ValueError(f"未找到预置案例: {case_name}，可选: {list(PREDEFINED_CASES.keys())}")
+    case = PREDEFINED_CASES[case_name]
+    data_points = []
+    for point in case["data_points"]:
+        converted = dict(point)
+        converted["timestamp"] = pd.to_datetime(converted["timestamp"])
+        data_points.append(converted)
+    return data_points, case["user_id"], case["description"]
+def main():
+    parser = argparse.ArgumentParser(description="使用原始 wearables 数据测试新的推理服务")
+    parser.add_argument("--model-dir", type=str, default="checkpoints/phase2/exp_factor_balanced")
+    parser.add_argument("--stage1-file", type=str, default="processed_data/stage1/wearable_processed.csv")
+    parser.add_argument("--window-size", type=int, default=12)
+    parser.add_argument("--case", type=str, nargs="+", default=["ab60_morning_rest"], help="使用预置案例名（可多选，all=全部）")
+    parser.add_argument("--from-raw", action="store_true", help="从stage1原始文件抽样，而不是预置案例")
+    args = parser.parse_args()
+    base_dir = Path(__file__).parent
+    stage1_file = base_dir / args.stage1_file
+    feature_calculator = FeatureCalculator()
+    feature_names = feature_calculator.get_enabled_feature_names()
+    print("\n🚀 加载异常检测器...")
+    detector = load_detector(base_dir / args.model_dir)
+    def run_prediction(label: str, data_points: List[Dict], user_id_hint: str):
+        print(f"\n🧪 执行预测: {label}")
+        print(f"  - 用户: {user_id_hint}")
+        print(f"  - 窗口长度: {len(data_points)} (每个点5分钟)")
+        result = detector.predict(data_points, return_score=True, return_details=True)
+        print("  ▸ 是否异常:", "是" if result['is_anomaly'] else "否")
+        print(f"  ▸ 异常分数: {result.get('anomaly_score', 0.0):.4f} (阈值 {result.get('threshold', detector.threshold):.4f})")
+        if result.get('details'):
+            print("  ▸ 详情:", result['details'])
+    if args.from_raw:
+        print("📥 正在从原始数据抽样窗口...")
+        if not stage1_file.exists():
+            raise FileNotFoundError(f"找不到原始数据文件: {stage1_file}")
+        data_points, device_id = load_raw_data_window(stage1_file, feature_names, window_size=args.window_size)
+        run_prediction("raw_sample", data_points, device_id)
+    else:
+        case_names = args.case
+        if "all" in case_names:
+            case_names = list(PREDEFINED_CASES.keys())
+        for case_name in case_names:
+            if case_name not in PREDEFINED_CASES:
+                print(f"⚠️  跳过未知案例: {case_name}")
+                continue
+            data_points, user_id, desc = load_predefined_case(case_name)
+            print(f"\n🧾 使用预置案例: {case_name}")
+            print(f"  - 描述: {desc}")
+            run_prediction(case_name, data_points, user_id)
+if __name__ == "__main__":
+    main()

wearable_anomaly_detector.py ADDED Viewed

	@@ -0,0 +1,428 @@

+"""
+Wearable健康异常检测模型 - 标准化封装
+提供简单的API接口，用于实时异常检测
+"""
+import torch
+import numpy as np
+import json
+import pickle
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+from datetime import datetime
+import pandas as pd
+# 添加项目根目录到路径
+import sys
+sys.path.insert(0, str(Path(__file__).parent.parent))
+from models.phased_lstm_tft import PhasedLSTM_TFT, PhasedLSTM_TFT_WithEnhancedAnomalyDetection
+from feature_calculator import FeatureCalculator
+class WearableAnomalyDetector:
+    """
+    Wearable健康异常检测器
+    使用示例:
+        detector = WearableAnomalyDetector(model_dir="checkpoints/phase2/exp_factor_balanced")
+        result = detector.predict(data_points)
+    """
+    def __init__(
+        self,
+        model_dir: Union[str, Path],
+        device: Optional[str] = None,
+        threshold: Optional[float] = None
+    ):
+        """
+        初始化异常检测器
+        参数:
+            model_dir: 模型目录路径（包含best_model.pt和配置文件）
+            device: 设备（'cuda'或'cpu'），如果为None则自动选择
+            threshold: 异常阈值，如果为None则从配置中读取
+        """
+        self.model_dir = Path(model_dir)
+        self.device = torch.device(device or ('cuda' if torch.cuda.is_available() else 'cpu'))
+        # 加载配置
+        self.config = self._load_config()
+        # 确定阈值
+        if threshold is not None:
+            self.threshold = float(threshold)
+        else:
+            config_threshold = self.config.get('threshold')
+            if config_threshold is not None:
+                self.threshold = float(config_threshold)
+            else:
+                self.threshold = 0.53  # 默认阈值
+                print(f"  ⚠️  未找到阈值配置，使用默认值: {self.threshold:.4f}")
+        # 加载模型
+        self.model = self._load_model()
+        self.model.eval()
+        # 加载归一化参数（维持向后兼容）
+        self.norm_params = self._load_norm_params()
+        # 配置驱动特征计算
+        self.feature_calculator = FeatureCalculator(
+            config_path=self.config.get('feature_config_path'),
+            norm_params_path=Path(__file__).parent / 'processed_data' / 'stage3' / 'norm_params.json',
+            static_features_path=Path(__file__).parent / 'processed_data' / 'stage2' / 'static_features.csv',
+            storage_dir=Path(self.config.get('storage_dir', Path(__file__).parent / 'data_storage'))
+        )
+        self.features = self.feature_calculator.get_enabled_feature_names()
+        self.static_feature_names = [cfg["name"] for cfg in self.feature_calculator.static_feature_defs]
+        self.known_future_dim = max(len(self.feature_calculator.known_future_defs), 1)
+        self.factor_metadata = {
+            'enabled': self.feature_calculator.factor_enabled,
+            'factor_names': self.feature_calculator.factor_names,
+            'factor_dim': self.feature_calculator.factor_dim
+        }
+        print(f"✅ 模型加载成功")
+        print(f"  - 设备: {self.device}")
+        print(f"  - 阈值: {self.threshold:.4f}")
+        print(f"  - 特征数: {len(self.features)}")
+    def _load_config(self) -> Dict:
+        """加载模型配置"""
+        config_file = self.model_dir / 'config.json'
+        if config_file.exists():
+            with open(config_file, 'r') as f:
+                config = json.load(f)
+                return config
+        # 尝试从summary.json读取
+        summary_file = self.model_dir / 'summary.json'
+        if summary_file.exists():
+            with open(summary_file, 'r') as f:
+                summary = json.load(f)
+                config = {
+                    'threshold': summary.get('best_threshold'),
+                    'features': [],  # 需要从其他地方获取
+                }
+                return config
+        # 如果都没有，返回空配置（使用默认值）
+        print(f"  ⚠️  未找到配置文件，使用默认配置")
+        return {}
+    def _load_model(self):
+        """加载模型"""
+        # 加载Phase1模型
+        phase1_model_path = self.model_dir.parent.parent / 'phase1' / 'best_model.pt'
+        if not phase1_model_path.exists():
+            raise FileNotFoundError(f"Phase1模型不存在: {phase1_model_path}")
+        checkpoint_phase1 = torch.load(phase1_model_path, map_location=self.device, weights_only=False)
+        phase1_config = checkpoint_phase1['config']
+        base_model = PhasedLSTM_TFT(phase1_config)
+        base_model.load_state_dict(checkpoint_phase1['model_state_dict'])
+        base_model = base_model.to(self.device)
+        # 加载factor_config
+        factor_config = self._load_factor_config()
+        # 创建Phase2模型
+        model = PhasedLSTM_TFT_WithEnhancedAnomalyDetection(
+            base_model,
+            num_anomaly_types=4,
+            use_enhanced_head=True,
+            use_multi_source_heads=False,
+            use_domain_adversarial=False,
+            factor_config=factor_config
+        )
+        model = model.to(self.device)
+        # 加载Phase2权重
+        phase2_model_path = self.model_dir / 'best_model.pt'
+        if not phase2_model_path.exists():
+            raise FileNotFoundError(f"Phase2模型不存在: {phase2_model_path}")
+        checkpoint_phase2 = torch.load(phase2_model_path, map_location=self.device, weights_only=False)
+        model.load_state_dict(checkpoint_phase2['model_state_dict'])
+        return model
+    def _load_factor_config(self) -> Optional[Dict]:
+        """加载因子特征配置"""
+        # 方法1: 从config.json读取（如果已加载）
+        if hasattr(self, 'factor_metadata') and self.factor_metadata:
+            if self.factor_metadata.get('enabled'):
+                return {
+                    'num_factors': len(self.factor_metadata.get('factor_names', [])),
+                    'factor_dim': self.factor_metadata.get('factor_dim', 0),
+                    'factor_names': self.factor_metadata.get('factor_names', []),
+                    'min_weight': 0.2,
+                    'dropout': 0.1,
+                }
+        # 方法2: 从窗口信息文件读取
+        window_info_file = Path(__file__).parent / 'processed_data' / 'stage3' / 'window_info_multi_scale.json'
+        if window_info_file.exists():
+            with open(window_info_file, 'r') as f:
+                window_info = json.load(f)
+            factor_metadata = window_info.get('factor_features', {})
+            if factor_metadata and factor_metadata.get('enabled'):
+                return {
+                    'num_factors': len(factor_metadata.get('factor_names', [])),
+                    'factor_dim': factor_metadata.get('factor_dim', 0),
+                    'factor_names': factor_metadata.get('factor_names', []),
+                    'min_weight': 0.2,
+                    'dropout': 0.1,
+                }
+        return None
+    def _load_norm_params(self) -> Optional[Dict]:
+        """加载归一化参数"""
+        norm_file = Path(__file__).parent / 'processed_data' / 'stage3' / 'norm_params.json'
+        if norm_file.exists():
+            with open(norm_file, 'r') as f:
+                return json.load(f)
+        return None
+    def predict(
+        self,
+        data_points: List[Dict],
+        return_score: bool = True,
+        return_details: bool = False
+    ) -> Dict:
+        """
+        预测异常
+        参数:
+            data_points: 数据点列表，每个数据点是一个字典，包含：
+                - timestamp: 时间戳（datetime或字符串）
+                - features: 特征字典，包含所有需要的特征值
+                - static_features: 静态特征字典（可选）
+            return_score: 是否返回异常分数
+            return_details: 是否返回详细信息
+        返回:
+            {
+                'is_anomaly': bool,  # 是否异常
+                'anomaly_score': float,  # 异常分数（0-1）
+                'threshold': float,  # 使用的阈值
+                'details': dict (可选)  # 详细信息
+            }
+        """
+        user_id = data_points[0].get('deviceId') or data_points[0].get('user_id')
+        window = self.feature_calculator.build_window(data_points, user_id=user_id)
+        # 转换为模型输入格式
+        model_input = self._prepare_model_input(window)
+        # 模型预测
+        with torch.no_grad():
+            # 模型forward方法接受位置参数，需要按顺序传递
+            outputs = self.model(
+                model_input['x'],
+                model_input['delta_t'],
+                model_input['static_features'],
+                model_input['known_future_features'],
+                mask=model_input.get('mask'),
+                return_contrastive_features=model_input.get('return_contrastive_features', False),
+                source=None,
+                return_domain_features=False,
+                factor_features=model_input.get('factor_features')
+            )
+            anomaly_score = outputs['anomaly_score'].cpu().item()
+        # 判断是否异常
+        is_anomaly = anomaly_score >= self.threshold
+        result = {
+            'is_anomaly': bool(is_anomaly),
+            'threshold': float(self.threshold),
+        }
+        if return_score:
+            result['anomaly_score'] = float(anomaly_score)
+        if return_details:
+            result['details'] = {
+                'window_size': len(data_points),
+                'model_output': float(anomaly_score),
+                'prediction_confidence': abs(anomaly_score - self.threshold),
+            }
+        return result
+    def _prepare_model_input(self, window: Dict) -> Dict:
+        """准备模型输入"""
+        input_features_list = []
+        for feat in self.features:
+            values = window['input_features'].get(feat, [0.0] * 12)
+            input_features_list.append(values)
+        # 转换为tensor
+        input_features = torch.tensor(
+            np.stack(input_features_list, axis=1),
+            dtype=torch.float32
+        ).unsqueeze(0).to(self.device)  # [1, 12, num_features]
+        delta_t = torch.tensor(
+            window['input_delta_t'],
+            dtype=torch.float32
+        ).unsqueeze(-1).unsqueeze(0).to(self.device)  # [1, 12, 1]
+        # 静态特征
+        static_feature_values = []
+        static_keys = self.static_feature_names or sorted(window['static_features'].keys())
+        for key in static_keys:
+            value = window['static_features'].get(key, 0.0)
+            static_feature_values.append(float(value))
+        if len(static_feature_values) == 0:
+            static_feature_values = [0.0]
+        static_features = torch.tensor(
+            static_feature_values,
+            dtype=torch.float32
+        ).unsqueeze(0).to(self.device)  # [1, num_static]
+        # 已知未来特征
+        pred_len = len(window.get('target_timestamp', []))
+        if pred_len == 0:
+            pred_len = 6  # 默认预测长度
+        known_future = torch.zeros(1, pred_len, self.known_future_dim, dtype=torch.float32).to(self.device)
+        if 'known_future_features' in window:
+            kf = window['known_future_features']
+            for idx, cfg in enumerate(self.feature_calculator.known_future_defs):
+                name = cfg['name']
+                if name in kf:
+                    series = kf[name][:pred_len]
+                    if name == 'hour_of_day':
+                        values = torch.tensor([float(h) / 23.0 for h in series], dtype=torch.float32)
+                    elif name == 'day_of_week':
+                        values = torch.tensor([float(d) / 6.0 for d in series], dtype=torch.float32)
+                    else:
+                        values = torch.tensor([float(v) for v in series], dtype=torch.float32)
+                    known_future[0, :len(series), idx] = values
+        # 输入mask（假设所有数据都有效）
+        input_mask = torch.ones(1, 12, len(self.features), dtype=torch.float32).to(self.device)
+        # 因子特征
+        factor_features = None
+        if window.get('factor_features'):
+            factor_names = self.factor_metadata.get('factor_names', [])
+            factor_dim = self.factor_metadata.get('factor_dim', 4)
+            factor_vectors = []
+            for name in factor_names:
+                vec = window['factor_features'].get(name, [0.0] * factor_dim)
+                factor_vectors.append(vec[:factor_dim])
+            if factor_vectors:
+                factor_features = torch.tensor(
+                    factor_vectors,
+                    dtype=torch.float32
+                ).unsqueeze(0).to(self.device)  # [1, num_factors, factor_dim]
+        return {
+            'x': input_features,
+            'delta_t': delta_t,
+            'static_features': static_features,
+            'known_future_features': known_future,
+            'mask': input_mask,
+            'factor_features': factor_features,
+            'return_contrastive_features': False,
+            'source': None,
+            'return_domain_features': False,
+        }
+    def batch_predict(
+        self,
+        windows: List[List[Dict]],
+        return_scores: bool = True
+    ) -> List[Dict]:
+        """
+        批量预测
+        参数:
+            windows: 窗口列表，每个窗口是一个数据点列表
+            return_scores: 是否返回异常分数
+        返回:
+            预测结果列表
+        """
+        results = []
+        for window_data in windows:
+            result = self.predict(window_data, return_score=return_scores)
+            results.append(result)
+        return results
+    def update_threshold(self, threshold: float):
+        """更新异常阈值"""
+        self.threshold = threshold
+        print(f"✅ 阈值已更新为: {threshold:.4f}")
+def load_detector(model_dir: Union[str, Path], **kwargs) -> WearableAnomalyDetector:
+    """
+    便捷函数：加载异常检测器
+    参数:
+        model_dir: 模型目录路径
+        **kwargs: 其他参数（device, threshold等）
+    返回:
+        WearableAnomalyDetector实例
+    """
+    return WearableAnomalyDetector(model_dir, **kwargs)
+if __name__ == '__main__':
+    # 使用示例
+    print("=" * 80)
+    print("Wearable健康异常检测器 - 使用示例")
+    print("=" * 80)
+    # 加载模型
+    model_dir = Path(__file__).parent / 'checkpoints' / 'phase2' / 'exp_factor_balanced'
+    detector = load_detector(model_dir)
+    # 模拟数据点（实际使用时应该从实时数据流获取）
+    print("\n模拟数据点...")
+    data_points = []
+    base_time = datetime.now()
+    # 使用一个真实的deviceId（如果静态特征表存在）
+    # 或者提供一个完整的静态特征示例
+    example_device_id = None
+    static_dict = detector.feature_calculator.static_features_dict
+    if static_dict:
+        example_device_id = list(static_dict.keys())[0]
+        print(f"  使用示例用户ID: {example_device_id}")
+    for i in range(12):
+        data_point = {
+            'timestamp': base_time.replace(minute=i*5),
+            'deviceId': example_device_id,  # 提供deviceId以便加载完整静态特征
+            'features': {
+                'hr': 70.0 + np.random.randn() * 5,
+                'hrv_rmssd': 30.0 + np.random.randn() * 3,
+                # ... 其他特征（简化示例，实际需要所有36个特征）
+            },
+            'static_features': {
+                # 可以只提供部分特征，系统会自动从静态特征表补充
+                # 或者不提供，完全从静态特征表加载
+            }
+        }
+        data_points.append(data_point)
+    # 预测
+    result = detector.predict(data_points, return_score=True, return_details=True)
+    print(f"\n预测结果:")
+    print(f"  - 是否异常: {result['is_anomaly']}")
+    print(f"  - 异常分数: {result['anomaly_score']:.4f}")
+    print(f"  - 阈值: {result['threshold']:.4f}")
+    if 'details' in result:
+        print(f"  - 详细信息: {result['details']}")