master f2c35471ab66 cached
51 files
67.9 MB
109.8k tokens
267 symbols
1 requests
Download .txt
Showing preview only (426K chars total). Download the full file or copy to clipboard to get everything.
Repository: shartoo/luna16_multi_size_3dcnn
Branch: master
Commit: f2c35471ab66
Files: 51
Total size: 67.9 MB

Directory structure:
gitextract_77wp8_t7/

├── README.md
├── data/
│   ├── __init__.py
│   ├── dataclass/
│   │   ├── CTData.py
│   │   ├── NoduleCube.py
│   │   └── __init__.py
│   └── preprocessing/
│       ├── __init__.py
│       ├── lidc_process/
│       │   ├── README.md
│       │   ├── __init__.py
│       │   ├── lidc_annotation_process.py
│       │   └── lidc_coordinate_process.py
│       ├── luna16_invalid_nodule_filter.py
│       └── luna16_prepare_cube_data.py
├── deploy/
│   ├── README.md
│   ├── backend/
│   │   ├── app.py
│   │   ├── data/
│   │   │   └── c3d_nodule_detect.pth
│   │   ├── dataclass/
│   │   │   ├── CTData.py
│   │   │   ├── NoduleCube.py
│   │   │   └── __init__.py
│   │   ├── detector.py
│   │   ├── models/
│   │   │   ├── pytorch_c3d_tiny.py
│   │   │   └── pytorch_nodule_detector.py
│   │   ├── preprocessing/
│   │   │   ├── __init__.py
│   │   │   └── luna16_invalid_nodule_filter.py
│   │   ├── util/
│   │   │   ├── __init__.py
│   │   │   ├── dicom_util.py
│   │   │   ├── image_util.py
│   │   │   ├── mhd_util.py
│   │   │   └── seg_util.py
│   │   └── utils.py
│   ├── frontend/
│   │   ├── css/
│   │   │   └── style.css
│   │   ├── index.html
│   │   └── js/
│   │       └── main.js
│   └── run.py
├── inference/
│   ├── __init__.py
│   ├── c3d_classify_result-1.3.6.1.4.1.14519.5.2.1.6279.6001.149041668385192796520281592139.csv
│   ├── classifier.py
│   ├── detector.py
│   ├── negative_sample_selection.py
│   └── pytorch_nodule_detector.py
├── models/
│   ├── __init__.py
│   └── pytorch_c3d_tiny.py
├── training/
│   ├── __init__.py
│   ├── pytorch_logs/
│   │   └── training_20250331_223230.log
│   └── train_c3d_pytorch.py
├── training_metrics.txt
└── util/
    ├── __init__.py
    ├── dicom_util.py
    ├── image_util.py
    ├── mhd_util.py
    ├── progress_watch.py
    └── seg_util.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# 3D肺结节检测系统

## 0. 项目结构

```
├── data/               # 数据处理相关
│   ├── dataclass/      # 数据类定义,包含NoduleCube结节立方体类
│   └── preprocessing/  # 数据预处理代码,包括LUNA16数据处理
├── models/             # 模型定义
│   └── pytorch_c3d_tiny.py  # Tiny-C3D模型定义
├── training/           # 模型训练相关
│   ├── pytorch_logs/   # 训练日志
│   ├── pytorch_checkpoints/ # 模型检查点
│   └── train_c3d_pytorch.py # 训练脚本
├── inference/          # 模型推理相关
│   ├── classifier.py   # 分类器实现
│   ├── detector.py     # 检测器实现
│   └── pytorch_nodule_detector.py # PyTorch实现的结节检测器
├── deploy/             # 部署相关
│   ├── backend/        # 后端代码
│   ├── frontend/       # 前端代码
│   └── run.py          # 启动脚本
└── util/               # 工具函数
```

## 1. Tiny-C3D模型架构设计

Tiny-C3D是一个轻量级的3D卷积神经网络,专为肺结节分类设计。模型采用了C3D架构的简化版本,保留核心功能的同时大幅减少参数量,使其适合在资源受限环境中运行。
原始的C3D模型是用于视频分类的 [Learning Spatiotemporal Features with 3D Convolutional Networks](https://arxiv.org/abs/1412.0767)

### 模型结构

模型包含4个3D卷积块,每个卷积块由以下组件构成:
- 3D卷积层
- 批归一化层
- ReLU激活函数
- 最大池化层
- Dropout层(防止过拟合)

输入数据为32×32×32的体素立方体,模型结构如下

```commandline
C3dTiny(
  (conv_block1): Sequential(
    (0): Conv3d(1, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool3d(kernel_size=(1, 2, 2), stride=(1, 2, 2), padding=0, dilation=1, ceil_mode=False)
  )
  (conv_block2): Sequential(
    (0): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (drop_out1): Dropout(p=0.2, inplace=False)
  (conv_block3): Sequential(
    (0): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (4): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (drop_out2): Dropout(p=0.2, inplace=False)
  (conv_block4): Sequential(
    (0): Conv3d(256, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
    (4): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (drop_out3): Dropout(p=0.2, inplace=False)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (fc1): Sequential(
    (0): Linear(in_features=8192, out_features=512, bias=True)
    (1): ReLU()
  )
  (fc2): Linear(in_features=512, out_features=2, bias=True)
)
```


## 2. 数据准备与预处理

### 2.1 数据来源

本项目使用[LUNA16(Lung Nodule Analysis 2016)](https://luna16.grand-challenge.org/)数据集,该数据集包含肺CT扫描图像和结节位置标注。

### 2.2 标注数据处理

- 恶性结节坐标标注数据使用的是 `annotations.csv`.最大结节半径为16,所以当前抽取的cube边长为32
- 良性结节坐标标注数据使用的是 `candidates_V2.csv`。注意,这部分标注数据可能是随机生成的,需要做筛选处理,当前代码中`luna16_prepare_cube_data.py`执行 `get_real_candidate`将标注数据筛选后重新保存。


### 2.3 cube预处理流程

1. **结节立方体提取**:
   - 从原始CT图像中提取以结节为中心的32×32×32立方体
   - 根据标注信息区分良性与恶性结节

2. **数据归一化**:
   - 通过`normal_cube_to_tensor`函数将数据归一化到[0,1]范围
   - 修复无效值,如NaN和Inf

3. **数据增强(只对良性结节)**:
   - 随机旋转:在三个轴向上随机旋转±20度
   - 随机翻转:沿指定轴随机翻转
   - 高斯噪声:添加低强度高斯噪声增强模型鲁棒性

4. **数据平衡**:
   - 采样相等数量的正负样本,避免类别不平衡
   - 负样本采样量为正样本的两倍,提高模型对负样本的敏感度。此问题没有完全解决,最终模型仍然存在大量误判,需要对结果根据坐标进一步筛选

### 2.4 数据类设计

#### 2.4.1 CTData

这个类主要负责 将从DICOM或MHD格式的原始CT数据,经过HU转换,缩放到统一的0-1之间的标准像素数据。

#### 2.4.2 NoduleCube

主要从直接从0-1之间的标准像素数据的多维数组中抽取 指定边长的Cube数据,保存和加载npy,保存和加载png图像数据等。

## 3. 模型训练与参数设置

### 3.1 训练参数

- **批量大小**: 64。 这个参考自己的GPU显存设置,此为 nvidia 4070 显卡参数
- **学习率**: 5e-4
- **权重衰减**: 1e-5
- **优化器**: Adam
- **损失函数**: 交叉熵损失
- **训练轮次**: 15
- **学习率调度**: ReduceLROnPlateau
- **梯度裁剪**: 1.0

### 3.2 训练流程

训练代码为 `training/train_c3d_pytorch.py`, 设置自己的数据目录

1. 加载预处理后的正负样本
2. 按8:2比例分割训练集和验证集
3. 数据增强提高模型泛化能力
4. 每轮训练后在验证集上评估
5. 保存最佳验证准确率模型

### 训练结果

训练日志如下

```commandline
2025-03-31 22:33:51,563 - c3d_training - INFO - Epoch [1/15], Train Loss: 0.3146, Train Acc: 89.05%, Val Loss: 0.1623, Val Acc: 93.28%, Time: 80.81s
2025-03-31 22:35:12,255 - c3d_training - INFO - Epoch [2/15], Train Loss: 0.0834, Train Acc: 97.17%, Val Loss: 0.0728, Val Acc: 97.33%, Time: 80.39s
2025-03-31 22:36:33,755 - c3d_training - INFO - Epoch [3/15], Train Loss: 0.0504, Train Acc: 98.30%, Val Loss: 0.0383, Val Acc: 98.61%, Time: 81.20s
2025-03-31 22:37:55,955 - c3d_training - INFO - Epoch [4/15], Train Loss: 0.0368, Train Acc: 98.80%, Val Loss: 0.0522, Val Acc: 98.22%, Time: 81.89s
2025-03-31 22:39:16,050 - c3d_training - INFO - Epoch [5/15], Train Loss: 0.0282, Train Acc: 99.04%, Val Loss: 0.0322, Val Acc: 98.98%, Time: 79.84s
2025-03-31 22:40:37,157 - c3d_training - INFO - Epoch [6/15], Train Loss: 0.0244, Train Acc: 99.13%, Val Loss: 0.0393, Val Acc: 98.83%, Time: 80.80s
2025-03-31 22:41:59,029 - c3d_training - INFO - Epoch [7/15], Train Loss: 0.0204, Train Acc: 99.40%, Val Loss: 0.0383, Val Acc: 98.95%, Time: 81.64s
2025-03-31 22:43:20,016 - c3d_training - INFO - Epoch [8/15], Train Loss: 0.0219, Train Acc: 99.31%, Val Loss: 0.0578, Val Acc: 98.29%, Time: 80.75s
2025-03-31 22:44:41,369 - c3d_training - INFO - Epoch [9/15], Train Loss: 0.0171, Train Acc: 99.52%, Val Loss: 0.0342, Val Acc: 99.17%, Time: 81.13s
2025-03-31 22:46:02,745 - c3d_training - INFO - Epoch [10/15], Train Loss: 0.0173, Train Acc: 99.49%, Val Loss: 0.0279, Val Acc: 99.15%, Time: 81.04s
2025-03-31 22:47:24,128 - c3d_training - INFO - Epoch [11/15], Train Loss: 0.0176, Train Acc: 99.47%, Val Loss: 0.0349, Val Acc: 99.17%, Time: 81.15s
2025-03-31 22:48:44,717 - c3d_training - INFO - Epoch [12/15], Train Loss: 0.0140, Train Acc: 99.53%, Val Loss: 0.0362, Val Acc: 99.27%, Time: 80.36s
2025-03-31 22:50:07,041 - c3d_training - INFO - Epoch [13/15], Train Loss: 0.0126, Train Acc: 99.62%, Val Loss: 0.0497, Val Acc: 98.87%, Time: 82.02s
2025-03-31 22:51:27,896 - c3d_training - INFO - Epoch [14/15], Train Loss: 0.0152, Train Acc: 99.49%, Val Loss: 0.0271, Val Acc: 99.32%, Time: 80.61s
2025-03-31 22:52:49,409 - c3d_training - INFO - Epoch [15/15], Train Loss: 0.0127, Train Acc: 99.58%, Val Loss: 0.0277, Val Acc: 99.15%, Time: 81.20s

```

损失函数和准确率随着epoch的示意图如下

![metric](./training/pytorch_checkpoints/training_metrics.png)

## 4. 模型推理

推理系统设计为两阶段流程:

### 1. 结节检测

- 使用滑动窗口技术扫描完整CT体积
- 检测潜在结节位置
- 非极大值抑制合并重叠检测结果

### 2. 结节分类

- 对检测到的候选区域提取特征
- 应用训练好的Tiny-C3D模型
- 输出结节概率和恶性度评分

### 推理优化

- 批处理推理提高处理效率
- 基于阈值过滤低置信度检测
- 多线程实现并行处理

## 5. 模型部署与效果

### 5.1 部署架构

所有的部署代码在 `deploy` 目录下,可以单独抽出来部署。注意,上面训练完成的模型,需要替换到 `deploy/backend/data` 目录下,当前代码使用的是 `c3d_nodule_detect.pth`。是前面训练过程中保留的最好的模型

系统采用前后端分离架构:

- **后端**: Flask RESTful API服务
- **前端**: 基于HTML5和WebGL的交互式3D可视化界面

### 5.2 用户界面

- 支持多种CT数据格式上传。当前测试了MHD(上传时要将同一个病例的.raw和.mhd文件压缩为一个压缩文件再上传),没有测试DICOM
- 结节位置标注和概率显示
- 交互式浏览不同结节视图

### 5.3 部署效果

系统在本地环境中运行流畅,能够在数3分钟左右内完成单个CT扫描,推理一个CUBE仅需要0.2秒,主要是CUBE扫描比较耗时。结节检测和分类准确率达到临床应用参考水平,可为放射科医生提供辅助诊断支持。

![部署效果](./deploy.png)

================================================
FILE: data/__init__.py
================================================


================================================
FILE: data/dataclass/CTData.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import SimpleITK as sitk
from scipy import ndimage
from enum import Enum
import matplotlib.pyplot as plt
from util.dicom_util import load_dicom_slices, get_pixels_hu, get_dicom_thickness
from util.seg_util import get_segmented_lungs, normalize_hu_values


class CTFormat(Enum):
    DICOM = 1
    MHD = 2
    UNKNOWN = 3

class CTData:
    """
    统一的CT数据类,用于处理不同格式的CT图像数据
    支持DICOM和MHD格式的加载、处理和分析
    """
    def __init__(self):
        # 基本属性
        self.pixel_data = None  # 像素数据,3D体素数组 (z,y,x)
        self.lung_seg_img = None    # 单独抽取肺部CT图像数据
        self.lung_seg_mask = None   # 肺部CT的掩码
        self.origin = None  # 坐标原点 (x,y,z),单位为mm
        self.spacing = None  # 体素间距 (x,y,z),单位为mm
        self.orientation = None  # 方向矩阵
        self.z_axis_flip = False    # z 轴是否是翻转的
        self.size = None  # 图像尺寸 (z,y,x)
        self.data_format = None  # 数据格式(DICOM/MHD)
        self.metadata = {}  # 其他元数据信息
        self.hu_converted = False  # 是否已转换为HU值
        self.preprocessed = False   # 数据是否已经处理过

    @classmethod
    def from_dicom(cls, dicom_path):
        """
        从DICOM文件夹加载CT数据

        Args:
            dicom_path: DICOM文件夹路径

        Returns:
            CTData对象
        """
        ct_data = cls()
        ct_data.data_format = CTFormat.DICOM
        slices = load_dicom_slices(dicom_path)
        ct_data.pixel_data = get_pixels_hu(slices)
        ct_data.z_axis_flip = slices[1].ImagePositionPatient[2] > slices[0].ImagePositionPatient[2]
        ct_data.hu_converted = True
        slice_thickness = get_dicom_thickness(slices)
        # 设置像素间距
        try:
            ct_data.spacing = [
                float(slices[0].PixelSpacing[0]),
                float(slices[0].PixelSpacing[1]),
                float(slice_thickness)
            ]
        except:
            print("警告: 无法获取像素间距,使用默认值[1.0, 1.0, 1.0]")
            ct_data.spacing = [1.0, 1.0, 1.0]
        # 设置原点
        try:
            ct_data.origin = [
                float(slices[0].ImagePositionPatient[0]),
                float(slices[0].ImagePositionPatient[1]),
                float(slices[0].ImagePositionPatient[2])
            ]
        except:
            print("警告: 无法获取坐标原点,使用默认值[0.0, 0.0, 0.0]")
            ct_data.origin = [0.0, 0.0, 0.0]
        # 设置尺寸
        ct_data.size = ct_data.pixel_data.shape
        return ct_data

    @classmethod
    def from_mhd(cls, mhd_path):
        """
        从MHD/RAW文件加载CT数据

        Args:
            mhd_path: MHD文件路径

        Returns:
            CTData对象
        """
        ct_data = cls()
        ct_data.data_format = CTFormat.MHD
        try:
            # 使用SimpleITK加载MHD文件
            itk_img = sitk.ReadImage(mhd_path)
            # 获取像素数据 (注意SimpleITK返回的数组顺序为z,y,x)
            ct_data.pixel_data = sitk.GetArrayFromImage(itk_img)
            # LUNA16的MHD数据已经是HU值
            ct_data.hu_converted = True
            # 获取原点和体素间距
            ct_data.origin = list(itk_img.GetOrigin())  # (x,y,z)
            ct_data.spacing = list(itk_img.GetSpacing())  # (x,y,z)
            # 获取尺寸
            ct_data.size = ct_data.pixel_data.shape
            # 提取方向信息
            ct_data.orientation = itk_img.GetDirection()
            ct_data.z_axis_flip = False
        except Exception as e:
            raise ValueError(f"加载MHD文件时出错: {e}")

        return ct_data

    def convert_to_hu(self):
        """
        将像素值转换为HU值(如果尚未转换)
        """
        if self.hu_converted:
            print("数据已经是HU值格式")
            return

        if self.data_format == CTFormat.DICOM:
            # 已在from_dicom中处理
            self.hu_converted = True
        elif self.data_format == CTFormat.MHD:
            # LUNA16的MHD数据已经是HU值
            self.hu_converted = True
        else:
            raise ValueError("未知数据格式,无法转换为HU值")


    def resample_pixel(self, new_spacing=[1, 1, 1]):
        """
        将CT体素重采样为指定间距

        Args:
            new_spacing: 目标体素间距 [x, y, z]

        Returns:
            重采样后的CTData对象
        """
        # 确保数据已转换为HU值
        if not self.hu_converted:
            self.convert_to_hu()
        # 为了符合scipy.ndimage的要求,将spacing和pixel_data的顺序调整为[z,y,x]
        spacing_zyx = [self.spacing[2], self.spacing[1], self.spacing[0]]
        new_spacing_zyx = [new_spacing[2], new_spacing[1], new_spacing[0]]
        # 计算新尺寸
        resize_factor = np.array(spacing_zyx) / np.array(new_spacing_zyx)
        new_shape = np.round(np.array(self.pixel_data.shape) * resize_factor)
        # 计算实际重采样因子
        real_resize = new_shape / np.array(self.pixel_data.shape)
        # 执行重采样 - 使用三线性插值
        resampled_data = ndimage.zoom(self.pixel_data, real_resize, order=1)
        # 创建新的CTData对象
        resampled_ct = CTData()
        resampled_ct.pixel_data = resampled_data
        resampled_ct.spacing = new_spacing
        resampled_ct.origin = self.origin
        resampled_ct.orientation = self.orientation
        resampled_ct.size = resampled_data.shape
        resampled_ct.data_format = self.data_format
        resampled_ct.hu_converted = self.hu_converted
        resampled_ct.preprocessed = self.preprocessed
        return resampled_ct

    def filter_lung_img_mask(self):
        """
            只保留肺部区域像素,并且归一化到 0-1 之间
        :return:
        """
        pixel_data = self.pixel_data.copy()
        seg_img = []
        seg_mask = []
        for index in range(pixel_data.shape[0]):
            one_seg_img ,one_seg_mask = get_segmented_lungs(pixel_data[index])
            one_seg_img = normalize_hu_values(one_seg_img)
            seg_img.append(one_seg_img)
            seg_mask.append(one_seg_mask)
        self.lung_seg_img = np.array(seg_img)
        self.lung_seg_mask = np.array(seg_mask)
    def world_to_voxel(self, world_coord):
        """
        将世界坐标(mm)转换为体素坐标

        Args:
            world_coord: 世界坐标 [x,y,z] (mm)

        Returns:
            体素坐标 [x,y,z]
        """
        voxel_coord = np.zeros(3, dtype=int)
        for i in range(3):
            voxel_coord[i] = int(round((world_coord[i] - self.origin[i]) / self.spacing[i]))

        return voxel_coord

    def voxel_to_world(self, voxel_coord):
        """
        将体素坐标转换为世界坐标(mm)

        Args:
            voxel_coord: 体素坐标 [x,y,z]

        Returns:
            世界坐标 [x,y,z] (mm)
        """
        world_coord = np.zeros(3, dtype=float)
        for i in range(3):
            world_coord[i] = voxel_coord[i] * self.spacing[i] + self.origin[i]

        return world_coord

    def extract_cube(self, center_world_mm, size_mm,if_fixed_radius = False):
        """
        提取指定中心点和大小的立方体区域

        Args:
            center_world_mm:    立方体中心的世界坐标 [x,y,z] (mm)
            size_mm:            立方体在世界坐标系的大小(mm),可以是数值或[x,y,z]形式
            if_fixed_radius:    是否为固定半径。默认是False(即不不是固定的,就说明每个结节半径都不一样,按照标注文件半径抽取)

        Returns:
            立方体像素数据
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")
        if self.lung_seg_img is None:
            print("肺部区域数据没有分割,现在开始分割..")
            self.filter_lung_img_mask()
        # 将世界坐标转换为体素坐标(注意:SimpleITK数组顺序为z,y,x)
        center_voxel = self.world_to_voxel(center_world_mm)
        # 交换坐标顺序为z,y,x以匹配pixel_data
        center_voxel_zyx = [center_voxel[2], center_voxel[1], center_voxel[0]]
        # 如果使用固定半径,那么只需要中心坐标即可,此时size_mm 就是像素半径了,直接从 lung_seg_img 按照像素半径抽取即可
        if if_fixed_radius:
            half_size = [int(size_mm/2), int(size_mm/2), int(size_mm/2)]
        else:
            # 计算立方体边长(体素数) [luna2016 的标注数据中每个结节半径不同,按照标注抽取的结节大小不一,最好使用固定半径]
            size_voxel = [int(size_mm / self.spacing[2]),
                          int(size_mm / self.spacing[1]),
                          int(size_mm / self.spacing[0])]
            # 计算立方体边界
            half_size = [s // 2 for s in size_voxel]
        # 提取立方体数据
        z_min = max(0, center_voxel_zyx[0] - half_size[0])
        y_min = max(0, center_voxel_zyx[1] - half_size[1])
        x_min = max(0, center_voxel_zyx[2] - half_size[2])

        z_max = min(self.lung_seg_img.shape[0], center_voxel_zyx[0] + half_size[0])
        y_max = min(self.lung_seg_img.shape[1], center_voxel_zyx[1] + half_size[1])
        x_max = min(self.lung_seg_img.shape[2], center_voxel_zyx[2] + half_size[2])
        # 提取子体积
        cube = self.lung_seg_img[z_min:z_max, y_min:y_max, x_min:x_max]
        return cube

    def visualize_slice(self, slice_idx=None, axis=0, show_lung_only=False):
        """
            可视化单个切片
        Args:
            slice_idx:          切片索引,如果为None则取中心切片
            axis:               沿哪个轴切片 (0=z, 1=y, 2=x)
            show_lung_only:     是否只显示肺部,其他区域都作为背景黑色
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")
        # 确定切片索引
        if slice_idx is None:
            slice_idx = self.pixel_data.shape[axis] // 2
        # 提取切片数据
        if show_lung_only:
            if axis == 0:  # z轴
                slice_data = self.lung_seg_img[slice_idx, :, :]
            elif axis == 1:  # y轴
                slice_data = self.lung_seg_img[:, slice_idx, :]
            else :  # x轴
                slice_data = self.lung_seg_img[:, :, slice_idx]
        else:
            if axis == 0:  # z轴
                slice_data = self.pixel_data[slice_idx, :, :]
            elif axis == 1:  # y轴
                slice_data = self.pixel_data[:, slice_idx, :]
            else:  # x轴
                slice_data = self.pixel_data[:, :, slice_idx]
        # 创建图像
        plt.figure(figsize=(10, 8))
        # 仅显示图像
        plt.imshow(slice_data, cmap='gray')
        # 设置标题
        axis_name = ['z', 'y', 'x'][axis]
        title = f"切片 {slice_idx} (沿{axis_name}轴)"
        plt.title(title)
        plt.colorbar(label='像素值')
        plt.axis('off')
        plt.tight_layout()
        plt.show()

    def visualize_nodule(self, coord_x,coord_y, coord_z, diameter):
        """
             结节可视化
        :param coord_x:
        :param coord_y:
        :param coord_z:
        :param diameter:
        :return:
        """
        # 提取结节立方体
        cube_size = max(32, int(diameter * 1.5))  # 确保立方体足够大
        cube = self.extract_cube([coord_x, coord_y, coord_z], cube_size)
        # 转换为体素坐标
        voxel_coord = self.world_to_voxel([coord_x, coord_y, coord_z])
        # 显示三个正交面
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        # 提取中心切片
        center_z = cube.shape[0] // 2
        center_y = cube.shape[1] // 2
        center_x = cube.shape[2] // 2
        # 绘制三个正交面
        axes[0].imshow(cube[center_z, :, :], cmap='gray')
        axes[0].set_title(f'轴向视图 (z={center_z})')
        axes[0].axis('off')
        axes[1].imshow(cube[:, center_y, :], cmap='gray')
        axes[1].set_title(f'冠状位视图 (y={center_y})')
        axes[1].axis('off')
        axes[2].imshow(cube[:, :, center_x], cmap='gray')
        axes[2].set_title(f'矢状位视图 (x={center_x})')
        axes[2].axis('off')
        fig.suptitle(f"结节- 位置: ({coord_x:.1f}, {coord_y:.1f}, {coord_z:.1f})mm, " +
                     f"直径: {diameter:.1f}mm,", fontsize=14)
        plt.tight_layout()
        plt.show()

    def save_as_nifti(self, output_path):
        """
        将CT数据保存为NIfTI格式

        Args:
            output_path: 输出文件路径
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")

        # 创建SimpleITK图像
        # 注意:SimpleITK的数组顺序为z,y,x
        img = sitk.GetImageFromArray(self.pixel_data)
        img.SetOrigin(self.origin)
        img.SetSpacing(self.spacing)

        if self.orientation is not None:
            img.SetDirection(self.orientation)
        # 保存为NIfTI格式
        sitk.WriteImage(img, output_path)
        print(f"已保存为NIfTI格式: {output_path}")



================================================
FILE: data/dataclass/NoduleCube.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os,torch
import numpy as np
import cv2
from typing import  Optional
from dataclasses import dataclass
import matplotlib.pyplot as plt
from scipy import ndimage

def normal_cube_to_tensor(cube_data):
    """
        将cube 数据归一化并转换为 pytorch tensor 。 用在训练和推理过程
    :param cube_data: shape为 [32,32,32] 的 ndarray
    :return:
    """
    cube_data = cube_data.astype(np.float32)
    # 归一化到 [0, 1] 范围
    min_val = np.min(cube_data)
    max_val = np.max(cube_data)
    data_range = max_val - min_val
    # 避免除以零
    if data_range < 1e-10:
        normalized_cube = np.zeros_like(cube_data)
    else:
        normalized_cube = (cube_data - min_val) / data_range
    # 检查是否有无效值并修复
    if np.isnan(normalized_cube).any() or np.isinf(normalized_cube).any():
        normalized_cube = np.nan_to_num(normalized_cube, nan=0.0, posinf=1.0, neginf=0.0)
    # 转换为PyTorch张量并添加批次和通道维度
    cube_tensor = torch.from_numpy(normalized_cube).float().unsqueeze(0)  # (1, 1, 32, 32, 32)
    return cube_tensor


@dataclass
class NoduleCube:
    """
    肺结节立方体类,表示肺结节区域的3D立方体数据
    与CT数据无关,仅处理已提取的立方体数据
    """
    # 基本属性
    cube_size: int = 64  # 立方体大小(默认64x64x64)
    pixel_data: Optional[np.ndarray] = None  # 像素数据 shape: [cube_size, cube_size, cube_size]
    
    # 结节特征
    center_x: int = 0  # 结节中心x坐标
    center_y: int = 0  # 结节中心y坐标
    center_z: int = 0  # 结节中心z坐标
    radius: float = 0.0  # 结节半径
    malignancy: int = 0  # 恶性度 (0 为良性 / 1 为恶性)
    
    # 文件路径
    npy_path: str = ""  # npy文件路径
    png_path: str = ""  # png文件路径

    def __post_init__(self):
        """初始化后调用"""
        # 如果提供了npy_path但没有pixel_data,尝试加载
        if self.npy_path and self.pixel_data is None:
            self.load_from_npy()
        # 如果提供了png_path但没有pixel_data,尝试加载
        elif self.png_path and self.pixel_data is None:
            self.load_from_png()

    def load_from_npy(self) -> None:
        """从NPY文件加载立方体数据"""
        if not os.path.exists(self.npy_path):
            raise FileNotFoundError(f"文件不存在: {self.npy_path}")
            
        try:
            self.pixel_data = np.load(self.npy_path)
            # 验证尺寸
            if len(self.pixel_data.shape) != 3:
                raise ValueError(f"像素数据必须是3D数组,当前形状: {self.pixel_data.shape}")
            
            # 如果尺寸不匹配,调整大小
            if (self.pixel_data.shape[0] != self.cube_size or 
                self.pixel_data.shape[1] != self.cube_size or 
                self.pixel_data.shape[2] != self.cube_size):
                self.resize(self.cube_size)
                
        except Exception as e:
            raise ValueError(f"加载NPY文件时出错: {e}")

    def save_to_npy(self, output_path: str) -> str:
        """
        将立方体数据保存为NPY文件
        
        Args:
            output_path: 输出路径
            
        Returns:
            保存的文件路径
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可保存")
            
        # 确保目录存在
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        np.save(output_path, self.pixel_data)
        self.npy_path = output_path
        return output_path
    def save_to_png(self, output_path: str) -> str:
        """
        将立方体数据保存为PNG图像(8x8网格布局)
        
        Args:
            output_path: 输出PNG文件路径
            
        Returns:
            保存的文件路径
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可保存")
            
        # 确保目录存在
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        
        # 计算每个切片在最终图像中的位置(8行8列布局)
        rows, cols = 8, 8
        if self.cube_size != 64:
            # 如果不是64x64x64,计算合适的行列数,保持接近正方形
            total_slices = self.cube_size
            rows = int(np.sqrt(total_slices))
            while total_slices % rows != 0:
                rows -= 1
            cols = total_slices // rows
        
        # 创建拼接图像
        img_height = self.cube_size
        img_width = self.cube_size
        combined_img = np.zeros((rows * img_height, cols * img_width), dtype=np.uint8)
        
        # 填充拼接图像
        for i in range(self.cube_size):
            row = i // cols
            col = i % cols
            
            slice_data = self.pixel_data[i]
            
            # 确保数据在0-255范围内
            if slice_data.max() <= 1.0:
                slice_data = (slice_data * 255).astype(np.uint8)
            else:
                slice_data = slice_data.astype(np.uint8)
            
            # 将切片放入拼接图像
            y_start = row * img_height
            x_start = col * img_width
            combined_img[y_start:y_start + img_height, x_start:x_start + img_width] = slice_data
        
        # 保存拼接图像
        cv2.imwrite(output_path, combined_img)
        self.png_path = output_path
        return output_path

    def load_from_png(self) -> None:
        """从PNG图像加载立方体数据(8x8网格布局)"""
        if not os.path.exists(self.png_path):
            raise FileNotFoundError(f"文件不存在: {self.png_path}")
            
        try:
            # 读取PNG图像
            img = cv2.imread(self.png_path, cv2.IMREAD_GRAYSCALE)
            
            # 确定行列数
            rows, cols = 8, 8
            if self.cube_size != 64:
                # 如果不是64x64x64,计算合适的行列数
                total_slices = self.cube_size
                rows = int(np.sqrt(total_slices))
                while total_slices % rows != 0:
                    rows -= 1
                cols = total_slices // rows
            
            # 确认图像尺寸正确
            expected_height = rows * self.cube_size
            expected_width = cols * self.cube_size
            if img.shape[0] != expected_height or img.shape[1] != expected_width:
                raise ValueError(f"图像尺寸不匹配: 期望{expected_height}x{expected_width}, 实际{img.shape[0]}x{img.shape[1]}")
            
            # 创建3D数组
            cube_data = np.zeros((self.cube_size, self.cube_size, self.cube_size), dtype=np.float32)
            
            # 从PNG图像提取每个切片
            for i in range(self.cube_size):
                row = i // cols
                col = i % cols
                
                y_start = row * self.cube_size
                x_start = col * self.cube_size
                
                slice_data = img[y_start:y_start + self.cube_size, x_start:x_start + self.cube_size]
                cube_data[i] = slice_data.astype(np.float32) / 255.0  # 归一化到[0,1]范围
            
            self.pixel_data = cube_data
            
        except Exception as e:
            raise ValueError(f"加载PNG文件时出错: {e}")

    def set_cube_data(self, pixel_data: np.ndarray) -> None:
        """
        设置立方体像素数据
        
        Args:
            pixel_data: 3D像素数据
        """
        if len(pixel_data.shape) != 3:
            raise ValueError(f"像素数据必须是3D数组,当前形状: {pixel_data.shape}")
        
        self.pixel_data = pixel_data
        
        # 如果尺寸不匹配,调整大小
        if (self.pixel_data.shape[0] != self.cube_size or 
            self.pixel_data.shape[1] != self.cube_size or 
            self.pixel_data.shape[2] != self.cube_size):
            self.resize(self.cube_size)

    def resize(self, new_size: int) -> None:
        """
        调整立方体尺寸
        
        Args:
            new_size: 新的立方体尺寸
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可调整大小")
        
        # 计算缩放因子
        zoom_factors = [new_size / self.pixel_data.shape[0],
                         new_size / self.pixel_data.shape[1],
                         new_size / self.pixel_data.shape[2]]
        
        # 使用scipy的ndimage进行重采样
        self.pixel_data = ndimage.zoom(self.pixel_data, zoom_factors, mode='nearest')
        self.cube_size = new_size
        
    def augment(self, rotation: bool = True, flip_axis: int = -1, noise: bool = True) -> 'NoduleCube':
        """
        数据增强
        
        Args:
            rotation: 是否进行旋转增强
            flip_axis: 是否进行翻转增强,默认为-1(不翻转)
            noise: 是否添加噪声
            
        Returns:
            增强后的新立方体实例
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可增强")
        
        # 创建副本
        augmented_cube = self.pixel_data.copy()
        
        # 旋转增强
        if rotation:
            # 随机选择旋转角度
            angles = np.random.uniform(-20, 20, 3)  # 在xyz三个方向上随机旋转
            augmented_cube = ndimage.rotate(augmented_cube, angles[0], axes=(1, 2), reshape=False, mode='nearest')
            augmented_cube = ndimage.rotate(augmented_cube, angles[1], axes=(0, 2), reshape=False, mode='nearest')
            augmented_cube = ndimage.rotate(augmented_cube, angles[2], axes=(0, 1), reshape=False, mode='nearest')
        
        # 翻转增强
        if flip_axis >=0:
            augmented_cube = np.flip(augmented_cube, axis=flip_axis)
        
        # 添加噪声
        if noise:
            # 添加随机高斯噪声
            noise_level = np.random.uniform(0.0, 0.05)
            noise_array = np.random.normal(0, noise_level, augmented_cube.shape)
            augmented_cube = augmented_cube + noise_array
            # 确保值在[0,1]范围内
            augmented_cube = np.clip(augmented_cube, 0, 1)
        
        # 创建新实例
        new_cube = NoduleCube(
            cube_size=self.cube_size,
            center_x=self.center_x,
            center_y=self.center_y,
            center_z=self.center_z,
            radius=self.radius,
            malignancy=self.malignancy
        )
        
        new_cube.set_cube_data(augmented_cube)
        return new_cube

    def visualize_3d(self, output_path: Optional[str] = None, show: bool = True) -> None:
        """
        可视化立方体数据
        
        Args:
            output_path: 可选的输出路径,如果提供则保存图像
            show: 是否显示图像
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可视化")
            
        # 创建图像
        fig, axes = plt.subplots(2, 3, figsize=(12, 8))
        
        # 获取中心切片
        center_z = self.pixel_data.shape[0] // 2
        center_y = self.pixel_data.shape[1] // 2
        center_x = self.pixel_data.shape[2] // 2
        
        # 显示三个正交平面
        slice_xy = self.pixel_data[center_z, :, :]
        slice_xz = self.pixel_data[:, center_y, :]
        slice_yz = self.pixel_data[:, :, center_x]
        
        # 显示三个正交视图
        axes[0, 0].imshow(slice_xy, cmap='gray')
        axes[0, 0].set_title(f'轴向视图 (Z={center_z})')
        
        axes[0, 1].imshow(slice_xz, cmap='gray')
        axes[0, 1].set_title(f'矢状位视图 (Y={center_y})')
        
        axes[0, 2].imshow(slice_yz, cmap='gray')
        axes[0, 2].set_title(f'冠状位视图 (X={center_x})')
        
        # 3D渲染视图(使用MIP: Maximum Intensity Projection)
        mip_xy = np.max(self.pixel_data, axis=0)
        mip_xz = np.max(self.pixel_data, axis=1)
        mip_yz = np.max(self.pixel_data, axis=2)
        
        axes[1, 0].imshow(mip_xy, cmap='gray')
        axes[1, 0].set_title('最大强度投影 (轴向)')
        
        axes[1, 1].imshow(mip_xz, cmap='gray')
        axes[1, 1].set_title('最大强度投影 (矢状位)')
        
        axes[1, 2].imshow(mip_yz, cmap='gray')
        axes[1, 2].set_title('最大强度投影 (冠状位)')
        
        # 添加结节信息
        nodule_info = f"结节中心: ({self.center_x}, {self.center_y}, {self.center_z})\n"
        nodule_info += f"半径: {self.radius:.1f}\n"
        nodule_info += f"恶性度: {'恶性' if self.malignancy == 1 else '良性'}"
        
        fig.suptitle(nodule_info, fontsize=12)
        plt.tight_layout()
        
        if output_path:
            plt.savefig(output_path, dpi=200, bbox_inches='tight')
        
        if show:
            plt.show()
        else:
            plt.close(fig)
            
    @classmethod
    def from_npy(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
        """
        从NPY文件创建立方体实例
        
        Args:
            file_path: NPY文件路径
            cube_size: 立方体大小
            
        Returns:
            NoduleCube实例
        """
        cube = cls(cube_size=cube_size, npy_path=file_path)
        cube.load_from_npy()
        return cube
    
    @classmethod
    def from_png(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
        """
        从PNG文件创建立方体实例
        
        Args:
            file_path: PNG文件路径
            cube_size: 立方体大小
            
        Returns:
            NoduleCube实例
        """
        cube = cls(cube_size=cube_size, png_path=file_path)
        cube.load_from_png()
        return cube
        
    @classmethod
    def from_array(cls, 
                  pixel_data: np.ndarray, 
                  center_x: int = 0, 
                  center_y: int = 0, 
                  center_z: int = 0,
                  radius: float = 0.0,
                  malignancy: int = 0) -> 'NoduleCube':
        """
        从numpy数组创建立方体实例
        
        Args:
            pixel_data: 3D像素数据
            center_x: 中心点X坐标
            center_y: 中心点Y坐标
            center_z: 中心点Z坐标
            radius: 结节半径
            malignancy: 恶性度(0=良性, 1=恶性)
            
        Returns:
            NoduleCube实例
        """
        if len(pixel_data.shape) != 3:
            raise ValueError(f"像素数据必须是3D数组,当前形状: {pixel_data.shape}")
            
        cube_size = pixel_data.shape[0]
        if pixel_data.shape[1] != cube_size or pixel_data.shape[2] != cube_size:
            raise ValueError(f"像素数据必须是立方体形状,当前形状: {pixel_data.shape}")
            
        cube = cls(
            cube_size=cube_size,
            center_x=center_x,
            center_y=center_y,
            center_z=center_z,
            radius=radius,
            malignancy=malignancy
        )
        
        cube.set_cube_data(pixel_data)
        return cube




================================================
FILE: data/dataclass/__init__.py
================================================



================================================
FILE: data/preprocessing/__init__.py
================================================


================================================
FILE: data/preprocessing/lidc_process/README.md
================================================
# LIDC-IDRI 数据预处理

本目录包含处理 [LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative)](https://www.cancerimagingarchive.net/collection/lidc-idri/) 数据集的预处理脚本。LIDC-IDRI 是一个公开的肺部CT扫描数据集,包含了1000多个患者的胸部CT扫描和相应的医生标注。

## LIDC-IDRI 数据集详情

LIDC-IDRI 数据集由7个学术中心和8个医学影像公司合作创建,共包含1018个案例。每个案例包括临床胸部CT扫描图像和由四位有经验的胸部放射科医生进行的标注结果。标注过程分为两个阶段:

1. **盲审阶段**:每位放射科医生独立审查每个CT扫描,并标记属于三个类别之一的病变:
   - 直径≥3mm的结节
   - 直径<3mm的结节
   - 直径≥3mm的非结节

2. **非盲审阶段**:每位放射科医生独立审查自己的标记以及其他三位放射科医生的匿名标记,以形成最终意见。

### 数据组成

- **CT扫描**:1018个病例的胸部CT扫描DICOM文件
- **标注XML文件**:包含结节位置、大小和特征的XML格式标注
- **结节诊断信息**:包含结节的恶性度评分和其他特征

### 标注特征说明

每个标记的结节包含以下特征评分(1-5分):

| 特征名称 | 评分范围 | 含义 |
| ------- | ------- | ---- |
| 恶性度(malignancy) | 1-5 | 1=高度良性,5=高度恶性 |
| 球形度(sphericity) | 1-5 | 1=线性,5=完全球形 |
| 边缘特征(margin) | 1-5 | 1=明显,5=模糊 |
| 毛刺(spiculation) | 1-5 | 1=无毛刺,5=明显毛刺 |
| 纹理(texture) | 1-5 | 1=非实性,5=实性 |
| 钙化(calcification) | 1-6 | 不同类型的钙化 |
| 内部结构(internal structure) | 1-4 | 不同类型的内部结构 |
| 分叶性(lobulation) | 1-5 | 1=无分叶,5=明显分叶 |
| 细微性(subtlety) | 1-5 | 1=明显,5=细微 |

## 数据示例

### XML标注示例

```xml
<LidcReadMessage>
  <ResponseHeader>
    <SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789</SeriesInstanceUid>
  </ResponseHeader>
  <readingSession>
    <servicingRadiologistID>Reader1</servicingRadiologistID>
    <unblindedReadNodule>
      <noduleID>Nodule001</noduleID>
      <characteristics>
        <malignancy>4</malignancy>
        <sphericity>5</sphericity>
        <margin>4</margin>
        <spiculation>3</spiculation>
        <texture>5</texture>
        <calcification>1</calcification>
        <internalStructure>1</internalStructure>
        <lobulation>2</lobulation>
        <subtlety>3</subtlety>
      </characteristics>
      <roi>
        <imageZposition>-124.0</imageZposition>
        <edgeMap>
          <xCoord>256</xCoord>
          <yCoord>215</yCoord>
        </edgeMap>
        <!-- 更多边缘点... -->
      </roi>
      <!-- 更多ROI... -->
    </unblindedReadNodule>
    <nonNodule>
      <nonNoduleID>NonNodule001</nonNoduleID>
      <imageZposition>-134.0</imageZposition>
      <locus>
        <xCoord>345</xCoord>
        <yCoord>287</yCoord>
      </locus>
    </nonNodule>
  </readingSession>
  <!-- 更多readingSession... -->
</LidcReadMessage>
```

### 处理后的CSV数据示例

**百分比坐标CSV(process_lidc_annotations输出)**:

```
patient_id,anno_index,servicingRadiologistID,coord_x,coord_y,coord_z,diameter,malscore,sphericiy,margin,spiculation,texture,calcification,internal_structure,lobulation,subtlety
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader1,0.5242,0.4455,0.3789,0.0521,4,5,4,3,5,1,1,2,3
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader2,0.5256,0.4478,0.3802,0.0534,3,4,3,2,5,1,1,3,2
```

**毫米坐标CSV(percent_coordinatecsv_to_mmcsv输出)**:

```
patient_id,anno_index,servicingRadiologistID,coord_x,coord_y,coord_z,mm_x,mm_y,mm_z,diameter,malscore,sphericiy,margin,spiculation,texture,calcification,internal_structure,lobulation,subtlety
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader1,0.5242,0.4455,0.3789,126.5,107.8,-124.0,0.0521,4,5,4,3,5,1,1,2,3
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader2,0.5256,0.4478,0.3802,127.0,108.5,-124.2,0.0534,3,4,3,2,5,1,1,3,2
```

**带平均坐标和恶性度标签的CSV(最终输出)**:

```
patient_id,anno_index,servicingRadiologistID,coord_x,coord_y,coord_z,mm_x,mm_y,mm_z,avg_x,avg_y,avg_z,diameter,malscore,real_mal,sphericiy,margin,spiculation,texture,calcification,internal_structure,lobulation,subtlety
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader1,0.5242,0.4455,0.3789,126.5,107.8,-124.0,126.75,108.15,-124.1,0.0521,4,1,5,4,3,5,1,1,2,3
1.3.6.1.4.1.14519.5.2.1.6279.6001.123456789,Nodule001,Reader2,0.5256,0.4478,0.3802,127.0,108.5,-124.2,126.75,108.15,-124.1,0.0534,3,1,4,3,2,5,1,1,3,2
```

### 结节标注统计

在LIDC-IDRI数据集中:
- 共有1018个病例
- 约2669个被至少一位放射科医生标注的≥3mm结节
- 约928个被所有四位放射科医生标注的≥3mm结节
- 约479个被标记为恶性的结节(平均恶性度评分>3)
- 约591个被标记为良性的结节(平均恶性度评分<3)
- 约858个具有不确定恶性度的结节(平均恶性度评分=3)

## 数据处理流程

数据处理分为以下几个主要步骤:

1. 从原始XML标注文件中提取结节信息
2. 将原始百分比坐标转换为毫米坐标
3. 汇总和整合多位放射科医生的结节标注
4. 计算结节的恶性度标签
5. 生成用于模型训练的数据集

## 脚本说明

### 1. lidc_annotation_process.py

该脚本用于处理LIDC-IDRI数据集中的XML标注文件,提取结节的位置、大小和特征信息。

**主要功能**:

- `read_nodule_annotation_from_xml()`: 读取单个XML标注文件,提取结节信息
- `process_lidc_annotations()`: 处理所有XML标注文件,汇总所有结节信息
- `extract_lidc_every_z_annotations()`: 提取每个Z轴切片上的标注信息
- `merge_nodule_annotation_csv_to_one()`: 将多个结节标注CSV文件合并为一个

**处理的信息包括**:

- 结节的位置坐标(百分比形式)
- 结节直径
- 结节恶性度评分(1-5)
- 其他特征:球形度、边缘特征、毛刺、纹理、钙化、内部结构、分叶性和细微性

### 2. lidc_coordinate_process.py

该脚本处理由 `lidc_annotation_process.py` 生成的结果,主要关注标注坐标的转换和处理。

**主要功能**:

- `percent_coordinatecsv_to_mmcsv()`: 将百分比坐标转换为毫米坐标
- `avg_coordinates()`: 计算相同结节的平均坐标
- `add_final_mals()`: 计算每个结节的最终恶性度标签
- `draw_percent_cube_by_csv()`: 根据百分比坐标绘制立方体区域
- `draw_all_confirmed_cubes()`: 绘制所有确认的结节立方体

## 处理流程

1. **读取原始XML标注**:
   - 解析XML文件提取每个放射科医生的标注
   - 获取结节的位置坐标(以图像百分比表示)
   - 提取结节的特征信息(恶性度评分、纹理等)

2. **坐标转换**:
   - 将百分比坐标转换为毫米坐标
   - 根据DICOM元数据计算实际物理位置

3. **结节汇总**:
   - 同一结节可能被多位放射科医生标注
   - 计算相近结节的平均坐标
   - 合并多位医生对同一结节的标注

4. **恶性度计算**:
   - 每个结节由多位医生评分(1-5分)
   - 根据各评分计算最终恶性度标签
   - 0 = 良性,1 = 恶性,"unknow" = 不确定

5. **数据可视化**:
   - 在原始CT图像上绘制结节立方体
   - 用于验证标注准确性

## 使用示例

```python
# 处理XML标注
process_lidc_annotations("path/to/xml/*.xml", patient_mhd_path_dict, "path/to/save/all_annotations.csv")

# 转换坐标
percent_coordinatecsv_to_mmcsv("all_annotations.csv", mhd_info_csv, "mm_coordinates.csv")

# 计算平均坐标
avg_coordinates("mm_coordinates.csv", threshold=5, "avg_coordinates.csv")

# 添加最终恶性度标签
add_final_mals("avg_coordinates.csv", "final_annotations.csv")
```

## 数据集参考

LIDC-IDRI数据集:[https://www.cancerimagingarchive.net/collection/lidc-idri/](https://www.cancerimagingarchive.net/collection/lidc-idri/) 

================================================
FILE: data/preprocessing/lidc_process/__init__.py
================================================


================================================
FILE: data/preprocessing/lidc_process/lidc_annotation_process.py
================================================
import pandas as pd
import math
import glob
import os

from bs4 import BeautifulSoup
from constant import luna

# mhd  or dicom file information csv file path
mhd_info_csv = luna.MHD_INFO_CSV
# csv columns name of postive nodule
pos_annotation_csv_head = ["anno_index", "coord_x", "coord_y", "coord_z", "diameter", "malscore"]
# csv columns name of negative nodule
neg_annotation_csv_head = ["anno_index", "coord_x", "coord_y", "coord_z", "diameter", "malscore"]
# root path of every patient's annotation information extracted from xml file
extracted_annotation_info_root_path ='/data/LUNA2016/extracted_annotation_infos/'
#columns name of  all information extracted from xml annotations
all_annotation_csv_head = ["patient_id", "anno_index","servicingRadiologistID", "coord_x", "coord_y", "coord_z", "diameter",
                           "malscore","sphericiy", "margin", "spiculation", "texture", "calcification", "internal_structure", "lobulation", "subtlety"]


def merge_nodule_annotation_csv_to_one(nodule_annotation_csv_list,save_file):
    '''

    :param nodule_annotation_csv_list:
    :param save_file:
    :return:
    '''
    annotattion_list = []
    # add patient id to annotation information csv file
    annotation_head = 'patient_id,'+','.join(pos_annotation_csv_head)
    for csv in nodule_annotation_csv_list:  # csv filename like : 1.3.6.1.4.1.14519.5.2.1.6279.6001.106630482085576298661469304872_annos_pos.csv
        with open(csv,'r') as read_csv:
            contents = read_csv.readlines()
            patient_id = os.path.basename(csv).split("_")[0]  # get patient id
            for line in contents[1:]:                         # skip column head
                annotattion_list.append(patient_id+","+line)
    with open(save_file,'w') as save_csv:
        save_csv.write(annotation_head+"\r\n")
        for line in annotattion_list:
            save_csv.write(line)
    print("csv annotation file merged into file ",save_file)

def read_nodule_annotation_from_xml(xml_path,patient_mhd_path_dict,agreement_threshold=0):
    '''
         read annotaion information xml file path

    :param xml_path:                single xml annotation file
    :param patient_mhd_path_dict:   list of all mhd file paths,xml annotation contains several patient,every patient should get real mhd file path from it
    :param agreement_threshold:     every patient's CT image was marked by multi(4 most) doctor,the least agreement to make final mark
    :return:
    '''
    pos_lines = []
    neg_lines = []
    extended_lines = []
    with open(xml_path, 'r') as xml_file:
        markup = xml_file.read()
    xml = BeautifulSoup(markup, features="xml")
    if xml.LidcReadMessage is None:
        return None, None, None
    patient_id = xml.LidcReadMessage.ResponseHeader.SeriesInstanceUid.text
    src_path = None
    if patient_id in patient_mhd_path_dict:
        src_path = patient_mhd_path_dict[patient_id]
    if src_path is None:
        return None, None, None

    print(patient_id)
    mhd_info_pd = pd.read_csv(mhd_info_csv)
    mhd_info_row = mhd_info_pd[mhd_info_pd['patient_id']==patient_id]
    print("information about this patient is:\t",mhd_info_row)
    num_z, height, width = list(mhd_info_row['shape_2'])[0],list(mhd_info_row['shape_1'])[0],list(mhd_info_row['shape_0'])[0]
    print("num_z,height,width are:\t",num_z, height, width)
    origin_x,origin_y,origin_z = list(mhd_info_row['origin_x'])[0],list(mhd_info_row['origin_y'])[0],list(mhd_info_row['origin_z'])[0]
    spacing_x,spacing_y,spacing_z = list(mhd_info_row['spacing_x'])[0],list(mhd_info_row['spacing_y'])[0],list(mhd_info_row['spacing_z'])[0]

    #  a reading session consists of the results consists of a set of markings done by a single
    # reader at a single phase (for these xml files, the unblinded reading phase).
    reading_sessions = xml.LidcReadMessage.find_all("readingSession")

    for reading_session in reading_sessions:
        # print("Sesion")
        servicingRadiologistID = reading_session.servicingRadiologistID.text
        nodules = reading_session.find_all("unblindedReadNodule")
        for nodule in nodules:
            nodule_id = nodule.noduleID.text
            # print("  ", nodule.noduleID)
            rois = nodule.find_all("roi")
            x_min = y_min = z_min = 999999
            x_max = y_max = z_max = -999999
            if len(rois) < 2:
                continue

            for roi in rois:
                z_pos = float(roi.imageZposition.text)
                z_min = min(z_min, z_pos)
                z_max = max(z_max, z_pos)
                edge_maps = roi.find_all("edgeMap")
                for edge_map in edge_maps:
                    x = int(edge_map.xCoord.text)
                    y = int(edge_map.yCoord.text)
                    x_min = min(x_min, x)
                    y_min = min(y_min, y)
                    x_max = max(x_max, x)
                    y_max = max(y_max, y)
                if x_max == x_min:
                    continue
                if y_max == y_min:
                    continue

            x_diameter = x_max - x_min
            x_center = x_min + x_diameter / 2
            y_diameter = y_max - y_min
            y_center = y_min + y_diameter / 2
            z_diameter = z_max - z_min
            z_center = z_min + z_diameter / 2
            z_center -= origin_z
            z_center /= spacing_z

            x_center_perc = round(x_center / width, 4)
            y_center_perc = round(y_center / height, 4)
            z_center_perc = round(z_center / num_z, 4)
            diameter = max(x_diameter , y_diameter)
            diameter_perc = round(max(x_diameter / width, y_diameter / height), 4)

            if nodule.characteristics is None:
                print("!!!!Nodule:", nodule_id, " has no charecteristics")
                continue
            if nodule.characteristics.malignancy is None:
                print("!!!!Nodule:", nodule_id, " has no malignacy")
                continue
            print("nodule in load xml",x_center_perc,y_center_perc,z_center_perc)
            malignacy = nodule.characteristics.malignancy.text
            sphericiy = nodule.characteristics.sphericity.text
            margin = nodule.characteristics.margin.text
            spiculation = nodule.characteristics.spiculation.text
            texture = nodule.characteristics.texture.text
            calcification = nodule.characteristics.calcification.text
            internal_structure = nodule.characteristics.internalStructure.text
            lobulation = nodule.characteristics.lobulation.text
            subtlety = nodule.characteristics.subtlety.text

            line = [nodule_id, x_center_perc, y_center_perc, z_center_perc, diameter_perc, malignacy]
            extended_line = [patient_id, nodule_id, servicingRadiologistID,x_center_perc, y_center_perc, z_center_perc,
                             diameter_perc, malignacy,sphericiy, margin, spiculation, texture, calcification, internal_structure, lobulation, subtlety]

            pos_lines.append(line)
            extended_lines.append(extended_line)

        nonNodules = reading_session.find_all("nonNodule")
        for nonNodule in nonNodules:
            z_center = float(nonNodule.imageZposition.text)
            z_center -= origin_z
            z_center /= spacing_z
            x_center = int(nonNodule.locus.xCoord.text)
            y_center = int(nonNodule.locus.yCoord.text)
            nodule_id = nonNodule.nonNoduleID.text
            x_center_perc = round(x_center / width, 4)
            y_center_perc = round(y_center / height, 4)
            z_center_perc = round(z_center / num_z, 4)
            diameter_perc = round(max(6 / width, 6 / height), 4)
            # print("Non nodule!", z_center)
            line = [nodule_id, x_center_perc, y_center_perc, z_center_perc, diameter_perc, 0]
            neg_lines.append(line)

    if agreement_threshold > 1:
        filtered_lines = []
        for pos_line1 in pos_lines:
            id1 = pos_line1[0]
            x1 = pos_line1[1]
            y1 = pos_line1[2]
            z1 = pos_line1[3]
            d1 = pos_line1[4]
            overlaps = 0
            for pos_line2 in pos_lines:
                id2 = pos_line2[0]
                if id1 == id2:
                    continue
                x2 = pos_line2[1]
                y2 = pos_line2[2]
                z2 = pos_line2[3]
                d2 = pos_line1[4]
                dist = math.sqrt(math.pow(x1 - x2, 2) + math.pow(y1 - y2, 2) + math.pow(z1 - z2, 2))
                if dist < d1 or dist < d2:
                    overlaps += 1
            if overlaps >= agreement_threshold:
                filtered_lines.append(pos_line1)
            # else:
            #     print("Too few overlaps")
        pos_lines = filtered_lines

    df_annos = pd.DataFrame(pos_lines, columns=pos_annotation_csv_head)
    df_annos.to_csv(extracted_annotation_info_root_path + patient_id + "_annos_pos_lidc.csv", index=False)
    df_neg_annos = pd.DataFrame(neg_lines, columns=neg_annotation_csv_head)
    df_neg_annos.to_csv(extracted_annotation_info_root_path + patient_id + "_annos_neg_lidc.csv", index=False)

    return pos_lines, neg_lines, extended_lines



def process_lidc_annotations(xml_annotation_like,patient_mhd_path_dict,mhd_all_info_save_path,agreement_threshold=0):
    '''
        extract xml annotation information from xml path

    :param xml_annotation_like:             used for glob,a file path string
    :param patient_mhd_path_dict:           key is patient id,value is the mhd file path
    :param mhd_all_info_save_path:          where to  save the extracted mhd information csv file(not mhd_info_csv)
    :param agreement_threshold:             every nodule was annotated by multi-doctor,the least number
    :return:
    '''
    file_no = 0
    pos_count = 0
    neg_count = 0
    all_lines = []

    xml_paths = glob.glob(xml_annotation_like)
    for xml_path in xml_paths:
        pos, neg, extended = read_nodule_annotation_from_xml(xml_path, patient_mhd_path_dict,agreement_threshold=agreement_threshold)
        if pos is not None:
            pos_count += len(pos)
            neg_count += len(neg)
            file_no += 1
            all_lines += extended
    df_annos = pd.DataFrame(all_lines, columns= all_annotation_csv_head)
    df_annos.to_csv(mhd_all_info_save_path, index=False)

def extract_lidc_every_z_annotations(xml_like,every_z_save_csv,patient_mhd_path_dict):
    xml_paths = glob.glob(xml_like)
    with open(every_z_save_csv,"w") as anno_save:
        anno_save.write("seriesuid,coord_percent_x, coord_percent_y,coord_mm_z,percent_diamater,mm_x,mm_y,diameter_mm,malscore")
        anno_save.write("\r\n")
        for xml_path in xml_paths:
            extended = extract_every_z_from_lidc_xml(xml_path, patient_mhd_path_dict)
            if extended is not None:
                anno_save.write(str(extended).replace(")","").replace("(","").replace("\'","")+"\r\n")


def extract_every_z_from_lidc_xml(xml_path,patient_mhd_path_dict):
    '''
        extract every nodule(ROIs) from xml file,the above method `read_nodule_annotation_from_xml` extract center z coordinates of nodule

        this method was used by UNet to produce more nodule mask

    :param xml_path:                xml file of nodule annotation
    :param patient_mhd_path_dict:   key is patient id,value is its full mhd file path
    :return:                        list of every nodule's coordinates
    '''
    extended_lines = []
    with open(xml_path, 'r') as xml_file:
        markup = xml_file.read()
    xml = BeautifulSoup(markup, features="xml")
    if xml.LidcReadMessage is None:
        return None, None, None
    patient_id = xml.LidcReadMessage.ResponseHeader.SeriesInstanceUid.text

    print("patient id is:\t", patient_id)
    if patient_id in patient_mhd_path_dict:
        src_path = patient_mhd_path_dict[patient_id]
    else:
        return None, None, None

    print(patient_id)
    mhd_info_pd = pd.read_csv(mhd_info_csv)
    mhd_info_row = mhd_info_pd[mhd_info_pd['patient_id'] == patient_id]
    print("information about this patient is:\t", mhd_info_row)
    num_z, height, width = list(mhd_info_row['shape_2'])[0], list(mhd_info_row['shape_1'])[0], \
                           list(mhd_info_row['shape_0'])[0]
    print("num_z,height,width are:\t", num_z, height, width)
    origin_x, origin_y, origin_z = list(mhd_info_row['origin_x'])[0], list(mhd_info_row['origin_y'])[0], \
                                   list(mhd_info_row['origin_z'])[0]
    spacing_x, spacing_y, spacing_z = list(mhd_info_row['spacing_x'])[0], list(mhd_info_row['spacing_y'])[0], \
                                      list(mhd_info_row['spacing_z'])[0]

    #  a reading session consists of the results consists of a set of markings done by a single
    # reader at a single phase (for these xml files, the unblinded reading phase).
    reading_sessions = xml.LidcReadMessage.find_all("readingSession")

    for reading_session in reading_sessions:
        # print("Sesion")
        servicingRadiologistID = reading_session.servicingRadiologistID.text
        nodules = reading_session.find_all("unblindedReadNodule")
        for nodule in nodules:
            nodule_id = nodule.noduleID.text
            if nodule.characteristics is None:
                print("!!!!Nodule:", nodule_id, " has no charecteristics")
                continue
            if nodule.characteristics.malignancy is None:
                print("!!!!Nodule:", nodule_id, " has no malignacy")
                continue
            malignacy = nodule.characteristics.malignancy.text

            rois = nodule.find_all("roi")
            x_min = y_min = z_min = 999999
            x_max = y_max = z_max = -999999
            if len(rois) < 2:
                continue

            for roi in rois:
                z_pos = float(roi.imageZposition.text)
                edge_maps = roi.find_all("edgeMap")
                for edge_map in edge_maps:
                    x = int(edge_map.xCoord.text)
                    y = int(edge_map.yCoord.text)
                    x_min = min(x_min, x)
                    y_min = min(y_min, y)
                    x_max = max(x_max, x)
                    y_max = max(y_max, y)
                if x_max == x_min:
                    continue
                if y_max == y_min:
                    continue

                x_diameter = x_max - x_min
                x_center = x_min + x_diameter / 2
                y_diameter = y_max - y_min
                y_center = y_min + y_diameter / 2

                x_center_perc = round(x_center / width, 4)
                y_center_perc = round(y_center / height, 4)
                diameter_mm = max(x_diameter, y_diameter)
                diameter_perc = round(max(x_diameter / width, y_diameter / height), 4)

                extended_line = patient_id+","+str(round(x_center_perc,4))+","+str(round(y_center_perc,4))+","+str(z_pos)+","+\
                                 str(round(diameter_perc,4))+","+str(x_center),str(y_center)+","+str(diameter_mm)+"," +malignacy
                extended_lines.append(extended_line)

    return extended_lines




================================================
FILE: data/preprocessing/lidc_process/lidc_coordinate_process.py
================================================
# -*- coding:utf-8 -*-
'''
 a script to process result produced by lidc_annotation_process.py

 mainly focus on process mm coordinates of annotations
'''

import math
import os
import pandas as pd
from constant import luna
from util import mhd_util,image_util, cube


def draw_percent_cube_by_csv(percent_csv,mhd_info_csv,cube_save_path):
    '''
       coordinate from xml are percent of image shape,
    :param percent_coordinate_csv:
    :return:
    '''
    mhd_pandas_index = mhd_util.read_csv_to_pandas(mhd_info_csv,',')
    percent_pandas = mhd_util.read_csv_to_pandas(percent_csv,',')

    for index, row in percent_pandas.iterrows():
        patient_id = row['patient_id']
        coord_x = float(row['coord_x'])
        coord_y = float(row['coord_y'])
        coord_z = float(row['coord_z'])
        malscore = row['malscore']
        diameter = float(row['diameter'])
        print("read cube coordx,coordy,coordz,diamater,malscore",coord_x,coord_y,coord_z,diameter,malscore)
        png_path = luna.LUNA_EXTRACTED_IMG + '/' + patient_id
        if os.path.exists(png_path):
            cube.draw_percent_cube(png_path, mhd_pandas_index, coord_x, coord_y,
                                   coord_z, diameter, cube_save_path, probility =malscore)
        else:
            print("one patient not exists..", png_path)


def percent_coordinatecsv_to_mmcsv(percent_csv,mhd_info_csv,mmcsv_save):
    '''
       transform percent coordinates to mm coordinates and save into new csv file

    :param percent_csv:     csv file contains all percent coordinates
    :param mmcsv_save:      csv file to save transformed coordinates
    :return:
    '''
    with open(percent_csv,'r') as percent_read:
        head = percent_read.readline()
        print(str(head))
        head = str(head).replace("coord_x,coord_y,coord_z","coord_x,coord_y,coord_z,mm_x,mm_y,mm_z")
        extend_mm_coordinate_content = []
        lines = percent_read.readlines()
        for line in lines:
            cols = line.split(",")
            patient_id = cols[0]
            p_x,p_y,p_z = cols[3],cols[4],cols[5] #row['coord_x'],row['coord_y'],row['coord_z']
            print("one line\t",line)
            print("patient id = ",patient_id)
            print("coordinates is:\t ",p_x,p_y,p_z)
            mm_x,mm_y,mm_z = percent_coordinate_to_mm(patient_id,float(p_x),float(p_y),float(p_z),mhd_info_csv)
            print("after transfered ..:\t",mm_x,mm_y,mm_z)
            line = line.replace(str(p_x)+","+str(p_y)+","+str(p_z),
                                str(p_x)+","+str(p_y)+","+str(p_z)+","+str(mm_x)+","+str(mm_y)+","+str(mm_z))
            print(line)
            extend_mm_coordinate_content.append(line)

        with open(mmcsv_save,"w") as mm:
            mm.write(head)
            for line in extend_mm_coordinate_content:
                mm.write(str(line))
    print("transformed mm coordinates finished..")


def avg_coordinates(csv,threshold,csv_save):
    '''
        add average coordinates to source.
     before:
        patient_id0, coord_x0,coord_y0,coord_z0
        patient_id0, coord_x1,coord_y1,coord_z1
     after:
        patient_id0, coord_x0,coord_y0,coord_z0,avg_x,avg_y,avg_z
        patient_id0, coord_x1,coord_y1,coord_z1,avg_x,avg_y,avg_z
    :param csv:
    :param csv_save:        csv file to save transformed content
    :return:
    '''
    new_content = []
    patient_coords = {}
    with open(csv,'r') as csv_read:
        head = csv_read.readline()
        lines = csv_read.readlines()
        for line in lines:
            cols = line.split(",")
            patient_id = cols[0]
            mm_x,mm_y,mm_z,diam,mals = cols[6],cols[7],cols[8],cols[9],cols[10]
            if patient_id in patient_coords:
                patient_coords[patient_id].append([mm_x,mm_y,mm_z,diam,mals])
            else:
                patient_coords[patient_id]= [[mm_x, mm_y, mm_z, diam,mals]]

    # get average coords
    avg_same_coords = {}

    for key,value in patient_coords.items():
        patient_id = key
        coords = value
        xyzs = []
        for coor in coords: # list of lists
            x,y,z = coor[0],coor[1],coor[2]
            xyzs.append([float(x),float(y),float(z)])

        # find every coordinate's neighbor coordinates (distance smaller than threshold)
        same_coords = {}
        coord_num = len(xyzs)
        i = 0
        while i <coord_num:
            current_x,current_y,current_z =xyzs[i][0],xyzs[i][1],xyzs[i][2]
            same_with_current = str(current_x)+","+str(current_y)+","+str(current_z)
            same_coords[same_with_current] = [[current_x,current_y,current_z]]   # put itself into its neighbor
            j = 0
            while j < coord_num:
                x,y,z = xyzs[j][0],xyzs[j][1],xyzs[j][2]
                dis = math.sqrt((x-current_x)**2+(y-current_y)**2+(z-current_z)**2)            # distance of two coordinates
                if dis< threshold and [x,y,z] not in same_coords[same_with_current]:
                    same_coords[same_with_current].append([x,y,z])
                j = j+1
            i+=1

        # get average of coordinates
        for key,value in same_coords.items():
            cur_x,curr_y,curr_z = key.split(",")
            x,y,z = 0,0,0
            #print("key:  value: ",key,":\t",value)
            if len(value)>0:
                for same_cor in value:
                    #print(same_cor)
                    x = x + same_cor[0]
                    y = y + same_cor[1]
                    z = z + same_cor[2]
                x = round(x /len(value),2)
                y = round(y /len(value),2)
                z = round(z /len(value),2)
                # update dict with average
            avg_same_coords[key] = [x,y,z]

    with open(csv,'r') as csv_read:
        head = csv_read.readline()
        head = head.replace("mm_x,mm_y,mm_z","mm_x,mm_y,mm_z,avg_x,avg_y,avg_z")
        new_content.append(head)
        lines = csv_read.readlines()
        for line in lines:
            cols = line.split(",")
            patient_id = cols[0]
            mm_x,mm_y,mm_z,diam,mals = cols[6],cols[7],cols[8],cols[9],cols[10]
            avg_xyz = avg_same_coords[mm_x+","+mm_y+","+mm_z]
            avg_x,avg_y,avg_z = str(avg_xyz[0]),str(avg_xyz[1]),str(avg_xyz[2])
            new_content.append(line.replace(mm_x+","+mm_y+","+mm_z,
                                           mm_x + "," + mm_y + "," + mm_z+","+avg_x+","+avg_y+","+avg_z))

    with open(csv_save,'w') as info:
        for line in new_content:
            print("line from lidc_coordinate:\t",line)
            info.write(line)

    print("write attachement information to %s finished.."%csv_save)


def add_final_mals(csv,with_real_malsclabel_csv):
    """
            compute real malignancy of every patient. every nodule was labeled by several different readers,
        this step is comfirming a final malignancy label
    :param csv:                         csv file of all mhd information
    :param with_real_malsclabel_csv:    csv file after add real malscore columns
    :return:
    """
    nodule_mals = {}
    with open(csv,'r') as read_csv:
        head = read_csv.readline()
        lines = read_csv.readlines()
        for line in lines:
            cols = line.split(",")
            patient_id = cols[0]
            avg_x,avg_y,avg_z,mals = cols[9],cols[10],cols[11],cols[13]
            key = patient_id+","+avg_x+","+avg_y+","+avg_z
            if key not in nodule_mals:
                nodule_mals[key] = [int(mals)]
            else:
                nodule_mals[key].append(int(mals))

    # compute the real malignancy label
    for key,val in nodule_mals.items():
        mals = val
        print("patient_id and all malscore is:\t",key+"\t:",val)
        non_cancer = 0
        unknow = 0
        cancer = 0
        UNK ="unknow"
        for mal in mals:
            if mal<3:
                non_cancer +=1
            elif mal ==3:
                unknow +=1
            elif mal>3:
                cancer+=1
        real_mal = ""
        if unknow == len(mals):         # all are unknow
            real_mal = UNK
        elif non_cancer/(non_cancer+cancer)>0.5:
            real_mal = "0"
        elif non_cancer/(non_cancer+cancer)==0.5:
            real_mal =UNK
        elif non_cancer/(non_cancer+cancer)<0.5:
            real_mal = "1"

        if real_mal=="0" and unknow>non_cancer:
            real_mal = UNK

        print("non_cancer,unk,cancer,real label", non_cancer, unknow, cancer,real_mal)
        # update the mal label
        nodule_mals[key] = real_mal

    # add real mal columns into csv file
    with_real_mals_content = []
    with open(csv,'r') as read_csv:
        head = read_csv.readline()
        head = head.replace("avg_x,avg_y,avg_z","avg_x,avg_y,avg_z,real_mal")
        with_real_mals_content.append(head)
        lines = read_csv.readlines()
        for line in lines:
            cols = line.split(",")
            patient_id = cols[0]
            avg_x,avg_y,avg_z,mals = cols[9],cols[10],cols[11],cols[13]
            key = patient_id+","+avg_x+","+avg_y+","+avg_z
            real_mal = nodule_mals[key]
            # average coordinates equal to source coordinates
            if avg_x + "," + avg_y +"," + avg_z+","+avg_x + "," + avg_y +"," + avg_z in line:
                with_real_mals_content.append(line.replace(avg_x + "," + avg_y +"," + avg_z + "," + avg_x + "," + avg_y +"," + avg_z,
                                                           avg_x + "," + avg_y + "," + avg_z +","+ avg_x + "," + avg_y + "," + avg_z +","+ real_mal))
            else:
                with_real_mals_content.append(line.replace(avg_x + "," + avg_y +"," + avg_z,
                                                       avg_x + "," + avg_y + "," + avg_z + ","+ real_mal))

    # write the final result with real malscore columns into file
    with open(with_real_malsclabel_csv) as with_mal:
        for line in with_real_mals_content:
            with_mal.write(line)

    print("write result with real malscore columns finished..")



def percent_coordinate_to_mm(patient_id,p_x,p_y,p_z,mhd_info_csv):
    """
      transform percent coordinate to mm coordinates

    :param patient_id:       patient id,used for mapping information from mhd_info_csv
    :param p_x:              x percent coordinate
    :param p_y:
    :param p_z:
    :param mhd_info_csv:    a csv file contains  all mhd information ,such as shape,spacing,origion
    :return:                transformed mm coordinate x,y,z
    """
    png_path = png_path = luna.LUNA_EXTRACTED_IMG + '/'+patient_id
    patient_img = image_util.load_patient_images(png_path, "*_i.png", [])
    mhd_pandas_index = mhd_util.read_csv_to_pandas(mhd_info_csv,',')
    patient_mhd_info = mhd_pandas_index.loc[patient_id]

    z = int(p_z * patient_img.shape[0])
    y = int(p_y * patient_img.shape[1])
    x = int(p_x * patient_img.shape[2])
    orgin_x = float(patient_mhd_info['origin_x'].strip())
    orgin_y = float(patient_mhd_info['origin_y'].strip())
    orgin_z = float(patient_mhd_info['origin_z'].strip())

    right_x = x + orgin_x
    right_y = y + orgin_y
    right_z = z + orgin_z

    return round(right_x,2),round(right_y,2),round(right_z,2)

def draw_all_confirmed_cubes(mm_coordinates_csv,mhd_info_csv,extract_png_path,save_path):
    """
        draw all annotated nodule by luna2016 official
    :param mm_coordinates_csv:
    :param mhd_info_csv:
    :param extract_png_path:
    :param save_path:
    :return:
    """
    coordinates = pd.read_csv(mm_coordinates_csv)
    count = 0
    mhd_info = mhd_util.read_csv_to_pandas(mhd_info_csv)
    for df_index, df_row in coordinates.iterrows():
        patient_id = df_row['seriesuid']
        patient_png_path = os.path.join(extract_png_path,patient_id)
        mm_x = df_row['coordX']
        mm_y = df_row['coordY']
        mm_z = df_row['coordZ']
        diameter = df_row['diameter_mm']
        if os.path.exists(patient_png_path):
            cube.draw_percent_cube(patient_png_path, mm_x, mm_y, mm_z, diameter, save_path, mhd_pandas_index =mhd_info)
        else:
            count +=1
    print("draw all cubes finished...%d cubes missed"%count)



================================================
FILE: data/preprocessing/luna16_invalid_nodule_filter.py
================================================
## 去掉 Luna2016 候选结节数据中 有问题的标注数据 以及 用在预测过程中的 错误结节
import numpy as np
def nodule_valid(ct_data, voxel_coord_x, voxel_coord_y,voxel_coord_z):
    """
        判定当前结节是否 可以用来做训练cube 或者 扫描得到的cube 是否
    :param ct_data:         已经转换为0-255 ,并且已经抽取到肺部区域数据的 CTData类
    :param voxel_coord_x:   当前要判定的 cube的坐标中心位置
    :param voxel_coord_y:
    :param voxel_coord_z:
    :return:                当前结节是否可用 True(可用) / False (不可用)
    """
    lung_mask = ct_data.lung_seg_mask
    # 检查坐标是否在肺部边界内
    if (voxel_coord_z < 0 or voxel_coord_z >= lung_mask.shape[0] or
            voxel_coord_y < 0 or voxel_coord_y >= lung_mask.shape[1] or
            voxel_coord_x < 0 or voxel_coord_x >= lung_mask.shape[2]):
        return False
    # 获取周围半径为5个体素的区域
    z_min = max(0, voxel_coord_z - 5)
    z_max = min(lung_mask.shape[0], voxel_coord_z + 6)
    y_min = max(0, voxel_coord_y - 5)
    y_max = min(lung_mask.shape[1], voxel_coord_y + 6)
    x_min = max(0, voxel_coord_x - 5)
    x_max = min(lung_mask.shape[2], voxel_coord_x + 6)
    # 提取周围区域的肺部掩码
    neighborhood_mask = lung_mask[z_min:z_max, y_min:y_max, x_min:x_max]
    # 计算肺部组织占比
    lung_ratio = np.mean(neighborhood_mask)
    # 如果周围区域肺部组织占比太低,可能是假阳性
    if lung_ratio < 0.5:
        return False

    # 检查是否在肺部边缘
    # 计算当前点在肺部掩码中的位置
    if (0 < voxel_coord_z < lung_mask.shape[0] - 1 and
            0 < voxel_coord_y < lung_mask.shape[1] - 1 and
            0 < voxel_coord_x < lung_mask.shape[2] - 1):
        # 计算6-邻域(上下左右前后)中肺部体素的数量
        neighbors = [
            lung_mask[voxel_coord_z - 1, voxel_coord_y, voxel_coord_x],
            lung_mask[voxel_coord_z + 1, voxel_coord_y, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y - 1, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y + 1, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y, voxel_coord_x - 1],
            lung_mask[voxel_coord_z, voxel_coord_y, voxel_coord_x + 1]
        ]
        # 如果邻域中有过多非肺部体素,说明这可能是在肺部边缘
        if sum(neighbors) < 4:
            return False
    return True

================================================
FILE: data/preprocessing/luna16_prepare_cube_data.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import pandas as pd
from tqdm import tqdm
import multiprocessing as mp
import time
from data.dataclass.CTData import CTData
from data.dataclass.NoduleCube import NoduleCube
from data.preprocessing.luna16_invalid_nodule_filter import nodule_valid

def get_mhd_file_path(patient_id, luna16_root="H:/luna16"):
    """
    根据patient_id查找对应的MHD文件路径

    Args:
        patient_id: LUNA16数据集中的患者ID
        luna16_root: LUNA16数据集根目录

    Returns:
        MHD文件的完整路径
    """
    # LUNA16数据集的子集文件夹
    subsets = [f"subset{i}" for i in range(10)]
    # 遍历所有子集查找对应的MHD文件
    for subset in subsets:
        subset_path = os.path.join(luna16_root, subset)
        if os.path.exists(subset_path):
            mhd_file = os.path.join(subset_path, f"{patient_id}.mhd")
            if os.path.exists(mhd_file):
                return mhd_file

    # 未找到对应的MHD文件
    print(f"警告: 未找到患者 {patient_id} 的MHD文件")
    return None

def get_real_candidate(mhd_root_dir, annotation_csv, candidate_csv,save_real_candidate_csv):
    """
        官方给的 候选结节标注,其实存在问题,可能与真实结节位置相近,导致数据误判。我们需要根据一定规则剔除
    :param mhd_root_dir:
    :param annotation_csv:
    :param candidate_csv:
    :param save_real_candidate_csv:
    :return:
    """
    positive_df = pd.read_csv(annotation_csv)
    part_positive_df = positive_df[["seriesuid","coordX", "coordY","coordZ"]]
    part_positive_df["class"] = 1
    negative_df = pd.read_csv(candidate_csv)
    part_negative_df = negative_df[negative_df["class"] == 0].copy()
    concat_df = pd.concat([part_positive_df, part_negative_df],axis=0)
    unique_seriesids = concat_df["seriesuid"].unique().tolist()
    # 最终保留哪些候选结节
    keep_negative_df_list = []
    remove_nodule_num = 0
    # 找到同一个病人的 真实结节标注 和 候选结节标注
    for seid in unique_seriesids:
        # 先查看对应数据在不在
        mhd_path = get_mhd_file_path(seid, mhd_root_dir)
        if mhd_path is not None and os.path.exists(mhd_path):
            one_serid_df = concat_df[concat_df["seriesuid"] == seid].copy()
            one_seid_postive_df = one_serid_df[one_serid_df["class"] == 1].copy()
            one_seid_negative_df = one_serid_df[one_serid_df["class"] == 0].copy()
            if one_seid_postive_df.shape[0] > 0 and one_seid_negative_df.shape[0] > 0:
                for _, negative_row in one_seid_negative_df.iterrows():
                    one_negative_coord_x = negative_row["coordX"]
                    one_negative_coord_y = negative_row["coordY"]
                    one_negative_coord_z = negative_row["coordZ"]
                    keep_current_nodule = True
                    for _, positive_row in one_seid_postive_df.iterrows():
                        one_positive_coord_x = positive_row["coordX"]
                        one_positive_coord_y = positive_row["coordY"]
                        one_positive_coord_z = positive_row["coordZ"]
                        x_dist = abs(one_negative_coord_x - one_positive_coord_x)
                        y_dist = abs(one_negative_coord_y - one_positive_coord_y)
                        z_dist = abs(one_negative_coord_z - one_positive_coord_z)
                        # 最大的结节的直径是32,我们必须保证所有候选结节 不与真实结节有重叠
                        if x_dist < 16 or y_dist < 16 or z_dist < 16:
                            keep_current_nodule = False
                            remove_nodule_num = remove_nodule_num + 1
                            break
                    # 所有候选结节与真实结节都不重叠
                    if keep_current_nodule:
                        one_keep_nodule_df = pd.DataFrame([{"seriesuid": seid,
                                                            "coordX": one_negative_coord_x,
                                                            "coordY": one_negative_coord_y,
                                                            "coordZ": one_negative_coord_z,
                                                            "class": 0}])
                        keep_negative_df_list.append(one_keep_nodule_df)
    print("最终有多少结节记录\t", len(keep_negative_df_list))
    print("剔除了多少个有问题的候选结节\t", remove_nodule_num)
    if len(keep_negative_df_list) > 0:
        keep_negative_df = pd.concat(keep_negative_df_list,axis = 0)
        keep_negative_df.to_csv(save_real_candidate_csv, encoding="utf-8", index = False)
        print("最终结果保存到\t", save_real_candidate_csv)

def ctdata_annotation2nodule(ct_data, nodule_info, mal_label, cube_size=64):
    """
    处理单个结节,提取立方体并保存为PNG和可视化图像

    Args:
        ct_data: CTData实例
        nodule_info: 结节信息(Series)
        mal_label:   当前结节标签
        cube_size: 立方体大小(mm)
    Returns:
        保存的文件路径元组 (png_path, viz_path)
    """
    # 获取结节信息
    patient_id = nodule_info['seriesuid']
    coord_x = nodule_info['coordX']
    coord_y = nodule_info['coordY']
    coord_z = nodule_info['coordZ']
    # 从CT数据中提取结节立方体
    nodule_cube_data = ct_data.extract_cube([coord_x, coord_y, coord_z], cube_size, if_fixed_radius=True)
    center_voxel = ct_data.world_to_voxel([coord_x, coord_y, coord_z])
    # 确保结节体积不为空
    if nodule_cube_data.size == 0:
        print(f"警告: 患者 {patient_id} 的结节体积为空,请检查坐标是否正确")
        return None, None
    # 打印原始数据形状
    # 创建NoduleCube实例
    nodule_cube = NoduleCube.from_array(
        pixel_data=nodule_cube_data,
        center_x=int(center_voxel[0]),
        center_y=int(center_voxel[1]),
        center_z=int(center_voxel[2]),
        radius=int(cube_size / 2),
        malignancy=mal_label
    )
    return nodule_cube


def process_nodule(nodule, cube_index,mhd_root_dir, label, if_aug, png_output, npy_output, check_output, check_count_limit):
    """
    处理单个结节的工作函数,适用于多进程

    Args:
        nodule: 结节信息(DataFrame的一行)
        mhd_root_dir: MHD文件的根目录
        label: 标签(0=良性, 1=恶性)
        if_aug: 是否需要做数据增强
        png_output: PNG输出目录
        npy_output: NPY输出目录
        check_output: 检查输出目录
        check_count_limit: 检查图像的最大数量
        cube_index: 结节索引

    Returns:
        处理结果信息字典
    """
    result = {
        'success': False,
        'patient_id': nodule['seriesuid'],
        'error': None,
        'check_saved': False
    }

    patient_id = nodule['seriesuid']
    # 获取MHD文件路径
    mhd_path = get_mhd_file_path(patient_id, mhd_root_dir)
    if mhd_path is None:
        result['error'] = "MHD文件未找到"
        return result

    patient_save_png_path = os.path.join(png_output, f"{patient_id}_mal={label}_{cube_index}.png")
    patient_save_npy_path = os.path.join(npy_output, f"{patient_id}_mal={label}_{cube_index}.npy")

    # 如果文件已存在,跳过处理
    if os.path.exists(patient_save_png_path):
        result['success'] = True
        result['error'] = "文件已存在,跳过处理"
        return result

    try:
        ct_data = CTData.from_mhd(mhd_path)
        ct_data.resample_pixel()
        ct_data.filter_lung_img_mask()

        # 处理结节
        one_nodule_cube = ctdata_annotation2nodule(ct_data, nodule, mal_label=label, cube_size=32)

        if one_nodule_cube is None:
            result['error'] = "结节体积为空"
            return result
        # 检查 良性结节的候选标注是否妥当。注意要先执行 resample_pixel 和 filter_lung_img_mask
        if label == 0:
            voxel_coord_x = one_nodule_cube.center_x
            voxel_coord_y = one_nodule_cube.center_y
            voxel_coord_z = one_nodule_cube.center_z
            this_nodule_valid = nodule_valid(ct_data, voxel_coord_x, voxel_coord_y, voxel_coord_z)
            if not this_nodule_valid:
                result['error'] = "结节不理想"
                return result
        # 保存为PNG拼接图和NPY文件
        one_nodule_cube.save_to_png(patient_save_png_path)
        one_nodule_cube.save_to_npy(patient_save_npy_path)
        if if_aug:
            # 3 次旋转
            rotation_nodule_cube1 = one_nodule_cube.augment(rotation=True)
            patient_save_png_path_rotatoion1 = patient_save_png_path.replace(".png", "_rotation1.png")
            patient_save_npy_path_rotatoion1 = patient_save_npy_path.replace(".npy", "_rotation1.npy")
            rotation_nodule_cube1.save_to_png(patient_save_png_path_rotatoion1)
            rotation_nodule_cube1.save_to_npy(patient_save_npy_path_rotatoion1)

            rotation_nodule_cube2 = one_nodule_cube.augment(rotation=True)
            patient_save_png_path_rotatoion2 = patient_save_png_path.replace(".png", "_rotation2.png")
            patient_save_npy_path_rotatoion2 = patient_save_npy_path.replace(".npy", "_rotation2.npy")
            rotation_nodule_cube2.save_to_png(patient_save_png_path_rotatoion2)
            rotation_nodule_cube2.save_to_npy(patient_save_npy_path_rotatoion2)

            rotation_nodule_cube3 = one_nodule_cube.augment(rotation=True)
            patient_save_png_path_rotatoion3 = patient_save_png_path.replace(".png", "_rotation3.png")
            patient_save_npy_path_rotatoion3 = patient_save_npy_path.replace(".npy", "_rotation3.npy")
            rotation_nodule_cube3.save_to_png(patient_save_png_path_rotatoion3)
            rotation_nodule_cube3.save_to_npy(patient_save_npy_path_rotatoion3)
            # 3次翻转
            flip_nodule_cube1 = one_nodule_cube.augment(rotation=False, flip_axis = 0)
            patient_save_png_path_flip1 = patient_save_png_path.replace(".png", "_flip1.png")
            patient_save_npy_path_flip1 = patient_save_npy_path.replace(".npy", "_flip1.npy")
            flip_nodule_cube1.save_to_png(patient_save_png_path_flip1)
            flip_nodule_cube1.save_to_npy(patient_save_npy_path_flip1)

            flip_nodule_cube2 = one_nodule_cube.augment(rotation=False, flip_axis=1)
            patient_save_png_path_flip2 = patient_save_png_path.replace(".png", "_flip2.png")
            patient_save_npy_path_flip2 = patient_save_npy_path.replace(".npy", "_flip2.npy")
            flip_nodule_cube2.save_to_png(patient_save_png_path_flip2)
            flip_nodule_cube2.save_to_npy(patient_save_npy_path_flip2)

            flip_nodule_cube3 = one_nodule_cube.augment(rotation=False, flip_axis=2)
            patient_save_png_path_flip3 = patient_save_png_path.replace(".png", "_flip3.png")
            patient_save_npy_path_flip3 = patient_save_npy_path.replace(".npy", "_flip3.npy")
            flip_nodule_cube3.save_to_png(patient_save_png_path_flip3)
            flip_nodule_cube3.save_to_npy(patient_save_npy_path_flip3)
            # 3次只加噪音
            noise_nodule_cube1 = one_nodule_cube.augment(rotation=False, flip_axis=-1, noise=True)
            patient_save_png_path_noise1 = patient_save_png_path.replace(".png", "_noise1.png")
            patient_save_npy_path_noise1 = patient_save_npy_path.replace(".npy", "_noise1.npy")
            noise_nodule_cube1.save_to_png(patient_save_png_path_noise1)
            noise_nodule_cube1.save_to_npy(patient_save_npy_path_noise1)

            noise_nodule_cube2 = one_nodule_cube.augment(rotation=False, flip_axis=-1, noise=True)
            patient_save_png_path_noise2 = patient_save_png_path.replace(".png", "_noise2.png")
            patient_save_npy_path_noise2 = patient_save_npy_path.replace(".npy", "_noise2.npy")
            noise_nodule_cube2.save_to_png(patient_save_png_path_noise2)
            noise_nodule_cube2.save_to_npy(patient_save_npy_path_noise2)

            noise_nodule_cube3 = one_nodule_cube.augment(rotation=False, flip_axis=-1, noise=True)
            patient_save_png_path_noise3 = patient_save_png_path.replace(".png", "_noise3.png")
            patient_save_npy_path_noise3 = patient_save_npy_path.replace(".npy", "_noise3.npy")
            noise_nodule_cube3.save_to_png(patient_save_png_path_noise3)
            noise_nodule_cube3.save_to_npy(patient_save_npy_path_noise3)

        # 判断是否需要保存检查图像
        if cube_index < check_count_limit:
            viz_path = os.path.join(check_output, f"{patient_id}_nodule_mal={label}_viz_{cube_index}.png")
            one_nodule_cube.visualize_3d(output_path=viz_path, show=False)
            result['check_saved'] = True

        result['success'] = True

    except Exception as e:
        result['error'] = str(e)
    return result

def prepare_cubes_mp(mhd_root_dir, annotation_csv, label, png_output, npy_output, check_output,if_aug = False, num_processes=None, max_samples = 60000):
    """
    多进程版本:从标注csv文件和mhd目录创建结节立方块

    Args:
        mhd_root_dir: mhd文件的根目录
        annotation_csv: 结节标注文件
        label: 当前标注文件良(0)/恶(1)性
        png_output: 结节立方块图片保存目录
        npy_output: 结节立方块数据保存目录
        check_output: 用于检查结节数据抽取是否准确的目录
        if_aug:     是否做数据增强,恶性结节数据较少,需要做增强
        num_processes: 进程数量,默认为CPU核心数的80%
        max_samples:   最大样本数,主要是针对负样本,因为正样本增强后也只有 9300,负样本有好几万
    """
    # 创建输出目录
    os.makedirs(png_output, exist_ok=True)
    os.makedirs(npy_output, exist_ok=True)
    os.makedirs(check_output, exist_ok=True)

    # 确定进程数
    if num_processes is None:
        num_processes = max(1, int(mp.cpu_count() * 0.8))

    # 加载标注数据
    annotations_df = pd.read_csv(annotation_csv, encoding="utf-8")
    print(annotations_df["class"].unique().tolist())
    # 确定只使用class=0 的记录,候选集里面还是有大量 class=1的数据
    if "class" in annotations_df.columns:
        annotations_df = annotations_df[annotations_df["class"] == 0].copy()

    annotations_df = annotations_df[:max_samples]
    total_nodules = len(annotations_df)

    print(f"开始处理 {total_nodules} 个{'恶性' if label == 1 else '良性'}结节,使用 {num_processes} 个进程")

    # 设置检查图像的数量限制
    check_count_limit = min(10, total_nodules)

    # 创建参数列表
    args_list = [
        (
            row,  # nodule
            i,  # cube_index
            mhd_root_dir,
            label,
            if_aug,
            png_output,
            npy_output,
            check_output,
            check_count_limit
        )
        for i, row in annotations_df.iterrows()
    ]
    # 使用进程池处理数据
    start_time = time.time()
    with mp.Pool(processes=num_processes) as pool:
        # 使用imap返回结果按顺序处理,并显示进度条
        results = list(tqdm(
            pool.starmap(process_nodule, args_list),
            total=total_nodules,
            desc=f"处理 {'恶性' if label == 1 else '良性'} 结节"
        ))

    # 统计处理结果
    success_count = sum(1 for r in results if r['success'])
    error_count = sum(1 for r in results if not r['success'])

    # 计算处理时间
    elapsed_time = time.time() - start_time
    avg_time_per_nodule = elapsed_time / total_nodules if total_nodules > 0 else 0

    print(
        f"处理完成!成功: {success_count}, 失败: {error_count}, 总共耗时: {elapsed_time:.2f}秒, 平均每个结节: {avg_time_per_nodule:.2f}秒")

    # 如果有错误,输出前5个错误
    if error_count > 0:
        print("错误样例:")
        error_samples = [r for r in results if not r['success']][:5]
        for i, sample in enumerate(error_samples):
            print(f"  {i + 1}. 患者ID: {sample['patient_id']}, 错误: {sample['error']}")

    return success_count, error_count


def main():
    """主函数"""
    # 设置路径
    positive_nodule_annotation_file = "H:/luna16/annotations.csv"
    negative_nodule_annotation_file = "H:/luna16/candidates_V2.csv"
    save_real_candidate_csv = "H:/luna16/candidates_clean.csv"
    luna16_root = "H:/luna16"
    # 从候选结节中筛选出 合适的 负样本,不要直接使用,里面存在大量有问题的数据
    get_real_candidate(luna16_root, positive_nodule_annotation_file, negative_nodule_annotation_file, save_real_candidate_csv)
    # 输出目录
    positive_png_save_dir = "J:/luna16_processed/positive_pngs/"
    positive_npy_save_dir = "J:/luna16_processed/positive_npys/"
    check_save_dir = "J:/luna16_processed/check_pngs/"
    # 设置进程数,默认为CPU核心数的80%
    num_processes = max(1, int(mp.cpu_count() * 0.8))
    print(f"系统检测到 {mp.cpu_count()} 个CPU核心,将使用 {num_processes} 个进程进行处理")
    # 处理恶性结节
    print("\n===== 处理恶性结节 =====")
    # success_pos, error_pos = prepare_cubes_mp(
    #     luna16_root,
    #     positive_nodule_annotation_file,
    #     1,
    #     positive_png_save_dir,
    #     positive_npy_save_dir,
    #     check_save_dir,
    #     True,
    #     num_processes
    # )
    # 处理良性结节
    print("\n===== 处理良性结节 =====")
    negative_png_save_dir = "J:/luna16_processed/negative_pngs/"
    negative_npy_save_dir = "J:/luna16_processed/negative_npys/"
    success_neg, error_neg = prepare_cubes_mp(
        luna16_root,
        save_real_candidate_csv,
        0,
        negative_png_save_dir,
        negative_npy_save_dir,
        check_save_dir,
        False,
        num_processes
    )
    # 总结处理结果
    print("\n===== 处理总结 =====")
    # print(f"恶性结节: 成功 {success_pos}, 失败 {error_pos}")
    print(f"良性结节: 成功 {success_neg}, 失败 {error_neg}")
    # print(f"总计: 成功 {success_pos + success_neg}, 失败 {error_pos + error_neg}")

if __name__ == "__main__":
    # 防止Windows多进程问题
    mp.freeze_support()
    main()

================================================
FILE: deploy/README.md
================================================
# CT图像分析系统

这是一个基于深度学习的肺部CT图像分析系统,可以检测肺结节并评估其恶性概率。

## 系统功能

- 支持多种CT数据格式(DICOM、MHD/RAW等)
- 全3D肺部可视化展示(体积视图、横断面、冠状面、矢状面)
- 自动检测肺部结节并显示其位置和大小
- 计算结节恶性概率并给出医学建议
- 可交互式浏览结节预览

## 系统要求

- Python 3.11或更高版本
- 浏览器: Chrome, Firefox, Edge(最新版本)
- 操作系统: Windows 10/11, macOS, Linux

## 依赖库

- Flask: Web服务器
- NumPy: 数值计算
- TensorFlow: 深度学习框架
- SimpleITK: 医学图像处理
- Three.js: 3D渲染(已在界面中引用)

## 安装

1. 安装依赖库:

```bash
pip install flask flask-cors numpy tensorflow SimpleITK
```

2. 运行系统:

```bash
python run.py
```

## 使用说明

1. 启动系统后,将自动打开浏览器访问`http://localhost:5000`
2. 点击"上传CT数据"按钮,选择CT数据文件
3. 系统会自动进行以下处理:
   - 上传数据
   - 数据预处理
   - 模型分析
   - 结果可视化
4. 处理完成后,可以在主界面查看3D肺部模型和检测到的结节
5. 右侧结节预览区域显示所有检测到的结节,点击可高亮显示对应结节
6. 可以使用视图切换按钮查看不同视角的肺部模型

## 文件结构

```
deploy/
├── backend/              # 后端代码
│   ├── models/           # 模型存储
│   ├── uploads/          # 上传文件临时存储
│   └── app.py            # 后端主程序
├── frontend/             # 前端代码
│   ├── css/              # 样式文件
│   ├── js/               # JavaScript代码
│   └── index.html        # 主页面
├── run.py                # 启动脚本
└── README.md             # 说明文档
```

## 注意事项

- 本系统仅供研究和教学使用,不应用于实际医疗诊断
- 大尺寸CT数据处理可能需要较长时间,请耐心等待
- 结节检测结果仅供参考,实际诊断应由专业医生进行 

================================================
FILE: deploy/backend/app.py
================================================
import os,cv2
import sys
import numpy as np
from flask import Flask, request, jsonify, send_from_directory, send_file, Response, make_response
from flask_cors import CORS
import logging
import json
import uuid
import threading
import zipfile
from PIL import Image
from datetime import datetime
import io
import struct
from deploy.backend.dataclass.CTData import CTData
from util.dicom_util import is_dicom_file
from io import BytesIO
import matplotlib
# 设置matplotlib后端为非交互式
matplotlib.use('Agg')
matplotlib.rcParams['font.sans-serif'] = ['Microsoft YaHei']  # 或者你系统中的其他中文字体
matplotlib.rcParams['axes.unicode_minus'] = False  # 正确显示负号
import matplotlib.pyplot as plt

# 添加项目根目录到系统路径
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
# 导入检测器模块
from detector import get_detector_instance, STATUS_COMPLETED, STATUS_ERROR

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# 模型参数
CUBE_SIZE = 32
# 配置上传文件夹和模型文件夹
UPLOAD_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'uploads')
MODEL_FOLDER = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'data')
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
os.makedirs(MODEL_FOLDER, exist_ok=True)
# 允许的文件扩展名
ALLOWED_EXTENSIONS = {'mhd', 'raw', 'nii', 'nii.gz', 'dcm', 'dicom', 'zip'}
# 会话数据存储 - 用于存储上传的文件和处理状态
SESSION_DATA = {}
# 创建Flask应用
app = Flask(__name__, static_folder='../frontend')
CORS(app)  # 启用跨域
# 配置应用
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
app.config['MAX_CONTENT_LENGTH'] = 500 * 1024 * 1024  # 500MB限制

# 设置默认模型路径
DEFAULT_MODEL_PATH = os.path.join(MODEL_FOLDER, 'c3d_nodule_detect.pth')
# 初始化检测器实例
detector = get_detector_instance(DEFAULT_MODEL_PATH)


# 添加检测完成回调函数
def on_detection_completed(session_id, results):
    """检测完成后的回调函数,更新会话状态"""
    logger.info(f"检测完成回调: 会话 {session_id}")
    if session_id in SESSION_DATA:
        # 更新会话状态为completed
        SESSION_DATA[session_id]['status'] = 'completed'
        SESSION_DATA[session_id]['results'] = results
        SESSION_DATA[session_id]['progress'] = 100
        SESSION_DATA[session_id]['message'] = f"检测完成,发现 {len(results.get('nodules', []))} 个结节"

        # 更新检测器中的状态
        detector.update_session_state(session_id, SESSION_DATA[session_id])

        # 生成结节图像
        try:
            nodules = results.get('nodules', [])
            if nodules:
                logger.info(f"检测完成,开始为 {len(nodules)} 个结节生成图像")
                lung_seg_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_seg.npy')
                if os.path.exists(lung_seg_path):
                    lung_img = np.load(lung_seg_path)
                    if lung_img is not None and lung_img.size > 0:
                        logger.info(f"成功加载肺部分割数据,形状: {lung_img.shape}")
                        success = generate_nodule_images(session_id, nodules, lung_img)
                        if success:
                            logger.info(f"结节图像生成完成")
                        else:
                            logger.error(f"结节图像生成失败")
                    else:
                        logger.error(f"肺部分割数据为空或无效")
                else:
                    logger.error(f"肺部分割数据文件不存在: {lung_seg_path}")
            else:
                logger.info(f"无结节数据,跳过图像生成")
        except Exception as e:
            logger.error(f"生成结节图像时出错: {str(e)}", exc_info=True)


# 检查检测器是否支持回调并添加
if hasattr(detector, 'set_completion_callback'):
    detector.set_completion_callback(on_detection_completed)
else:
    logger.warning("检测器不支持完成回调,状态更新可能不正确")

# 添加兼容性方法,确保detector支持update_session_state
if not hasattr(detector, 'update_session_state'):
    def update_session_state(session_id, state):
        """
        更新会话状态的兼容性方法
        确保session_states字典存在并更新状态
        """
        if not hasattr(detector, 'session_states'):
            detector.session_states = {}

        if not hasattr(detector, 'session_locks'):
            detector.session_locks = {}

        # 创建会话锁(如果不存在)
        if session_id not in detector.session_locks:
            detector.session_locks[session_id] = threading.Lock()

        # 更新会话状态
        with detector.session_locks.get(session_id, threading.Lock()):
            detector.session_states[session_id] = state.copy()  # 使用副本避免引用问题

        logger.info(f"更新会话 {session_id} 的状态: {state['status']}")


    # 将方法添加到检测器对象
    detector.update_session_state = update_session_state

    # 如果get_session_state方法也不存在,添加它
    if not hasattr(detector, 'get_session_state'):
        def get_session_state(session_id):
            """获取会话状态的兼容性方法"""
            if not hasattr(detector, 'session_states'):
                detector.session_states = {}

            return detector.session_states.get(session_id)


        detector.get_session_state = get_session_state


def allowed_file(filename):
    """检查文件扩展名是否允许"""
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS


@app.route('/')
def index():
    """提供前端页面"""
    return send_from_directory('../frontend', 'index.html')


@app.route('/<path:path>')
def static_files(path):
    """提供静态文件"""
    return send_from_directory('../frontend', path)


@app.route('/api/upload', methods=['POST'])
def upload_file():
    """
    处理CT文件上传
    支持上传包含DICOM或MHD/RAW文件的压缩包
    自动解压并检测文件类型
    """
    try:
        # 检查是否有文件
        if 'file' not in request.files:
            return jsonify({'success': False, 'error': '没有找到文件'}), 400

        file = request.files['file']

        # 如果用户没有选择文件
        if file.filename == '':
            return jsonify({'success': False, 'error': '没有选择文件'}), 400

        # 检查文件是否为zip
        if not file.filename.lower().endswith('.zip'):
            return jsonify({'success': False, 'error': '请上传ZIP格式的文件'}), 400

        # 创建新的会话ID
        session_id = str(uuid.uuid4())

        # 创建会话目录
        session_dir = os.path.join(UPLOAD_FOLDER, session_id)
        os.makedirs(session_dir, exist_ok=True)

        # 保存压缩文件
        zip_path = os.path.join(session_dir, 'upload.zip')
        file.save(zip_path)
        app.logger.info(f"保存压缩文件到 {zip_path}")

        # 解压文件
        extract_dir = os.path.join(session_dir, 'extracted')
        os.makedirs(extract_dir, exist_ok=True)

        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_dir)
        zip_filename = os.path.splitext(os.path.basename(file.filename))[0]
        real_extracted_path = os.path.join(extract_dir, zip_filename)
        print("真实的加压缩文件目录是\t", real_extracted_path)
        app.logger.info(f"解压文件到 {real_extracted_path}")
        # 检测文件类型
        file_type, file_paths = detect_file_type(real_extracted_path)
        print("检测到的类型\t", file_type)
        print("检测到的文件列表\t", file_paths)
        if not file_type:
            return jsonify({'success': False, 'error': '压缩包中未找到支持的CT数据文件(DICOM或MHD/RAW)'}), 400

        # 记录会话信息
        session_info = {
            'id': session_id,
            'timestamp': datetime.now().isoformat(),
            'file_type': file_type,
            'files': file_paths,
            'extract_dir': real_extracted_path,
            'status': 'uploaded'
        }

        # 可选:保存患者ID
        if 'patient_id' in request.form:
            session_info['patient_id'] = request.form['patient_id']

        # 保存会话信息到文件
        session_info_path = os.path.join(session_dir, 'session_info.json')
        with open(session_info_path, 'w') as f:
            json.dump(session_info, f)

        # 将会话ID添加到全局会话字典
        SESSION_DATA[session_id] = {
            'status': 'uploaded',
            'progress': 0,
            'message': f'已上传并解压{file_type}格式CT数据,等待检测',
            'file_type': file_type,
            'files': file_paths
        }

        # 如果配置了自动处理,启动预处理任务
        auto_detect = app.config.get('AUTO_DETECT', False)
        if auto_detect:
            # 启动异步任务进行预处理
            threading.Thread(target=start_preprocessing, args=(session_id,)).start()
            app.logger.info(f"启动自动预处理任务,会话ID: {session_id}")

        # 返回成功响应
        return jsonify({
            'success': True,
            'message': f'文件上传成功,检测到{file_type}格式CT数据',
            'session_id': session_id,
            'file_type': file_type,
            'auto_detect': auto_detect
        })

    except Exception as e:
        app.logger.error(f"文件上传失败: {str(e)}", exc_info=True)
        return jsonify({
            'success': False,
            'error': f"文件上传失败: {str(e)}"
        }), 400


def detect_file_type(directory):
    """
    检测目录中的CT数据类型
    支持DICOM和MHD/RAW格式
    返回: (文件类型, 文件路径列表)
    """
    # 递归查找所有文件
    file_list = []
    for root, _, files in os.walk(directory):
        for file in files:
            file_list.append(os.path.join(root, file))
    print("检查的文件夹是\t", directory)
    print(file_list)
    # 检查是否有MHD文件
    mhd_files = [f for f in file_list if f.lower().endswith('.mhd')]
    if mhd_files:
        # 检查对应的RAW文件是否存在
        for mhd_file in mhd_files:
            # 获取对应的RAW文件名(替换扩展名)
            raw_file = os.path.splitext(mhd_file)[0] + '.raw'
            # 忽略大小写比较
            if any(f.lower() == raw_file.lower() for f in file_list):
                # 找到MHD和对应的RAW文件
                return 'MHD/RAW', [mhd_file]

    # 检查是否有DICOM文件
    dicom_files = [f for f in file_list if f.lower().endswith(('.dcm', '.dicom')) or
                   (os.path.isfile(f) and is_dicom_file(f))]

    if dicom_files:
        return 'DICOM', dicom_files

    # 如果没有找到支持的文件类型
    return None, []


def start_preprocessing(session_id):
    """
    启动CT数据预处理
    将CT数据转换为肺部分割数据
    """
    try:
        # 获取会话状态
        if session_id not in SESSION_DATA:
            logger.error(f"会话 {session_id} 不存在")
            return

        state = SESSION_DATA[session_id]
        state['status'] = 'preprocessing'
        state['progress'] = 0
        state['message'] = '正在开始预处理...'

        # 同步会话状态到检测器
        detector.update_session_state(session_id, state)

        # 获取会话目录
        session_dir = os.path.join(UPLOAD_FOLDER, session_id)
        # 加载会话信息
        with open(os.path.join(session_dir, 'session_info.json'), 'r') as f:
            session_info = json.load(f)
        print("加载检测的信息是\n", session_info)

        file_type = session_info['file_type']
        extract_dir = session_info['extract_dir']  # 使用保存的解压目录

        # 更新状态
        state['progress'] = 10
        state['message'] = f'正在加载{file_type}格式CT数据...'
        # 同步状态更新
        detector.update_session_state(session_id, state)

        # 根据文件类型进行处理
        ct_data = None

        print("使用的解压目录是\n", extract_dir)

        if file_type == 'MHD/RAW':
            # 使用MHD文件的目录路径,而不是单个文件路径
            ct_data = CTData.from_mhd(session_info['files'][0])

        elif file_type == 'DICOM':
            # 对于DICOM,使用包含所有DICOM文件的目录路径
            ct_data = CTData.from_dicom(extract_dir)

        # 检查是否成功加载
        if ct_data is None:
            state['status'] = 'error'
            state['message'] = '加载CT数据失败'
            SESSION_DATA[session_id] = state
            detector.update_session_state(session_id, state)
            logger.error(f"加载CT数据失败,会话ID: {session_id}")
            return

        # 更新状态
        state['progress'] = 30
        state['message'] = '正在进行肺部分割...'
        # 同步状态更新
        detector.update_session_state(session_id, state)

        ct_data.resample_pixel()
        ct_data.filter_lung_img_mask()
        # 执行肺部分割
        lung_seg = ct_data.lung_seg_img

        # 保存肺部分割结果
        lung_seg_path = os.path.join(session_dir, 'lung_seg.npy')
        np.save(lung_seg_path, lung_seg)

        # 更新状态 - 肺部分割完成
        state['progress'] = 40
        state['message'] = '肺部分割完成,准备检测'
        state['status'] = 'preprocessed'  # 在肺部分割完成时就设置为preprocessed
        # 添加肺部分割路径
        state['lung_seg_path'] = lung_seg_path
        SESSION_DATA[session_id] = state
        detector.update_session_state(session_id, state)

        # 自动启动检测 - 传递正确的路径
        if file_type == 'MHD/RAW':
            # 对于MHD使用文件路径
            detector.detect(session_info['files'][0], session_id, patient_id=None)
        else:
            # 对于DICOM使用目录路径
            detector.detect(extract_dir, session_id, patient_id=None)

        # 在保存肺部分割数据的时候,同时保存所有切片图像
        save_lung_segmentation_slices(session_id, lung_seg)

    except Exception as e:
        logger.error(f"预处理失败: {str(e)}", exc_info=True)

        # 更新状态为错误
        if session_id in SESSION_DATA:
            SESSION_DATA[session_id]['status'] = 'error'
            SESSION_DATA[session_id]['message'] = f'预处理失败: {str(e)}'
            detector.update_session_state(session_id, SESSION_DATA[session_id])


@app.route('/api/detect', methods=['POST'])
def start_detection():
    """启动检测过程"""
    try:
        # 获取请求参数
        data = request.json
        session_id = data.get('session_id')
        patient_id = data.get('patient_id', session_id)

        if not session_id:
            return jsonify({'success': False, 'error': '缺少会话ID'}), 400

        # 获取会话文件夹和文件
        session_folder = os.path.join(UPLOAD_FOLDER, session_id)
        if not os.path.exists(session_folder):
            return jsonify({'success': False, 'error': f'会话 {session_id} 不存在'}), 404

        # 读取session_info.json获取正确的文件路径
        session_info_path = os.path.join(session_folder, 'session_info.json')
        if not os.path.exists(session_info_path):
            return jsonify({'success': False, 'error': '会话信息文件不存在'}), 404

        # 加载会话信息
        with open(session_info_path, 'r') as f:
            session_info = json.load(f)

        # 根据文件类型选择正确的路径
        file_type = session_info.get('file_type')
        extract_dir = session_info.get('extract_dir')

        if not file_type or not extract_dir:
            return jsonify({'success': False, 'error': '会话信息不完整'}), 400

        # 使用与start_preprocessing相同的逻辑
        if file_type == 'MHD/RAW':
            # 对于MHD使用文件路径
            file_path = session_info['files'][0]
        else:
            # 对于DICOM使用目录路径
            file_path = extract_dir

        logger.info(f"使用文件路径 {file_path} 开始检测")

        # 启动检测
        if not detector.detect(file_path, session_id, patient_id):
            return jsonify({'success': False, 'error': '检测启动失败,请确保模型已正确加载'}), 500

        # 返回会话ID,客户端将使用此ID轮询进度
        return jsonify({
            'success': True,
            'session_id': session_id,
            'message': '检测已启动'
        })

    except Exception as e:
        logger.error(f"启动检测错误: {str(e)}", exc_info=True)
        return jsonify({'success': False, 'error': str(e)}), 500


@app.route('/api/progress/<session_id>', methods=['GET'])
def get_progress(session_id):
    """获取检测进度"""
    try:
        # 获取会话状态 - 首先尝试从检测器获取
        state = detector.get_session_state(session_id)

        # 如果检测器中没有,则尝试从SESSION_DATA获取
        if not state and session_id in SESSION_DATA:
            state = SESSION_DATA[session_id]
            logger.info(f"从SESSION_DATA中获取会话 {session_id} 的状态")

        # 如果两处都没有,则返回错误
        if not state:
            return jsonify({'success': False, 'error': f'会话 {session_id} 不存在'}), 404

        # 创建响应
        response = {
            'success': True,
            'session_id': session_id,
            'status': state['status'],
            'progress': state['progress'],
            'message': state['message'],
            'started_at': state.get('started_at'),
            'completed_at': state.get('completed_at')
        }

        # 如果检测完成,添加结节信息
        if state['status'] == STATUS_COMPLETED and 'nodules' in state:
            response['nodules_count'] = len(state['nodules'])

        # 如果发生错误,添加错误信息
        if state['status'] == STATUS_ERROR and 'error' in state:
            response['error'] = state['error']

        return jsonify(response)

    except Exception as e:
        logger.error(f"获取进度错误: {str(e)}", exc_info=True)
        return jsonify({'success': False, 'error': str(e)}), 500


@app.route('/api/results/<session_id>', methods=['GET'])
def get_results(session_id):
    """获取检测结果"""
    try:
        # 获取会话状态
        state = detector.get_session_state(session_id)
        if not state:
            return jsonify({'success': False, 'error': f'会话 {session_id} 不存在'}), 404
        print("结节api/result 返回状态", state['status'])
        # 检查检测是否完成
        if state['status'] != STATUS_COMPLETED:
            return jsonify({
                'success': False,
                'error': '检测尚未完成',
                'status': state['status'],
                'progress': state['progress']
            }), 400

        # 检查是否有结节数据
        if 'nodules' not in state or not state['nodules']:
            print('nodule 不在最终结果里面')
            return jsonify({
                'success': True,
                'session_id': session_id,
                'nodules_count': 0,
                'nodules': [],
                'message': '未检测到结节'
            })

        # 返回结节信息
        # 注意:为了减少传输数据量,这里不返回完整的立方体数据
        nodules_info = []
        for nodule in state['nodules']:
            print('尝试在返回结果增加每个结节信息')
            # 创建不包含立方体数据的结节信息副本
            nodule_info = {
                'id': nodule['id'],
                'voxel_coords': nodule['voxel_coords'],
                'world_coords': nodule['world_coords'],
                'diameter_mm': nodule['diameter_mm'],
                'probability': nodule['probability']
            }
            nodules_info.append(nodule_info)
            lung_seg_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_seg.npy')
            lung_seg_img = np.load(lung_seg_path)
            print('开始保存 结节的bbox...')
            save_slice_box(session_id, lung_seg_img, nodule['voxel_coords'], radius=32)
            print('保存一个结节的bbox 完成。。。')

        return jsonify({
            'success': True,
            'session_id': session_id,
            'nodules_count': len(nodules_info),
            'nodules': nodules_info,
            'message': f'检测完成,共发现 {len(nodules_info)} 个结节'
        })

    except Exception as e:
        logger.error(f"获取结果错误: {str(e)}", exc_info=True)
        return jsonify({'success': False, 'error': str(e)}), 500


@app.route('/api/nodule/<session_id>/<int:nodule_id>', methods=['GET'])
def get_nodule_data(session_id, nodule_id):
    try:
        # 检查会话是否存在
        if session_id not in SESSION_DATA:
            return jsonify({"success": False, "error": "会话不存在"}), 404

        # 获取会话数据
        session = SESSION_DATA[session_id]
        print('返回 api nodule 结果状态\t', session['status'])
        # 检查是否已完成检测
        if session['status'] not in ['preprocessed', 'completed']:
            return jsonify({"success": False, "error": "CT数据检测未完成"}), 400
        # 获取检测结果
        results = session.get('results', {})
        nodules = results.get('nodules', [])
        # 查找指定的结节
        target_nodule = None
        for nodule in nodules:
            if nodule.get('id') == nodule_id:
                target_nodule = nodule
                break
        if not target_nodule:
            return jsonify({"success": False, "error": "找不到指定的结节"}), 404
        # 返回结节数据
        return jsonify({
            "success": True,
            "nodule": target_nodule
        })

    except Exception as e:
        app.logger.error(f"获取结节数据时出错: {str(e)}")
        return jsonify({"success": False, "error": f"获取结节数据时出错: {str(e)}"}), 500


# 添加新函数: 在检测完成后为所有结节生成并保存图像
def generate_nodule_images(session_id, nodules, lung_img):
    """
    为所有结节生成并保存切片图像

    Args:
        session_id: 会话ID
        nodules: 结节列表
        lung_img: 肺部分割数据
    """
    try:
        # 确保结节图像目录存在
        nodule_images_dir = os.path.join(UPLOAD_FOLDER, session_id, 'nodule_images')
        os.makedirs(nodule_images_dir, exist_ok=True)
        app.logger.info(f"开始为 {len(nodules)} 个结节生成切片图像")

        # 打印肺部图像数据统计信息以进行调试
        app.logger.info(f"肺部图像数据统计: 形状={lung_img.shape}, 类型={lung_img.dtype}, "
                        f"最小值={lung_img.min()}, 最大值={lung_img.max()}, "
                        f"平均值={lung_img.mean()}, 中位数={np.median(lung_img)}")

        # 检查肺部图像数据是否全为0或接近于0
        if np.max(lung_img) < 0.01:
            app.logger.warning("警告: 肺部图像数据几乎全为零,可能导致结节图像显示为黑色")
            # 尝试自动增强对比度 - 将较小的值映射到0,将较大的值映射到255
            lung_img = (lung_img * 100).clip(0, 1)
            app.logger.info(
                f"增强对比度后统计: 最小值={lung_img.min()}, 最大值={lung_img.max()}, 平均值={lung_img.mean()}")

        # 处理每个结节
        for nodule in nodules:
            nodule_id = nodule.get('id')
            # 获取结节中心坐标
            voxel_coords = nodule.get('voxel_coords', [0, 0, 0])
            app.logger.info(f"处理结节 {nodule_id}, 原始坐标 (xyz格式): {voxel_coords}")
            # 将坐标转换为整数,注意从xyz转换为zyx顺序
            x, y, z = [int(coord) for coord in voxel_coords]
            # 交换坐标顺序为z,y,x以匹配lung_img的顺序
            center_voxel_zyx = [z, y, x]
            app.logger.info(f"转换后坐标 (zyx格式): {center_voxel_zyx}")
            # 检查坐标是否在肺部图像范围内
            if (center_voxel_zyx[0] < 0 or center_voxel_zyx[0] >= lung_img.shape[0] or
                    center_voxel_zyx[1] < 0 or center_voxel_zyx[1] >= lung_img.shape[1] or
                    center_voxel_zyx[2] < 0 or center_voxel_zyx[2] >= lung_img.shape[2]):
                app.logger.error(
                    f"结节 {nodule_id} 坐标超出肺部图像范围: {center_voxel_zyx}, 肺部图像尺寸: {lung_img.shape}")
                continue
            # 定义立方体的半尺寸
            half_size = 16  # CUBE_SIZE/2 = 16
            # 提取立方体数据
            z_min = max(0, center_voxel_zyx[0] - half_size)
            y_min = max(0, center_voxel_zyx[1] - half_size)
            x_min = max(0, center_voxel_zyx[2] - half_size)

            z_max = min(lung_img.shape[0], center_voxel_zyx[0] + half_size)
            y_max = min(lung_img.shape[1], center_voxel_zyx[1] + half_size)
            x_max = min(lung_img.shape[2], center_voxel_zyx[2] + half_size)

            # 检查提取的范围是否有效
            if z_min >= z_max or y_min >= y_max or x_min >= x_max:
                app.logger.error(
                    f"结节 {nodule_id} 提取的范围无效: z({z_min}-{z_max}), y({y_min}-{y_max}), x({x_min}-{x_max})")
                continue

            # 提取子体积
            cube = lung_img[z_min:z_max, y_min:y_max, x_min:x_max]
            # 检查立方体数据是否为空
            if cube.size == 0:
                app.logger.error(f"结节 {nodule_id} 提取的立方体数据为空")
                continue

            app.logger.info(f"成功提取结节 {nodule_id} 立方体数据, 形状: {cube.shape}")
            app.logger.info(f"立方体数据统计: 最小值={cube.min()}, 最大值={cube.max()}, 平均值={cube.mean()}")

            # 为每个平面类型生成图像
            for plane_type in ['axial', 'coronal', 'sagittal']:
                try:
                    # 根据平面类型获取中心切片
                    if plane_type == 'axial':  # Z轴切片
                        if cube.shape[0] == 0:
                            app.logger.error(f"结节 {nodule_id} 在轴向切片上的维度为0")
                            continue
                        center_index = min(cube.shape[0] // 2, cube.shape[0] - 1)
                        slice_data = cube[center_index, :, :]
                    elif plane_type == 'coronal':  # Y轴切片
                        if cube.shape[1] == 0:
                            app.logger.error(f"结节 {nodule_id} 在冠状切片上的维度为0")
                            continue
                        center_index = min(cube.shape[1] // 2, cube.shape[1] - 1)
                        slice_data = cube[:, center_index, :]
                    elif plane_type == 'sagittal':  # X轴切片
                        if cube.shape[2] == 0:
                            app.logger.error(f"结节 {nodule_id} 在矢状切片上的维度为0")
                            continue
                        center_index = min(cube.shape[2] // 2, cube.shape[2] - 1)
                        slice_data = cube[:, :, center_index]

                    # 检查切片数据是否为空
                    if slice_data.size == 0:
                        app.logger.error(f"结节 {nodule_id} 在{plane_type}切片上的数据为空")
                        continue

                    app.logger.info(
                        f"切片数据统计: 最小值={slice_data.min()}, 最大值={slice_data.max()}, 平均值={slice_data.mean()}")

                    # 增强对比度
                    # 如果所有值都很小(接近0),进行强化对比度处理
                    if slice_data.max() < 0.1:
                        # 将数据放大100倍以增强可见度
                        slice_data = np.clip(slice_data * 100, 0, 1)
                        app.logger.info(f"对比度增强后: 最小值={slice_data.min()}, 最大值={slice_data.max()}")
                    # 归一化数据
                    if np.all(slice_data == 0):  # 检查是否全为0
                        app.logger.warning(f"结节切片数据全为0")
                        normalized_slice = np.zeros(slice_data.shape, dtype=np.uint8)
                    elif slice_data.min() >= 0 and slice_data.max() <= 1:
                        # 0-1范围数据,转为0-255
                        normalized_slice = (slice_data * 255).astype(np.uint8)
                    elif slice_data.min() >= 0 and slice_data.max() <= 255:
                        normalized_slice = slice_data.astype(np.uint8)
                    else:
                        min_val = slice_data.min()
                        max_val = slice_data.max()
                        if max_val > min_val:
                            normalized_slice = ((slice_data - min_val) / (max_val - min_val) * 255).astype(np.uint8)
                        else:
                            normalized_slice = np.zeros_like(slice_data, dtype=np.uint8)

                    app.logger.info(
                        f"归一化后数据统计: 最小值={normalized_slice.min()}, 最大值={normalized_slice.max()}, 平均值={normalized_slice.mean()}")

                    # 应用额外的图像增强(如果需要)
                    if normalized_slice.max() < 50:  # 如果最大值仍然很小
                        # 应用CLAHE(对比度受限的自适应直方图均衡化)
                        app.logger.info("应用CLAHE增强对比度")
                        try:
                            clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8, 8))
                            normalized_slice = clahe.apply(normalized_slice)
                        except ImportError:
                            # 如果没有cv2,使用简单的线性拉伸
                            normalized_slice = np.clip(normalized_slice * 5, 0, 255).astype(np.uint8)

                    # 创建图像
                    plt.figure(figsize=(6, 6))
                    plt.imshow(normalized_slice, cmap='gray', vmin=0, vmax=255)

                    # 添加标题
                    view_names = {
                        'axial': '轴向视图 (XY)',
                        'coronal': '冠状视图 (XZ)',
                        'sagittal': '矢状视图 (YZ)'
                    }
                    plt.title(f"结节 {nodule_id} - {view_names.get(plane_type, plane_type)}")

                    # 添加中心标记
                    center_y, center_x = normalized_slice.shape[0] // 2, normalized_slice.shape[1] // 2
                    plt.plot(center_x, center_y, 'r+', markersize=10)

                    # 添加直径标记
                    diameter = nodule.get('diameter_mm', 10)
                    radius_pixels = diameter / 2
                    circle = plt.Circle((center_x, center_y), radius_pixels,
                                        color='r', fill=False, linestyle='--')
                    plt.gca().add_patch(circle)

                    # 添加颜色条以直观显示像素值
                    plt.colorbar(label='像素值')

                    # 关闭坐标轴
                    plt.axis('off')
                    plt.tight_layout()

                    # 保存图像到文件
                    img_filename = f"nodule_{nodule_id}_{plane_type}.png"
                    img_path = os.path.join(nodule_images_dir, img_filename)

                    # 保存图像
                    plt.savefig(img_path, format='png', dpi=100)
                    plt.close('all')  # 确保关闭所有图形

                    app.logger.info(f"已保存结节图像: {img_path}")
                except Exception as slice_error:
                    app.logger.error(f"生成结节 {nodule_id} 的 {plane_type} 图像时出错: {str(slice_error)}",
                                     exc_info=True)

        app.logger.info(f"结节图像生成完成")
        return True

    except Exception as e:
        app.logger.error(f"生成结节图像时出错: {str(e)}", exc_info=True)
        return False


# 修改检测完成时的代码,在检测完成后生成结节图像
def update_session_state(session_id, state):
    """更新会话状态"""
    if session_id in SESSION_DATA:
        SESSION_DATA[session_id]['status'] = state

        # 如果检测已完成,生成所有结节的图像
        if state == 'completed' and 'results' in SESSION_DATA[session_id]:
            nodules = SESSION_DATA[session_id]['results'].get('nodules', [])
            if nodules:
                app.logger.info(f"检测完成,开始生成结节图像")
                # 加载肺部分割数据用于生成图像
                try:
                    lung_seg_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_seg.npy')
                    if os.path.exists(lung_seg_path):
                        lung_img = np.load(lung_seg_path)
                        # 生成结节图像
                        generate_nodule_images(session_id, nodules, lung_img)
                    else:
                        app.logger.error(f"无法生成结节图像: 肺部分割数据不存在")
                except Exception as e:
                    app.logger.error(f"生成结节图像时出错: {str(e)}", exc_info=True)
    else:
        # 创建新会话
        SESSION_DATA[session_id] = {
            'status': state,
            'progress': 0
        }

def save_slice_box(session_id,lung_seg,voxel_coords,radius = 32):
    print("传入的体素是\t", voxel_coords)
    x, y, z = [int(coord) for coord in voxel_coords]
    from_z,end_z = z- int(radius/2), z + int(radius/2)
    from_x,end_x = x - int(radius/2), x + int(radius/2)
    from_y, end_y = y - int(radius / 2), y + int(radius / 2)
    slices_dir = os.path.join(UPLOAD_FOLDER, session_id, 'lung_slices')
    for z_index in range(from_z, end_z + 1):
        slice_data = lung_seg[z_index, :, :]
        # 归一化数据
        normalized_slice = (slice_data * 255).astype(np.uint8)
        # 在图像上绘制红色边界框
        # 将灰度图转换为彩色图像,以便绘制彩色边界框
        color_slice = cv2.cvtColor(normalized_slice, cv2.COLOR_GRAY2BGR)
        # 确保边界框坐标在图像范围内
        from_y_safe = max(0, from_y)
        end_y_safe = min(normalized_slice.shape[0], end_y)
        from_x_safe = max(0, from_x)
        end_x_safe = min(normalized_slice.shape[1], end_x)
        # 绘制红色边界框,线宽为2
        cv2.rectangle(color_slice, (from_x_safe, from_y_safe), (end_x_safe, end_y_safe), (0, 0, 255), 2)
        # 使用绘制了边界框的彩色图像替换原来的灰度图像
        normalized_slice = color_slice
        # 创建图像文件名和路径
        img_filename = f"z_slice_{z_index:04d}.png"
        img_path = os.path.join(slices_dir, img_filename)
        # 保存图像
        cv2.imwrite(img_path, normalized_slice)
        print("保存一个结节的 bbox 到\t", img_path)

# 在保存肺部分割数据的时候,同时保存所有切片图像
def save_lung_segmentation_slices(session_id, lung_seg):
    """
    保存肺部分割的所有切片图像

    Args:
        session_id: 会话ID
        lung_seg: 肺部分割数据 (3D数组)
    """
    try:
        # 创建切片图像目录
        slices_dir = os.path.join(UPLOAD_FOLDER, session_id, 'lung_slices')
        os.makedirs(slices_dir, exist_ok=True)
        app.logger.info(f"开始保存肺部切片图像,形状: {lung_seg.shape}")
        # 获取各轴切片数量
        z_slices = lung_seg.shape[0]
        y_slices = lung_seg.shape[1]
        x_slices = lung_seg.shape[2]

        # 创建索引文件,用于前端加载
        slice_info = {
            "dimensions": lung_seg.shape,
            "z_slices": z_slices,
            "y_slices": y_slices,
            "x_slices": x_slices,
            "z_axis": [],
            "y_axis": [],
            "x_axis": []
        }
        # 保存Z轴切片 (横断面)
        app.logger.info(f"正在保存Z轴切片,共{z_slices}个...")
        for z in range(z_slices):
            # 获取切片
            slice_data = lung_seg[z, :, :]
            # 归一化数据
            normalized_slice = (slice_data * 255).astype(np.uint8)
            # 创建图像文件名和路径
            img_filename = f"z_slice_{z:04d}.png"
            img_path = os.path.join(slices_dir, img_filename)
            # 保存图像
            cv2.imwrite(img_path, normalized_slice)
            # 添加切片信息到索引
            slice_info["z_axis"].append({
                "index": z,
                "filename": img_filename,
                "path": f"/api/lung_slice/{session_id}/z/{z}"
            })
            
            # 每保存10个切片打印一次进度
            if z % 10 == 0 or z == z_slices - 1:
                app.logger.info(f"已保存Z轴切片 {z + 1}/{z_slices}")
        # 保存索引文件
        index_path = os.path.join(slices_dir, 'slices_index.json')
        with open(index_path, 'w') as f:
            json.dump(slice_info, f)

        app.logger.info(f"肺部切片图像保存完成,共 {z_slices} 个Z轴切片, {y_slices} 个Y轴切片, {x_slices} 个X轴切片")
        return True

    except Exception as e:
        app.logger.error(f"保存肺部切片图像时出错: {str(e)}", exc_info=True)
        return False

# 添加API端点获取肺部切片图像
@app.route('/api/lung_slice/<session_id>/z/<int:slice_index>', methods=['GET'])
def get_lung_z_slice(session_id, slice_index):
    """获取肺部的Z轴切片图像"""
    try:
        app.logger.info(f"请求Z轴切片 - 会话ID: {session_id}, 切片索引: {slice_index}")
        # 检查会话是否存在
        if session_id not in SESSION_DATA:
            app.logger.error(f"会话不存在: {session_id}")
            return jsonify({"success": False, "error": "会话不存在"}), 404
        
        # 图像文件路径
        img_filename = f"z_slice_{slice_index:04d}.png"
        img_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_slices', img_filename)
        app.logger.info(f"切片图像路径: {img_path}")
        
        # 检查文件是否存在
        if os.path.exists(img_path):
            app.logger.info(f"切片图像文件已存在,直接返回: {img_path}")
            return send_file(img_path, mimetype='image/png')

    except Exception as e:
        app.logger.error(f"获取肺部Z轴切片时出错: {str(e)}", exc_info=True)
        return jsonify({"success": False, "error": f"获取肺部Z轴切片时出错: {str(e)}"}), 500

# 添加API端点获取肺部切片信息
@app.route('/api/lung_slices_info/<session_id>', methods=['GET'])
def get_lung_slices_info(session_id):
    """获取肺部切片信息"""
    print('/api/lung_slices_info 会话内容\t', SESSION_DATA, session_id)
    # 检查会话是否存在
    if session_id not in SESSION_DATA:
        return jsonify({"success": False, "error": "会话不存在"}), 404
    # 索引文件路径
    index_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_slices', 'slices_index.json')
    print('索引文件路径\t', index_path)
    # 检查索引文件是否存在
    if os.path.exists(index_path):
        with open(index_path, 'r') as f:
            slices_info = json.load(f)
        return jsonify({"success": True, "slices_info": slices_info})
    print('索引文件不存在,重新加载 npy 文件\t')
    # 如果索引文件不存在,尝试从肺部分割数据创建基本信息
    lung_seg_path = os.path.join(UPLOAD_FOLDER, session_id, 'lung_seg.npy')
    if not os.path.exists(lung_seg_path):
        return jsonify({"success": False, "error": "肺部分割数据不存在"}), 404

    # 加载数据并创建基本信息
    lung_seg = np.load(lung_seg_path)

    # 获取各轴切片数量
    z_slices = lung_seg.shape[0]
    y_slices = lung_seg.shape[1]
    x_slices = lung_seg.shape[2]

    # 创建与save_lung_segmentation_slices函数相同格式的切片信息
    slices_info = {
        "dimensions": lung_seg.shape,
        "z_slices": z_slices,
        "y_slices": y_slices,
        "x_slices": x_slices,
        "z_axis": [],
        "y_axis": [],
        "x_axis": []
    }
    # 添加Z轴切片信息
    for z in range(z_slices):
        slices_info["z_axis"].append({
            "index": z,
            "filename": f"z_slice_{z:04d}.png",
            "path": f"/api/lung_slice/{session_id}/z/{z}"
        })
    return jsonify({"success": True, "slices_info": slices_info})

# 启动服务器
if __name__ == '__main__':
    # 确保目录存在
    os.makedirs(UPLOAD_FOLDER, exist_ok=True)
    # 启用自动检测
    app.config['AUTO_DETECT'] = True
    # 启动应用
    app.run(host='0.0.0.0', port=5000, debug=True)


================================================
FILE: deploy/backend/data/c3d_nodule_detect.pth
================================================
[File too large to display: 67.5 MB]

================================================
FILE: deploy/backend/dataclass/CTData.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import numpy as np
import SimpleITK as sitk
from scipy import ndimage
from enum import Enum
import matplotlib.pyplot as plt
from util.dicom_util import load_dicom_slices, get_pixels_hu, get_dicom_thickness
from util.seg_util import get_segmented_lungs, normalize_hu_values


class CTFormat(Enum):
    DICOM = 1
    MHD = 2
    UNKNOWN = 3

class CTData:
    """
    统一的CT数据类,用于处理不同格式的CT图像数据
    支持DICOM和MHD格式的加载、处理和分析
    """
    def __init__(self):
        # 基本属性
        self.pixel_data = None  # 像素数据,3D体素数组 (z,y,x)
        self.lung_seg_img = None    # 单独抽取肺部CT图像数据
        self.lung_seg_mask = None   # 肺部CT的掩码
        self.origin = None  # 坐标原点 (x,y,z),单位为mm
        self.spacing = None  # 体素间距 (x,y,z),单位为mm
        self.orientation = None  # 方向矩阵
        self.z_axis_flip = False    # z 轴是否是翻转的
        self.size = None  # 图像尺寸 (z,y,x)
        self.data_format = None  # 数据格式(DICOM/MHD)
        self.metadata = {}  # 其他元数据信息
        self.hu_converted = False  # 是否已转换为HU值
        self.preprocessed = False   # 数据是否已经处理过

    @classmethod
    def from_dicom(cls, dicom_path):
        """
        从DICOM文件夹加载CT数据

        Args:
            dicom_path: DICOM文件夹路径

        Returns:
            CTData对象
        """
        ct_data = cls()
        ct_data.data_format = CTFormat.DICOM
        slices = load_dicom_slices(dicom_path)
        ct_data.pixel_data = get_pixels_hu(slices)
        ct_data.z_axis_flip = slices[1].ImagePositionPatient[2] > slices[0].ImagePositionPatient[2]
        ct_data.hu_converted = True
        slice_thickness = get_dicom_thickness(slices)
        # 设置像素间距
        try:
            ct_data.spacing = [
                float(slices[0].PixelSpacing[0]),
                float(slices[0].PixelSpacing[1]),
                float(slice_thickness)
            ]
        except:
            print("警告: 无法获取像素间距,使用默认值[1.0, 1.0, 1.0]")
            ct_data.spacing = [1.0, 1.0, 1.0]
        # 设置原点
        try:
            ct_data.origin = [
                float(slices[0].ImagePositionPatient[0]),
                float(slices[0].ImagePositionPatient[1]),
                float(slices[0].ImagePositionPatient[2])
            ]
        except:
            print("警告: 无法获取坐标原点,使用默认值[0.0, 0.0, 0.0]")
            ct_data.origin = [0.0, 0.0, 0.0]
        # 设置尺寸
        ct_data.size = ct_data.pixel_data.shape
        return ct_data

    @classmethod
    def from_mhd(cls, mhd_path):
        """
        从MHD/RAW文件加载CT数据

        Args:
            mhd_path: MHD文件路径

        Returns:
            CTData对象
        """
        ct_data = cls()
        ct_data.data_format = CTFormat.MHD
        try:
            # 使用SimpleITK加载MHD文件
            itk_img = sitk.ReadImage(mhd_path)
            # 获取像素数据 (注意SimpleITK返回的数组顺序为z,y,x)
            ct_data.pixel_data = sitk.GetArrayFromImage(itk_img)
            # LUNA16的MHD数据已经是HU值
            ct_data.hu_converted = True
            # 获取原点和体素间距
            ct_data.origin = list(itk_img.GetOrigin())  # (x,y,z)
            ct_data.spacing = list(itk_img.GetSpacing())  # (x,y,z)
            # 获取尺寸
            ct_data.size = ct_data.pixel_data.shape
            # 提取方向信息
            ct_data.orientation = itk_img.GetDirection()
            ct_data.z_axis_flip = False
        except Exception as e:
            raise ValueError(f"加载MHD文件时出错: {e}")

        return ct_data

    def convert_to_hu(self):
        """
        将像素值转换为HU值(如果尚未转换)
        """
        if self.hu_converted:
            print("数据已经是HU值格式")
            return

        if self.data_format == CTFormat.DICOM:
            # 已在from_dicom中处理
            self.hu_converted = True
        elif self.data_format == CTFormat.MHD:
            # LUNA16的MHD数据已经是HU值
            self.hu_converted = True
        else:
            raise ValueError("未知数据格式,无法转换为HU值")


    def resample_pixel(self, new_spacing=[1, 1, 1]):
        """
        将CT体素重采样为指定间距

        Args:
            new_spacing: 目标体素间距 [x, y, z]

        Returns:
            重采样后的CTData对象
        """
        # 确保数据已转换为HU值
        if not self.hu_converted:
            self.convert_to_hu()
        # 为了符合scipy.ndimage的要求,将spacing和pixel_data的顺序调整为[z,y,x]
        spacing_zyx = [self.spacing[2], self.spacing[1], self.spacing[0]]
        new_spacing_zyx = [new_spacing[2], new_spacing[1], new_spacing[0]]
        # 计算新尺寸
        resize_factor = np.array(spacing_zyx) / np.array(new_spacing_zyx)
        new_shape = np.round(np.array(self.pixel_data.shape) * resize_factor)
        # 计算实际重采样因子
        real_resize = new_shape / np.array(self.pixel_data.shape)
        # 执行重采样 - 使用三线性插值
        resampled_data = ndimage.zoom(self.pixel_data, real_resize, order=1)
        # 创建新的CTData对象
        resampled_ct = CTData()
        resampled_ct.pixel_data = resampled_data
        resampled_ct.spacing = new_spacing
        resampled_ct.origin = self.origin
        resampled_ct.orientation = self.orientation
        resampled_ct.size = resampled_data.shape
        resampled_ct.data_format = self.data_format
        resampled_ct.hu_converted = self.hu_converted
        resampled_ct.preprocessed = self.preprocessed
        return resampled_ct

    def filter_lung_img_mask(self):
        """
            只保留肺部区域像素,并且归一化到 0-1 之间
        :return:
        """
        pixel_data = self.pixel_data.copy()
        seg_img = []
        seg_mask = []
        for index in range(pixel_data.shape[0]):
            one_seg_img ,one_seg_mask = get_segmented_lungs(pixel_data[index])
            one_seg_img = normalize_hu_values(one_seg_img)
            seg_img.append(one_seg_img)
            seg_mask.append(one_seg_mask)
        self.lung_seg_img = np.array(seg_img)
        self.lung_seg_mask = np.array(seg_mask)
    def world_to_voxel(self, world_coord):
        """
        将世界坐标(mm)转换为体素坐标

        Args:
            world_coord: 世界坐标 [x,y,z] (mm)

        Returns:
            体素坐标 [x,y,z]
        """
        voxel_coord = np.zeros(3, dtype=int)
        for i in range(3):
            voxel_coord[i] = int(round((world_coord[i] - self.origin[i]) / self.spacing[i]))

        return voxel_coord

    def voxel_to_world(self, voxel_coord):
        """
        将体素坐标转换为世界坐标(mm)

        Args:
            voxel_coord: 体素坐标 [x,y,z]

        Returns:
            世界坐标 [x,y,z] (mm)
        """
        world_coord = np.zeros(3, dtype=float)
        for i in range(3):
            world_coord[i] = voxel_coord[i] * self.spacing[i] + self.origin[i]

        return world_coord

    def extract_cube(self, center_world_mm, size_mm,if_fixed_radius = False):
        """
        提取指定中心点和大小的立方体区域

        Args:
            center_world_mm:    立方体中心的世界坐标 [x,y,z] (mm)
            size_mm:            立方体在世界坐标系的大小(mm),可以是数值或[x,y,z]形式
            if_fixed_radius:    是否为固定半径。默认是False(即不不是固定的,就说明每个结节半径都不一样,按照标注文件半径抽取)

        Returns:
            立方体像素数据
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")
        if self.lung_seg_img is None:
            print("肺部区域数据没有分割,现在开始分割..")
            self.filter_lung_img_mask()
        # 将世界坐标转换为体素坐标(注意:SimpleITK数组顺序为z,y,x)
        center_voxel = self.world_to_voxel(center_world_mm)
        # 交换坐标顺序为z,y,x以匹配pixel_data
        center_voxel_zyx = [center_voxel[2], center_voxel[1], center_voxel[0]]
        # 如果使用固定半径,那么只需要中心坐标即可,此时size_mm 就是像素半径了,直接从 lung_seg_img 按照像素半径抽取即可
        if if_fixed_radius:
            half_size = [int(size_mm/2), int(size_mm/2), int(size_mm/2)]
        else:
            # 计算立方体边长(体素数) [luna2016 的标注数据中每个结节半径不同,按照标注抽取的结节大小不一,最好使用固定半径]
            size_voxel = [int(size_mm / self.spacing[2]),
                          int(size_mm / self.spacing[1]),
                          int(size_mm / self.spacing[0])]
            # 计算立方体边界
            half_size = [s // 2 for s in size_voxel]
        # 提取立方体数据
        z_min = max(0, center_voxel_zyx[0] - half_size[0])
        y_min = max(0, center_voxel_zyx[1] - half_size[1])
        x_min = max(0, center_voxel_zyx[2] - half_size[2])

        z_max = min(self.lung_seg_img.shape[0], center_voxel_zyx[0] + half_size[0])
        y_max = min(self.lung_seg_img.shape[1], center_voxel_zyx[1] + half_size[1])
        x_max = min(self.lung_seg_img.shape[2], center_voxel_zyx[2] + half_size[2])
        # 提取子体积
        cube = self.lung_seg_img[z_min:z_max, y_min:y_max, x_min:x_max]
        return cube

    def visualize_slice(self, slice_idx=None, axis=0, show_lung_only=False):
        """
            可视化单个切片
        Args:
            slice_idx:          切片索引,如果为None则取中心切片
            axis:               沿哪个轴切片 (0=z, 1=y, 2=x)
            show_lung_only:     是否只显示肺部,其他区域都作为背景黑色
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")
        # 确定切片索引
        if slice_idx is None:
            slice_idx = self.pixel_data.shape[axis] // 2
        # 提取切片数据
        if show_lung_only:
            if axis == 0:  # z轴
                slice_data = self.lung_seg_img[slice_idx, :, :]
            elif axis == 1:  # y轴
                slice_data = self.lung_seg_img[:, slice_idx, :]
            else :  # x轴
                slice_data = self.lung_seg_img[:, :, slice_idx]
        else:
            if axis == 0:  # z轴
                slice_data = self.pixel_data[slice_idx, :, :]
            elif axis == 1:  # y轴
                slice_data = self.pixel_data[:, slice_idx, :]
            else:  # x轴
                slice_data = self.pixel_data[:, :, slice_idx]
        # 创建图像
        plt.figure(figsize=(10, 8))
        # 仅显示图像
        plt.imshow(slice_data, cmap='gray')
        # 设置标题
        axis_name = ['z', 'y', 'x'][axis]
        title = f"切片 {slice_idx} (沿{axis_name}轴)"
        plt.title(title)
        plt.colorbar(label='像素值')
        plt.axis('off')
        plt.tight_layout()
        plt.show()

    def visualize_nodule(self, coord_x,coord_y, coord_z, diameter):
        """
             结节可视化
        :param coord_x:
        :param coord_y:
        :param coord_z:
        :param diameter:
        :return:
        """
        # 提取结节立方体
        cube_size = max(32, int(diameter * 1.5))  # 确保立方体足够大
        cube = self.extract_cube([coord_x, coord_y, coord_z], cube_size)
        # 转换为体素坐标
        voxel_coord = self.world_to_voxel([coord_x, coord_y, coord_z])
        # 显示三个正交面
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        # 提取中心切片
        center_z = cube.shape[0] // 2
        center_y = cube.shape[1] // 2
        center_x = cube.shape[2] // 2
        # 绘制三个正交面
        axes[0].imshow(cube[center_z, :, :], cmap='gray')
        axes[0].set_title(f'轴向视图 (z={center_z})')
        axes[0].axis('off')
        axes[1].imshow(cube[:, center_y, :], cmap='gray')
        axes[1].set_title(f'冠状位视图 (y={center_y})')
        axes[1].axis('off')
        axes[2].imshow(cube[:, :, center_x], cmap='gray')
        axes[2].set_title(f'矢状位视图 (x={center_x})')
        axes[2].axis('off')
        fig.suptitle(f"结节- 位置: ({coord_x:.1f}, {coord_y:.1f}, {coord_z:.1f})mm, " +
                     f"直径: {diameter:.1f}mm,", fontsize=14)
        plt.tight_layout()
        plt.show()

    def save_as_nifti(self, output_path):
        """
        将CT数据保存为NIfTI格式

        Args:
            output_path: 输出文件路径
        """
        # 确保数据已加载
        if self.pixel_data is None:
            raise ValueError("未加载数据")

        # 创建SimpleITK图像
        # 注意:SimpleITK的数组顺序为z,y,x
        img = sitk.GetImageFromArray(self.pixel_data)
        img.SetOrigin(self.origin)
        img.SetSpacing(self.spacing)

        if self.orientation is not None:
            img.SetDirection(self.orientation)
        # 保存为NIfTI格式
        sitk.WriteImage(img, output_path)
        print(f"已保存为NIfTI格式: {output_path}")



================================================
FILE: deploy/backend/dataclass/NoduleCube.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os,torch
import numpy as np
import cv2
from typing import  Optional
from dataclasses import dataclass
import matplotlib.pyplot as plt
from scipy import ndimage

def normal_cube_to_tensor(cube_data):
    """
        将cube 数据归一化并转换为 pytorch tensor 。 用在训练和推理过程
    :param cube_data: shape为 [32,32,32] 的 ndarray
    :return:
    """
    cube_data = cube_data.astype(np.float32)
    # 归一化到 [0, 1] 范围
    min_val = np.min(cube_data)
    max_val = np.max(cube_data)
    data_range = max_val - min_val
    # 避免除以零
    if data_range < 1e-10:
        normalized_cube = np.zeros_like(cube_data)
    else:
        normalized_cube = (cube_data - min_val) / data_range
    # 检查是否有无效值并修复
    if np.isnan(normalized_cube).any() or np.isinf(normalized_cube).any():
        normalized_cube = np.nan_to_num(normalized_cube, nan=0.0, posinf=1.0, neginf=0.0)
    # 转换为PyTorch张量并添加批次和通道维度
    cube_tensor = torch.from_numpy(normalized_cube).float().unsqueeze(0).unsqueeze(0)  # (1, 1, 32, 32, 32)
    return cube_tensor


@dataclass
class NoduleCube:
    """
    肺结节立方体类,表示肺结节区域的3D立方体数据
    与CT数据无关,仅处理已提取的立方体数据
    """
    # 基本属性
    cube_size: int = 64  # 立方体大小(默认64x64x64)
    pixel_data: Optional[np.ndarray] = None  # 像素数据 shape: [cube_size, cube_size, cube_size]
    
    # 结节特征
    center_x: int = 0  # 结节中心x坐标
    center_y: int = 0  # 结节中心y坐标
    center_z: int = 0  # 结节中心z坐标
    radius: float = 0.0  # 结节半径
    malignancy: int = 0  # 恶性度 (0 为良性 / 1 为恶性)
    
    # 文件路径
    npy_path: str = ""  # npy文件路径
    png_path: str = ""  # png文件路径

    def __post_init__(self):
        """初始化后调用"""
        # 如果提供了npy_path但没有pixel_data,尝试加载
        if self.npy_path and self.pixel_data is None:
            self.load_from_npy()
        # 如果提供了png_path但没有pixel_data,尝试加载
        elif self.png_path and self.pixel_data is None:
            self.load_from_png()

    def load_from_npy(self) -> None:
        """从NPY文件加载立方体数据"""
        if not os.path.exists(self.npy_path):
            raise FileNotFoundError(f"文件不存在: {self.npy_path}")
            
        try:
            self.pixel_data = np.load(self.npy_path)
            # 验证尺寸
            if len(self.pixel_data.shape) != 3:
                raise ValueError(f"像素数据必须是3D数组,当前形状: {self.pixel_data.shape}")
            
            # 如果尺寸不匹配,调整大小
            if (self.pixel_data.shape[0] != self.cube_size or 
                self.pixel_data.shape[1] != self.cube_size or 
                self.pixel_data.shape[2] != self.cube_size):
                self.resize(self.cube_size)
                
        except Exception as e:
            raise ValueError(f"加载NPY文件时出错: {e}")

    def save_to_npy(self, output_path: str) -> str:
        """
        将立方体数据保存为NPY文件
        
        Args:
            output_path: 输出路径
            
        Returns:
            保存的文件路径
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可保存")
            
        # 确保目录存在
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        np.save(output_path, self.pixel_data)
        self.npy_path = output_path
        return output_path
    def save_to_png(self, output_path: str) -> str:
        """
        将立方体数据保存为PNG图像(8x8网格布局)
        
        Args:
            output_path: 输出PNG文件路径
            
        Returns:
            保存的文件路径
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可保存")
            
        # 确保目录存在
        os.makedirs(os.path.dirname(output_path), exist_ok=True)
        
        # 计算每个切片在最终图像中的位置(8行8列布局)
        rows, cols = 8, 8
        if self.cube_size != 64:
            # 如果不是64x64x64,计算合适的行列数,保持接近正方形
            total_slices = self.cube_size
            rows = int(np.sqrt(total_slices))
            while total_slices % rows != 0:
                rows -= 1
            cols = total_slices // rows
        
        # 创建拼接图像
        img_height = self.cube_size
        img_width = self.cube_size
        combined_img = np.zeros((rows * img_height, cols * img_width), dtype=np.uint8)
        
        # 填充拼接图像
        for i in range(self.cube_size):
            row = i // cols
            col = i % cols
            
            slice_data = self.pixel_data[i]
            
            # 确保数据在0-255范围内
            if slice_data.max() <= 1.0:
                slice_data = (slice_data * 255).astype(np.uint8)
            else:
                slice_data = slice_data.astype(np.uint8)
            
            # 将切片放入拼接图像
            y_start = row * img_height
            x_start = col * img_width
            combined_img[y_start:y_start + img_height, x_start:x_start + img_width] = slice_data
        
        # 保存拼接图像
        cv2.imwrite(output_path, combined_img)
        self.png_path = output_path
        return output_path

    def load_from_png(self) -> None:
        """从PNG图像加载立方体数据(8x8网格布局)"""
        if not os.path.exists(self.png_path):
            raise FileNotFoundError(f"文件不存在: {self.png_path}")
            
        try:
            # 读取PNG图像
            img = cv2.imread(self.png_path, cv2.IMREAD_GRAYSCALE)
            
            # 确定行列数
            rows, cols = 8, 8
            if self.cube_size != 64:
                # 如果不是64x64x64,计算合适的行列数
                total_slices = self.cube_size
                rows = int(np.sqrt(total_slices))
                while total_slices % rows != 0:
                    rows -= 1
                cols = total_slices // rows
            
            # 确认图像尺寸正确
            expected_height = rows * self.cube_size
            expected_width = cols * self.cube_size
            if img.shape[0] != expected_height or img.shape[1] != expected_width:
                raise ValueError(f"图像尺寸不匹配: 期望{expected_height}x{expected_width}, 实际{img.shape[0]}x{img.shape[1]}")
            
            # 创建3D数组
            cube_data = np.zeros((self.cube_size, self.cube_size, self.cube_size), dtype=np.float32)
            
            # 从PNG图像提取每个切片
            for i in range(self.cube_size):
                row = i // cols
                col = i % cols
                
                y_start = row * self.cube_size
                x_start = col * self.cube_size
                
                slice_data = img[y_start:y_start + self.cube_size, x_start:x_start + self.cube_size]
                cube_data[i] = slice_data.astype(np.float32) / 255.0  # 归一化到[0,1]范围
            
            self.pixel_data = cube_data
            
        except Exception as e:
            raise ValueError(f"加载PNG文件时出错: {e}")

    def set_cube_data(self, pixel_data: np.ndarray) -> None:
        """
        设置立方体像素数据
        
        Args:
            pixel_data: 3D像素数据
        """
        if len(pixel_data.shape) != 3:
            raise ValueError(f"像素数据必须是3D数组,当前形状: {pixel_data.shape}")
        
        self.pixel_data = pixel_data
        
        # 如果尺寸不匹配,调整大小
        if (self.pixel_data.shape[0] != self.cube_size or 
            self.pixel_data.shape[1] != self.cube_size or 
            self.pixel_data.shape[2] != self.cube_size):
            self.resize(self.cube_size)

    def resize(self, new_size: int) -> None:
        """
        调整立方体尺寸
        
        Args:
            new_size: 新的立方体尺寸
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可调整大小")
        
        # 计算缩放因子
        zoom_factors = [new_size / self.pixel_data.shape[0],
                         new_size / self.pixel_data.shape[1],
                         new_size / self.pixel_data.shape[2]]
        
        # 使用scipy的ndimage进行重采样
        self.pixel_data = ndimage.zoom(self.pixel_data, zoom_factors, mode='nearest')
        self.cube_size = new_size
        
    def augment(self, rotation: bool = True, flip_axis: int = -1, noise: bool = True) -> 'NoduleCube':
        """
        数据增强
        
        Args:
            rotation: 是否进行旋转增强
            flip_axis: 是否进行翻转增强,默认为-1(不翻转)
            noise: 是否添加噪声
            
        Returns:
            增强后的新立方体实例
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可增强")
        
        # 创建副本
        augmented_cube = self.pixel_data.copy()
        
        # 旋转增强
        if rotation:
            # 随机选择旋转角度
            angles = np.random.uniform(-20, 20, 3)  # 在xyz三个方向上随机旋转
            augmented_cube = ndimage.rotate(augmented_cube, angles[0], axes=(1, 2), reshape=False, mode='nearest')
            augmented_cube = ndimage.rotate(augmented_cube, angles[1], axes=(0, 2), reshape=False, mode='nearest')
            augmented_cube = ndimage.rotate(augmented_cube, angles[2], axes=(0, 1), reshape=False, mode='nearest')
        
        # 翻转增强
        if flip_axis >=0:
            augmented_cube = np.flip(augmented_cube, axis=flip_axis)
        
        # 添加噪声
        if noise:
            # 添加随机高斯噪声
            noise_level = np.random.uniform(0.0, 0.05)
            noise_array = np.random.normal(0, noise_level, augmented_cube.shape)
            augmented_cube = augmented_cube + noise_array
            # 确保值在[0,1]范围内
            augmented_cube = np.clip(augmented_cube, 0, 1)
        
        # 创建新实例
        new_cube = NoduleCube(
            cube_size=self.cube_size,
            center_x=self.center_x,
            center_y=self.center_y,
            center_z=self.center_z,
            radius=self.radius,
            malignancy=self.malignancy
        )
        
        new_cube.set_cube_data(augmented_cube)
        return new_cube

    def visualize_3d(self, output_path: Optional[str] = None, show: bool = True) -> None:
        """
        可视化立方体数据
        
        Args:
            output_path: 可选的输出路径,如果提供则保存图像
            show: 是否显示图像
        """
        if self.pixel_data is None:
            raise ValueError("没有像素数据可视化")
            
        # 创建图像
        fig, axes = plt.subplots(2, 3, figsize=(12, 8))
        
        # 获取中心切片
        center_z = self.pixel_data.shape[0] // 2
        center_y = self.pixel_data.shape[1] // 2
        center_x = self.pixel_data.shape[2] // 2
        
        # 显示三个正交平面
        slice_xy = self.pixel_data[center_z, :, :]
        slice_xz = self.pixel_data[:, center_y, :]
        slice_yz = self.pixel_data[:, :, center_x]
        
        # 显示三个正交视图
        axes[0, 0].imshow(slice_xy, cmap='gray')
        axes[0, 0].set_title(f'轴向视图 (Z={center_z})')
        
        axes[0, 1].imshow(slice_xz, cmap='gray')
        axes[0, 1].set_title(f'矢状位视图 (Y={center_y})')
        
        axes[0, 2].imshow(slice_yz, cmap='gray')
        axes[0, 2].set_title(f'冠状位视图 (X={center_x})')
        
        # 3D渲染视图(使用MIP: Maximum Intensity Projection)
        mip_xy = np.max(self.pixel_data, axis=0)
        mip_xz = np.max(self.pixel_data, axis=1)
        mip_yz = np.max(self.pixel_data, axis=2)
        
        axes[1, 0].imshow(mip_xy, cmap='gray')
        axes[1, 0].set_title('最大强度投影 (轴向)')
        
        axes[1, 1].imshow(mip_xz, cmap='gray')
        axes[1, 1].set_title('最大强度投影 (矢状位)')
        
        axes[1, 2].imshow(mip_yz, cmap='gray')
        axes[1, 2].set_title('最大强度投影 (冠状位)')
        
        # 添加结节信息
        nodule_info = f"结节中心: ({self.center_x}, {self.center_y}, {self.center_z})\n"
        nodule_info += f"半径: {self.radius:.1f}\n"
        nodule_info += f"恶性度: {'恶性' if self.malignancy == 1 else '良性'}"
        
        fig.suptitle(nodule_info, fontsize=12)
        plt.tight_layout()
        
        if output_path:
            plt.savefig(output_path, dpi=200, bbox_inches='tight')
        
        if show:
            plt.show()
        else:
            plt.close(fig)
            
    @classmethod
    def from_npy(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
        """
        从NPY文件创建立方体实例
        
        Args:
            file_path: NPY文件路径
            cube_size: 立方体大小
            
        Returns:
            NoduleCube实例
        """
        cube = cls(cube_size=cube_size, npy_path=file_path)
        cube.load_from_npy()
        return cube
    
    @classmethod
    def from_png(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
        """
        从PNG文件创建立方体实例
        
        Args:
            file_path: PNG文件路径
            cube_size: 立方体大小
            
        Returns:
            NoduleCube实例
        """
        cube = cls(cube_size=cube_size, png_path=file_path)
        cube.load_from_png()
        return cube
        
    @classmethod
    def from_array(cls, 
                  pixel_data: np.ndarray, 
                  center_x: int = 0, 
                  center_y: int = 0, 
                  center_z: int = 0,
                  radius: float = 0.0,
                  malignancy: int = 0) -> 'NoduleCube':
        """
        从numpy数组创建立方体实例
        
        Args:
            pixel_data: 3D像素数据
            center_x: 中心点X坐标
            center_y: 中心点Y坐标
            center_z: 中心点Z坐标
            radius: 结节半径
            malignancy: 恶性度(0=良性, 1=恶性)
            
        Returns:
            NoduleCube实例
        """
        if len(pixel_data.shape) != 3:
            raise ValueError(f"像素数据必须是3D数组,当前形状: {pixel_data.shape}")
            
        cube_size = pixel_data.shape[0]
        if pixel_data.shape[1] != cube_size or pixel_data.shape[2] != cube_size:
            raise ValueError(f"像素数据必须是立方体形状,当前形状: {pixel_data.shape}")
            
        cube = cls(
            cube_size=cube_size,
            center_x=center_x,
            center_y=center_y,
            center_z=center_z,
            radius=radius,
            malignancy=malignancy
        )
        
        cube.set_cube_data(pixel_data)
        return cube




================================================
FILE: deploy/backend/dataclass/__init__.py
================================================


================================================
FILE: deploy/backend/detector.py
================================================
import os
import sys
import time
import logging
import numpy as np
import pandas as pd
import torch
from threading import Thread, Lock
import json

# 添加项目根目录到系统路径
sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))

# 导入肺结节检测模块
from inference.pytorch_nodule_detector import (
    load_ct_data, 
    load_model, 
    get_lung_bounds, 
    scan_ct_data, 
    reduce_overlapping_nodules, 
    filter_false_positives, 
    format_results
)

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# 检测会话状态
session_states = {}
session_locks = {}

# 检测状态常量
STATUS_LOADING = "loading"       # 加载数据
STATUS_PREPROCESSING = "preprocessing"  # 预处理
STATUS_SCANNING = "scanning"     # 扫描检测
STATUS_FILTERING = "filtering"   # 过滤假阳性
STATUS_COMPLETED = "completed"   # 完成
STATUS_ERROR = "error"           # 错误

class NoduleDetector:
    """肺结节检测器,包装pytorch_nodule_detector.py的功能"""
    
    def __init__(self, model_path=None, device='cuda'):
        """初始化检测器
        
        Args:
            model_path: 模型路径
            device: 使用设备 (cuda或cpu)
        """
        self.model_path = model_path
        self.device = torch.device(device if torch.cuda.is_available() and device == 'cuda' else 'cpu')
        self.model = None
        print("给出的模型路径是\t", model_path)
        # 如果提供了模型路径,加载模型
        if model_path and os.path.exists(model_path):
            print("模型已经完成加载!")
            self.load_model(model_path)
        
        # 添加完成回调
        self.completion_callback = None
            
    def load_model(self, model_path):
        """加载模型
        
        Args:
            model_path: 模型路径
            
        Returns:
            加载是否成功
        """
        try:
            if not os.path.exists(model_path):
                logger.error(f"模型文件不存在: {model_path}")
                return False
                
            self.model, self.device = load_model(model_path, self.device)
            self.model_path = model_path
            return True
        except Exception as e:
            logger.error(f"加载模型时出错: {e}")
            return False
            
    def detect(self, file_path, session_id, patient_id=None):
        """启动肺结节检测
        
        Args:
            file_path: CT文件或文件夹路径
            session_id: 会话ID
            patient_id: 患者ID
            
        Returns:
            布尔值,表示检测是否成功启动
        """
        if not self.model:
            logger.error("模型未加载")
            return False
            
        if session_id in session_states:
            logger.warning(f"会话 {session_id} 已存在,将被覆盖")
        print("进入detect 中的 file_path\t", file_path)
        # 初始化会话状态
        session_states[session_id] = {
            "status": STATUS_LOADING,
            "progress": 0,
            "message": "正在加载CT数据...",
            "started_at": time.time(),
            "patient_id": patient_id,
            "file_path": file_path,
            "ct_data": None,
            "nodules": None,
            "lung_bounds": None,
            "error": None
        }
        
        # 创建锁
        if session_id not in session_locks:
            session_locks[session_id] = Lock()
            
        # 启动检测线程
        thread = Thread(target=self._detect_thread, args=(file_path, session_id, patient_id))
        thread.daemon = True
        thread.start()
        
        return True
        
    def _detect_thread(self, file_path, session_id, patient_id):
        """检测线程
        
        Args:
            file_path: CT文件或文件夹路径
            session_id: 会话ID
            patient_id: 患者ID
        """
        try:
            # 加载CT数据
            self._update_session(session_id, {
                "status": STATUS_LOADING,
                "progress": 0,
                "message": "正在加载CT数据..."
            })
            
            ct_data = load_ct_data(file_path)
            
            # 检查CT数据是否加载成功
            if ct_data is None:
                raise ValueError("CT数据加载失败")
                
            # 更新会话状态
            self._update_session(session_id, {
                "ct_data": ct_data,
                "progress": 10,
                "message": "CT数据加载完成,开始进行肺部分割..."
            })
            
            # 获取肺部边界信息
            self._update_session(session_id, {
                "status": STATUS_PREPROCESSING,
                "progress": 20,
                "message": "正在进行肺部分割..."
            })
            
            # 肺部分割在load_ct_data中已完成,这里获取肺部边界信息
            lung_bounds = get_lung_bounds(ct_data.lung_seg_mask)
            if lung_bounds is None:
                raise ValueError("未能找到有效的肺部区域")
                
            # 更新会话状态
            self._update_session(session_id, {
                "lung_bounds": lung_bounds,
                "progress": 30,
                "message": "肺部分割完成,开始检测结节..."
            })
            
            # 开始扫描检测
            self._update_session(session_id, {
                "status": STATUS_SCANNING,
                "progress": 40,
                "message": "正在扫描肺部区域,检测结节..."
            })
            # 创建logger用于接收进度更新
            progress_logger = self._create_progress_logger(session_id)
            # 执行扫描
            results_df = scan_ct_data(ct_data, self.model, self.device, progress_logger)
            # 更新会话状态
            self._update_session(session_id, {
                "progress": 80,
                "message": f"扫描完成,初步检测到 {len(results_df)} 个可能的结节,进行假阳性过滤..."
            })
            
            # 合并重叠结节
            self._update_session(session_id, {
                "status": STATUS_FILTERING,
                "progress": 85,
                "message": "正在合并重叠结节..."
            })
            reduced_df = reduce_overlapping_nodules(results_df)
            
            # 过滤假阳性
            self._update_session(session_id, {
                "progress": 90,
                "message": f"初步合并后剩余 {len(reduced_df)} 个结节,正在进行假阳性过滤..."
            })
            filtered_df = filter_false_positives(reduced_df, ct_data)
            
            # 格式化结果
            self._update_session(session_id, {
                "progress": 95,
                "message": f"过滤完成,最终检测到 {len(filtered_df)} 个结节,正在生成结果..."
            })
            final_df = format_results(filtered_df, ct_data, patient_id or session_id)
            
            # 提取结节立方体
            nodules = self._extract_nodule_cubes(ct_data, final_df)
            
            # 更新会话状态
            final_results = {
                "status": STATUS_COMPLETED,
                "progress": 100,
                "message": f"检测完成,共发现 {len(nodules)} 个结节",
                "nodules": nodules,
                "completed_at": time.time()
            }
            
            self._update_session(session_id, final_results)
            
            # 调用完成回调函数
            if self.completion_callback:
                try:
                    self.completion_callback(session_id, {"nodules": nodules})
                except Exception as callback_error:
                    logger.error(f"执行完成回调函数时出错: {str(callback_error)}", exc_info=True)
            
        except Exception as e:
            logger.error(f"检测过程出错: {str(e)}", exc_info=True)
            self._update_session(session_id, {
                "status": STATUS_ERROR,
                "message": f"检测失败: {str(e)}",
                "error": str(e)
            })
    
    def _create_progress_logger(self, session_id):
        """创建用于接收进度更新的logger
        
        Args:
            session_id: 会话ID
            
        Returns:
            自定义logger对象
        """
        class ProgressLogger:
            def __init__(self, outer_self, session_id):
                self.outer_self = outer_self
                self.session_id = session_id
                self.last_progress_time = time.time()
                
            def info(self, message):
                # 解析进度信息
                if "处理进度:" in message:
                    try:
                        # 从消息中提取进度百分比
                        progress_part = message.split("处理进度:")[1].split("%")[0].strip()
                        progress_parts = progress_part.split('/')
                        if len(progress_parts) == 2:
                            current, total = map(int, progress_parts)
                            progress = min(40 + int(current / total * 40), 80)  # 扫描进度从40%到80%
                            
                            # 每秒最多更新一次进度
                            current_time = time.time()
                            if current_time - self.last_progress_time > 1:
                                self.last_progress_time = current_time
                                self.outer_self._update_session(self.session_id, {
                                    "progress": progress,
                                    "message": message
                                })
                    except:
                        pass
                
                # 记录其他重要消息
                if "扫描完成" in message:
                    self.outer_self._update_session(self.session_id, {
                        "progress": 80,
                        "message": message
                    })
        
        return ProgressLogger(self, session_id)
    
    def _extract_nodule_cubes(self, ct_data, nodules_df, cube_size=32):
        """从CT数据中提取结节立方体
        
        Args:
            ct_data: CTData对象
            nodules_df: 结节DataFrame
            cube_size: 立方体大小
            
        Returns:
            结节列表,每个结节包含立方体数据和相关信息
        """
        nodule_list = []
        
        for _, row in nodules_df.iterrows():
            try:
                # 获取体素坐标
                x = int(row['voxel_x'])
                y = int(row['voxel_y'])
                z = int(row['voxel_z'])
                
                # 计算立方体区域边界
                half_size = cube_size // 2
                z_min = max(0, z - half_size)
                y_min = max(0, y - half_size)
                x_min = max(0, x - half_size)
                z_max = min(ct_data.lung_seg_img.shape[0], z + half_size)
                y_max = min(ct_data.lung_seg_img.shape[1], y + half_size)
                x_max = min(ct_data.lung_seg_img.shape[2], x + half_size)
                
                # 提取立方体
                cube = ct_data.lung_seg_img[z_min:z_max, y_min:y_max, x_min:x_max]
                
                # 如果立方体大小不符合要求,进行调整
                if cube.shape != (cube_size, cube_size, cube_size):
                    # 使用零填充调整大小
                    padded_cube = np.zeros((cube_size, cube_size, cube_size), dtype=cube.dtype)
                    padded_cube[:min(cube_size, cube.shape[0]), 
                               :min(cube_size, cube.shape[1]), 
                               :min(cube_size, cube.shape[2])] = cube[:min(cube_size, cube.shape[0]), 
                                                                      :min(cube_size, cube.shape[1]), 
                                                                      :min(cube_size, cube.shape[2])]
                    cube = padded_cube
                
                # 添加结节信息
                nodule_info = {
                    'id': int(row['nodule_id']),
                    'cube': cube.tolist(),  # 转换为列表以便JSON序列化
                    'voxel_coords': [x, y, z],
                    'world_coords': [float(row['world_x']), float(row['world_y']), float(row['world_z'])],
                    'diameter_mm': float(row['diameter_mm']),
                    'probability': float(row['prob'])
                }
                
                nodule_list.append(nodule_info)
                
            except Exception as e:
                logger.error(f"提取结节立方体时出错: {str(e)}", exc_info=True)
        
        return nodule_list
    
    def _update_session(self, session_id, updates):
        """更新会话状态
        
        Args:
            session_id: 会话ID
            updates: 要更新的字段
        """
        if session_id not in session_states:
            logger.warning(f"会话 {session_id} 不存在")
            return
            
        # 获取锁
        with session_locks[session_id]:
            for key, value in updates.items():
                session_states[session_id][key] = value
                
    def get_session_state(self, session_id):
        """获取会话状态
        
        Args:
            session_id: 会话ID
            
        Returns:
            会话状态字典
        """
        if session_id not in session_states:
            return None
            
        # 获取锁
        with session_locks[session_id]:
            # 创建副本以避免修改原始状态
            state = session_states[session_id].copy()
            
            # 移除不可序列化的字段
            if 'ct_data' in state:
                del state['ct_data']
            if 'lung_bounds' in state:
                bounds_info = {}
                if state['lung_bounds']:
                    bounds_info = {
                        'z_min': state['lung_bounds']['z_min'],
                        'z_max': state['lung_bounds']['z_max'],
                        'region_count': len(state['lung_bounds']['regions'])
                    }
                state['lung_bounds'] = bounds_info
            
            return state
            
    def set_completion_callback(self, callback):
        """设置检测完成后的回调函数
        
        Args:
            callback: 回调函数,接收session_id和results参数
        """
        self.completion_callback = callback
            
def get_detector_instance(model_path=None):
    """获取检测器实例(单例模式)
    
    Args:
        model_path: 模型路径
        
    Returns:
        NoduleDetector实例
    """
    if not hasattr(get_detector_instance, 'instance'):
        get_detector_instance.instance = NoduleDetector(model_path)
    elif model_path and get_detector_instance.instance.model_path != model_path:
        # 如果提供了不同的模型路径,重新加载模型
        get_detector_instance.instance.load_model(model_path)
        
    return get_detector_instance.instance 

================================================
FILE: deploy/backend/models/pytorch_c3d_tiny.py
================================================
import torch.nn as nn
import torchvision.transforms as transforms

# my_tranform =transforms.Compose([
#     # transforms.Resize((32,32,32)),
#     transforms.ToTensor(),
#     transforms.Normalize((0.5,0.5,0.5), (0.5, 0.5,0.5))
# ])


class C3dTiny(nn.Module):
    def __init__(self):
        super().__init__()
        # 第一个3d卷积组
        self.conv_block1 = nn.Sequential(
            nn.Conv3d(in_channels=1, kernel_size=3, padding = 1, out_channels=64),
            # 原网络结构没有,新增的
            nn.BatchNorm3d(64),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=(1,2,2), stride = (1,2,2))
        )
        #
        self.conv_block2 = nn.Sequential(
            nn.Conv3d(in_channels=64, kernel_size=3, padding = 1, out_channels=128),
            # 原网络结构没有,新增的
            nn.BatchNorm3d(128),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2)
        )
        self.drop_out1 = nn.Dropout(0.2)
        #
        self.conv_block3 = nn.Sequential(
            nn.Conv3d(in_channels = 128, kernel_size=3, padding = 1, out_channels=256),
            nn.BatchNorm3d(256),
            nn.ReLU(),
            nn.Conv3d(in_channels=256, kernel_size=3, padding = 1, out_channels=256),
            nn.BatchNorm3d(256),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2)
        )
        self.drop_out2 = nn.Dropout(0.2)
        #
        self.conv_block4 = nn.Sequential(
            nn.Conv3d(in_channels = 256, kernel_size = 3, padding = 1, out_channels=512),
            nn.BatchNorm3d(512),
            nn.ReLU(),
            nn.Conv3d(in_channels = 512, kernel_size = 3, padding = 1, out_channels = 512),
            nn.BatchNorm3d(512),
            nn.ReLU(),
            nn.MaxPool3d(kernel_size=2)
        )
        self.drop_out3 = nn.Dropout(0.2)
        self.flatten = nn.Flatten()
        #计算输入特征数量:
        # 原始输入为32x32x32,经过pool1(1,2,2)后变为32x16x16
        # 经过pool2(2,2,2)后变为16x8x8
        # 经过pool3(2,2,2)后变为8x4x4
        # 经过pool4(2,2,2)后变为4x2x2
        # 因此最终特征图大小为4x2x2,通道数为512
        self.fc1 = nn.Sequential(
            nn.Linear(512 * 4 * 2 * 2, 512),
            nn.ReLU()
        )
        self.fc2 = nn.Linear(512, 2)

    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = self.drop_out1(x)
        x = self.conv_block3(x)
        x = self.drop_out2(x)
        x = self.conv_block4(x)
        x = self.drop_out3(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

================================================
FILE: deploy/backend/models/pytorch_nodule_detector.py
================================================
import os
import numpy as np
import pandas as pd
import torch
from torch.nn import functional as F
from datetime import datetime
import logging
import time
from scipy import ndimage
from data.dataclass.CTData import CTData
from data.dataclass.NoduleCube import normal_cube_to_tensor
from deploy.backend.preprocessing.luna16_invalid_nodule_filter import nodule_valid
from models.pytorch_c3d_tiny import C3dTiny

# 推理参数
CUBE_SIZE = 32  # 扫描立方体大小 32x32x32
SCAN_STEP = 10  # 扫描步长,每次移动10个像素
PROB_THRESHOLD = 0.8  # 阈值: 大于此概率才视为结节

# 设置日志
def setup_logger(log_dir="./inference_logs"):
    """设置日志配置"""
    os.makedirs(log_dir, exist_ok=True)
    log_file = os.path.join(log_dir, f"inference_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log")
    
    # 创建logger
    logger = logging.getLogger('nodule_detection')
    logger.setLevel(logging.INFO)
    
    # 创建文件处理器
    file_handler = logging.FileHandler(log_file)
    file_handler.setLevel(logging.INFO)
    
    # 创建控制台处理器
    console_handler = logging.StreamHandler()
    console_handler.setLevel(logging.INFO)
    
    # 创建格式器
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    file_handler.setFormatter(formatter)
    console_handler.setFormatter(formatter)
    
    # 添加处理器
    logger.addHandler(file_handler)
    logger.addHandler(console_handler)
    
    return logger

def load_ct_data(file_path):
    """
        加载CT数据(支持MHD和DICOM)并进行预处理
    Args:
        file_path: CT文件或文件夹路径
    Returns:
        CTData对象
    """
    # 判断是文件还是目录
    if os.path.isfile(file_path):
        # 假设是MHD文件
        if file_path.endswith('.mhd'):
            ct_data = CTData.from_mhd(file_path)
        else:
            raise ValueError(f"不支持的文件类型: {file_path}")
    elif os.path.isdir(file_path):
        # 假设是DICOM文件夹
        ct_data = CTData.from_dicom(file_path)
    else:
        raise ValueError(f"指定路径不存在: {file_path}")
    # 重采样到1mm间距
    ct_data = ct_data.resample_pixel(new_spacing=[1, 1, 1])
    # 肺部区域分割
    ct_data.filter_lung_img_mask()
    return ct_data

def load_model(evaL_model_path, device='cuda'):
    """
    加载PyTorch模型
    
    Args:
        evaL_model_path: 模型权重文件路径
        device: 计算设备 ('cuda' 或 'cpu')

    Returns:
        加载好权重的模型
    """
    model = C3dTiny().to(device)
    # 加载权重
    model.load_state_dict(torch.load(evaL_model_path, map_location=device))
    model.eval()
    return model, device

def get_lung_bounds(lung_mask):
    """获取肺部掩码的边界框,考虑左右肺分离的情况"""
    if lung_mask.sum() == 0:
        return None
    # 使用连通区域分析找出肺部区域
    labeled_mask, num_features = ndimage.label(lung_mask > 0)
    # 如果连通区域过多,只考虑最大的几个区域(通常是左右肺)
    if num_features > 2:
        # 计算每个标签区域的体素数量
        region_sizes = np.array([(labeled_mask == i).sum() for i in range(1, num_features + 1)])
        # 只保留最大的2个区域(左右肺)
        valid_labels = np.argsort(region_sizes)[-2:] + 1
        # 创建新的掩码,只包含最大的几个区域
        refined_mask = np.zeros_like(labeled_mask)
        for label in valid_labels:
            refined_mask[labeled_mask == label] = 1
    else:
        refined_mask = lung_mask > 0
    
    # 根据z轴切片,计算每个切片的肺部区域
    z_ranges = []
    margin = 5  # 切片边距
    
    # 遍历每个z轴切片
    for z in range(refined_mask.shape[0]):
        slice_mask = refined_mask[z]
        if slice_mask.sum() > 100:  # 如果切片包含足够的肺部体素
            y_indices, x_indices = np.where(slice_mask)
            if len(y_indices) > 0:
                y_min = max(0, y_indices.min() - margin)
                y_max = min(refined_mask.shape[1], y_indices.max() + margin)
                x_min = max(0, x_indices.min() - margin)
                x_max = min(refined_mask.shape[2], x_indices.max() + margin)
                z_ranges.append((z, y_min, y_max, x_min, x_max))
    
    if not z_ranges:
        return None
    
    # 确定整体z轴范围
    z_min = z_ranges[0][0]
    z_max = z_ranges[-1][0] + 1
    
    # 收集所有y和x范围
    scan_regions = []
    for z_slice, y_min, y_max, x_min, x_max in z_ranges:
        scan_regions.append({
            'z': z_slice,
            'y_min': y_min,
            'y_max': y_max,
            'x_min': x_min,
            'x_max': x_max
        })
    
    return {
        'z_min': z_min,
        'z_max': z_max,
        'regions': scan_regions
    }

def scan_ct_data(ct_data, model, device, logger, step=SCAN_STEP):
    """
    扫描整个CT图像,预测结节位置 - 优化版
    
    Args:
        ct_data: CTData对象
        model: PyTorch模型
        device: 计算设备
        logger: 日志对象
        step: 扫描步长
        
    Returns:
        包含结节信息的DataFrame
    """
    logger.info("开始扫描CT数据...")
    
    # 获取肺部分割后的图像数据
    lung_img = ct_data.lung_seg_img
    lung_mask = ct_data.lung_seg_mask
    
    # 获取肺部边界信息
    bounds = get_lung_bounds(lung_mask)
    if bounds is None:
        logger.warning("未能找到有效的肺部区域")
        return pd.DataFrame(columns=['voxel_coord_x', 'voxel_coord_y', 'voxel_coord_z', 
                                    'world_coord_x', 'world_coord_y', 'world_coord_z', 'prob'])
    
    logger.info(f"已确定肺部区域: Z轴范围 {bounds['z_min']} 到 {bounds['z_max']}, 共 {len(bounds['regions'])} 个切片")
    
    # 创建存储结果的列表
    results = []
    
    # 计算需要扫描的总体素数估计
    total_voxels = 0
    for region in bounds['regions']:
        y_range = region['y_max'] - region['y_min']
        x_range = region['x_max'] - region['x_min']
        total_voxels += (y_range // step + 1) * (x_range // step + 1)
    
    logger.info(f"预计扫描体素数: {total_voxels}")
    
    # 开始计时
    start_time = time.time()
    batch_size = 32  # 增大批处理大小提高GPU利用率
    batch_inputs = []
    batch_positions = []
    # 跟踪进度
    processed_voxels = 0
    skipped_voxels = 0
    # 设置肺部组织比例阈值
    lung_tissue_threshold = 0.1  # 立方体中肺部组织的最小比例
    # 逐切片扫描肺部区域
    for z_idx, region in enumerate(bounds['regions']):
        z = region['z']
        # 检查是否可以放置一个完整的立方体
        if z + CUBE_SIZE > lung_img.shape[0]:
            continue
        # 在当前切片上扫描
        for y in range(region['y_min'], region['y_max'] - CUBE_SIZE + 1, step):
            for x in range(region['x_min'], region['x_max'] - CUBE_SIZE + 1, step):
                # 提取当前位置的肺部掩码立方体
                mask_cube = lung_mask[z:z+CUBE_SIZE, y:y+CUBE_SIZE, x:x+CUBE_SIZE]
                # 计算肺部组织比例
                lung_ratio = np.mean(mask_cube)
                # 如果肺部组织比例过低,跳过
                if lung_ratio < lung_tissue_threshold:
                    skipped_voxels += 1
                    continue
                # 提取当前位置的立方体
                cube = lung_img[z:z+CUBE_SIZE, y:y+CUBE_SIZE, x:x+CUBE_SIZE]
                # 预处理立方体数据
                cube_tensor = normal_cube_to_tensor(cube)
                cube_tensor = cube_tensor.unsqueeze(0)
                # 添加到批处理
                batch_inputs.append(cube_tensor)
                batch_positions.append((z, y, x))
                # 当批处理达到指定大小时进行预测
                if len(batch_inputs) == batch_size:
                    # 处理当前批次
                    process_batch(batch_inputs, batch_positions, model, device, ct_data, results)
                    batch_inputs = []
                    batch_positions = []
                processed_voxels += 1
                # 定期报告进度
                if (processed_voxels + skipped_voxels) % 1000 == 0:
                    elapsed_time = time.time() - start_time
                    progress = processed_voxels / total_voxels * 100 if total_voxels > 0 else 0
                    logger.info(f"处理进度: {processed_voxels}/{total_voxels} ({progress:.2f}%), "
                                f"已跳过: {skipped_voxels}, 耗时: {elapsed_time:.2f}秒")
    
    # 处理最后一个批次
    if batch_inputs:
        process_batch(batch_inputs, batch_positions, model, device, ct_data, results)
    
    # 创建DataFrame
    if results:
        results_df = pd.DataFrame(results)
        logger.info(f"扫描完成! 发现 {len(results_df)} 个可能的结节")
    else:
        results_df = pd.DataFrame(columns=['voxel_coord_x', 'voxel_coord_y', 'voxel_coord_z', 
                                          'world_coord_x', 'world_coord_y', 'world_coord_z', 'prob'])
        logger.info("扫描完成! 未发现任何结节")
    
    return results_df

def process_batch(batch_inputs, batch_positions, model, device, ct_data, results):
    """处理一个批次的数据"""
    # 合并批处理
    batch_tensor = torch.cat(batch_inputs, dim=0).to(device)
    # 预测
    with torch.no_grad():
        batch_outputs = model(batch_tensor)
        batch_probs = F.softmax(batch_outputs, dim=1)[:, 1]  # 类别1的概率
    # 处理每个预测结果
    for i, prob in enumerate(batch_probs):
        prob_value = prob.item()
        if prob_value > PROB_THRESHOLD:
            z_pos, y_pos, x_pos = batch_positions[i]
            # 计算中心点坐标
            center_z = z_pos + CUBE_SIZE // 2
            center_y = y_pos + CUBE_SIZE // 2
            center_x = x_pos + CUBE_SIZE // 2
            # 将体素坐标转换为世界坐标 (mm)
            world_coord = ct_data.voxel_to_world([center_x, center_y, center_z])
            # 添加结果
            results.append({
                'voxel_coord_x': center_x,
                'voxel_coord_y': center_y,
                'voxel_coord_z': center_z,
                'world_coord_x': world_coord[0],
                'world_coord_y': world_coord[1],
                'world_coord_z': world_coord[2],
                'prob': prob_value
            })

def reduce_overlapping_nodules(results_df, distance_threshold=15):
    """
    合并重叠的结节预测,使用更严格的距离阈值
    
    Args:
        results_df: 包含结节预测的DataFrame
        distance_threshold: 合并的距离阈值(体素)
        
    Returns:
        合并后的结节DataFrame
    """
    if len(results_df) <= 1:
        return results_df
    # 按概率从高到低排序
    sorted_df = results_df.sort_values('prob', ascending=False).reset_index(drop=True)
    # 创建一个布尔掩码来标记要保留的行
    keep_mask = np.ones(len(sorted_df), dtype=bool)
    # 对每一行
    for i in range(len(sorted_df)):
        if not keep_mask[i]:
            continue  # 如果此行已被标记为删除,则跳过
        # 获取当前结节的坐标
        current = sorted_df.iloc[i]
        # 比较与其他所有结节的距离
        for j in range(i + 1, len(sorted_df)):
            if not keep_mask[j]:
                continue  # 如果要比较的行已被标记为删除,则跳过
            # 获取要比较的结节坐标
            compare = sorted_df.iloc[j]
            # 计算3D欧氏距离
            distance = np.sqrt(
                (current['voxel_coord_x'] - compare['voxel_coord_x']) ** 2 +
                (current['voxel_coord_y'] - compare['voxel_coord_y']) ** 2 +
                (current['voxel_coord_z'] - compare['voxel_coord_z']) ** 2
            )
            # 如果距离小于阈值,标记为删除
            if distance < distance_threshold:
                keep_mask[j] = False
    # 应用掩码,仅保留未被标记为删除的行
    reduced_df = sorted_df[keep_mask].reset_index(drop=True)
    return reduced_df

def filter_false_positives(nodules_df, ct_data, max_nodules=10):
    """
    基于解剖学和统计特征过滤假阳性结节
    
    Args:
        nodules_df: 包含结节预测的DataFrame
        ct_data: CTData对象
        max_nodules: 每个患者允许的最大结节数量

    Returns:
        过滤后的结节DataFrame
    """
    if nodules_df.empty:
        return nodules_df
    # 获取肺部掩码
    lung_mask = ct_data.lung_seg_mask
    # 1. 限制结节总数
    if len(nodules_df) > max_nodules:
        # 只保留概率最高的前N个结节
        nodules_df = nodules_df.sort_values('prob', ascending=False).head(max_nodules)
    # 2. 基于位置过滤
    filtered_rows = []
    for i, row in nodules_df.iterrows():
        x, y, z = int(row['voxel_coord_x']), int(row['voxel_coord_y']), int(row['voxel_coord_z'])
        this_nodule_valid = nodule_valid(ct_data, x, y, z)
        if this_nodule_valid:
            # 通过所有检查,保留此结节
            filtered_rows.append(row)
    # 创建新的DataFrame
    filtered_df = pd.DataFrame(filtered_rows)
    # 3. 基于概率再次过滤
    # 如果概率低于阈值,移除
    # high_prob_threshold = 0.95  # 高概率阈值
    # filtered_df = filtered_df[filtered_df['prob'] >= high_prob_threshold]
    return filtered_df

def format_results(results_df, ct_data, patient_id):
    """
    格式化结果为最终输出的DataFrame
    
    Args:
        results_df: 合并后的结节DataFrame
        ct_data: CTData对象
        patient_id: 患者ID
        
    Returns:
        包含结节信息的最终DataFrame
    """
    # 如果没有结节,返回空的DataFrame
    if results_df.empty:
        return pd.DataFrame(columns=['patient_id', 'nodule_id', 'voxel_x', 'voxel_y', 'voxel_z', 
                                     'world_x', 'world_y', 'world_z', 'diameter_mm', 'prob'])
    # 创建最终结果列表
    final_results = []
    # 处理每个结节
    for i, row in results_df.iterrows():
        # 设置默认直径为CUBE_SIZE / 2
        diameter_mm = CUBE_SIZE / 2
        # 添加结果
        final_results.append({
            'patient_id': patient_id,
            'nodule_id': i + 1,
            'voxel_x': int(row['voxel_coord_x']),
            'voxel_y': int(row['voxel_coord_y']),
            'voxel_z': int(row['voxel_coord_z']),
            'world_x': row['world_coord_x'],
            'world_y': row['world_coord_y'],
            'world_z': row['world_coord_z'],
            'diameter_mm': diameter_mm,
            'prob': row['prob']
        })
    
    # 创建DataFrame
    final_df = pd.DataFrame(final_results)
    
    return final_df

def detect_nodules(file_path, model_path, detect_patient_id=None, device='cuda'):
    """
    主函数:对CT数据进行结节检测
    
    Args:
        file_path: CT文件或文件夹路径
        model_path: 模型权重文件路径
        detect_patient_id: 患者ID,如果为None则使用文件名
        device: 计算设备 ('cuda' 或 'cpu')
        
    Returns:
        包含结节信息的DataFrame
    """
    # 设置日志
    logger = setup_logger()
    # 如果患者ID为None,则使用文件名
    if detect_patient_id is None:
        if os.path.isfile(file_path):
            detect_patient_id = os.path.splitext(os.path.basename(file_path))[0]
        else:
            detect_patient_id = os.path.basename(file_path)
    logger.info(f"开始处理患者 {detect_patient_id} 的CT数据")
    try:
        # 加载CT数据
        logger.info(f"加载CT数据: {file_path}")
        ct_data = load_ct_data(file_path)
        # 加载模型
        logger.info(f"加载模型: {model_path}")
        model, device = load_model(model_path, device)
        # 扫描CT数据
        results_df = scan_ct_data(ct_data, model, device, logger)
        # 合并重叠结节
        logger.info("合并重叠结节...")
        reduced_df = reduce_overlapping_nodules(results_df)
        logger.info(f"合并后的结节数量: {len(reduced_df)}")
        # 过滤假阳性
        logger.info("过滤假阳性结节...")
        filtered_df = filter_false_positives(reduced_df, ct_data)
        logger.info(f"过滤后的结节数量: {len(filtered_df)}")
        # 格式化结果
        final_df = format_results(filtered_df, ct_data, patient_id)
        logger.info(f"检测完成,找到 {len(final_df)} 个结节")
        return final_df
    except Exception as e:
        logger.error(f"检测过程中出错: {str(e)}", exc_info=True)
        raise
    
if __name__ == "__main__":
    test_mhd = "H:/luna16/subset8/1.3.6.1.4.1.14519.5.2.1.6279.6001.149041668385192796520281592139.mhd"
    model_path = "../training/pytorch_checkpoints/best_model.pth"
    threshold = 0.7
    patient_id = "1.3.6.1.4.1.14519.5.2.1.6279.6001.149041668385192796520281592139"
    detect_result_csv = "./c3d_classify_result-%s.csv" %patient_id
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # 运行检测
    result_df = detect_nodules(test_mhd, model_path, None, device)
    # 保存结果
    result_df.to_csv(detect_result_csv, index=False, encoding="utf-8")

================================================
FILE: deploy/backend/preprocessing/__init__.py
================================================


================================================
FILE: deploy/backend/preprocessing/luna16_invalid_nodule_filter.py
================================================
## 去掉 Luna2016 候选结节数据中 有问题的标注数据 以及 用在预测过程中的 错误结节
import numpy as np

def nodule_valid(ct_data, voxel_coord_x, voxel_coord_y,voxel_coord_z):
    """
        判定当前结节是否 可以用来做训练cube 或者 扫描得到的cube 是否
    :param ct_data:         已经转换为0-255 ,并且已经抽取到肺部区域数据的 CTData类
    :param voxel_coord_x:   当前要判定的 cube的坐标中心位置
    :param voxel_coord_y:
    :param voxel_coord_z:
    :return:                当前结节是否可用 True(可用) / False (不可用)
    """
    lung_mask = ct_data.lung_seg_mask
    # 检查坐标是否在肺部边界内
    if (voxel_coord_z < 0 or voxel_coord_z >= lung_mask.shape[0] or
            voxel_coord_y < 0 or voxel_coord_y >= lung_mask.shape[1] or
            voxel_coord_x < 0 or voxel_coord_x >= lung_mask.shape[2]):
        return False
    # 获取周围半径为5个体素的区域
    z_min = max(0, voxel_coord_z - 5)
    z_max = min(lung_mask.shape[0], voxel_coord_z + 6)
    y_min = max(0, voxel_coord_y - 5)
    y_max = min(lung_mask.shape[1], voxel_coord_y + 6)
    x_min = max(0, voxel_coord_x - 5)
    x_max = min(lung_mask.shape[2], voxel_coord_x + 6)
    # 提取周围区域的肺部掩码
    neighborhood_mask = lung_mask[z_min:z_max, y_min:y_max, x_min:x_max]
    # 计算肺部组织占比
    lung_ratio = np.mean(neighborhood_mask)
    # 如果周围区域肺部组织占比太低,可能是假阳性
    if lung_ratio < 0.5:
        return False

    # 检查是否在肺部边缘
    # 计算当前点在肺部掩码中的位置
    if (0 < voxel_coord_z < lung_mask.shape[0] - 1 and
            0 < voxel_coord_y < lung_mask.shape[1] - 1 and
            0 < voxel_coord_x < lung_mask.shape[2] - 1):
        # 计算6-邻域(上下左右前后)中肺部体素的数量
        neighbors = [
            lung_mask[voxel_coord_z - 1, voxel_coord_y, voxel_coord_x],
            lung_mask[voxel_coord_z + 1, voxel_coord_y, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y - 1, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y + 1, voxel_coord_x],
            lung_mask[voxel_coord_z, voxel_coord_y, voxel_coord_x - 1],
            lung_mask[voxel_coord_z, voxel_coord_y, voxel_coord_x + 1]
        ]
        # 如果邻域中有过多非肺部体素,说明这可能是在肺部边缘
        if sum(neighbors) < 4:
            return False
    return True

================================================
FILE: deploy/backend/util/__init__.py
================================================


================================================
FILE: deploy/backend/util/dicom_util.py
================================================
import os
import glob
import pydicom
import numpy as np
import cv2
from tqdm import tqdm

from util.seg_util import get_segmented_lungs,normalize_hu_values
from util.image_util import rescale_patient_images


def is_dicom_file(filename):
    '''
       if current file is a dicom file
    :param filename:      file need to be judged
    :return:
    '''
    file_stream = open(filename, 'rb')
    file_stream.seek(128)
    data = file_stream.read(4)
    file_stream.close()
    if data == b'DICM':
        return True
    return False

def get_dicom_thickness(dicom_slices):
    """
        计算切片厚度
    :param dicom_slices:    dicom 读取的 dicom数据
    :return:
    """
    if len(dicom_slices) > 1:
        try:
            slice_thickness = abs(dicom_slices[0].ImagePositionPatient[2] - dicom_slices[1].ImagePositionPatient[2])
        except:
            try:
                slice_thickness = abs(dicom_slices[0].SliceLocation - dicom_slices[1].SliceLocation)
            except:
                # 如果无法计算,尝试从SliceThickness标签中获取
                try:
                    slice_thickness = float(dicom_slices[0].SliceThickness)
                except:
                    print("警告: 无法确定切片厚度,使用默认值1.0mm")
                    slice_thickness = 1.0
    else:
        try:
            slice_thickness = float(dicom_slices[0].SliceThickness)
        except:
            print("警告: 只有一个切片,无法计算切片厚度,使用默认值1.0mm")
            slice_thickness = 1.0
    return slice_thickness

def load_dicom_slices(dicom_path):
    """
        load dicom file path and stack into list

    :param dicom_path:     a dicom path
    :return:            dicom list
    """
    dicom_files = []
    for root, _, files in os.walk(dicom_path):
        for file in files:
            if file.lower().endswith(('.dcm', '.dicom')):
                real_file = os.path.join(dicom_path, root, file)
                current_if_dicom = is_dicom_file(real_file)
                if current_if_dicom:
                    dicom_files.append(real_file)
    if not dicom_files:
        raise ValueError(f"在路径 {dicom_path} 中未找到DICOM文件")
    # 加载所有切片
    slices = []
    for file in dicom_files:
        try:
            ds = pydicom.dcmread(file)
            slices.append(ds)
        except Exception as e:
            print(f"无法读取DICOM文件 {file}: {e}")
    # # 按照Z轴位置排序切片
    slices.sort(key=lambda x: int(x.InstanceNumber))
    slice_thickness = get_dicom_thickness(slices)
    for s in slices:
        s.SliceThickness = slice_thickness
    return slices

def get_pixels_hu(slices):
    '''
        transfer dicom array to pixel array,and remove border(HU==-2000)

    :param slices:  dicom list
    :return:        pixel array of one patient's dicom
    '''
    image = np.stack([s.pixel_array for s in slices])
    image = image.astype(np.int16)
    image[image == -2000] = 0
    for slice_number in range(len(slices)):
        intercept = slices[slice_number].RescaleIntercept
        slope = slices[slice_number].RescaleSlope
        if slope != 1:
            image[slice_number] = slope * image[slice_number].astype(np.float64)
            image[slice_number] = image[slice_number].astype(np.int16)
        image[slice_number] += np.int16(intercept)
    return np.array(image, dtype=np.int16)


def getinfo_dicom(dicom_path):
    print('dicom_path: ', dicom_path)
    slices = load_dicom_slices(dicom_path)
    print(type(slices[0]), slices[0].ImagePositionPatient)
    print(len(slices), "\t", slices[0].SliceThickness, "\t", slices[0].PixelSpacing)
    print("Orientation: ", slices[0].ImageOrientationPatient)
    #assert slices[0].ImageOrientationPatient == [1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000]
    pixels = get_pixels_hu(slices)
    image = pixels
    print(image.shape)

    invert_order = slices[1].ImagePositionPatient[2] > slices[0].ImagePositionPatient[2]
    print("Invert order: ", invert_order, " - ", slices[1].ImagePositionPatient[2], ",",
          slices[0].ImagePositionPatient[2])

    pixel_spacing = slices[0].PixelSpacing
    pixel_spacing.append(slices[0].SliceThickness)
    # save dicom source image size
    dicom_size = [image.shape[0], image.shape[1], image.shape[2]]

    return pixel_spacing, dicom_size, invert_order

def extract_dicom_images_patient(dicom_path, target_dir):
    slices = load_dicom_slices(dicom_path)
    assert slices[0].ImageOrientationPatient == [1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000]
    pixels = get_pixels_hu(slices)
    image = pixels
    invert_order = slices[1].ImagePositionPatient[2] > slices[0].ImagePositionPatient[2]
    pixel_spacing = slices[0].PixelSpacing
    pixel_spacing.append(slices[0].SliceThickness)
    # save dicom source image size
    dicom_size = [image.shape[0], image.shape[1], image.shape[2]]
    image = rescale_patient_images(image, pixel_spacing)
    png_size = [image.shape[0], image.shape[1], image.shape[2]]
    if not invert_order:
        image = np.flipud(image)
    if not os.path.exists(target_dir):
        os.mkdir(target_dir)
    else:
        print("png dir already exists, return directly")
        return pixel_spacing, dicom_size, png_size, invert_order
    png_files = glob.glob(target_dir + "*.png")
    for file in png_files:
        os.remove(file)
    for i in tqdm(range(image.shape[0])):
        img_path = patient_dir + "/img_" + str(i).rjust(4, '0') + "_i.png"
        org_img = image[i]
        img, mask = get_segmented_lungs(org_img.copy())
        org_img = normalize_hu_values(org_img)
        cv2.imwrite(img_path, org_img * 255)
        cv2.imwrite(img_path.replace("_i.png", "_m.png"), mask * 255)
    return pixel_spacing, dicom_size, png_size,invert_order



================================================
FILE: deploy/backend/util/image_util.py
================================================
from typing import Tuple

import cv2
import os
import numpy
import glob
import random
import numpy as np
from scipy import ndimage

def get_normalized_img_unit8(img):
    img = img.astype(numpy.float)
    min = img.min()
    max = img.max()
    img -= min
    img /= max - min
    img *= 255
    res = img.astype(numpy.uint8)
    return res


def load_patient_images(png_path, wildcard="*.*", exclude_wildcards=[]):
    print("png path is\t",png_path)
    src_dir = png_path
    src_img_paths = glob.glob(src_dir +'/'+ wildcard)
    for exclude_wildcard in exclude_wildcards:
        exclude_img_paths = glob.glob(src_dir + exclude_wildcard)
        src_img_paths = [im for im in src_img_paths if im not in exclude_img_paths]
    src_img_paths.sort()

    images = [cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) for img_path in src_img_paths]
    images = [im.reshape((1, ) + im.shape) for im in images]
    res = numpy.vstack(images)
    return res


def draw_overlay(png_path: str, p_x: float, p_y: float, p_z: float, index: str,  BOX_size:int = 20) -> None:
    """
    在图像上绘制覆盖层
    Args:
        png_path: PNG图像路径
        p_x: X坐标(百分比)
        p_y: Y坐标(百分比)
        p_z: Z坐标(百分比)
        index: 索引标识
        :param BOX_size:
    """
    patient_img = load_patient_images(png_path + "/png/", "*_i.png", [])
    z = int(p_z * patient_img.shape[0])
    y = int(p_y * patient_img.shape[1])
    x = int(p_x * patient_img.shape[2])
     # 包围盒大小
    x1 = x - BOX_size
    y1 = y - BOX_size
    x2 = x + BOX_size
    y2 = y + BOX_size
    target_img = patient_img[z, :, :]
    cv2.rectangle(target_img, (x1, y1), (x2, y2), (255, 0, 0), 1)
    cv2.imwrite(png_path + "/" + index + ".png", target_img)

def prepare_image_for_net3D(img,MEAN_PIXEL_VALUE = 41):
    '''
        normalization of image (average and zero center)

    :param img:               image to be normalization
    :param MEAN_PIXEL_VALUE:
    :return:
    '''
    img = img.astype(numpy.float32)
    img -= MEAN_PIXEL_VALUE
    img /= 255.
    img = img.reshape(1, img.shape[0], img.shape[1], img.shape[2], 1)
    return img


def move_png2dir(target_dir):
    import shutil
    first_dir = []
    for path in os.listdir(target_dir):
        if os.path.isdir(os.path.join(target_dir,path)):
            first_dir.append(os.path.join(target_dir,path))
    for d in first_dir:
        tmp_path = []
        for file in os.listdir(d):
            tmp_file_path = os.path.join(d,file)
            png_path = os.path.join(d,'png')
            if not os.path.exists(png_path):
                os.mkdir(png_path)
            if tmp_file_path.endswith(".png"):
                shutil.move(tmp_file_path,os.path.join(png_path,file))
                print("move file from %s  to   %s " %(tmp_file_path,os.path.join(png_path,file)))

def rescale_patient_images(images_zyx, org_spacing_xyz, target_voxel_mm =1.0, is_mask_image=False, verbose=False):
    '''
                rescale a 3D image to specified size

    :param images_zyx:              source image
    :param org_spacing_xyz:
    :param target_voxel_mm:
    :param is_mask_image:
    :param verbose:
    :return:
    '''
    if verbose:
        print("Spacing: ", org_spacing_xyz)
        print("Shape: ", images_zyx.shape)

    # print "Resizing dim z"
    resize_x = 1.0
    resize_y = float(org_spacing_xyz[2]) / float(target_voxel_mm)
    interpolation = cv2.INTER_NEAREST if is_mask_image else cv2.INTER_LINEAR
    res = cv2.resize(images_zyx, dsize=None, fx=resize_x, fy=resize_y, interpolation=interpolation)  # opencv assumes y, x, channels umpy array, so y = z pfff
    res = res.swapaxes(0, 2)
    res = res.swapaxes(0, 1)
    # print "Shape: ", res.shape
    resize_x = float(org_spacing_xyz[0]) / float(target_voxel_mm)
    resize_y = float(org_spacing_xyz[1]) / float(target_voxel_mm)

    # cv2 can handle max 512 channels..
    if res.shape[2] > 512:
        res = res.swapaxes(0, 2)
        res1 = res[:256]
        res2 = res[256:]
        res1 = res1.swapaxes(0, 2)
        res2 = res2.swapaxes(0, 2)
        res1 = cv2.resize(res1, dsize=None, fx=resize_x, fy=resize_y, interpolation=interpolation)
        res2 = cv2.resize(res2, dsize=None, fx=resize_x, fy=resize_y, interpolation=interpolation)
        res1 = res1.swapaxes(0, 2)
        res2 = res2.swapaxes(0, 2)
        res = numpy.vstack([res1, res2])
        res = res.swapaxes(0, 2)
    else:
        res = cv2.resize(res, dsize=None, fx=resize_x, fy=resize_y, interpolation=interpolation)
    res = res.swapaxes(0, 2)
    res = res.swapaxes(2, 1)
    if verbose:
        print("Shape after: ", res.shape)
    return res


def rescale_patient_images2(images_zyx, target_shape, verbose=False):
    if verbose:
        print("Target: ", target_shape)
        print("Shape: ", images_zyx.shape)

    # print "Resizing dim z"
    resize_x = 1.0
    interpolation = cv2.INTER_NEAREST if False else cv2.INTER_LINEAR
    res = cv2.resize(images_zyx, dsize=(target_shape[1], target_shape[0]), interpolation=interpolation)  # opencv assumes y, x, channels umpy array, so y = z pfff
    # print "Shape is now : ", res.shape

    res = res.swapaxes(0, 2)
    res = res.swapaxes(0, 1)

    # cv2 can handle max 512 channels..
    if res.shape[2] > 512:
        res = res.swapaxes(0, 2)
        res1 = res[:256]
        res2 = res[256:]
        res1 = res1.swapaxes(0, 2)
        res2 = res2.swapaxes(0, 2)
        res1 = cv2.resize(res1, dsize=(target_shape[2], target_shape[1]), interpolation=interpolation)
        res2 = cv2.resize(res2, dsize=(target_shape[2], target_shape[1]), interpolation=interpolation)
        res1 = res1.swapaxes(0, 2)
        res2 = res2.swapaxes(0, 2)
        res = numpy.vstack([res1, res2])
        res = res.swapaxes(0, 2)
    else:
        res = cv2.resize(res, dsize=(target_shape[2], target_shape[1]), interpolation=interpolation)

    res = res.swapaxes(0, 2)
    res = res.swapaxes(2, 1)
    if verbose:
        print("Shape after: ", res.shape)
    return res


def resize_image(image: np.ndarray, new_shape: Tuple[int, ...]) -> np.ndarray:
    """
    调整图像大小

    Args:
        image: 输入图像
        new_shape: 新形状

    Returns:
        np.ndarray: 调整大小后的图像
    """
    # 处理单通道或多通道图像
    if len(image.shape) == 3 and len(new_shape) == 2:
        # 处理3D图像调整为2D
        resized_image = np.zeros((image.shape[0], new_shape[0], new_shape[1]))
        for i in range(image.shape[0]):
            resized_image[i] = cv2.resize(image[i], (new_shape[1], new_shape[0]))
        return resized_image
    elif len(image.shape) == 2 and len(new_shape) == 2:
        # 处理2D图像
        return cv2.resize(image, (new_shape[1], new_shape[0]))
    else:
        # 处理任意维度图像
        resize_factor = tuple(n / o for n, o in zip(new_shape, image.shape))
        return ndimage.zoom(image, resize_factor, mode='nearest')

def cv_flip(img,cols,rows,degree):
    '''
        flip image by degree

    :param img:         image array to be fliped
    :param cols:        width of image
    :param rows:        height of image
    :param degree:      degree to flip
    :return:
    '''
    M = cv2.getRotationMatrix2D((cols / 2, rows /2), degree, 1.0)
    dst = cv2.warpAffine(img, M, (cols, rows))
    return dst


def random_rotate_img(img, chance, min_angle, max_angle):
    '''
        random rotation an image

    :param img:         image to be rotated
    :param chance:      random probability
    :param min_angle:   min angle to rotate
    :param max_angle:   max angle to rotate
    :return:            image after random rotated
    '''
    import cv2
    if random.random() > chance:
        return img
    if not isinstance(img, list):
        img = [img]

    angle = random.randint(min_angle, max_angle)
    center = (img[0].shape[0] / 2, img[0].shape[1] / 2)
    rot_matrix = cv2.getRotationMatrix2D(center, angle, scale=1.0)

    res = []
    for img_inst in img:
        img_inst = cv2.warpAffine(img_inst, rot_matrix, dsize=img_inst.shape[:2], borderMode=cv2.BORDER_CONSTANT)
        res.append(img_inst)
    if len(res) == 0:
        res = res[0]
    return res


def random_flip_img(img, horizontal_chance=0, vertical_chance=0):
    '''
        random flip image,both on horizontal and vertical

    :param img:                 image to be flipped
    :param horizontal_chance:   flip probability to flipped on horizontal direction
    :param vertical_chance:     flip probability to flipped on vertical  direction
    :return:                    image after flipped
    '''
    import cv2
    flip_horizontal = False
    if random.random() < horizontal_chance:
        flip_horizontal = True

    flip_vertical = False
    if random.random() < vertical_chance:
        flip_vertical = True

    if not flip_horizontal and not flip_vertical:
        return img

    flip_val = 1
    if flip_vertical:
        flip_val = -1 if flip_horizontal else 0

    if not isinstance(img, list):
        res = cv2.flip(img, flip_val)  # 0 = X axis, 1 = Y axis,  -1 = both
    else:
        res = []
        for img_item in img:
            img_flip = cv2.flip(img_item, flip_val)
            res.append(img_flip)
    return res


def random_scale_img(img, xy_range, lock_xy=False):
    if random.random() > xy_range.chance:
        return img

    if not isinstance(img, list):
        img = [img]

    import cv2
    scale_x = random.uniform(xy_range.x_min, xy_range.x_max)
    scale_y = random.uniform(xy_range.y_min, xy_range.y_max)
    if lock_xy:
        scale_y = scale_x

    org_height, org_width = img[0].shape[:2]
    xy_range.last_x = scale_x
    xy_range.last_y = scale_y

    res = []
    for img_inst in img:
        scaled_width = int(org_width * scale_x)
        scaled_height = int(org_height * scale_y)
        scaled_img = cv2.resize(img_inst, (scaled_width, scaled_height), interpolation=cv2.INTER_CUBIC)
        if scaled_width < org_width:
            extend_left = (org_width - scaled_width) / 2
            extend_right = org_width - extend_left - scaled_width
            scaled_img = cv2.copyMakeBorder(scaled_img, 0, 0, extend_left, extend_right, borderType=cv2.BORDER_CONSTANT)
            scaled_width = org_width

        if scaled_height < org_height:
            extend_top = (org_height - scaled_height) / 2
            extend_bottom = org_height - extend_top - scaled_height
            scaled_img = cv2.copyMakeBorder(scaled_img, extend_top, extend_bottom, 0, 0, borderType=cv2.BORDER_CONSTANT)
            scaled_height = org_height

        start_x = (scaled_width - org_width) / 2
        start_y = (scaled_height - org_height) / 2
        tmp = scaled_img[start
Download .txt
gitextract_77wp8_t7/

├── README.md
├── data/
│   ├── __init__.py
│   ├── dataclass/
│   │   ├── CTData.py
│   │   ├── NoduleCube.py
│   │   └── __init__.py
│   └── preprocessing/
│       ├── __init__.py
│       ├── lidc_process/
│       │   ├── README.md
│       │   ├── __init__.py
│       │   ├── lidc_annotation_process.py
│       │   └── lidc_coordinate_process.py
│       ├── luna16_invalid_nodule_filter.py
│       └── luna16_prepare_cube_data.py
├── deploy/
│   ├── README.md
│   ├── backend/
│   │   ├── app.py
│   │   ├── data/
│   │   │   └── c3d_nodule_detect.pth
│   │   ├── dataclass/
│   │   │   ├── CTData.py
│   │   │   ├── NoduleCube.py
│   │   │   └── __init__.py
│   │   ├── detector.py
│   │   ├── models/
│   │   │   ├── pytorch_c3d_tiny.py
│   │   │   └── pytorch_nodule_detector.py
│   │   ├── preprocessing/
│   │   │   ├── __init__.py
│   │   │   └── luna16_invalid_nodule_filter.py
│   │   ├── util/
│   │   │   ├── __init__.py
│   │   │   ├── dicom_util.py
│   │   │   ├── image_util.py
│   │   │   ├── mhd_util.py
│   │   │   └── seg_util.py
│   │   └── utils.py
│   ├── frontend/
│   │   ├── css/
│   │   │   └── style.css
│   │   ├── index.html
│   │   └── js/
│   │       └── main.js
│   └── run.py
├── inference/
│   ├── __init__.py
│   ├── c3d_classify_result-1.3.6.1.4.1.14519.5.2.1.6279.6001.149041668385192796520281592139.csv
│   ├── classifier.py
│   ├── detector.py
│   ├── negative_sample_selection.py
│   └── pytorch_nodule_detector.py
├── models/
│   ├── __init__.py
│   └── pytorch_c3d_tiny.py
├── training/
│   ├── __init__.py
│   ├── pytorch_logs/
│   │   └── training_20250331_223230.log
│   └── train_c3d_pytorch.py
├── training_metrics.txt
└── util/
    ├── __init__.py
    ├── dicom_util.py
    ├── image_util.py
    ├── mhd_util.py
    ├── progress_watch.py
    └── seg_util.py
Download .txt
SYMBOL INDEX (267 symbols across 31 files)

FILE: data/dataclass/CTData.py
  class CTFormat (line 12) | class CTFormat(Enum):
  class CTData (line 17) | class CTData:
    method __init__ (line 22) | def __init__(self):
    method from_dicom (line 38) | def from_dicom(cls, dicom_path):
    method from_mhd (line 80) | def from_mhd(cls, mhd_path):
    method convert_to_hu (line 112) | def convert_to_hu(self):
    method resample_pixel (line 130) | def resample_pixel(self, new_spacing=[1, 1, 1]):
    method filter_lung_img_mask (line 165) | def filter_lung_img_mask(self):
    method world_to_voxel (line 180) | def world_to_voxel(self, world_coord):
    method voxel_to_world (line 196) | def voxel_to_world(self, voxel_coord):
    method extract_cube (line 212) | def extract_cube(self, center_world_mm, size_mm,if_fixed_radius = False):
    method visualize_slice (line 256) | def visualize_slice(self, slice_idx=None, axis=0, show_lung_only=False):
    method visualize_nodule (line 298) | def visualize_nodule(self, coord_x,coord_y, coord_z, diameter):
    method save_as_nifti (line 333) | def save_as_nifti(self, output_path):

FILE: data/dataclass/NoduleCube.py
  function normal_cube_to_tensor (line 11) | def normal_cube_to_tensor(cube_data):
  class NoduleCube (line 36) | class NoduleCube:
    method __post_init__ (line 56) | def __post_init__(self):
    method load_from_npy (line 65) | def load_from_npy(self) -> None:
    method save_to_npy (line 85) | def save_to_npy(self, output_path: str) -> str:
    method save_to_png (line 103) | def save_to_png(self, output_path: str) -> str:
    method load_from_png (line 157) | def load_from_png(self) -> None:
    method set_cube_data (line 201) | def set_cube_data(self, pixel_data: np.ndarray) -> None:
    method resize (line 219) | def resize(self, new_size: int) -> None:
    method augment (line 238) | def augment(self, rotation: bool = True, flip_axis: int = -1, noise: b...
    method visualize_3d (line 290) | def visualize_3d(self, output_path: Optional[str] = None, show: bool =...
    method from_npy (line 355) | def from_npy(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
    method from_png (line 371) | def from_png(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
    method from_array (line 387) | def from_array(cls,

FILE: data/preprocessing/lidc_process/lidc_annotation_process.py
  function merge_nodule_annotation_csv_to_one (line 22) | def merge_nodule_annotation_csv_to_one(nodule_annotation_csv_list,save_f...
  function read_nodule_annotation_from_xml (line 44) | def read_nodule_annotation_from_xml(xml_path,patient_mhd_path_dict,agree...
  function process_lidc_annotations (line 201) | def process_lidc_annotations(xml_annotation_like,patient_mhd_path_dict,m...
  function extract_lidc_every_z_annotations (line 227) | def extract_lidc_every_z_annotations(xml_like,every_z_save_csv,patient_m...
  function extract_every_z_from_lidc_xml (line 238) | def extract_every_z_from_lidc_xml(xml_path,patient_mhd_path_dict):

FILE: data/preprocessing/lidc_process/lidc_coordinate_process.py
  function draw_percent_cube_by_csv (line 15) | def draw_percent_cube_by_csv(percent_csv,mhd_info_csv,cube_save_path):
  function percent_coordinatecsv_to_mmcsv (line 40) | def percent_coordinatecsv_to_mmcsv(percent_csv,mhd_info_csv,mmcsv_save):
  function avg_coordinates (line 75) | def avg_coordinates(csv,threshold,csv_save):
  function add_final_mals (line 169) | def add_final_mals(csv,with_real_malsclabel_csv):
  function percent_coordinate_to_mm (line 253) | def percent_coordinate_to_mm(patient_id,p_x,p_y,p_z,mhd_info_csv):
  function draw_all_confirmed_cubes (line 282) | def draw_all_confirmed_cubes(mm_coordinates_csv,mhd_info_csv,extract_png...

FILE: data/preprocessing/luna16_invalid_nodule_filter.py
  function nodule_valid (line 3) | def nodule_valid(ct_data, voxel_coord_x, voxel_coord_y,voxel_coord_z):

FILE: data/preprocessing/luna16_prepare_cube_data.py
  function get_mhd_file_path (line 12) | def get_mhd_file_path(patient_id, luna16_root="H:/luna16"):
  function get_real_candidate (line 37) | def get_real_candidate(mhd_root_dir, annotation_csv, candidate_csv,save_...
  function ctdata_annotation2nodule (line 97) | def ctdata_annotation2nodule(ct_data, nodule_info, mal_label, cube_size=...
  function process_nodule (line 134) | def process_nodule(nodule, cube_index,mhd_root_dir, label, if_aug, png_o...
  function prepare_cubes_mp (line 266) | def prepare_cubes_mp(mhd_root_dir, annotation_csv, label, png_output, np...
  function main (line 351) | def main():

FILE: deploy/backend/app.py
  function on_detection_completed (line 58) | def on_detection_completed(session_id, results):
  function update_session_state (line 104) | def update_session_state(session_id, state):
  function get_session_state (line 131) | def get_session_state(session_id):
  function allowed_file (line 142) | def allowed_file(filename):
  function index (line 148) | def index():
  function static_files (line 154) | def static_files(path):
  function upload_file (line 160) | def upload_file():
  function detect_file_type (line 262) | def detect_file_type(directory):
  function start_preprocessing (line 298) | def start_preprocessing(session_id):
  function start_detection (line 401) | def start_detection():
  function get_progress (line 460) | def get_progress(session_id):
  function get_results (line 502) | def get_results(session_id):
  function get_nodule_data (line 564) | def get_nodule_data(session_id, nodule_id):
  function generate_nodule_images (line 599) | def generate_nodule_images(session_id, nodules, lung_img):
  function update_session_state (line 792) | def update_session_state(session_id, state):
  function save_slice_box (line 820) | def save_slice_box(session_id,lung_seg,voxel_coords,radius = 32):
  function save_lung_segmentation_slices (line 851) | def save_lung_segmentation_slices(session_id, lung_seg):
  function get_lung_z_slice (line 915) | def get_lung_z_slice(session_id, slice_index):
  function get_lung_slices_info (line 940) | def get_lung_slices_info(session_id):

FILE: deploy/backend/dataclass/CTData.py
  class CTFormat (line 12) | class CTFormat(Enum):
  class CTData (line 17) | class CTData:
    method __init__ (line 22) | def __init__(self):
    method from_dicom (line 38) | def from_dicom(cls, dicom_path):
    method from_mhd (line 80) | def from_mhd(cls, mhd_path):
    method convert_to_hu (line 112) | def convert_to_hu(self):
    method resample_pixel (line 130) | def resample_pixel(self, new_spacing=[1, 1, 1]):
    method filter_lung_img_mask (line 165) | def filter_lung_img_mask(self):
    method world_to_voxel (line 180) | def world_to_voxel(self, world_coord):
    method voxel_to_world (line 196) | def voxel_to_world(self, voxel_coord):
    method extract_cube (line 212) | def extract_cube(self, center_world_mm, size_mm,if_fixed_radius = False):
    method visualize_slice (line 256) | def visualize_slice(self, slice_idx=None, axis=0, show_lung_only=False):
    method visualize_nodule (line 298) | def visualize_nodule(self, coord_x,coord_y, coord_z, diameter):
    method save_as_nifti (line 333) | def save_as_nifti(self, output_path):

FILE: deploy/backend/dataclass/NoduleCube.py
  function normal_cube_to_tensor (line 11) | def normal_cube_to_tensor(cube_data):
  class NoduleCube (line 36) | class NoduleCube:
    method __post_init__ (line 56) | def __post_init__(self):
    method load_from_npy (line 65) | def load_from_npy(self) -> None:
    method save_to_npy (line 85) | def save_to_npy(self, output_path: str) -> str:
    method save_to_png (line 103) | def save_to_png(self, output_path: str) -> str:
    method load_from_png (line 157) | def load_from_png(self) -> None:
    method set_cube_data (line 201) | def set_cube_data(self, pixel_data: np.ndarray) -> None:
    method resize (line 219) | def resize(self, new_size: int) -> None:
    method augment (line 238) | def augment(self, rotation: bool = True, flip_axis: int = -1, noise: b...
    method visualize_3d (line 290) | def visualize_3d(self, output_path: Optional[str] = None, show: bool =...
    method from_npy (line 355) | def from_npy(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
    method from_png (line 371) | def from_png(cls, file_path: str, cube_size: int = 64) -> 'NoduleCube':
    method from_array (line 387) | def from_array(cls,

FILE: deploy/backend/detector.py
  class NoduleDetector (line 41) | class NoduleDetector:
    method __init__ (line 44) | def __init__(self, model_path=None, device='cuda'):
    method load_model (line 63) | def load_model(self, model_path):
    method detect (line 84) | def detect(self, file_path, session_id, patient_id=None):
    method _detect_thread (line 127) | def _detect_thread(self, file_path, session_id, patient_id):
    method _create_progress_logger (line 242) | def _create_progress_logger(self, session_id):
    method _extract_nodule_cubes (line 288) | def _extract_nodule_cubes(self, ct_data, nodules_df, cube_size=32):
    method _update_session (line 348) | def _update_session(self, session_id, updates):
    method get_session_state (line 364) | def get_session_state(self, session_id):
    method set_completion_callback (line 396) | def set_completion_callback(self, callback):
  function get_detector_instance (line 404) | def get_detector_instance(model_path=None):

FILE: deploy/backend/models/pytorch_c3d_tiny.py
  class C3dTiny (line 11) | class C3dTiny(nn.Module):
    method __init__ (line 12) | def __init__(self):
    method forward (line 66) | def forward(self, x):

FILE: deploy/backend/models/pytorch_nodule_detector.py
  function setup_logger (line 21) | def setup_logger(log_dir="./inference_logs"):
  function load_ct_data (line 49) | def load_ct_data(file_path):
  function load_model (line 75) | def load_model(evaL_model_path, device='cuda'):
  function get_lung_bounds (line 92) | def get_lung_bounds(lung_mask):
  function scan_ct_data (line 151) | def scan_ct_data(ct_data, model, device, logger, step=SCAN_STEP):
  function process_batch (line 256) | def process_batch(batch_inputs, batch_positions, model, device, ct_data,...
  function reduce_overlapping_nodules (line 286) | def reduce_overlapping_nodules(results_df, distance_threshold=15):
  function filter_false_positives (line 328) | def filter_false_positives(nodules_df, ct_data, max_nodules=10):
  function format_results (line 364) | def format_results(results_df, ct_data, patient_id):
  function detect_nodules (line 405) | def detect_nodules(file_path, model_path, detect_patient_id=None, device...

FILE: deploy/backend/preprocessing/luna16_invalid_nodule_filter.py
  function nodule_valid (line 4) | def nodule_valid(ct_data, voxel_coord_x, voxel_coord_y,voxel_coord_z):

FILE: deploy/backend/util/dicom_util.py
  function is_dicom_file (line 12) | def is_dicom_file(filename):
  function get_dicom_thickness (line 26) | def get_dicom_thickness(dicom_slices):
  function load_dicom_slices (line 53) | def load_dicom_slices(dicom_path):
  function get_pixels_hu (line 85) | def get_pixels_hu(slices):
  function getinfo_dicom (line 105) | def getinfo_dicom(dicom_path):
  function extract_dicom_images_patient (line 127) | def extract_dicom_images_patient(dicom_path, target_dir):

FILE: deploy/backend/util/image_util.py
  function get_normalized_img_unit8 (line 11) | def get_normalized_img_unit8(img):
  function load_patient_images (line 22) | def load_patient_images(png_path, wildcard="*.*", exclude_wildcards=[]):
  function draw_overlay (line 37) | def draw_overlay(png_path: str, p_x: float, p_y: float, p_z: float, inde...
  function prepare_image_for_net3D (line 61) | def prepare_image_for_net3D(img,MEAN_PIXEL_VALUE = 41):
  function move_png2dir (line 76) | def move_png2dir(target_dir):
  function rescale_patient_images (line 93) | def rescale_patient_images(images_zyx, org_spacing_xyz, target_voxel_mm ...
  function rescale_patient_images2 (line 141) | def rescale_patient_images2(images_zyx, target_shape, verbose=False):
  function resize_image (line 178) | def resize_image(image: np.ndarray, new_shape: Tuple[int, ...]) -> np.nd...
  function cv_flip (line 204) | def cv_flip(img,cols,rows,degree):
  function random_rotate_img (line 219) | def random_rotate_img(img, chance, min_angle, max_angle):
  function random_flip_img (line 248) | def random_flip_img(img, horizontal_chance=0, vertical_chance=0):
  function random_scale_img (line 283) | def random_scale_img(img, xy_range, lock_xy=False):
  class XYRange (line 325) | class XYRange:
    method __init__ (line 326) | def __init__(self, x_min, x_max, y_min, y_max, chance=1.0):
    method get_last_xy_txt (line 335) | def get_last_xy_txt(self):
  function random_translate_img (line 341) | def random_translate_img(img, xy_range, border_mode="constant"):
  function data_augmentation (line 368) | def data_augmentation(image: np.ndarray, augment_type: str = 'random') -...

FILE: deploy/backend/util/mhd_util.py
  function get_all_mhd_file (line 17) | def get_all_mhd_file(BASE_DATA_DIR,base_head,max):
  function get_luna16_mhd_file (line 38) | def get_luna16_mhd_file(mhd_root):
  function read_csv_to_pandas (line 52) | def read_csv_to_pandas(mhd_info,col_sepator ='\t'):
  function extract_image_from_mhd (line 71) | def extract_image_from_mhd(mhd_file_path,png_save_path_root =None):

FILE: deploy/backend/util/seg_util.py
  function normalize_hu_values (line 10) | def normalize_hu_values(image: np.ndarray, min_bound: int = -1000, max_b...
  function get_segmented_lungs (line 27) | def get_segmented_lungs(im, plot=False):

FILE: deploy/backend/utils.py
  class CTData (line 25) | class CTData:
    method __init__ (line 28) | def __init__(self):
    method from_dicom (line 38) | def from_dicom(cls, dicom_path):
    method from_mhd (line 87) | def from_mhd(cls, mhd_path):
    method resample_pixels (line 112) | def resample_pixels(self, new_spacing=[1.0, 1.0, 1.0]):
    method convert_to_hu (line 146) | def convert_to_hu(self):
    method filter_lung_img_mask (line 165) | def filter_lung_img_mask(self, threshold=-320):
    method fill_body_mask (line 214) | def fill_body_mask(self, threshold_image):
  function extract_lung_from_image (line 234) | def extract_lung_from_image(sitk_image):
  function segment_lung (line 253) | def segment_lung(ct_array):
  function extract_lung_from_file (line 309) | def extract_lung_from_file(file_path):
  function prepare_data_for_3d_rendering (line 350) | def prepare_data_for_3d_rendering(lung_data):

FILE: deploy/frontend/js/main.js
  function initializeSteps (line 27) | function initializeSteps() {
  function setupFileUpload (line 33) | function setupFileUpload() {
  function initializeEventListeners (line 79) | function initializeEventListeners() {
  function getFileExtension (line 115) | function getFileExtension(filename) {
  function formatFileSize (line 120) | function formatFileSize(bytes) {
  function uploadFile (line 129) | function uploadFile(file) {
  function startDetection (line 214) | async function startDetection() {
  function startProgressPolling (line 258) | function startProgressPolling() {
  function updateProgress (line 317) | function updateProgress(progress, message) {
  function loadLungSegmentation (line 333) | function loadLungSegmentation() {
  function initCTViewer (line 384) | function initCTViewer() {
  function handleSliceScroll (line 426) | function handleSliceScroll(event) {
  function changeSlice (line 437) | function changeSlice(delta) {
  function loadSlice (line 457) | function loadSlice(sliceIndex) {
  function fetchResults (line 498) | function fetchResults() {
  function updateNoduleList (line 553) | function updateNoduleList(nodules) {
  function createNoduleMarkers (line 633) | function createNoduleMarkers() {
  function highlightNoduleInList (line 741) | function highlightNoduleInList(noduleId) {
  function loadNoduleDetails (line 758) | function loadNoduleDetails(noduleId) {
  function resetView (line 879) | function resetView() {
  function showMessage (line 899) | function showMessage(message, type = 'info') {
  function updateUIState (line 938) | function updateUIState(state) {

FILE: deploy/run.py
  function get_python_command (line 8) | def get_python_command():
  function check_dependencies (line 15) | def check_dependencies():
  function create_directories (line 29) | def create_directories():
  function run_backend_server (line 35) | def run_backend_server():
  function main (line 65) | def main():

FILE: inference/classifier.py
  function scan (line 16) | def scan(dicom_path, only_patient_id, workspace):
  function generate_ggn_class (line 75) | def generate_ggn_class(ii_index):
  function multipule_test (line 82) | def multipule_test(workspace, only_patient_id, CONTINUE_JOB):
  function locate_malignancy (line 96) | def locate_malignancy(png_path, model_weight,CONTINUE_JOB, only_patient_id,
  function filter_patient_nodules_predictions (line 198) | def filter_patient_nodules_predictions(df_nodule_predictions: pandas.Dat...
  function reduce_predicts_same_slice (line 244) | def reduce_predicts_same_slice(pred_nodules_df):

FILE: inference/detector.py
  function filter_patient_nodules_predictions (line 48) | def filter_patient_nodules_predictions(df_nodule_predictions: pandas.Dat...
  function predict_cubes (line 94) | def predict_cubes(png_path, model_path, only_patient_id=None,  magnifica...
  function reduce_predicts_same_slice (line 192) | def reduce_predicts_same_slice(pred_nodules_df):
  function multipule_test (line 218) | def multipule_test(workspace, only_patient_id, CONTINUE_JOB):
  function draw_overlay_dicom (line 235) | def draw_overlay_dicom(pixels,  coord_x, coord_y, coord_z, i, pixel_spac...
  function get_papaya_coords (line 264) | def get_papaya_coords(coord_x, coord_y, coord_z, nodule_chance, pixel_sp...
  function get_papaya_coords_only (line 295) | def get_papaya_coords_only(coord_x, coord_y, coord_z, nodule_chance, pix...
  function run (line 326) | def run(dicom_path, png_save_dir):
  function predict_nodule_type (line 347) | def predict_nodule_type(df, png_size, png_dir):
  function generate_ggn_class (line 375) | def generate_ggn_class(ii_index):
  function scan (line 388) | def scan(dicom_path, only_patient_id, workspace, file_type="dicom"):
  function scan_only (line 474) | def scan_only(dicom_path, only_patient_id, workspace, file_type="dicom"):

FILE: inference/negative_sample_selection.py
  function variance_of_laplacian (line 13) | def variance_of_laplacian(image):
  function texture_score (line 20) | def texture_score(cube):
  function is_valid_negative_sample (line 42) | def is_valid_negative_sample(cube, lung_mask=None, nodule_coords=None, m...
  function select_negative_samples_from_ct (line 76) | def select_negative_samples_from_ct(ct_data, nodule_coords=None, num_sam...
  function generate_negative_samples (line 200) | def generate_negative_samples(ct_paths, output_dir, nodules_csv=None, sa...
  function visualize_samples (line 268) | def visualize_samples(samples, output_path=None, cols=5):

FILE: inference/pytorch_nodule_detector.py
  function setup_logger (line 21) | def setup_logger(log_dir="./inference_logs"):
  function load_ct_data (line 49) | def load_ct_data(file_path):
  function load_model (line 75) | def load_model(evaL_model_path, device='cuda'):
  function get_lung_bounds (line 92) | def get_lung_bounds(lung_mask):
  function scan_ct_data (line 151) | def scan_ct_data(ct_data, model, device, logger, step=SCAN_STEP):
  function process_batch (line 257) | def process_batch(batch_inputs, batch_positions, model, device, ct_data,...
  function reduce_overlapping_nodules (line 287) | def reduce_overlapping_nodules(results_df, distance_threshold=15):
  function filter_false_positives (line 329) | def filter_false_positives(nodules_df, ct_data, max_nodules=10):
  function format_results (line 365) | def format_results(results_df, ct_data, patient_id):
  function detect_nodules (line 406) | def detect_nodules(file_path, model_path, detect_patient_id=None, device...

FILE: models/pytorch_c3d_tiny.py
  class C3dTiny (line 11) | class C3dTiny(nn.Module):
    method __init__ (line 12) | def __init__(self):
    method forward (line 66) | def forward(self, x):

FILE: training/train_c3d_pytorch.py
  function setup_logger (line 31) | def setup_logger(log_dir="./pytorch_logs"):
  class Luna16DataSet (line 58) | class Luna16DataSet(Dataset):
    method __init__ (line 59) | def __init__(self, files, labels, tranform =None):
    method __len__ (line 64) | def __len__(self):
    method __getitem__ (line 67) | def __getitem__(self, idx):
  function load_train_val_data (line 77) | def load_train_val_data(postive_dir, negative_dir):
  function train_model (line 98) | def train_model(model,train_loader, val_loader, optimizer, criterion, sc...
  function plot_metrics (line 216) | def plot_metrics(train_losses, val_losses, train_accs, val_accs, save_dir):
  function main (line 244) | def main():

FILE: util/dicom_util.py
  function is_dicom_file (line 12) | def is_dicom_file(filename):
  function get_dicom_thickness (line 26) | def get_dicom_thickness(dicom_slices):
  function load_dicom_slices (line 53) | def load_dicom_slices(dicom_path):
  function get_pixels_hu (line 85) | def get_pixels_hu(slices):
  function getinfo_dicom (line 105) | def getinfo_dicom(dicom_path):
  function extract_dicom_images_patient (line 127) | def extract_dicom_images_patient(dicom_path, target_dir):

FILE: util/image_util.py
  function get_normalized_img_unit8 (line 11) | def get_normalized_img_unit8(img):
  function load_patient_images (line 22) | def load_patient_images(png_path, wildcard="*.*", exclude_wildcards=[]):
  function draw_overlay (line 37) | def draw_overlay(png_path: str, p_x: float, p_y: float, p_z: float, inde...
  function prepare_image_for_net3D (line 61) | def prepare_image_for_net3D(img,MEAN_PIXEL_VALUE = 41):
  function move_png2dir (line 76) | def move_png2dir(target_dir):
  function rescale_patient_images (line 93) | def rescale_patient_images(images_zyx, org_spacing_xyz, target_voxel_mm ...
  function rescale_patient_images2 (line 141) | def rescale_patient_images2(images_zyx, target_shape, verbose=False):
  function resize_image (line 178) | def resize_image(image: np.ndarray, new_shape: Tuple[int, ...]) -> np.nd...
  function cv_flip (line 204) | def cv_flip(img,cols,rows,degree):
  function random_rotate_img (line 219) | def random_rotate_img(img, chance, min_angle, max_angle):
  function random_flip_img (line 248) | def random_flip_img(img, horizontal_chance=0, vertical_chance=0):
  function random_scale_img (line 283) | def random_scale_img(img, xy_range, lock_xy=False):
  class XYRange (line 325) | class XYRange:
    method __init__ (line 326) | def __init__(self, x_min, x_max, y_min, y_max, chance=1.0):
    method get_last_xy_txt (line 335) | def get_last_xy_txt(self):
  function random_translate_img (line 341) | def random_translate_img(img, xy_range, border_mode="constant"):
  function data_augmentation (line 368) | def data_augmentation(image: np.ndarray, augment_type: str = 'random') -...

FILE: util/mhd_util.py
  function get_all_mhd_file (line 17) | def get_all_mhd_file(BASE_DATA_DIR,base_head,max):
  function get_luna16_mhd_file (line 38) | def get_luna16_mhd_file(mhd_root):
  function read_csv_to_pandas (line 52) | def read_csv_to_pandas(mhd_info,col_sepator ='\t'):
  function extract_image_from_mhd (line 71) | def extract_image_from_mhd(mhd_file_path,png_save_path_root =None):

FILE: util/progress_watch.py
  class Stopwatch (line 2) | class Stopwatch(object):
    method start (line 3) | def start(self):
    method get_elapsed_time (line 5) | def get_elapsed_time(self):
    method get_elapsed_seconds (line 10) | def get_elapsed_seconds(self):
    method get_time (line 16) | def get_time():
    method start_new (line 21) | def start_new():

FILE: util/seg_util.py
  function normalize_hu_values (line 12) | def normalize_hu_values(image: np.ndarray, min_bound: int = -1000, max_b...
  function get_segmented_lungs (line 29) | def get_segmented_lungs(im, plot=False):
Condensed preview — 51 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (434K chars).
[
  {
    "path": "README.md",
    "chars": 7347,
    "preview": "# 3D肺结节检测系统\n\n## 0. 项目结构\n\n```\n├── data/               # 数据处理相关\n│   ├── dataclass/      # 数据类定义,包含NoduleCube结节立方体类\n│   └──"
  },
  {
    "path": "data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/dataclass/CTData.py",
    "chars": 11752,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport numpy as np\nimport SimpleITK as sitk\nfrom scipy import ndimage\nfrom"
  },
  {
    "path": "data/dataclass/NoduleCube.py",
    "chars": 13585,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os,torch\nimport numpy as np\nimport cv2\nfrom typing import  Optional"
  },
  {
    "path": "data/dataclass/__init__.py",
    "chars": 1,
    "preview": "\n"
  },
  {
    "path": "data/preprocessing/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/preprocessing/lidc_process/README.md",
    "chars": 5756,
    "preview": "# LIDC-IDRI 数据预处理\n\n本目录包含处理 [LIDC-IDRI (Lung Image Database Consortium and Image Database Resource Initiative)](https://w"
  },
  {
    "path": "data/preprocessing/lidc_process/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/preprocessing/lidc_process/lidc_annotation_process.py",
    "chars": 15231,
    "preview": "import pandas as pd\nimport math\nimport glob\nimport os\n\nfrom bs4 import BeautifulSoup\nfrom constant import luna\n\n# mhd  o"
  },
  {
    "path": "data/preprocessing/lidc_process/lidc_coordinate_process.py",
    "chars": 12223,
    "preview": "# -*- coding:utf-8 -*-\n'''\n a script to process result produced by lidc_annotation_process.py\n\n mainly focus on process "
  },
  {
    "path": "data/preprocessing/luna16_invalid_nodule_filter.py",
    "chars": 2047,
    "preview": "## 去掉 Luna2016 候选结节数据中 有问题的标注数据 以及 用在预测过程中的 错误结节\nimport numpy as np\ndef nodule_valid(ct_data, voxel_coord_x, voxel_coord"
  },
  {
    "path": "data/preprocessing/luna16_prepare_cube_data.py",
    "chars": 16510,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os\nimport pandas as pd\nfrom tqdm import tqdm\nimport multiprocessing"
  },
  {
    "path": "deploy/README.md",
    "chars": 1184,
    "preview": "# CT图像分析系统\n\n这是一个基于深度学习的肺部CT图像分析系统,可以检测肺结节并评估其恶性概率。\n\n## 系统功能\n\n- 支持多种CT数据格式(DICOM、MHD/RAW等)\n- 全3D肺部可视化展示(体积视图、横断面、冠状面、矢状面)"
  },
  {
    "path": "deploy/backend/app.py",
    "chars": 35428,
    "preview": "import os,cv2\nimport sys\nimport numpy as np\nfrom flask import Flask, request, jsonify, send_from_directory, send_file, R"
  },
  {
    "path": "deploy/backend/dataclass/CTData.py",
    "chars": 11752,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport numpy as np\nimport SimpleITK as sitk\nfrom scipy import ndimage\nfrom"
  },
  {
    "path": "deploy/backend/dataclass/NoduleCube.py",
    "chars": 13598,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os,torch\nimport numpy as np\nimport cv2\nfrom typing import  Optional"
  },
  {
    "path": "deploy/backend/dataclass/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "deploy/backend/detector.py",
    "chars": 13919,
    "preview": "import os\nimport sys\nimport time\nimport logging\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom threading impor"
  },
  {
    "path": "deploy/backend/models/pytorch_c3d_tiny.py",
    "chars": 2512,
    "preview": "import torch.nn as nn\nimport torchvision.transforms as transforms\n\n# my_tranform =transforms.Compose([\n#     # transform"
  },
  {
    "path": "deploy/backend/models/pytorch_nodule_detector.py",
    "chars": 15079,
    "preview": "import os\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom torch.nn import functional as F\nfrom datetime import "
  },
  {
    "path": "deploy/backend/preprocessing/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "deploy/backend/preprocessing/luna16_invalid_nodule_filter.py",
    "chars": 2048,
    "preview": "## 去掉 Luna2016 候选结节数据中 有问题的标注数据 以及 用在预测过程中的 错误结节\nimport numpy as np\n\ndef nodule_valid(ct_data, voxel_coord_x, voxel_coor"
  },
  {
    "path": "deploy/backend/util/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "deploy/backend/util/dicom_util.py",
    "chars": 5682,
    "preview": "import os\nimport glob\nimport pydicom\nimport numpy as np\nimport cv2\nfrom tqdm import tqdm\n\nfrom util.seg_util import get_"
  },
  {
    "path": "deploy/backend/util/image_util.py",
    "chars": 13519,
    "preview": "from typing import Tuple\n\nimport cv2\nimport os\nimport numpy\nimport glob\nimport random\nimport numpy as np\nfrom scipy impo"
  },
  {
    "path": "deploy/backend/util/mhd_util.py",
    "chars": 5339,
    "preview": "import os\nimport ntpath\nimport SimpleITK\nimport numpy as np\nimport pandas as pd\nimport cv2\n\nfrom data.dataclass.CTData i"
  },
  {
    "path": "deploy/backend/util/seg_util.py",
    "chars": 2445,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom scipy import ndimage as ndi\nfrom skimage.filters import roberts\n"
  },
  {
    "path": "deploy/backend/utils.py",
    "chars": 12081,
    "preview": "import os\nimport sys\nimport logging\nimport numpy as np\nimport SimpleITK as sitk\nimport tempfile\nimport zipfile\nimport sh"
  },
  {
    "path": "deploy/frontend/css/style.css",
    "chars": 13287,
    "preview": "/* 主要样式表 */\n\n/* 重置和全局样式 */\n* {\n    margin: 0;\n    padding: 0;\n    box-sizing: border-box;\n    font-family: 'Segoe UI', T"
  },
  {
    "path": "deploy/frontend/index.html",
    "chars": 5012,
    "preview": "<!DOCTYPE html>\n<html lang=\"zh-CN\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta name=\"viewport\" content=\"width=device-wi"
  },
  {
    "path": "deploy/frontend/js/main.js",
    "chars": 30235,
    "preview": "/**\n * CT 肺结节检测系统 主JS文件\n */\n\n// 全局变量\nlet currentSessionId = null;\nlet currentSliceIndex = 0;\nlet maxSliceIndex = 0;\nlet "
  },
  {
    "path": "deploy/run.py",
    "chars": 1749,
    "preview": "import os\nimport sys\nimport subprocess\nimport platform\nimport webbrowser\nimport time\n\ndef get_python_command():\n    \"\"\"获"
  },
  {
    "path": "inference/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "inference/c3d_classify_result-1.3.6.1.4.1.14519.5.2.1.6279.6001.149041668385192796520281592139.csv",
    "chars": 323,
    "preview": "patient_id,nodule_id,voxel_x,voxel_y,voxel_z,world_x,world_y,world_z,diameter_mm,prob\n1.3.6.1.4.1.14519.5.2.1.6279.6001."
  },
  {
    "path": "inference/classifier.py",
    "chars": 12152,
    "preview": "import numpy,pandas\nimport os\nfrom util import progress_watch\nfrom detector import extract_dicom_images_patient,get_papa"
  },
  {
    "path": "inference/detector.py",
    "chars": 22719,
    "preview": "from service import settings_jjyang\nimport cv2\nimport pandas\nimport os\nimport glob\nimport numpy\nfrom keras import backen"
  },
  {
    "path": "inference/negative_sample_selection.py",
    "chars": 10491,
    "preview": "import os\nimport sys\nimport numpy as np\nimport random\nimport glob\nfrom sklearn.cluster import KMeans\nimport matplotlib.p"
  },
  {
    "path": "inference/pytorch_nodule_detector.py",
    "chars": 15097,
    "preview": "import os\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom torch.nn import functional as F\nfrom datetime import "
  },
  {
    "path": "models/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "models/pytorch_c3d_tiny.py",
    "chars": 2512,
    "preview": "import torch.nn as nn\nimport torchvision.transforms as transforms\n\n# my_tranform =transforms.Compose([\n#     # transform"
  },
  {
    "path": "training/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "training/pytorch_logs/training_20250331_223230.log",
    "chars": 5089,
    "preview": "2025-03-31 22:32:30,594 - c3d_training - INFO - 训练集: 23640个样本\n2025-03-31 22:32:30,594 - c3d_training - INFO - 验证集: 5910个"
  },
  {
    "path": "training/train_c3d_pytorch.py",
    "chars": 10332,
    "preview": "import os\nimport glob\nimport random\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\nfr"
  },
  {
    "path": "util/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "util/dicom_util.py",
    "chars": 5682,
    "preview": "import os\nimport glob\nimport pydicom\nimport numpy as np\nimport cv2\nfrom tqdm import tqdm\n\nfrom util.seg_util import get_"
  },
  {
    "path": "util/image_util.py",
    "chars": 13519,
    "preview": "from typing import Tuple\nimport cv2\nimport os\nimport numpy\nimport glob\nimport random\nimport numpy as np\nfrom scipy impor"
  },
  {
    "path": "util/mhd_util.py",
    "chars": 5339,
    "preview": "import os\nimport ntpath\nimport SimpleITK\nimport numpy as np\nimport pandas as pd\nimport cv2\n\nfrom data.dataclass.CTData i"
  },
  {
    "path": "util/progress_watch.py",
    "chars": 599,
    "preview": "import datetime\nclass Stopwatch(object):\n    def start(self):\n        self.start_time = Stopwatch.get_time()\n    def get"
  },
  {
    "path": "util/seg_util.py",
    "chars": 2550,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nplt.rcParams['font.family'] = ['SimHei']  # 设置字体为黑体\nplt.rcParams['axe"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the shartoo/luna16_multi_size_3dcnn GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 51 files (67.9 MB), approximately 109.8k tokens, and a symbol index with 267 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!