研究目的
To develop a deep learning-based framework for the detection, localization, and tracking of surgical tools in cataract surgery videos to aid in surgical workflow analysis, report generation, and real-time decision support.
研究成果
The proposed deep learning framework effectively detects and localizes surgical tools in cataract surgery videos, with the tool counter achieving 84% accuracy and the CNN achieving 82% mean AUC. This can enhance surgical workflow analysis and support real-time decision-making. Future work should focus on online evaluation in operating rooms and extending to surgical phase prediction for smart context-aware environments.
研究不足
The framework is tested offline on prerecorded videos; online integration in operating rooms is not yet implemented. Class imbalance in the dataset requires extensive balancing and augmentation, which may not generalize to all surgical scenarios. Some tools are exclusively present in specific videos, limiting training set diversity. The method relies on pre-trained models, which may not fully capture domain-specific features.
1:Experimental Design and Method Selection:
The methodology involves using Convolutional Neural Networks (CNNs) for multi-label multi-class classification to detect and localize surgical tools. A two-stage framework is proposed: Stage I uses a tool counter network (ResNet-18) to predict the number of tools and localize them via activation maps, and Stage II uses a CNN to classify tool types from selected regions (glimpses). Baseline models (AlexNet, VGGNet, ResNet variants) are pretrained on ImageNet and fine-tuned. The loss function is MultiLabelSoftMarginLoss, and Stochastic Gradient Descent with momentum and weight decay is used for training.
2:Sample Selection and Data Sources:
The dataset from the Cataracts Grand Challenge consists of 50 videos of cataract surgeries, annotated for tool presence at 1 fps. Videos are divided into training (25 videos), validation, and test sets, ensuring tool representation balance. Frames are resized to 3x224x224 tensors.
3:List of Experimental Equipment and Materials:
No specific physical equipment is mentioned; the focus is on computational models and datasets (e.g., ImageNet dataset, surgical videos).
4:Experimental Procedures and Operational Workflow:
Data augmentation (horizontal/vertical flips, interpolation to 30 fps) and class balancing (weighting classes inversely to frequency) are applied. Training involves 100 epochs with learning rate adjustments based on loss. Performance is evaluated using area under the ROC curve (AUC).
5:Data Analysis Methods:
Performance metrics include mean AUC across tools. Statistical analysis involves comparing baseline and proposed method results, with tools absent in test sets ignored in averages.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容