AN INNOVATIVE HYBRID MODEL FOR ELBOW BONE FRACTURE DETECTION: INTEGRATING VIT AND CNN
Abstract
Elbow bone fractures can be quite difficult to detect correctly and if a fracture goes misdiagnosed, it can be treated improperly, resulting in long term complications. Existing approaches such as manual assessment and Convolutional Neural Networks (CNN) based models also struggle with detecting subtle fracture patterns, leading to the demand for more dependable diagnostic aids. Precise classification and fast detection of bone fractures are crucial for efficient clinical diagnosis. Then, traditional techniques, using Convolutional Neural Networks (CNNs), have achieved a great progress, but they still struggle in classifying the subtle fracture subtypes accurately. To alleviate these flaws, the approach in this paper introduces a Hybrid Vision Transformers Convolutional Neural Network (ViT-CNN) model that combines the feature extraction capabilities of CNN with the attention mechanisms of ViTs, resulting in high performance improvement. This hybrid model benefits from the advantages of both architectures, improving the accuracy and reliability of diagnostics. The performance of hybrid model is found better than traditional CNN based approaches with respect to accuracy, sensitivity and specificity. Focusing on subtle patterns of fracture, this model is a powerful resource for increasing fracture detection accuracy and assisting clinicians with accurate diagnosis. The results show that the hybrid ViT-CNN model has the potential to make a substantial positive impact on the future of bone fracture detection and subsequently patient outcomes. The overall aims were to assess the performance successes of this hybrid approach in identifying elbow bone fractures, and whether this may have scope to further improve clinician diagnostic accuracy.