Detection of Bone Fractures

Justin Hickey - jhickey4, Geordie Young - gyoung7, Amelia Zug - azug, Brendan Leahey - bleahey

Intro:

The purpose of this project is to apply deep learning techniques to identify bone fractures in X-ray images. We are reimplementing a paper written by Kosrat Dlshad Ahmed and Roojwan Hawezi. In the paper, they identify that the typical scanner for X-rays produces a fuzzy picture of the bone component, leading surgeons to risk making an inaccurate diagnosis of bone fractures. The goal of the paper and our project is to better identify bone fractures and give doctors another opinion when diagnosing injuries. Key steps implemented include pre-processing, edge detection, feature extraction, and classification to identify fractures. We plan to apply more recent machine learning techniques such as deep convolutional neural networks to attain similar results to the original paper.

Related Works:

There have been multiple papers written about identifying fractures. There are a wide variety of techniques including using CNNs, transformers, etc. Some of these papers are: link link link The paper we are primarily using is by Kosrat Dlshad Ahmed and Roojwan Hawezi. X-rays are the primary way physicians identify fractures. However, X-ray pictures are not always the clearest images and therefore it can be difficult to identify fractures. The goal of the paper was to use various image preprocessing techniques and feature extraction to clarify and extract the features of the images and then use machine learning to identify if a fracture exists or not.

Data:

The dataset we are using is a Kaggle dataset containing bone fracture images. link The dataset is comprised of two directories for training and validation data. Each directory then has two nested directories containing fractured and non-fractured .jpg images. The dataset contains many images that are rotated translations of each other. The train directory contains around 4000 fractured and 4000 non-fractured images. The validation directory contains around 350 fractures and 250 non-fractured images.

Methodology:

Following the paper, our pipeline involves taking the image of the X-ray and passing it through 4 main modules: Pre-processing, Edge Detection, Feature Extraction, and Classification. First, we pre-process the image, which includes translating the image from RGB to Grayscale, noise removal using a Gaussian filter, and contrast improvement using adaptive histogram equalization. Then, we use Canny edge detection to extract the edges from the image. We then extract the features from the image using a Gray Level Co-occurrence Matrix. We will use 5 properties for the feature extraction including energy, correlation, dissimilarity, homogeneity, and contrast for 4 distances including 1,3,5, and 9, and in 7 angles including 0, 45, 90, 135, 180, 225, 270 degrees. This would give us a total of 140 features extracted per image. We then perform classification with the image features using an SVM since it has the best results out of the models tested in the paper. We are choosing to build on SVM and other machine learning techniques used in the paper and test the effectiveness of a deep convolutional neural network. We will perform the same preprocessing, passing the augmented data through a network that automatically extracts our edges and other features, outputting classification probabilities. We will begin by finetuning smaller architectures such as VGG16, a simple network with 5 convolutional blocks separated by pooling. As our results of the fine-tuning plateau, we plan to continue modifying and utilizing more robust networks with features such as residual connections as we see fit.

Metrics:

We aim to match or improve upon the original paper’s results. However, as a baseline, we would like to attain about 80% raw accuracy to be relatively consistent with the original paper. We note that the paper just uses raw accuracy, precision, and recall to evaluate the model’s outputs. While this is a good place to start, we believe that considering other metrics such as AUROC would better indicate model performance. For example, the dataset we plan to use is more balanced between fractured and non-fractured classes (the original paper contained only 60 fractured images and 210 non-fractured images). AUROC is less sensitive to class imbalance, which will give a more accurate assessment of whether our model is effectively classifying all classes. Attaining an AUROC score of 80-90% or more would be a good stretch goal. Further, providing a confusion matrix or other visualization methods such as feature maps can provide interpretable ways of assessing the model’s outputs.

Ethics:

The obvious stakeholders are patients affected by bone injuries caused by various conditions. For these patients, the accuracy of bone injury diagnosis can be a huge factor in the course of their treatment, as a neglected injury could be a sign of underlying conditions such as osteoporosis or other deficiencies. One concern is the sources of these bone images, which are not provided by Kaggle. Biological factors such as health, age, and sex may affect how bone injuries manifest. The ability of our network to effectively assess injuries may be affected by the demographics of the patients in the dataset, which is difficult to collect due to the consent requirements in medical data collection. Areas without consistent access to X-rays may be less effectively able to use the network’s outputs (https://www.theatlantic.com/health/archive/2016/09/radiology-gap/501803/). Despite the limitations of the data, the availability of a pre-trained bone injury assessment model is a promising development in accessibility to better medical care. In areas where finding a doctor to evaluate a bone injury may be difficult, less qualified experts may utilize an automated system such as this one to give quick feedback on a patient’s scan. Admittedly, X-rays are not the most affordable technology. However, the success of technologies like this represents a trend towards affordable automated assessments given data labeled by institutions with the resources for stronger staff on hand.

Division of labor:

Geordie has implemented data and image preprocessing. Justin will implement the feature extraction using GLCM and help with the machine learning algorithm implementation. Brendan and Amelia will work on implementing our desired deep-learning techniques on our extracted features.