Dealing with MRI and Deep Learning with Python | by Carla Pitarch Abaigar

[ad_1]

A Comprehensive Guide to MRI Analysis through Deep Learning models in PyTorch

First of all, I’d like to introduce myself. My name is Carla Pitarch, and I am a PhD candidate in AI. My research centers on developing an automated brain tumor grade classification system by extracting information from Magnetic Resonance Images (MRIs) using Deep Learning (DL) models, particularly Convolutional Neural Networks (CNNs).

At the start of my PhD journey, diving into MRI data and DL was a whole new world. The initial steps for running models in this realm were not as straightforward as expected. Despite spending some time researching in this domain, I found a lack of comprehensive repositories guiding the initiation into both MRI and DL. Therefore, I decided to share some of the knowledge I have gained over this period, hoping it makes your journey a tad smoother.

Embarking on Computer Vision (CV) tasks through DL often involves using standard public image datasets such as ImageNet , characterized by 3-channel RGB natural images. PyTorch models are primed for these specifications, expecting input images to be in this format. However, when our image data comes from a distinct domain, like the medical field, diverging in both format and features from these natural image datasets, it presents challenges. This post delves into this issue, emphasizing two crucial preparatory steps before model implementation: aligning the data with the model’s requirements and preparing the model to effectively process our data.

Let’s start with a brief overview of the fundamental aspects of CNNs and MRI.

Convolutional Neural Networks

In this section, we delve into the realm of CNNs, assuming readers have a foundational understanding of DL. CNNs stand as the gold standard architectures in CV, specializing in the processing of 2D and 3D input image data. Our focus within this post will be centered on the processing of 2D image data.

Image classification, associating output classes or labels with input images, is a core task in CNNs. The pioneering LeNet5 architecture introduced by LeCun et al.¹ in 1989 laid the groundwork for CNNs. This architecture can be summarized as follows:

CNN architecture with two convolution layers, two pooling layers, and one fully-connected layer previous to the output layer.

2D CNN architectures operate by receiving image pixels as input, expecting an image to be a tensor with shape Height x Width x Channels. Color images typically consist of 3 channels: red, green and blue (RGB), while grayscale images consist of a single channel.

A fundamental operation in CNNs is convolution, executed by applying a set of filters or kernels across all areas of the input data. The figure below shows an example of how convolution works in a 2D context.

Example of a convolution over a 5×5 image with a 3×3 filter that produces a 3×3 convolved feature.

The process involves sliding the filter across the image to the right and compute the weighted sum to obtain a convolved feature map. The output will represent whether a specific visual pattern, for instance an edge, is recognized at that location in the input image. Following each convolutional layer, an activation function introduces non-linearity. Popular choices include: ReLU (Rectified Linear Unit), Leaky ReLu, Sigmoid, Tanh, and Softmax. For further details on each activation function, this post provides clear explanations Activation Functions in Neural Networks | by SAGAR SHARMA | Towards Data Science.

Different types of layers contribute to the construction of CNNs, each playing a distinct role in defining the network’s functionality. Alongside convolutional layers, several other prominent layers used in CNNs include:

Pooling layers, like max-pooling or average-pooling, efffectively reduce feature map dimensions while preserving essential information.
Dropout layers are used to prevent overfitting by randomly deactivating a fraction of neurons during training, thereby enhancing the network’s generalization ability.
Batch normalization layers focus on standardizing inputs for each layer, which accelerates network training.
Fully connected (FC) layers establish connections between all neurons in one layer and all activations from the preceding layer, integrating learned features to facilitate final classifications.

CNNs learn to identify patterns hierarchically. Initial layers focus on low-level features, progressively moving to highly abstract features in deeper layers. Upon reaching the FC layer, the Softmax activation function estimates class probabilities for the image input.

Beyond LeNet’s inception, notable CNN architectures like AlexNet², GoogLeNet³, VGGNet⁴, ResNet⁵, and more recent Transformer⁶ have significantly contributed to the realm of DL.