Skorch: Hyper-parameter Tuning with Pytorch

7 min readApr 3, 2024

An easy step-by-step tutorial on fine tuning pytorch convolutional neural network parameters with Skorch.

This article encompases a detailed tutorial on the application of hyper-parameter tuning of Convolutional Neural Network (CNN) parameters with Skorch for an image classification task .

With the challenge of finding the right value for most neural network parameters such as the number of neurons, kernel size or droupout rate while building a CNN model is tedious. Performing hyper-parameter tuning on these parameters would provide an optimal value to reach a certain goal for instance maximizing the accuracy score.

The aim of this article is to provide a detailed step-by-step guide on how to tune the values of a CNN model parameters implemented in Pytorch using Skorch. This technique will be demonstrated through an image classification task using a locally stored dataset.

Pytorch

PyTorch is an open-source machine learning library developed by Facebook’s AI Research lab (FAIR). It is a machine learning library based on the Torch library, mostly used to speed up the path from research prototyping to production deployment.

It offers a dynamic computational graph, which allows an easy model experimentation and debugging. PyTorch provides a flexible and intuitive interface . With its seamless integration with Python and strong support for GPU acceleration, PyTorch is widely utilized for various deep learning tasks, including computer vision, natural language processing, and reinforcement learning.

Skortch

Skortch represents a combination of Sklearn and Pytorch. The goal of skorch is to make it possible to use PyTorch with sklearn. Skorch allows you to use PyTorch models as if they were scikit-learn estimators, making it easier to integrate PyTorch into scikit-learn pipelines and workflows.

skorch documentation - skorch 0.15.0 documentation

A scikit-learn compatible neural network library that wraps PyTorch. The goal of skorch is to make it possible to use…

skorch.readthedocs.io

Skorch provides a high-level interface for training and deploying PyTorch models using familiar scikit-learn APIs. It simplifies the process of integrating PyTorch models into scikit-learn pipelines and workflows, making it easier for users to leverage PyTorch’s capabilities within their existing machine learning projects. However, this tutorial focuses on its application for hyper-parameter tuning.

Pip Installation

python -m pip install -U skorch

The focus of this tutorial involves classifying images. Image Classification involves detecting and categorizing specific objects. Our dataset contains images of 2 categories of animals : Elephants and Zebra equally balanced with 500 images per classes.

The framework of our implementation is defined as follow:

Data Preparation
Data preprocessing
Model architecture
Skorch model
Model parameters tuning
Best parameters generation

Now, let’s dive into the code.

Data Preparation

Transform your dataset to achieve the following format, where images are separated into train and validation sets within their respective folders, and the filepath column indicates the direct path to each image.

Data preprocessing

This step describe the data transformation stage and is composed of three main sections:

CustomDataset(Dataset) : loads and preprocesses data from CSV files containing image file paths and corresponding labels as shown aboveD.
Transform : Defines a sequence of transformations to be applied to the input images. In this case, the transformations include resizing the images to a fixed size of (224, 224) pixels, converting them to PyTorch tensors, and normalizing the pixel values using the specified mean and standard deviation.
Data Loading: train_dataset and val_dataset instances of the CustomDataset class are initiated . these dataset are set up with the training and validation CSV files, root directories, and transformation rules. Then, train_loader and val_loader are PyTorch's DataLoader objects.These loaders make it easy to iterate through the datasets during model training and validation.

Model Architecture

The architecture of our Convolutional Neural Network (CNN) model is defined as follows:

class SimpleModel(nn.Module):
    def __init__(self, num_classes, last_units=64, conv_kernel_size=5,dropout=0.3):
        super(SimpleModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 8, kernel_size=conv_kernel_size, stride=1, padding=2)
        self.conv2 = nn.Conv2d(8, 16, kernel_size=conv_kernel_size, stride=1, padding=2)
        self.fc1 = nn.Linear(16 * 56 * 56, last_units)  
        self.dropout = nn.Dropout(p=dropout)
        self.fc2 = nn.Linear(last_units, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))  
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x)) 
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, 16 * 56 * 56)
        x = self.dropout(x)
        x = F.sigmoid(self.fc1(x)) 
        x = self.fc2(x)
        return x

model = SimpleModel(num_classes=2, last_units=64,conv_kernel_size=5,dropout=0.1)   

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

This code defines a simple convolutional neural network (CNN) model named SimpleModel using PyTorch's nn.Module class. The model comprises two convolutional layers followed by max-pooling layers for downsampling, and two fully connected layers for binary classification. During initialization, it's configured for 2 output classes (representing "elephant" and "zebra"), the number of neurons of 64 in the fully connected layer, a kernel size of 5 for convolutional layers, and a dropout rate of 0.1 for regularization. Lastly, the model is equipped with a CrossEntropyLoss loss function and an Adam optimizer, ensuring efficient training and optimization.

Model Performance Without Hyper-parameter tuning

# Model training
def train_mpodel(model, train_loader, criterion, optimizer, num_epochs=5):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for images, labels in train_loader:
            images = images.to(device)  
            labels = labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * images.size(0)
        
        epoch_loss = running_loss / len(train_loader.dataset)
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {epoch_loss:.4f}")

train_model(model, train_loader, criterion, optimizer, num_epochs=10)

# Model evaluation
def evaluate_model(model, val_loader):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    val_accuracy = correct / total
    print(f"Validation Accuracy: {val_accuracy:.4f}")


evaluate_model(model, val_loader)

This model was trained on our datatset for 10 epochs, the training loss were stable with a score around 0.70. During the model evaluation, this model reached an accuracy score of 0.50, which is quite low. By leveraging this model parameter with hyper-parameter tuning, we could potentially acheive more satisfactory results.

Hyper parameter tuning with Sktorch

Skorch is a Python library that facilitates seamless integration between PyTorch and scikit-learn, enabling the use of PyTorch models within scikit-learn’s ecosystem. One of its key features is the ability to perform hyperparameter tuning using scikit-learn’s GridSearchCV.

Here’s a step-by-step guide on how to implement Skorch for hyperparameter tuning using GridSearchCV on a Pytorch CNN model.

Step 1 : Wrap the PyTorch model with Skorch

This step involves wraping the previously defined Pytorch model (simpleModel) with Skorch using NeuralNetClassifier. NeuralNetClassifier wrap the PyTorch module while providing an interface that should be familiar for sklearn users.

net = NeuralNetClassifier(
    module=SimpleModel,
    module__num_classes=num_classes,  
    criterion=nn.CrossEntropyLoss,
    optimizer=optim.Adam,
    optimizer__lr=0.001,
    batch_size=32,
    max_epochs=10
)

This code concises a way to define and train a neural network model for classification tasks using Skorch.

Step 2: Defining the Hyper-parameters

For this animal classification task, we are tuning the following parameters :

Droupout rate: specifies the dropout rate of the neural network model to prevent overfitting.
Kernel size: defines the kernel size of the convolutional layers in our neural network model.
Last_units : specifies the number of units (neurons) in the last fully connected layer of our neural network model.

params = {
    'module__dropout': [0.1, 0.4, 0.6],
    'module__conv_kernel_size': [2,3,4],
    'module__last_units': [64, 128]

}

Step 3: Performing grid search

# Perform grid search with cross-validation
gs = GridSearchCV(net, params, cv=3, scoring='accuracy', verbose=1)

# Train the model
gs.fit(X_train, y_train)

This code performs a grid search with cross-validation to tune the hyper-parameters of a Skorch neural network model (net). It searches over the specified parameter grid (params) and evaluates the models using accuracy as the scoring metric. The best model and its corresponding hyperparameters are determined based on the results of the grid search. The output of this training session is displayed below:

This image provides a comprehensive information regarding the training and validation loss, the number of epochs, the accuracy score achieved on the validation set, and the duration of implementation for each epoch during the cross-validation.

Step 4: Get the best model and its parameters

print("Best parameters found: ", gs.best_params_)
print("Best accuracy found: ", gs.best_score_)

This code allows us to access the best parameters set of our model on the image dataset and also the best accuarcy score.

On our dataset, we obtain:

Best parameters found: {‘module__conv_kernel_size’: 4, ‘module__dropout’: 0.4, ‘module__last_units’: 128}
Best accuracy found: 0.9699858259881915

This results highligths the efficiency of hyper-parameter tuning on Pytorch CNN model performance using Skorch.

In conlusion, this article explores the utilization of Skorch, a powerful library bridging PyTorch and scikit-learn, to streamline hyper-parameter tuning in PyTorch models. By fine tuning model parameters efficiently, ones can achieve state-of-the-art performance, thereby enhancing the effectiveness and efficiency of their PyTorch models. The implementation of Skorch on this animal image classification task is available on my github account within this repository.

Happy Learning Journey :) !