{ "cells": [ { "cell_type": "markdown", "id": "5b4defbc-614b-46ba-8f01-43da527c9a96", "metadata": {}, "source": [ "# Image Datasets\n", "\n", "In this notebook we are going to explore how we can load images using PyTorch and transform them into a tensor representation so that they can be used with neural networks." ] }, { "cell_type": "code", "execution_count": null, "id": "a200161e-16a7-48af-8950-9b6218ed840b", "metadata": {}, "outputs": [], "source": [ "import os" ] }, { "cell_type": "markdown", "id": "f7f25d85-ba6c-4d49-bcfe-0f859f13a900", "metadata": {}, "source": [ "The images we are using in this notebook are stored in the directory `data/image_datasets/leaves`. This folder contains images of to different kinds of leaves. For each category there is a subfolder in this directory containing all the images of that category.\n", "\n", "In the following cell we will collect some information about the images." ] }, { "cell_type": "code", "execution_count": null, "id": "ba6de857-5ba0-4055-9115-65071d7d871a", "metadata": {}, "outputs": [], "source": [ "# This is the directroy where our images are stored\n", "image_dir = \"data/image_datasets/leaves\"\n", "\n", "# For each image category there is a separate folder in image the directory\n", "# We will collect some information about the categories here\n", "categories = list(map(lambda x : {\"name\": x, \"dir\": x}, os.listdir(image_dir)))\n", "n_categories = len(categories)\n", "\n", "# In this variable we will store the subdirectories of the image directory that contain the images\n", "# of one category\n", "category_dirs = []\n", "\n", "for i, category in enumerate(categories):\n", " category_dirs.append(os.path.join(image_dir, category[\"dir\"]))\n", "\n", "# Print some information about the images\n", "print('The dataset contains:')\n", "for i, category in enumerate(categories):\n", " print('\\u2022 {}: {:,} images'.format(category[\"name\"], len(os.listdir(category_dirs[i]))))" ] }, { "cell_type": "markdown", "id": "8a52811a-755c-4142-8b04-b9fc32b85a70", "metadata": {}, "source": [ "## Building a dataset with PyTorch\n", "\n", "Now we need to build a dataset with PyTorch's Dataset class. There is a special dataset class named *ImageFolder* in PyTorch that already provides functionality for loading images from a folder. It expects each image category to have its own subfolder like it is the case for the leaf images.\n", "\n", "When loading images we also need to perform some operations on them. Images are usualy stored as JPG or PNG files on our disks. However for training neural networks they must be converted into a bitmap format and then into a PyTorch tensor. Also all images must have the same size, because neural networks usually have a fixed size of neurons and cannot handle images of different shapes. Finally all images need to have the same number format which is usually a 32 bit floating point format. If you are interested in what this format is you can read more about it here\n", "\n", "You can also have a look at the documentation of ImageFolder and transformations:\n", "- ImageFolder datasets:\n", "https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html\n", "- Transforming Images:\n", "https://pytorch.org/vision/stable/transforms.html\n", "\n", "Let's start implementing a PyTorch image dataset.\n", "\n", "First we import all the necessary modules." ] }, { "cell_type": "code", "execution_count": null, "id": "7d822369-4635-4b68-bf0e-3d89a51fd81a", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import matplotlib.pyplot as plt\n", "from torchvision.datasets import ImageFolder\n", "from torchvision.transforms import v2" ] }, { "cell_type": "markdown", "id": "09116cea-c6e5-4207-83f5-470cc63eda6b", "metadata": {}, "source": [ "Next we build a transformation pipeline which resizes the images to a shape of 224x224 pixels which most classification networks are trained with. Then it transforms the images into a 32 bit floating point tensor." ] }, { "cell_type": "code", "execution_count": null, "id": "8411dffa-6026-45aa-abb4-dfc916dff321", "metadata": {}, "outputs": [], "source": [ "# When loading images we can send them through a pipeline that performs some transformations on them\n", "transform = v2.Compose([\n", " v2.Resize(size=(224, 224)),\n", " v2.ToTensor(),\n", " v2.ToDtype(torch.float32),\n", "])" ] }, { "cell_type": "markdown", "id": "dd287e85-641c-4831-afef-1172b60dcae8", "metadata": {}, "source": [ "We can now use the ImageFolder class of PyTorch to build our image dataset. It gets the location of our image folder and the transformation pipeline as an argument. Note that you don't need to define the number and the names of the available categories. PyTorch uses the subdirectories in your image dataset directory to find out these values." ] }, { "cell_type": "code", "execution_count": null, "id": "a6c264a2-d5e6-43ef-b18a-283a9ce98fb0", "metadata": {}, "outputs": [], "source": [ "# Create an instance of the dataset\n", "dataset = ImageFolder(image_dir, transform=transform)" ] }, { "cell_type": "markdown", "id": "fcbbe5a0-9679-4afc-af8d-9954824a6246", "metadata": {}, "source": [ "Now that we have defined the dataset we can explore the first image of it. You can handle the dataset class like an array, but you will get a tuple back. The first value of the tuple is the image, the second value the corresponding label. The labels are the numerical indices of our image categories. As we have two categories in out dataset there are two possible labels: `0` and `1`.\n", "\n", "The image tensor we get from our dataset has three dimensions. The first dimension represents the color channels of our image. Usually images are encoded using the RGB (red, green, blue) schema, so they have three color channels. The second and third dimension represent the width and height of the image.\n", "\n", "When we look at the tensor itself we will see that every pixel of every color channel is represented by a floating point number. As each number has 32 bit (which is equal to 4 bytes) and our tensor has 3\\*224\\*224=150528 numbers each image will consume 3\\*224\\*224\\*4=602 kB of memory when represented as a tensor. This is much more as they need when stored as JPG or PNG files on disk, because these formats compress the images by a lot." ] }, { "cell_type": "code", "execution_count": null, "id": "ca93cd7f-8523-4b84-8715-2689c778484f", "metadata": {}, "outputs": [], "source": [ "# This is how the loaded batch looks now in raw format\n", "image, label = dataset[0]\n", "print(\"Label\", label)\n", "print(\"Image shape\", image.shape)\n", "print(\"Image\", image)" ] }, { "cell_type": "markdown", "id": "5903fdc6-4bd5-432b-a91e-fa7b733883e7", "metadata": {}, "source": [ "We can have a look at the image tensor using matplotlib's imshow function. But we need to reformat the tensor for that. First matplotlib expects the color dimension to be represented by the third dimension of a tensor, but PyTorch uses the first dimension for the color channels. We can fix that using the `permute` method of PyTorch. Next matplotlib uses NumPy tensors instead of PyTorch tensors. So we have to convert the tensors to the NumPy format which can be done using the `numpy()`method of PyTorch tensors. Sometimes we also train models on a GPU, but matplotlib can only access CPU memory. This is why we need to call the `cpu()` method of the PyTorch tensor, which will create a copy of the original tensor in the CPU memory. \n", "\n", "Documentation:\n", "- `cpu()` https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html\n", "- `numpy()` https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html\n", "- `permute()` https://pytorch.org/docs/stable/generated/torch.permute.html#torch.permute" ] }, { "cell_type": "code", "execution_count": null, "id": "75837a40-8d27-467a-979f-e4c24baf97e7", "metadata": {}, "outputs": [], "source": [ "# Let's show the image using matplotlib\n", "plt.imshow(image.permute(1,2,0).cpu().numpy())" ] }, { "cell_type": "markdown", "id": "956cd8a6-1dbd-4c6c-a9b4-86ff55e19458", "metadata": {}, "source": [ "## Image transformations\n", "\n", "Neural networks have big problems when the datasets contain similar looking images. Also having too less images can be a huge problem when training a network on image classification. This is why image augmentation is used when loading the images. Aumenting images means performing operations like scaling or rotation on images so that they look different every time they are loaded. \n", "\n", "##### Your task:\n", "\n", "In the following cells you should build a pipeline that performs the following operations on the images:\n", "- Flip the images horizontally with a probability of 50%\n", "- Flip the images vertically with a probability of 50%\n", "- Perform an affine transformation (= scaling + rotation)\n", "- Perform a zoom out operation on the images\n", "- Finally resize them to the same shape of 224x224 pixels\n", "\n", "You can have a look on the illustration of image transformations here:\n", "\n", "https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_illustrations.html#sphx-glr-auto-examples-transforms-plot-transforms-illustrations-py" ] }, { "cell_type": "code", "execution_count": null, "id": "ee700b91-d5ec-46dc-863b-87e1f1a97d48", "metadata": {}, "outputs": [], "source": [ "# When loading images we can send them through a pipeline that performs some transformations on them\n", "# TODO implement the missing image augmentations\n", "transform = v2.Compose([\n", " v2._, # randomhorizontal flip\n", " v2._, # random vertical flip\n", " v2._, # random affine\n", " v2._, # random zoom put\n", " v2._, # resize to 224x224 pixels\n", " v2._, # transform to tensor\n", " v2._, # transform tensor to dtype torch.float32\n", "])" ] }, { "cell_type": "markdown", "id": "c0f667b0-82d8-4b03-b207-c9b7a406b2ce", "metadata": {}, "source": [ "Now execute the following cell multiple times. You will see that the image looks different every time you execute this cell. The position of the leaf and the zooming factor changes every time. This will help the neural network prevent from focusing on one area of the image only." ] }, { "cell_type": "code", "execution_count": null, "id": "50a8072b-b59d-4dd5-af64-d28a5a2449a8", "metadata": {}, "outputs": [], "source": [ "# TODO Create an instance of the dataset\n", "dataset = _\n", "\n", "# Load the first image and its label\n", "image, label = dataset[0]\n", "\n", "# Show the image using matplotlib\n", "plt.imshow(image.permute(1,2,0).cpu().numpy())" ] }, { "cell_type": "markdown", "id": "55c6e6b7-56d0-4c0b-8177-dd8641451899", "metadata": {}, "source": [ "## Build a data loader\n", "\n", "Now that we have built the dataset we can now build a data loader. A data loader loads multiple images at once and builds a single 4D tensor containing all the images. This tensor is also called a ***batch***. The ***batch size*** defines how many images are loaded at once. We need to split our dataset into batches because when training a neural network we often have not enough memory to load all images at once." ] }, { "cell_type": "code", "execution_count": null, "id": "13552d29-f56c-4230-8e7c-59602a821c0f", "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import DataLoader" ] }, { "cell_type": "code", "execution_count": null, "id": "64727afa-25eb-49e6-b2cf-12280052b0d7", "metadata": {}, "outputs": [], "source": [ "# TODO Create a data loader from our dataset.\n", "dataloader = _" ] }, { "cell_type": "code", "execution_count": null, "id": "3392f177-bce7-4029-86da-2bc1c3f7794b", "metadata": {}, "outputs": [], "source": [ "# TODO Load a batch of images and labels\n", "images, labels = _" ] }, { "cell_type": "markdown", "id": "2c3ad73d-c115-46da-aaaf-333440235fca", "metadata": {}, "source": [ "Let's have a look at the shape of the image tensor. It actually is a 4D tensor where the first dimensions is the batch site, the second dimension are the color channels of the images and the third and fourth dimension are the width and height of the images." ] }, { "cell_type": "code", "execution_count": null, "id": "5bbcfb24-cb86-4c46-ade0-4b08b7546262", "metadata": {}, "outputs": [], "source": [ "# Let's look at the shape of the image tensor\n", "images.shape" ] }, { "cell_type": "markdown", "id": "8e0f0d33-7120-46ff-a7e5-cfb036b1d228", "metadata": {}, "source": [ "Now again let's show both images of the batch that was loaded. You will see that the dataloader sometimes loads images from the first image category and sometimes from the second one. Also both categories can appear in the same batch. Execute the cell below mutiple times to see that the dataloader generates random batches." ] }, { "cell_type": "code", "execution_count": null, "id": "b45b8264-1ebf-4c49-8b75-e09f5aac816b", "metadata": {}, "outputs": [], "source": [ "# TODO Load batch of images and labels\n", "images, labels = _\n", "\n", "fix, ax = plt.subplots(1, 2, figsize=(16,8))\n", "ax[0].imshow(images[0].permute(1,2,0).cpu().numpy())\n", "ax[1].imshow(images[1].permute(1,2,0).cpu().numpy())" ] }, { "cell_type": "code", "execution_count": null, "id": "2f0dfe68-6647-4ec6-a74b-7553e538351a", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }