304 North Cardinal St.
Dorchester Center, MA 02124
A picture tells a thousand words. Perhaps this is why the number of images stored on our devices has increased dramatically. Not all of us are aware of the tendency to use emoticons instead of an overly long text response, which has made services like SnapChat, Instagram and TikTok one of the most popular platforms on the internet.
Organizing these images on our devices can be tedious, which is why cloud storage platforms from Google and Apple offer automatic image sorting. These are driven by advances in computer vision algorithms, particularly the development of machine learning (ML) methods. In this tutorial, we will try to create a simple image classifier and run it on a local machine with Python installed. To make it more interesting, we will try to tell jalebis from samosas!
The starting point for the development of the entire ML model is to obtain well-processed data. The data for this tutorial can be downloaded from the link here. Once the data is downloaded locally to your computer, try to extract the archive so that the folder structure is preserved. You should see the following content in the folder.
Browse the contents of the data folder and you should see two more folders with the names of the foods we are trying to classify. Additionally, the folder also contains Python code in the form of an interactive Python notebook (ipynb) that the reader can modify for later projects. Inside the data folder are some images of jalebis and samosas that were crawled from a website released under a Creative Commons license, allowing us to work with them.
Creating a dataset
Now that the data is downloaded, we will start using the Python program to create the image classifier. The recommended environment for this project is the Jupyter-Lab Python environment, but more advanced users can use any other integrated development environment. Once you’re in your favorite Python programming environment, load the initial libraries needed to read and visualize the images. Browse to the folder where the contents of the downloaded archive are located.
It should be noted that JPG images are stored as numeric fields and that each number represents the intensity of a given pixel. Typically, an intensity of 0 corresponds to black and 255 corresponds to white, and all numbers in between capture 256 shades of gray. For color images, there are three channels corresponding to the Red-Green-Blue (RGB) content of the images.
Using the image reading and viewing packages in the Matplotlib library, we can visualize different image channels.
We will need to standardize these images by using the amount of RGB content in each of the images to describe them to the algorithms. So the dataset we create will resize all these images to a fixed size and calculate the average amount of RGB in each image. So each image will be represented by only three functions.
As computers cannot fully comprehend what jalebis and samosas are, we will use numbers to represent their labels. For instance, a jalebi can be 0 and a samosa can be 1.
The final step in creating a dataset is stacking the two classes of data into one array. This is a requirement for the ML framework we will use in the next step.
Now we can imagine all the jalebis and samosas in the average RGB feature space. Each marker in this chart captures the amount of R, G, and B content in those frames. (see below)
Image classifier training
Now that our dataset is ready, we can train a simple image classifier to automate the detection of jalebis and samosas. We will use the Scikit-Learn Python package, which has a comprehensive set of tools and algorithms that can be used for ML.
The first step in training an ML algorithm is to partition the data set for training and testing purposes. As the name suggests, the training set is used to train the ML algorithm and the test set is used to test its performance. Performance on the test set is more representative of how the algorithm will perform on new real data. We use 75% of the data for training and the remaining 25% for testing purposes.
The next step is to load the Python modules from Scikit-Learn that will be used to initialize our simple classifier. Here we will use the simplest of the classifiers based on logistic regression. This classifier is constrained as a linear classifier and can be restrictive. However, these models can also be quite useful, and for this tutorial we will limit ourselves to using such linear models. We initialize the logistic regression classifier, train it on the training set, and test it on the test set.
For this training-test randomization, we get a training accuracy of about 70% and a test set accuracy of 77%. It should be noted that these numbers can vary quite a bit due to randomness in the randomization of the data and initial model parameters.
Once you train a classifier, we can use the trained model to make predictions on new data. A logistic regression classifier predicts a score based on how likely it thinks a certain image belongs to a certain group. This can be thought of as a probability score.
For now, we’ve explored how to create a simple image classifier that can distinguish jalebis from samosas. More sophisticated algorithms can be created using this template. A simple extension of this project could be to include other food categories or try more complex algorithms. For the latter, Scikit-Learn has well-documented algorithms with examples and could be a starting point.