Sitemap

Fine-Tuning an Image Classification Model in Seconds

Or how you can use a state-of-the-art model in less time than it takes to make coffee

5 min readApr 21, 2025

--

Every once in a while, I’m still amazed by how fast technology is moving.

Just a few years ago, training your own computer vision model felt like something reserved for tech giants or very persistent PhD students. You needed expensive GPUs, hours of tweaking, and often ended up with a model that could barely recognize a cat from a toaster. But not anymore.

Recently, I started diving into the FastAI course on deep learning (which I highly recommend, by the way), and the first lecture alone blew my mind. In a matter of seconds — literally seconds — I had a good high-performing image classifier fine-tuned and ready to go.

So, I decided to test it out myself with a little hands-on project: identifying musical instruments from images. Guitar, bass, piano. Simple idea, but a nice playground for learning.

If you want to run the code, you can check the notebook adapted from FastAI lectures here.

The first step was gathering a dataset. Thanks to FastAI’s built-in tools, I didn’t even need to leave the notebook.

We start by loading the appropriate libraries and defining a search_images function that leverages DuckDuckGo search:

from duckduckgo_search import DDGS
from fastcore.all import *
from fastai.vision.all import *
from fastdownload import download_url
import time, jsondef search_images(keywords, max_images=100):
return L(DDGS().images(keywords, max_results=max_images)).itemgot('image')

To download images, we can pass a list of the elements to search in the searches list below.

searches = 'guitar','bass guitar','piano'
path = Path('instruments')
for o in searches:
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'{o} photo'))
time.sleep(6)
resize_images(path/o, max_size=400, dest=path/o)

Some images may fail, so fastAI recommends that we take care of them as well:

failed = verify_images(get_image_files(path))
failed.map(Path.unlink)

Next, we’ll prepare a block of data — using the idea of chunking in the context of Deep Learning.

Once again, we’ll use fastAI to create a batch of 32 images to fine-tune our model.

dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)

To get a quick visual check, let’s display a sample of 6 images from our current batch below. This helps us confirm that the data is being loaded and processed correctly before we move on.

Note: If you run the notebook, the images you see may differ each time you run the code.

dls.show_batch(max_n=6)
Batch of 6 Images for our Fine Tuning

Now that our data is ready, it’s time to dive into fine-tuning our computer vision model!

We’ll be using ResNet-18 as the foundation for our model — a widely used, pre-trained convolutional neural network known for its strong performance in computer vision tasks.

ResNet18 Deep Learning Architecture

One of the major advantages of deep learning is that we rarely need to train models entirely from scratch. Instead, we can leverage transfer learning: by starting with a model like ResNet-18 that has already learned useful visual features from a large dataset (such as ImageNet), we only need to fine-tune it with a relatively small number of images from our specific domain.

This saves both time and computational resources while still achieving excellent results.

learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(10)

We’re starting to see signs of overfitting in our model — while the training loss may be decreasing slightly, the validation is not showing significant improvement. This suggests the model is memorizing the training data rather than learning generalizable patterns.

Despite this, it’s still valuable to test how the model performs in a real-world scenario.

Let’s move on to a practical evaluation: I’ll feed it a variety of product images sourced from the Guitar Center website and observe how well it handles these unseen examples.

This kind of qualitative testing gives us early insights into whether the model is capturing the right features and making sensible predictions outside of the training environment.

def check_prediction(image_url):
response = requests.get(image_url)
image = PILImage.create(BytesIO(response.content))
pred,_,probs = learn.predict(PILImage.create(image))
print(f"This is a: {pred}.")
print("Probability distribution:")
for c, p in zip(learn.dls.vocab, probs):
print(f" {c}: {p * 100:.2f}%")

Starting with the following image:

The outcome from our model is…

Many non-musicians struggle to tell the difference between a regular guitar and a bass guitar — they may look similar at a glance, but they serve very different roles in music.

This raises an interesting question: can our model pick up on the subtle visual cues that separate the two? Let’s put it to the test and see if our neural network is capable of making this distinction more accurately than the average person after identifying a bass guitar correctly.

Nice! Finally, let’s check a Piano:

Wrapping Up

Awesome! Our model is performing quite well — even with a relatively small dataset and minimal fine-tuning.

Computer vision is truly fascinating. With just a few lines of code and the right tools, we can build models that recognize and interpret complex images with impressive accuracy.

Want to Learn More?

Here are some great resources to help you continue your journey into deep learning and transfer learning:

Thanks for following along — hope you enjoyed building and testing our own computer vision model!

--

--

Ivo Bernardo
Ivo Bernardo

Written by Ivo Bernardo

I write about data science and analytics | Partner @ DareData | Instructor @ Udemy | also on thedatajourney.substack.com/ and youtube.com/@TheDataJourney42

Responses (2)