Guitar Pedals Object Detection Model¶
Introduction¶
Data Science Workshop (20936) at the Open University of Israel by Noam Rosner (2024a)
Project Overview and Objectives¶
Guitar pedals play a crucial role in shaping sound and effects in music, used extensively by musicians to create unique audio experiences. However, identifying and managing these pedals can be challenging due to the vast array of designs, brands, and models available. Traditional methods of cataloging and identifying guitar pedals involve manual processes that are time-consuming and prone to errors. These inefficiencies can hinder the workflow of musicians, sound engineers, and collectors who rely on accurate and swift identification of their equipment.
To address this issue, this project aims to develop an advanced object detection model specifically tailored for guitar pedals. Utilizing a custom YOLOv8 model, the goal is to automatically detect and identify different guitar pedals from images. This technology will streamline the process of managing pedal collections, enhance inventory accuracy, and ultimately support musicians in optimizing their setup.
The project involves gathering a diverse dataset of guitar pedal images, annotating them meticulously, and fine-tuning the YOLOv8 model to achieve high detection accuracy. By leveraging cutting-edge machine learning techniques, this project seeks to bring efficiency and reliability to the identification of guitar pedals, benefiting the music industry as a whole.
Practical Applications¶
The development of an advanced object detection model for guitar pedals opens up numerous possibilities for its integration into mobile and desktop applications. By embedding this technology into apps, users can enjoy seamless identification and management of their guitar pedals, leading to various practical applications:
Real-Time Pedal Identification: Musicians can use their smartphones or tablets to quickly identify the type of pedal they are using by simply taking a picture. The app, powered by the YOLOv8 model, will analyze the image and provide instant information about the pedal, including its brand, model, and specific features. This can be particularly useful during performances, rehearsals, or recording sessions where quick and accurate identification is crucial.
Pedal Management and Inventory: Collectors and sound engineers can maintain a digital inventory of their guitar pedal collections. By photographing each pedal, the app can catalog the items automatically, reducing the need for manual entry and minimizing errors. This can streamline the organization and tracking of large collections, ensuring that every pedal is accounted for and easily accessible when needed.
Enhanced Online Marketplaces: For platforms that sell or trade guitar pedals, integrating this detection model can enhance user experience by allowing sellers to quickly list their pedals with accurate details. Buyers can also use the app to verify the authenticity and specifications of pedals before making a purchase, increasing trust and satisfaction in online transactions.
Challenges and Project Scope¶
The world of guitar pedals is vast and diverse, with numerous brands and models offering a wide range of effects and functionalities. This diversity poses significant challenges for developing a comprehensive object detection model capable of identifying every type of guitar pedal. Each brand has unique designs, colors, and labeling, which can complicate the training process for the model. Additionally, the sheer volume of available guitar pedal models would require an extensive and meticulously annotated dataset, which is both time-consuming and resource-intensive to create.
To demonstrate the feasibility and effectiveness of an object detection model for guitar pedals, this project will focus on a single brand of guitar pedals. By narrowing the scope, we can manage the dataset more efficiently and ensure high-quality annotations for each image. This approach allows us to fine-tune the YOLOv8 model to achieve optimal performance for the selected brand, providing a proof of concept that can be expanded in future projects.
Concentrating on one brand also helps mitigate the challenges associated with variability in pedal design and branding. It allows for a more controlled environment to test and refine the model, ensuring that it performs reliably within a defined scope. Once the model demonstrates success with the chosen brand, the methodology and insights gained can be applied to extend the project to include additional brands and models, ultimately aiming for a comprehensive solution for the entire spectrum of guitar pedals.
By addressing these challenges through a focused approach, this project aims to create a robust foundation for future expansion, showcasing the potential of object detection technology in the music industry.
import os
import random
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image_folder = '/content/drive/MyDrive/pedal-model/images/train/'
all_images = [os.path.join(image_folder, img) for img in os.listdir(image_folder) if img.endswith(('png', 'jpg', 'jpeg'))]
selected_images = random.sample(all_images, 9)
images_per_row = 3
fig, axes = plt.subplots(3, images_per_row, figsize=(15, 15))
axes = axes.flatten()
for ax, img_path in zip(axes, selected_images):
img = mpimg.imread(img_path)
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
Setup¶
from google.colab import drive
drive.mount('/content/drive')
ROOT_DIR = '/content/drive/MyDrive/pedal-model'
Data Collection¶
To build an effective object detection model for guitar pedals, a comprehensive and well-annotated dataset is essential. Given the challenge of sourcing a large number of images for a specific brand of guitar pedals, a custom approach was taken to collect and annotate the required data.
Custom Image Collection Script
A custom Python script was developed using Selenium and other web scraping tools to automate the collection of images from Google Photos. Selenium, a powerful tool for automating web browsers, was utilized to navigate through search results, capture images, and download them programmatically. This automated approach ensures a large and diverse set of images, covering various angles, lighting conditions, and settings in which guitar pedals are used.
The custom script performs the following steps:
Search Query Automation: The script initiates a search query for the specific brand and model of guitar pedals on Google Photos.
Image Scraping: It navigates through the search results, waits for images to load, and identifies image elements for extraction.
Image Downloading: Identified images are downloaded and stored in a structured directory, ready for the annotation process. The script handles both HTTP and data URL image sources.
Here’s an overview of part of the script that was used locally for the image collection:
import os
import time
import base64
import urllib.request
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.common.by import By
class Browser:
def __init__(self, url, record_file):
self.url = url
self.folder = os.getcwd()
self.current_folder = None
self.record_file = record_file
self.img_counter = 0
options = FirefoxOptions()
self.driver = webdriver.Firefox(options=options)
def read_downloaded_records(self):
if not os.path.exists(self.record_file):
with open(self.record_file, 'w') as file:
pass
return set()
else:
with open(self.record_file, 'r') as file:
return set(file.read().splitlines())
def download_images(self, current_folder, item_dict):
if not os.path.exists(current_folder):
os.makedirs(current_folder)
self.current_folder = current_folder
self.img_counter = len(os.listdir(os.path.join(os.getcwd(), current_folder))) + 1
downloaded_images = self.read_downloaded_records()
try:
self.driver.get("https://www.google.com")
input_bar = self.driver.find_element(By.CLASS_NAME, "gLFyf")
input_bar.click()
input_bar.send_keys(f"{item_dict['company']} {item_dict['item']}")
input_bar.send_keys(Keys.ENTER)
time.sleep(3)
input("Press Enter after the images have loaded: ")
images = self.driver.find_elements(By.CLASS_NAME, 'YQ4gaf')
print(f"Found {len(images)} images")
for img in images:
src = img.get_attribute("src")
if src and src not in downloaded_images:
log(f"Downloading this image - {src}")
if src.startswith("http"):
self.download_http_image(src)
elif src.startswith("data:image"):
self.download_data_image(src)
with open(self.record_file, 'a') as file:
file.write(src + "\n")
self.img_counter += 1
else:
log(f"Not downloading this image - {src}")
print(f"Downloaded {self.img_counter} images to {self.current_folder}")
except Exception as e:
log(f"error: {e}")
finally:
self.driver.quit()
def download_http_image(self, src):
filename = f"image_{self.img_counter}.jpg"
file_path = os.path.join(self.current_folder, filename)
print(f"Downloading {src}...")
urllib.request.urlretrieve(src, file_path)
def download_data_image(self, src):
encoded_data = src.split(',')[1]
decoded_data = base64.b64decode(encoded_data)
filename = f"image_{self.img_counter}.jpg"
img_filename = os.path.join(self.current_folder, filename)
with open(img_filename, 'wb') as file:
file.write(decoded_data)
Image Annotation Process¶
After collecting the images, the next critical step was to annotate them for training the object detection model. CVAT.ai (Computer Vision Annotation Tool) was chosen for this task due to its robust feature set and ease of use. CVAT is a powerful open-source tool designed specifically for annotating video and image datasets, making it ideal for this project.
The annotation process involved several key steps:
Uploading Images: The collected images were uploaded to CVAT, where they were organized into projects based on the specific guitar pedal models and variants. This organization helped streamline the annotation workflow and ensured that each model was adequately represented in the dataset.
Creating Annotation Tasks: Annotation tasks were created within CVAT to systematically cover all the images. These tasks were divided based on factors such as pedal model, angle of the shot, and lighting conditions. By breaking down the workload into manageable tasks, we ensured a thorough and consistent annotation process.
Bounding Box Annotations: Each image was meticulously annotated with bounding boxes around the guitar pedals. The bounding boxes were drawn to capture essential details such as the shape, size, and orientation of the pedals. Care was taken to ensure that the annotations were accurate, as these would directly influence the performance of the object detection model.
Quality Control: A quality control step was implemented where annotated images were reviewed to ensure accuracy and consistency. This review process was crucial for identifying and correcting any potential errors in the annotations, thereby maintaining the overall quality of the dataset.
Exporting Annotations: Once the annotations were finalized, they were exported in a format compatible with the YOLOv8 model, which was selected for training the object detection system. The exported annotations included the bounding box coordinates and labels, making them ready for direct use in the model training process.
While the initial image collection process was largely automated and completed within minutes, the annotation process required a much more deliberate and time-intensive approach. The custom script was able to gather a large dataset quickly, but the quality and effectiveness of the object detection model depended heavily on the precision of the annotations.
import os
import random
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image_folder = '/content/drive/MyDrive/pedal-model/examples/cvat'
all_images = [os.path.join(image_folder, img) for img in os.listdir(image_folder) if img.endswith(('png', 'jpg', 'jpeg'))]
selected_images = random.sample(all_images, 2)
images_per_row = 2
fig, axes = plt.subplots(1, images_per_row, figsize=(15, 15))
axes = axes.flatten()
for ax, img_path in zip(axes, selected_images):
img = mpimg.imread(img_path)
ax.imshow(img)
ax.axis('off')
plt.tight_layout()
plt.show()
Annotation Metrics Overview¶
Using CVAT.ai, the images were annotated over a total of 101.1 hours. This time reflects the manual effort required to ensure that each guitar pedal was accurately labeled, with bounding boxes precisely drawn to capture every detail.
The total object count reached 4,064, spread across multiple classes of guitar pedals.
The average annotation speed was 350.1 objects per hour, with daily performance varying based on the complexity of the images being annotated:
- May 15, 2024: ~300 objects per hour
- May 16, 2024: ~375 objects per hour
- May 17, 2024: ~350 objects per hour
- May 18, 2024: ~250 objects per hour
This steady pace highlights the contrast between the rapid image collection process and the more deliberate, time-intensive annotation process. While the script could gather images in a matter of minutes, annotating these images required a detailed, methodical approach to ensure that the dataset was of the highest quality possible. This careful balance between speed and precision is what ultimately sets the foundation for training an effective object detection model.
By combining automated image collection with detailed manual annotation, a high-quality dataset was prepared that lays a strong foundation for developing a robust object detection model for guitar pedals. The careful planning and execution of both the collection and annotation phases are expected to contribute significantly to the accuracy and effectiveness of the final model.
Exploratory Data Analysis¶
Exploratory Data Analysis (EDA) is an essential step in understanding our dataset. In this section, we will visualize the data, analyze the distribution of different classes of guitar pedals, and check for any anomalies or patterns that might be present.
Setup¶
!pip install umap-learn
import os
import json
import torch
import warnings
import cv2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from PIL import Image
from skimage.feature import local_binary_pattern
import umap
from collections import defaultdict
from tensorflow.keras.applications import VGG16
from sklearn.cluster import KMeans
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications import ResNet50, InceptionV3, EfficientNetB0, VGG16
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.inception_v3 import preprocess_input as inception_preprocess
from tensorflow.keras.applications.efficientnet import preprocess_input as efficientnet_preprocess
from tensorflow.keras.applications.vgg16 import preprocess_input as vgg_preprocess
from sklearn.metrics import pairwise_distances
from sklearn.metrics import silhouette_score, davies_bouldin_score
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import time
import seaborn as sns
from scipy.spatial.distance import pdist, squareform
model_class_names = [
"UltraViolet Vintage Vibe",
"Brig dBucket Delay",
"Cloudburst Ambient Reverb",
"BigSky Multi Reverb",
"TimeLine Multi Delay",
"Mobius Multi Modulation",
"Iridium Amp Modeler And Cab",
"Compadre Compressor & Boost",
"NightSky Experimental Reverb",
"Volante Magnetic Tape Delay",
"Zelzah Phaser & Modulation",
"Sunset Dual Overdrive",
"Riverside Drive & Distortion",
"blueSky V2 Reverb",
"Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay",
"El Capistan V2 Tape Delay",
"Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation",
"Ola Chorus & Vibrato",
"Orbit Flanger"
]
Sample Images¶
Let's start by displaying some sample images from our dataset to get a visual understanding of the guitar pedals we are working with.
image_dir = '/content/drive/MyDrive/pedal-model/images/train'
image_files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.jpg')]
num_images = min(len(image_files), 6)
num_cols = 3
num_rows = (num_images + num_cols - 1) // num_cols
fig, axs = plt.subplots(num_rows, num_cols, figsize=(15, 5 * num_rows))
for i, image_path in enumerate(image_files[:num_images]):
img = mpimg.imread(image_path)
ax = axs[i // num_cols, i % num_cols]
ax.imshow(img)
ax.axis('off')
for j in range(i + 1, num_rows * num_cols):
axs[j // num_cols, j % num_cols].axis('off')
plt.tight_layout()
plt.show()
Class Distribution¶
Next, we will analyze the distribution of different classes (types of guitar pedals) in our dataset. This is important to identify any imbalances that might affect the model's performance.
class_labels = {
0: "UltraViolet Vintage Vibe",
1: "Brig dBucket Delay",
2: "Cloudburst Ambient Reverb",
3: "BigSky Multi Reverb",
4: "TimeLine Multi Delay",
5: "Mobius Multi Modulation",
6: "Iridium Amp Modeler And Cab",
7: "Compadre Compressor & Boost",
8: "NightSky Experimental Reverb",
9: "Volante Magnetic Tape Delay",
10: "Zelzah Phaser & Modulation",
11: "Sunset Dual Overdrive",
12: "Riverside Drive & Distortion",
13: "blueSky V2 Reverb",
14: "Deco V2 Tape Saturation & Doubletracker",
15: "DIG V2 Dual Digital Delay",
16: "El Capistan V2 Tape Delay",
17: "Flint V2 Tremolo & Reverb",
18: "Lex V2 Rotary Modulation",
19: "Ola Chorus & Vibrato",
20: "Orbit Flanger"
}
def verify_split():
train_images_path = '/content/drive/MyDrive/pedal-model/images/train'
train_annotations_path = '/content/drive/MyDrive/pedal-model/labels/train'
val_images_path = '/content/drive/MyDrive/pedal-model/images/validation'
val_annotations_path = '/content/drive/MyDrive/pedal-model/labels/validation'
test_images_path = '/content/drive/MyDrive/pedal-model/images/test'
test_annotations_path = '/content/drive/MyDrive/pedal-model/labels/test'
def count_files_and_objects(image_folder_path, annotation_folder_path):
file_count = len([f for f in os.listdir(image_folder_path) if os.path.isfile(os.path.join(image_folder_path, f))])
object_counts = defaultdict(int)
annotation_files = [f for f in os.listdir(annotation_folder_path) if f.endswith(('.txt'))]
for filename in annotation_files:
annotation_file_path = os.path.join(annotation_folder_path, filename)
if os.path.exists(annotation_file_path):
with open(annotation_file_path, 'r') as file:
for line in file:
obj_class = line.split()[0]
object_counts[obj_class] += 1
return file_count, object_counts
train_count, train_objects = count_files_and_objects(train_images_path, train_annotations_path)
val_count, val_objects = count_files_and_objects(val_images_path, val_annotations_path)
test_count, test_objects = count_files_and_objects(test_images_path, test_annotations_path)
def create_dataframe(object_counts):
data = {class_labels[int(k)]: v for k, v in object_counts.items()}
return pd.DataFrame.from_dict(data, orient='index', columns=['Count']).sort_index()
train_df = create_dataframe(train_objects)
val_df = create_dataframe(val_objects)
test_df = create_dataframe(test_objects)
print(f'Training set: {train_count} images')
print(f'Validation set: {val_count} images')
print(f'Test set: {test_count} images')
return {
'train_count': train_count,
'val_count': val_count,
'test_count': test_count,
'train_df': train_df,
'val_df': val_df,
'test_df': test_df
}
data_info = verify_split()
display(data_info['train_df'].T)
display(data_info['val_df'].T)
display(data_info['test_df'].T)
Training set: 2609 images Validation set: 727 images Test set: 448 images
BigSky Multi Reverb | Brig dBucket Delay | Cloudburst Ambient Reverb | Compadre Compressor & Boost | DIG V2 Dual Digital Delay | Deco V2 Tape Saturation & Doubletracker | El Capistan V2 Tape Delay | Flint V2 Tremolo & Reverb | Iridium Amp Modeler And Cab | Lex V2 Rotary Modulation | ... | NightSky Experimental Reverb | Ola Chorus & Vibrato | Orbit Flanger | Riverside Drive & Distortion | Sunset Dual Overdrive | TimeLine Multi Delay | UltraViolet Vintage Vibe | Volante Magnetic Tape Delay | Zelzah Phaser & Modulation | blueSky V2 Reverb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Count | 162 | 101 | 162 | 135 | 127 | 162 | 156 | 119 | 160 | 74 | ... | 25 | 145 | 93 | 181 | 184 | 244 | 126 | 105 | 98 | 121 |
1 rows × 21 columns
BigSky Multi Reverb | Brig dBucket Delay | Cloudburst Ambient Reverb | Compadre Compressor & Boost | DIG V2 Dual Digital Delay | Deco V2 Tape Saturation & Doubletracker | El Capistan V2 Tape Delay | Flint V2 Tremolo & Reverb | Iridium Amp Modeler And Cab | Lex V2 Rotary Modulation | ... | NightSky Experimental Reverb | Ola Chorus & Vibrato | Orbit Flanger | Riverside Drive & Distortion | Sunset Dual Overdrive | TimeLine Multi Delay | UltraViolet Vintage Vibe | Volante Magnetic Tape Delay | Zelzah Phaser & Modulation | blueSky V2 Reverb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Count | 47 | 30 | 46 | 36 | 37 | 52 | 42 | 44 | 44 | 21 | ... | 8 | 41 | 26 | 49 | 57 | 64 | 38 | 28 | 20 | 40 |
1 rows × 21 columns
BigSky Multi Reverb | Brig dBucket Delay | Cloudburst Ambient Reverb | Compadre Compressor & Boost | DIG V2 Dual Digital Delay | Deco V2 Tape Saturation & Doubletracker | El Capistan V2 Tape Delay | Flint V2 Tremolo & Reverb | Iridium Amp Modeler And Cab | Lex V2 Rotary Modulation | ... | NightSky Experimental Reverb | Ola Chorus & Vibrato | Orbit Flanger | Riverside Drive & Distortion | Sunset Dual Overdrive | TimeLine Multi Delay | UltraViolet Vintage Vibe | Volante Magnetic Tape Delay | Zelzah Phaser & Modulation | blueSky V2 Reverb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Count | 21 | 15 | 23 | 20 | 22 | 25 | 20 | 19 | 26 | 9 | ... | 6 | 21 | 14 | 28 | 25 | 36 | 19 | 16 | 16 | 17 |
1 rows × 21 columns
The following bar plot illustrates the overall distribution of images across the training, validation, and test datasets. By comparing the total number of images in each dataset, we can assess whether the data split is appropriate for training, validation, and testing phases, helping to ensure a robust evaluation of the model's performance.
def plot_dataset_distribution(train_count, val_count, test_count):
data = {
'Dataset': ['Training', 'Validation', 'Test'],
'Count': [train_count, val_count, test_count],
'Color': ['Training', 'Validation', 'Test']
}
df = pd.DataFrame(data)
fig, ax = plt.subplots(figsize=(8, 6))
sns.barplot(x='Dataset', y='Count', data=df, hue='Color', palette=['#1f77b4', '#2ca02c', '#ff7f0e'], ax=ax, legend=False)
ax.set_title('Dataset Distribution')
ax.set_xlabel('Dataset')
ax.set_ylabel('Number of Images')
plt.tight_layout()
plt.show()
plot_dataset_distribution(data_info['train_count'], data_info['val_count'], data_info['test_count'])
The graphs below display the distribution of images across the training, validation, and test datasets. Each graph represents the number of images containing guitar pedal instances for each class within the respective dataset. This visualization provides a clear overview of how the data is balanced across different classes and datasets, ensuring that the model is exposed to a diverse set of examples during training and evaluation.
def plot_class_distributions_side_by_side(train_df, val_df, test_df):
fig, axes = plt.subplots(1, 3, figsize=(18, 8), sharey=True)
sns.barplot(y=train_df.index, x='Count', data=train_df, hue=train_df.index, palette='Blues_d', ax=axes[0], legend=False)
axes[0].set_title('Training Set')
axes[0].set_xlabel('Number of Images')
axes[0].set_ylabel('Class Labels')
sns.barplot(y=val_df.index, x='Count', data=val_df, hue=val_df.index, palette='Greens_d', ax=axes[1], legend=False)
axes[1].set_title('Validation Set')
axes[1].set_xlabel('Number of Images')
axes[1].set_ylabel('')
sns.barplot(y=test_df.index, x='Count', data=test_df, hue=test_df.index, palette='Oranges_d', ax=axes[2], legend=False)
axes[2].set_title('Test Set')
axes[2].set_xlabel('Number of Images')
axes[2].set_ylabel('')
plt.tight_layout()
plt.show()
plot_class_distributions_side_by_side(data_info['train_df'], data_info['val_df'], data_info['test_df'])
Bounding Box Analysis¶
In this section, we dive into the bounding boxes in our dataset to better understand their dimensions and aspect ratios. By analyzing these characteristics, we aim to determine whether any insights can help us optimize our model's ability to detect the guitar pedals. We will extract the widths, heights, and aspect ratios from the annotation files and visualize their distributions to gain a clearer picture of the data.
The three histograms below show the distribution of bounding box widths, heights, and aspect ratios across the combined dataset. These visualizations can help identify any patterns or anomalies in the bounding box dimensions, potentially guiding further adjustments to the model or preprocessing steps.
def analyze_bounding_boxes(annotation_files):
widths = []
heights = []
aspect_ratios = []
for annotation_file in annotation_files:
if os.path.exists(annotation_file):
with open(annotation_file, 'r') as file:
for line in file:
_, x_center, y_center, width, height = map(float, line.split())
widths.append(width)
heights.append(height)
aspect_ratios.append(width / height)
return widths, heights, aspect_ratios
def plot_bounding_box_analysis_combined(widths, heights, aspect_ratios):
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
sns.histplot(widths, bins=30, color='blue', ax=axes[0])
axes[0].set_title(f'Distribution of Bounding Box Widths (Combined)')
axes[0].set_xlabel('Width')
axes[0].set_ylabel('Frequency')
sns.histplot(heights, bins=30, color='green', ax=axes[1])
axes[1].set_title(f'Distribution of Bounding Box Heights (Combined)')
axes[1].set_xlabel('Height')
axes[1].set_ylabel('Frequency')
sns.histplot(aspect_ratios, bins=30, color='purple', ax=axes[2])
axes[2].set_title(f'Distribution of Bounding Box Aspect Ratios (Combined)')
axes[2].set_xlabel('Aspect Ratio')
axes[2].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
def analyze_and_plot_combined_all(annotation_paths):
combined_widths = []
combined_heights = []
combined_aspect_ratios = []
for annotation_path in annotation_paths:
annotation_files = [os.path.join(annotation_path, f) for f in os.listdir(annotation_path) if f.endswith('.txt')]
widths, heights, aspect_ratios = analyze_bounding_boxes(annotation_files)
combined_widths.extend(widths)
combined_heights.extend(heights)
combined_aspect_ratios.extend(aspect_ratios)
plot_bounding_box_analysis_combined(combined_widths, combined_heights, combined_aspect_ratios)
train_annotation_path = '/content/drive/MyDrive/pedal-model/labels/train'
val_annotation_path = '/content/drive/MyDrive/pedal-model/labels/validation'
test_annotation_path = '/content/drive/MyDrive/pedal-model/labels/test'
combined_annotation_paths = [train_annotation_path, val_annotation_path, test_annotation_path]
analyze_and_plot_combined_all(combined_annotation_paths)
In this section, we analyze the bounding boxes for each class by combining data from the training, validation, and test sets. By examining the distribution of bounding box widths, heights, and aspect ratios for each class, we aim to identify any patterns or variations that may be relevant for model training. This combined analysis helps us understand the overall characteristics of each class in the dataset and ensures that the model is trained on a comprehensive and balanced set of examples.
def analyze_bounding_boxes_by_class_combined(annotation_paths, class_id):
widths = []
heights = []
aspect_ratios = []
for annotation_path in annotation_paths:
annotation_files = [os.path.join(annotation_path, f) for f in os.listdir(annotation_path) if f.endswith('.txt')]
for annotation_file in annotation_files:
if os.path.exists(annotation_file):
with open(annotation_file, 'r') as file:
for line in file:
class_idx, x_center, y_center, width, height = map(float, line.split())
if int(class_idx) == class_id:
widths.append(width)
heights.append(height)
aspect_ratios.append(width / height)
return widths, heights, aspect_ratios
def plot_bounding_box_analysis_by_class_combined(widths, heights, aspect_ratios, class_label):
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
sns.histplot(widths, bins=30, color='blue', ax=axes[0])
axes[0].set_title(f'Distribution of Bounding Box Widths ({class_label})')
axes[0].set_xlabel('Width')
axes[0].set_ylabel('Frequency')
sns.histplot(heights, bins=30, color='green', ax=axes[1])
axes[1].set_title(f'Distribution of Bounding Box Heights ({class_label})')
axes[1].set_xlabel('Height')
axes[1].set_ylabel('Frequency')
sns.histplot(aspect_ratios, bins=30, color='purple', ax=axes[2])
axes[2].set_title(f'Distribution of Bounding Box Aspect Ratios ({class_label})')
axes[2].set_xlabel('Aspect Ratio')
axes[2].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
def analyze_and_plot_for_each_class_combined(annotation_paths, class_labels):
for class_id, class_label in class_labels.items():
widths, heights, aspect_ratios = analyze_bounding_boxes_by_class_combined(annotation_paths, class_id)
plot_bounding_box_analysis_by_class_combined(widths, heights, aspect_ratios, class_label)
class_labels = {
0: "UltraViolet Vintage Vibe",
1: "Brig dBucket Delay",
2: "Cloudburst Ambient Reverb",
3: "BigSky Multi Reverb",
4: "TimeLine Multi Delay",
5: "Mobius Multi Modulation",
6: "Iridium Amp Modeler And Cab",
7: "Compadre Compressor & Boost",
8: "NightSky Experimental Reverb",
9: "Volante Magnetic Tape Delay",
10: "Zelzah Phaser & Modulation",
11: "Sunset Dual Overdrive",
12: "Riverside Drive & Distortion",
13: "blueSky V2 Reverb",
14: "Deco V2 Tape Saturation & Doubletracker",
15: "DIG V2 Dual Digital Delay",
16: "El Capistan V2 Tape Delay",
17: "Flint V2 Tremolo & Reverb",
18: "Lex V2 Rotary Modulation",
19: "Ola Chorus & Vibrato",
20: "Orbit Flanger"
}
train_annotation_path = '/content/drive/MyDrive/pedal-model/labels/train'
val_annotation_path = '/content/drive/MyDrive/pedal-model/labels/validation'
test_annotation_path = '/content/drive/MyDrive/pedal-model/labels/test'
combined_annotation_paths = [train_annotation_path, val_annotation_path, test_annotation_path]
analyze_and_plot_for_each_class_combined(combined_annotation_paths, class_labels)
After thoroughly analyzing the bounding boxes across all classes and datasets, we observed that the dimensions and aspect ratios of the guitar pedals remained relatively consistent. This result aligns with our expectations, as most guitar pedals share a similar rectangular shape. Consequently, we did not uncover any significant patterns or variations that would require adjustments to our model or preprocessing steps. While this analysis confirms that our dataset is uniform in terms of bounding box characteristics, it also suggests that further optimizations may need to focus on other aspects, such as feature extraction or model architecture, rather than on the bounding box dimensions themselves.
Color Analysis¶
In this section, we focus exclusively on analyzing the color characteristics of the guitar pedals. By examining color distributions and average color intensities across different classes, we aim to understand how color differentiates one class from another. This analysis can help identify whether color is a strong feature for classification and how it might contribute to model performance.
def analyze_color_distribution(image, bbox):
x1, y1, x2, y2 = bbox
roi = image[y1:y2, x1:x2]
color = ('b', 'g', 'r')
hist_data = {}
for i, col in enumerate(color):
hist = cv2.calcHist([roi], [i], None, [256], [0, 256])
hist_data[col] = hist
return hist_data
def calculate_color_density(image, bbox):
x1, y1, x2, y2 = bbox
roi = image[y1:y2, x1:x2]
mean_color = cv2.mean(roi)
return mean_color
def load_image_with_retry(img_path, retries=3):
for _ in range(retries):
image = cv2.imread(img_path)
if image is not None:
return image
return None
train_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json'
with open(train_annotations_path) as f:
annotations = json.load(f)
color_density_per_class = defaultdict(list)
for img_ann in annotations['images']:
img_path = os.path.join('/content/drive/MyDrive/pedal-model/images/train', img_ann['file_name'])
image = load_image_with_retry(img_path)
if image is None:
continue
for ann in annotations['annotations']:
if ann['image_id'] == img_ann['id']:
bbox = ann['bbox']
bbox = [int(bbox[0]), int(bbox[1]), int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3])]
class_id = ann['category_id'] - 1
mean_color = calculate_color_density(image, bbox)
color_density_per_class[class_id].append(mean_color)
channel_averages = {
"class": [],
"type": [],
"value": []
}
for class_id, class_name in enumerate(model_class_names):
color_densities = color_density_per_class[class_id]
if color_densities:
avg_color_density = np.mean(color_densities, axis=0)[:3]
for i, channel in enumerate(["Blue", "Green", "Red"]):
channel_averages["class"].append(class_name)
channel_averages["type"].append(channel)
channel_averages["value"].append(avg_color_density[i])
channel_averages_df = pd.DataFrame(channel_averages)
plt.figure(figsize=(12, 6))
sns.barplot(x="class", y="value", hue="type", data=channel_averages_df, palette="muted")
plt.xticks(rotation=90)
plt.xlabel("Class")
plt.ylabel("Average Channel Intensity")
plt.title("Average Color Intensity Per Class (Training Set)")
plt.legend(title="Channel", loc="upper right")
plt.tight_layout()
plt.show()
The color analysis confirms that guitar pedals do exhibit different colors across the various classes. However, the variations in color are not significant enough to provide a unique or distinguishing feature for any specific class. While color differences exist, they are not pronounced enough to serve as a primary factor for classification. This suggests that other features, such as shape or texture, may play a more critical role in distinguishing between classes.
Feature Extraction¶
In this section we focus on the process of feature extraction, a critical step in preparing our object detection model. Feature extraction involves identifying and isolating relevant patterns and attributes from raw images to create a set of descriptive features. These features serve as the foundation for clustering and classification, enabling the model to distinguish between different classes effectively.
We began by leveraging pre-trained convolutional neural networks (CNNs) such as ResNet50, InceptionV3, EfficientNetB0, and VGG16 to extract deep features from the images. These models, trained on large-scale datasets, allow us to utilize their learned representations to capture complex patterns in the images, such as textures, shapes, and colors, which are crucial for differentiating between the various classes of guitar pedals in our dataset.
resnet_model = ResNet50(weights='imagenet', include_top=False)
inception_model = InceptionV3(weights='imagenet', include_top=False)
efficientnet_model = EfficientNetB0(weights='imagenet', include_top=False)
vgg_model = VGG16(weights='imagenet', include_top=False)
def extract_features(img, bbox, model, preprocess_input):
x1, y1, x2, y2 = bbox
roi = img[y1:y2, x1:x2]
roi_resized = cv2.resize(roi, (224, 224))
img_data = np.expand_dims(roi_resized, axis=0)
img_data = preprocess_input(img_data)
features = model.predict(img_data)
return features.flatten()
all_features_resnet = []
all_features_inception = []
all_features_efficientnet = []
all_features_vgg = []
train_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json'
with open(train_annotations_path) as f:
annotations = json.load(f)
for img_ann in annotations['images']:
img_path = os.path.join('/content/drive/MyDrive/pedal-model/images/train', img_ann['file_name'])
if os.path.exists(img_path):
img = cv2.imread(img_path)
if img is None:
continue
for ann in annotations['annotations']:
if ann['image_id'] == img_ann['id']:
bbox = ann['bbox']
bbox = [int(bbox[0]), int(bbox[1]), int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3])]
features_resnet = extract_features(img, bbox, resnet_model, resnet_preprocess)
features_inception = extract_features(img, bbox, inception_model, inception_preprocess)
features_efficientnet = extract_features(img, bbox, efficientnet_model, efficientnet_preprocess)
features_vgg = extract_features(img, bbox, vgg_model, vgg_preprocess)
all_features_resnet.append(features_resnet)
all_features_inception.append(features_inception)
all_features_efficientnet.append(features_efficientnet)
all_features_vgg.append(features_vgg)
all_features_resnet = np.array(all_features_resnet)
all_features_inception = np.array(all_features_inception)
all_features_efficientnet = np.array(all_features_efficientnet)
all_features_vgg = np.array(all_features_vgg)
np.save('/content/drive/MyDrive/pedal-model/features_resnet.npy', all_features_resnet)
np.save('/content/drive/MyDrive/pedal-model/features_inception.npy', all_features_inception)
np.save('/content/drive/MyDrive/pedal-model/features_efficientnet.npy', all_features_efficientnet)
np.save('/content/drive/MyDrive/pedal-model/features_vgg.npy', all_features_vgg)
features_resnet = np.load('/content/drive/MyDrive/pedal-model/features_resnet.npy')
features_inception = np.load('/content/drive/MyDrive/pedal-model/features_inception.npy')
features_efficientnet = np.load('/content/drive/MyDrive/pedal-model/features_efficientnet.npy')
features_vgg = np.load('/content/drive/MyDrive/pedal-model/features_vgg.npy')
train_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json'
with open(train_annotations_path) as f:
annotations = json.load(f)
object_classes = []
for img_ann in annotations['images']:
for ann in annotations['annotations']:
if ann['image_id'] == img_ann['id']:
class_id = ann['category_id']
object_classes.append(model_class_names[class_id - 1])
By combining features from multiple models, we aim to maximize the diversity and richness of the extracted features. This comprehensive approach ensures that our analysis captures the nuances of the data, leading to a more robust understanding of the inherent structure within the dataset. As we move forward, we will use these features to perform clustering, analyze class separability, and address any challenges related to overlapping classes.
combined_features = np.hstack([features_resnet, features_inception, features_efficientnet, features_vgg])
# Silhouette Score for combined features
silhouette_combined = silhouette_score(combined_features, object_classes)
print(f'Silhouette Score (Combined Features): {silhouette_combined}')
# Davies-Bouldin Index for combined features
dbi_combined = davies_bouldin_score(combined_features, object_classes)
print(f'Davies-Bouldin Index (Combined Features): {dbi_combined}')
# Silhouette Score and Davies-Bouldin Index for each set of features
silhouette_resnet = silhouette_score(features_resnet, object_classes)
dbi_resnet = davies_bouldin_score(features_resnet, object_classes)
print(f'ResNet50 - Silhouette Score: {silhouette_resnet}, Davies-Bouldin Index: {dbi_resnet}')
silhouette_inception = silhouette_score(features_inception, object_classes)
dbi_inception = davies_bouldin_score(features_inception, object_classes)
print(f'InceptionV3 - Silhouette Score: {silhouette_inception}, Davies-Bouldin Index: {dbi_inception}')
silhouette_efficientnet = silhouette_score(features_efficientnet, object_classes)
dbi_efficientnet = davies_bouldin_score(features_efficientnet, object_classes)
print(f'EfficientNetB0 - Silhouette Score: {silhouette_efficientnet}, Davies-Bouldin Index: {dbi_efficientnet}')
silhouette_vgg = silhouette_score(features_vgg, object_classes)
dbi_vgg = davies_bouldin_score(features_vgg, object_classes)
print(f'VGG16 - Silhouette Score: {silhouette_vgg}, Davies-Bouldin Index: {dbi_vgg}')
Silhouette Score (Combined Features): -0.019616249948740005 Davies-Bouldin Index (Combined Features): 6.28207253719273 ResNet50 - Silhouette Score: 0.0020600399002432823, Davies-Bouldin Index: 5.829557138341255 InceptionV3 - Silhouette Score: -0.02045496553182602, Davies-Bouldin Index: 7.161155619444595 EfficientNetB0 - Silhouette Score: -0.005833596456795931, Davies-Bouldin Index: 6.5289967869516214 VGG16 - Silhouette Score: -0.02354755625128746, Davies-Bouldin Index: 6.385996761777286
from sklearn.cluster import AgglomerativeClustering
hierarchical = AgglomerativeClustering(n_clusters=len(model_class_names)).fit(features_resnet)
labels = hierarchical.labels_
silhouette_avg = silhouette_score(features_resnet, labels)
dbi_avg = davies_bouldin_score(features_resnet, labels)
print(f"Hierarchical Clustering - Silhouette Score: {silhouette_avg:.4f}, Davies-Bouldin Index: {dbi_avg:.4f}")
Hierarchical Clustering - Silhouette Score: 0.0163, Davies-Bouldin Index: 3.9868
To determine the best feature extraction method for clustering, we evaluated four pre-trained models: ResNet50, InceptionV3, EfficientNetB0, and VGG16. The performance of each model was measured using the Silhouette Score and Davies-Bouldin Index. The results indicated that ResNet50 outperformed the other models, with a higher Silhouette Score (0.00206) and a lower Davies-Bouldin Index (5.8296). This suggests that ResNet50's feature extraction provides better cluster separation and lower within-cluster variation compared to the other models. Consequently, we proceed with ResNet50 for further analysis and optimization in the t-SNE phase.
In this section, the focus is on identifying the optimal perplexity value for t-SNE, a key hyperparameter that influences the clustering results. By evaluating the t-SNE embeddings with various perplexity values, we aim to maximize the silhouette score, which indicates how well clusters are separated, while minimizing the Davies-Bouldin index, which measures the average similarity ratio of each cluster with its most similar one. The goal is to determine the perplexity that yields the best overall clustering performance for the extracted ResNet50 features.
def find_best_perplexity(features, title):
best_perplexity = None
best_silhouette_score = -np.inf
best_davies_bouldin_score = np.inf
for perplexity in [5, 10, 20, 30, 40, 50]:
tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
reduced_features_tsne = tsne.fit_transform(features)
silhouette_avg = silhouette_score(reduced_features_tsne, object_classes)
davies_bouldin_avg = davies_bouldin_score(reduced_features_tsne, object_classes)
print(f'Perplexity: {perplexity} | Silhouette Score: {silhouette_avg:.4f} | Davies-Bouldin Index: {davies_bouldin_avg:.4f}')
if silhouette_avg > best_silhouette_score:
best_silhouette_score = silhouette_avg
best_perplexity = perplexity
elif silhouette_avg == best_silhouette_score and davies_bouldin_avg < best_davies_bouldin_score:
best_davies_bouldin_score = davies_bouldin_avg
best_perplexity = perplexity
print(f'\nBest Perplexity: {best_perplexity} | Best Silhouette Score: {best_silhouette_score:.4f}')
return best_perplexity, best_silhouette_score
best_perplexity, best_silhouette = find_best_perplexity(features_resnet, "ResNet50")
Perplexity: 5 | Silhouette Score: -0.0912 | Davies-Bouldin Index: 10.8116 Perplexity: 10 | Silhouette Score: -0.0776 | Davies-Bouldin Index: 9.4520 Perplexity: 20 | Silhouette Score: -0.0835 | Davies-Bouldin Index: 12.9500 Perplexity: 30 | Silhouette Score: -0.0885 | Davies-Bouldin Index: 10.6377 Perplexity: 40 | Silhouette Score: -0.1157 | Davies-Bouldin Index: 12.8011 Perplexity: 50 | Silhouette Score: -0.1103 | Davies-Bouldin Index: 11.1982 Best Perplexity: 10 | Best Silhouette Score: -0.0776
The analysis revealed that a perplexity value of 10 provided the best results, with a Silhouette Score of -0.0776 and a Davies-Bouldin Index of 9.4520. This indicates that, among the tested values, perplexity 10 offers the most optimal balance for cluster separation and compactness, making it the best choice for visualizing the ResNet50 features.
Visualization of ResNet50 Features with t-SNE¶
To further analyze the distribution of features extracted from ResNet50, we employed t-SNE with the previously determined optimal perplexity value of 10. The resulting scatter plot effectively reduces the high-dimensional feature space to two dimensions, allowing us to visually assess the clustering of the different guitar pedal classes.
Each point in the t-SNE plot represents an individual feature vector, and the color coding corresponds to the various classes of guitar pedals. The plot reveals the spatial relationships between classes, highlighting areas where the features are well-separated and regions where classes overlap, which suggests potential challenges in distinguishing between certain pedal types.
def visualize_tsne(features, title):
tsne = TSNE(n_components=2, perplexity=10, random_state=42)
reduced_features_tsne = tsne.fit_transform(features)
df_tsne = pd.DataFrame(reduced_features_tsne, columns=['Component 1', 'Component 2'])
df_tsne['Class'] = object_classes
plt.figure(figsize=(12, 10))
sns.scatterplot(x='Component 1', y='Component 2', hue='Class', data=df_tsne, palette='tab20', s=60)
plt.title(f'{title} - t-SNE Visualization')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.legend(loc='upper right', title='Classes', bbox_to_anchor=(1.25, 1))
plt.show()
def visualize_umap(features, title):
reducer = umap.UMAP(n_components=2, random_state=42)
reduced_features_umap = reducer.fit_transform(features)
df_umap = pd.DataFrame(reduced_features_umap, columns=['Component 1', 'Component 2'])
df_umap['Class'] = object_classes
plt.figure(figsize=(12, 10))
sns.scatterplot(x='Component 1', y='Component 2', hue='Class', data=df_umap, palette='tab20', s=60)
plt.title(f'{title} - UMAP Visualization')
plt.xlabel('UMAP Component 1')
plt.ylabel('UMAP Component 2')
plt.legend(loc='upper right', title='Classes', bbox_to_anchor=(1.25, 1))
plt.show()
visualize_tsne(features_resnet, "ResNet50")
# visualize_tsne(features_inception, "InceptionV3")
# visualize_tsne(features_efficientnet, "EfficientNetB0")
# visualize_tsne(features_vgg, "VGG16")
# visualize_umap(features_resnet, "ResNet50")
# visualize_umap(features_inception, "InceptionV3")
# visualize_umap(features_efficientnet, "EfficientNetB0")
# visualize_umap(features_vgg, "VGG16")
The t-SNE plot reveals that while some classes are well-separated, indicating distinct feature representations, other clusters remain dense and are not properly separated. This suggests that certain classes share similar feature spaces, leading to overlapping clusters, which may present challenges in accurately distinguishing between these classes.
Focused t-SNE Visualization for Each Class¶
To further investigate the clustering behavior of each class, we applied a focused t-SNE visualization technique. In this approach, we generated a grid of plots where each plot emphasizes a specific class while maintaining the overall context of the data. This method allows us to observe how well each class is clustered in the feature space. The visualization revealed that while some classes exhibit clear and distinct clusters, others appear more diffuse or overlap with neighboring classes, highlighting areas where the model's feature representation may struggle to differentiate between certain categories. This focused analysis is crucial for identifying which specific classes require further attention in feature extraction or data augmentation efforts.
def visualize_tsne_with_focus_grid(features, title, class_names, grid_size=(7, 3)):
tsne = TSNE(n_components=2, perplexity=10, random_state=42)
reduced_features_tsne = tsne.fit_transform(features)
df_tsne = pd.DataFrame(reduced_features_tsne, columns=['Component 1', 'Component 2'])
df_tsne['Class'] = object_classes
fig, axes = plt.subplots(grid_size[0], grid_size[1], figsize=(14, 28))
fig.suptitle(f'{title} - t-SNE Visualization with Focus on Each Class', fontsize=16)
for idx, focus_class in enumerate(class_names):
ax = axes[idx // grid_size[1], idx % grid_size[1]]
sns.scatterplot(
x='Component 1', y='Component 2',
hue='Class', data=df_tsne, palette='tab20',
s=60, alpha=0.2, legend=False, ax=ax
)
sns.scatterplot(
x='Component 1', y='Component 2',
hue='Class', data=df_tsne[df_tsne['Class'] == focus_class],
palette='tab20', s=60, ax=ax, legend=False
)
ax.set_title(f'{focus_class}')
ax.set_xlabel('')
ax.set_ylabel('')
ax.set_aspect('equal')
plt.tight_layout()
plt.subplots_adjust(top=0.95)
plt.show()
visualize_tsne_with_focus_grid(features_resnet, "ResNet Features", model_class_names, grid_size=(7, 3))
Upon examining the focused t-SNE visualizations, it becomes evident that while there are identifiable clusters corresponding to each class, the clustering quality is not optimal. In many cases, a single class is not confined to a single, cohesive cluster but instead is spread across multiple clusters, with some points appearing as outliers. This dispersion indicates that the feature space does not perfectly encapsulate the distinctions between classes, leading to overlaps and fragmented clusters. This finding suggests that further refinement in feature extraction or data preprocessing might be necessary to achieve better class separability.
K-Means Clustering Analysis¶
To further investigate the structure of the feature space and the effectiveness of feature extraction, we applied K-Means clustering using the number of clusters equal to the number of distinct classes in our dataset. By assigning each data point to one of these clusters, we aim to assess how well the ResNet50 features can differentiate between the various classes. The clustering performance is evaluated using two key metrics: the Silhouette Score and the Davies-Bouldin Index, which provide insight into the compactness and separation of the clusters. This analysis helps us understand the degree to which the extracted features allow for effective class separation.
n_clusters = len(model_class_names)
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans_labels = kmeans.fit_predict(features_resnet)
silhouette_kmeans = silhouette_score(features_resnet, kmeans_labels)
dbi_kmeans = davies_bouldin_score(features_resnet, kmeans_labels)
print(f'K-Means - Silhouette Score: {silhouette_kmeans}, Davies-Bouldin Index: {dbi_kmeans}')
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
K-Means - Silhouette Score: 0.021858306601643562, Davies-Bouldin Index: 3.9885556190123777
To further explore the effectiveness of K-Means clustering, we visualized the resulting clusters using both t-SNE and UMAP, two powerful dimensionality reduction techniques. These visualizations allow us to project the high-dimensional ResNet50 features into a two-dimensional space, providing a clear and interpretable representation of the clusters formed by the K-Means algorithm. By examining these plots, we can assess how well the data points corresponding to different clusters are separated and whether the clusters are coherent and compact. The t-SNE and UMAP visualizations also offer a way to visually inspect the relationship between clusters, which is crucial for understanding the underlying structure of the data and identifying any potential overlap or misclassification among the clusters. While the following graph only displays the t-SNE visualization, the code provided includes both t-SNE and UMAP for a comprehensive analysis.
def visualize_clusters_tsne(features, kmeans_labels, title):
tsne = TSNE(n_components=2, perplexity=10, random_state=42)
reduced_features_tsne = tsne.fit_transform(features)
df_tsne = pd.DataFrame(reduced_features_tsne, columns=['Component 1', 'Component 2'])
df_tsne['Cluster'] = kmeans_labels
plt.figure(figsize=(12, 10))
sns.scatterplot(x='Component 1', y='Component 2', hue='Cluster', data=df_tsne, palette='tab20', s=60)
plt.title(f'{title} - t-SNE Clustering Visualization')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.legend(loc='upper right', title='Clusters', bbox_to_anchor=(1.25, 1))
plt.show()
def visualize_clusters_umap(features, kmeans_labels, title):
reducer = umap.UMAP(n_components=2, random_state=42)
reduced_features_umap = reducer.fit_transform(features)
df_umap = pd.DataFrame(reduced_features_umap, columns=['Component 1', 'Component 2'])
df_umap['Cluster'] = kmeans_labels
plt.figure(figsize=(12, 10))
sns.scatterplot(x='Component 1', y='Component 2', hue='Cluster', data=df_umap, palette='tab20', s=60)
plt.title(f'{title} - UMAP Clustering Visualization')
plt.xlabel('UMAP Component 1')
plt.ylabel('UMAP Component 2')
plt.legend(loc='upper right', title='Clusters', bbox_to_anchor=(1.25, 1))
plt.show()
visualize_clusters_tsne(features_resnet, kmeans_labels, "ResNet50")
# visualize_clusters_umap(features_resnet, kmeans_labels, "ResNet50")
Determining the Optimal Number of Clusters with K-Means¶
To identify the optimal number of clusters for our dataset, we performed a K-Means clustering analysis by iterating over a range of cluster counts from 2 to 35. For each value of k, the number of clusters, we computed two key metrics: the Silhouette Score and the Davies-Bouldin Index. The Silhouette Score measures how similar an object is to its own cluster compared to other clusters, with a higher score indicating better-defined clusters. The Davies-Bouldin Index, on the other hand, evaluates the average similarity ratio of each cluster with the cluster that is most similar to it, with lower values indicating better clustering quality. By comparing these metrics across different k values, we aimed to identify the number of clusters that balances these criteria, ultimately selecting the k that yields the highest Silhouette Score and the lowest Davies-Bouldin Index as the best choice for our data. The results of this analysis, including the optimal k value, are reported below.
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, davies_bouldin_score
best_k = None
best_silhouette = -1
best_dbi = float('inf')
for k in range(2, 35):
kmeans = KMeans(n_clusters=k, random_state=42).fit(features_resnet)
labels = kmeans.labels_
silhouette_avg = silhouette_score(features_resnet, labels)
dbi_avg = davies_bouldin_score(features_resnet, labels)
print(f"K: {k} | Silhouette Score: {silhouette_avg:.4f} | Davies-Bouldin Index: {dbi_avg:.4f}")
if silhouette_avg > best_silhouette and dbi_avg < best_dbi:
best_silhouette = silhouette_avg
best_dbi = dbi_avg
best_k = k
print(f"\nBest K: {best_k} | Best Silhouette Score: {best_silhouette:.4f} | Best Davies-Bouldin Index: {best_dbi:.4f}")
The following graph visualizes the relationship between the number of clusters (K) and two important clustering metrics: the Silhouette Score and the Davies-Bouldin Index. The Silhouette Score (in blue) is plotted on the left y-axis and provides insight into the cohesion of the clusters, where higher scores indicate more distinct and well-separated clusters. The Davies-Bouldin Index (in red) is plotted on the right y-axis and reflects the average similarity ratio between clusters, with lower values indicating better clustering quality. This dual-axis graph allows us to evaluate the trade-off between these two metrics as we vary the number of clusters from 2 to 34.
def plot_clustering_metrics(k_values, silhouette_scores, dbi_scores):
fig, ax1 = plt.subplots(figsize=(10, 6))
ax1.plot(k_values, silhouette_scores, 'b-o', label='Silhouette Score')
ax1.set_xlabel('Number of Clusters (K)', fontsize=12)
ax1.set_ylabel('Silhouette Score', color='b', fontsize=12)
ax1.tick_params(axis='y', labelcolor='b')
ax2 = ax1.twinx()
ax2.plot(k_values, dbi_scores, 'r-o', label='Davies-Bouldin Index')
ax2.set_ylabel('Davies-Bouldin Index', color='r', fontsize=12)
ax2.tick_params(axis='y', labelcolor='r')
plt.title('Clustering Metrics vs. Number of Clusters (K)', fontsize=14)
ax1.grid(True)
fig.tight_layout()
plt.show()
k_values = list(range(2, 35))
silhouette_scores = [0.0626, 0.0412, 0.0391, 0.0317, 0.0148, 0.0185, 0.0181, 0.0158, 0.0217, 0.0192,
0.0209, 0.0240, 0.0253, 0.0245, 0.0214, 0.0197, 0.0322, 0.0268, 0.0272, 0.0235,
0.0247, 0.0242, 0.0248, 0.0224, 0.0236, 0.0261, 0.0186]
dbi_scores = [4.2887, 4.3776, 4.1218, 4.2828, 4.1023, 4.3009, 4.2201, 4.2114, 4.2212, 4.1433,
4.1321, 4.1332, 3.9831, 3.8696, 4.0765, 3.8619, 3.8169, 3.8593, 3.6744, 3.8674,
3.9347, 3.8280, 3.6525, 3.6435, 3.8327, 3.6215, 3.7469]
min_length = min(len(k_values), len(silhouette_scores), len(dbi_scores))
k_values = k_values[:min_length]
silhouette_scores = silhouette_scores[:min_length]
dbi_scores = dbi_scores[:min_length]
plot_clustering_metrics(k_values, silhouette_scores, dbi_scores)
From the graph, we observe that as the number of clusters increases, the Silhouette Score generally decreases, indicating that the clusters become less distinct. Simultaneously, the Davies-Bouldin Index fluctuates but shows a trend of decreasing with certain K values, suggesting periods of improved cluster separation. The analysis reveals that a cluster count of 26 provides the best balance, with a Silhouette Score of 0.0272 and a Davies-Bouldin Index of 3.6744, representing the optimal trade-off between cluster cohesion and separation within our dataset. This indicates that 26 clusters may be the most appropriate number for accurately capturing the underlying structure in the data.
We proceed by visualizing the clustering results for K = 26, which was identified as the optimal number of clusters based on our previous analysis. The K-Means algorithm is applied to the ResNet50 features, resulting in the assignment of each feature to one of the 26 clusters. By calculating the Silhouette Score and the Davies-Bouldin Index for this specific clustering, we can further assess the quality of the cluster formation and gain insights into how well the selected K value captures the structure within the data.
n_clusters = 26
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans_labels = kmeans.fit_predict(features_resnet)
silhouette_kmeans = silhouette_score(features_resnet, kmeans_labels)
dbi_kmeans = davies_bouldin_score(features_resnet, kmeans_labels)
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
visualize_clusters_tsne(features_resnet, kmeans_labels, "ResNet50")
The K-Means clustering analysis has provided valuable insights into the structure of the ResNet50 features. By experimenting with various values of K, we identified that K = 26 yielded the best clustering performance in terms of both the Silhouette Score and the Davies-Bouldin Index. These metrics suggest that while the clusters are somewhat distinct, there are still overlaps and inconsistencies, indicating that the feature space might not perfectly capture the separability of the classes.
It is important to note that the optimal K being larger than the number of classes suggests that the data might have finer substructures within each class or that some classes are more complex and require multiple clusters to capture their variation. This result underscores the inherent complexity of the dataset and suggests that while K-Means is a useful tool for exploring the data, the exact number of clusters does not necessarily have to match the number of classes. Instead, it can reflect the nuanced distribution of features within the dataset. This observation is crucial as it guides us in understanding the limitations and potential improvements needed for feature extraction and subsequent model development.
In conclusion, the exploratory data analysis (EDA) has shed light on the complexities and potential challenges within the dataset. Despite the use of advanced techniques such as t-SNE, UMAP, and K-Means clustering, the results indicate significant overlaps and inconsistencies across various classes. These findings suggest that the feature space, while informative, may not fully capture the nuances required for distinct class separations. The complexity of the data, as revealed through clustering performance, hints at potential issues such as overlapping class distributions, insufficient feature extraction, or inherent variability within the classes themselves. Moving forward, these insights will be crucial in guiding further analysis, model fine-tuning, and potentially revisiting data collection or preprocessing strategies to address the challenges identified during this EDA.
Classic Computer Vision Methods¶
Introduction¶
Classic computer vision methods form the foundation of many image processing and analysis techniques that were widely used before the advent of deep learning. These methods include edge detection, contour detection, feature detection and matching, and traditional object detection algorithms. In this section, we will explore each of these classic vision methods in detail and demonstrate their application through minimal, focused examples. By doing so, we will gain an understanding of how these techniques work and how they have contributed to the evolution of modern computer vision.
Edge Detection¶
Edge detection is a fundamental technique in image processing and computer vision. It aims to identify points in an image where the image brightness changes sharply, which typically indicate boundaries of objects. Detecting edges is crucial for tasks such as image segmentation, object detection, and recognition.
One of the most popular edge detection algorithms is the Canny edge detector. Developed by John F. Canny in 1986, the Canny edge detector is known for its optimal edge detection performance. It involves several steps including noise reduction, gradient calculation, non-maximum suppression, and edge tracking by hysteresis.
import cv2
import matplotlib.pyplot as plt
image_path = '/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg'
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
blurred_image = cv2.GaussianBlur(image, (5, 5), 0)
edges = cv2.Canny(blurred_image, 100, 200)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(image, cmap='gray')
plt.subplot(1, 2, 2)
plt.title('Canny Edges')
plt.imshow(edges, cmap='gray')
plt.show()
Harris Corner Detection¶
Harris Corner Detection is a popular technique used in computer vision to detect corners within an image. Corners are points where the image intensity changes significantly in multiple directions, making them useful features for various tasks such as image matching, object tracking, and 3D reconstruction.
The Harris Corner Detector, introduced by Chris Harris and Mike Stephens in 1988, is based on the local auto-correlation function of a signal. It identifies points where the surrounding pixels have significant intensity variations, indicating potential corners.
import cv2
import numpy as np
import matplotlib.pyplot as plt
image_path = '/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg'
image = cv2.imread(image_path)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = np.float32(gray_image)
dst = cv2.cornerHarris(gray, 2, 3, 0.04)
dst = cv2.dilate(dst, None)
image[dst > 0.01 * dst.max()] = [0, 0, 255]
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(cv2.cvtColor(cv2.imread(image_path), cv2.COLOR_BGR2RGB))
plt.subplot(1, 2, 2)
plt.title('Harris Corners')
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.show()
Contour Detection¶
Contour detection is a technique used in image processing to identify the boundaries of objects within an image. Contours are curves joining all the continuous points along a boundary that have the same color or intensity. They are very useful for shape analysis, object detection, and recognition.
In this section, we will use OpenCV's findContours
function to detect contours in an image. This method helps in extracting the shapes of objects, which can be further used for various applications such as object recognition, image segmentation, and more.
import cv2
import matplotlib.pyplot as plt
image_path = '/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg'
image = cv2.imread(image_path)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray_image, 100, 200)
contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contour_image = cv2.drawContours(image.copy(), contours, -1, (0, 255, 0), 2)
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title('Original Image')
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.subplot(1, 2, 2)
plt.title('Contours')
plt.imshow(cv2.cvtColor(contour_image, cv2.COLOR_BGR2RGB))
plt.show()
Feature Detection and Matching¶
Feature detection and matching are essential techniques in computer vision used to identify and match key points between images. These techniques are crucial for tasks such as image stitching, object recognition, and 3D reconstruction. In this section, we use the ORB (Oriented FAST and Rotated BRIEF) method for feature detection and matching.
ORB is a fast and efficient alternative to older methods like SIFT and SURF. It combines the FAST keypoint detector and the BRIEF descriptor with modifications to improve performance and rotation invariance. ORB is not only faster but also free to use without any licensing restrictions, making it an excellent choice for real-time applications and projects with resource constraints.
Here's an implementation of ORB for detecting and visualizing keypoints in an image:
import cv2
import matplotlib.pyplot as plt
image_path = '/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg'
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
orb = cv2.ORB_create()
keypoints, descriptors = orb.detectAndCompute(image, None)
keypoint_image = cv2.drawKeypoints(image, keypoints, None, color=(0, 255, 0))
# Display the result
plt.figure(figsize=(10, 5))
plt.title('ORB Keypoints')
plt.imshow(keypoint_image, cmap='gray')
plt.show()
SIFT for Feature Detection and Matching¶
Scale-Invariant Feature Transform (SIFT) is a powerful algorithm used for detecting and describing local features in images. It is particularly effective for matching key points between two images, even if the images are scaled, rotated, or have different illumination. SIFT identifies distinctive key points in the image and computes a descriptor for each key point, which can be used for matching points across different images.
In this section, we use SIFT to detect and match features between two images of a guitar pedal. This helps in understanding how different parts of the pedal can be identified and matched across various images.
import cv2
import matplotlib.pyplot as plt
sift = cv2.SIFT_create()
image_paths = [
'/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg',
'/content/drive/MyDrive/pedal-model/images/train/BigSky Multi Reverb_1.jpg',
'/content/drive/MyDrive/pedal-model/images/train/BigSky Multi Reverb_2.jpg'
]
for i in range(len(image_paths) - 1):
img1 = cv2.imread(image_paths[i])
gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
kp1, des1 = sift.detectAndCompute(gray1, None)
img2 = cv2.imread(image_paths[i + 1])
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
kp2, des2 = sift.detectAndCompute(gray2, None)
bf = cv2.BFMatcher(crossCheck=True)
matches = bf.match(des2, des1)
matches = sorted(matches, key=lambda x: x.distance)
img3 = cv2.drawMatches(img2, kp2, img1, kp1, matches[:10], None, flags=2)
plt.figure(figsize=(20, 20))
plt.imshow(img3)
image1_name = image_paths[i + 1].split('/')[-1]
image2_name = image_paths[i].split('/')[-1]
plt.title(f'SIFT Feature Matching: {image1_name} vs {image2_name}')
plt.axis('off')
plt.show()
Shi-Tomasi Corner Detection¶
Shi-Tomasi Corner Detection, also known as Good Features to Track, is an effective algorithm used for detecting corner points in images, which are good for tracking. This method is particularly useful for applications where feature tracking stability across multiple frames is crucial, such as in video analysis and 3D reconstruction.
import cv2
import numpy as np
import matplotlib.pyplot as plt
image_paths = [
'/content/drive/MyDrive/pedal-model/images/test/BigSky Multi Reverb_31.jpg',
'/content/drive/MyDrive/pedal-model/images/train/BigSky Multi Reverb_5.jpg',
]
n_features = 30
for image_path in image_paths:
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
corners = cv2.goodFeaturesToTrack(gray, n_features, 0.01, 10)
corners = np.intp(corners)
for i in corners:
x, y = i.ravel()
cv2.circle(img, (x, y), 3, 255, -1)
plt.figure(figsize=(10, 5))
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
image_name = image_path.split('/')[-1]
plt.title(f'{n_features} Features Detected in {image_name}')
plt.axis('off')
plt.show()
After exploring these classic computer vision methods, we have gained insight into the fundamental techniques that have long been used to analyze and interpret images. While these methods are powerful in their simplicity and effectiveness for certain tasks, it becomes evident that they have limitations, especially when dealing with complex and high-dimensional data. As we move forward, the need for more sophisticated approaches, such as deep learning, becomes clear. These modern techniques offer greater accuracy, flexibility, and the ability to handle more intricate patterns in data, making them indispensable in the current landscape of computer vision.
Deep Learning Computer Vision Methods¶
Introduction¶
This section explores various deep learning techniques for computer vision tasks, specifically focusing on object detection and classification of guitar pedals. We will cover the use of the Detectron2 model and the YOLOv8 model, which is known for its superior performance in object detection.
Object Detection with Detectron2¶
Detectron2 is a cutting-edge, open-source library developed by Facebook AI Research (FAIR) for object detection and segmentation. It supports various architectures, including Faster R-CNN and Mask R-CNN, enabling high-accuracy detection. For my project, I used Detectron2 to create a model specifically for identifying guitar pedals. By fine-tuning a pre-trained model on a custom dataset, I achieved accurate detection and classification of different guitar pedals, aiding in better inventory management for musicians and collectors. Detectron2's flexibility and ease of use significantly streamlined the development and optimization process.
Setup¶
In this section, we load the Detectron2 model and annotations necessary for our object detection task. This setup ensures that the model and data are properly configured before moving forward with training and evaluation.
!python -m pip install pyyaml==5.1
import sys, os, distutils.core
!git clone 'https://github.com/facebookresearch/detectron2'
dist = distutils.core.run_setup("./detectron2/setup.py")
!python -m pip install {' '.join([f"'{x}'" for x in dist.install_requires])}
sys.path.insert(0, os.path.abspath('./detectron2'))
import torch, detectron2
!nvcc --version
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
import numpy as np
import os, json, cv2, random
from google.colab.patches import cv2_imshow
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer, ColorMode
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
import json
def remap_category_ids(annotation_file):
with open(annotation_file) as f:
data = json.load(f)
for cat in data["categories"]:
cat["id"] += 1
for ann in data["annotations"]:
ann["category_id"] += 1
with open(annotation_file, 'w') as f:
json.dump(data, f, indent=4)
annotation_files = [
"/content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json",
"/content/drive/MyDrive/pedal-model/Detectron2/validation_coco_annotations.json",
"/content/drive/MyDrive/pedal-model/Detectron2/test_coco_annotations.json"
]
for file in annotation_files:
remap_category_ids(file)
def unregister_dataset(name):
if name in DatasetCatalog:
DatasetCatalog.pop(name)
if name in MetadataCatalog:
MetadataCatalog.pop(name)
unregister_dataset("train_dataset")
unregister_dataset("validation_dataset")
unregister_dataset("test_dataset")
register_coco_instances("train_dataset", {}, "/content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json", "/content/drive/MyDrive/pedal-model/images/train")
register_coco_instances("validation_dataset", {}, "/content/drive/MyDrive/pedal-model/Detectron2/validation_coco_annotations.json", "/content/drive/MyDrive/pedal-model/images/validation")
register_coco_instances("test_dataset", {}, "/content/drive/MyDrive/pedal-model/Detectron2/test_coco_annotations.json", "/content/drive/MyDrive/pedal-model/images/test")
train_metadata = MetadataCatalog.get("train_dataset")
train_dataset_dicts = DatasetCatalog.get("train_dataset")
val_metadata = MetadataCatalog.get("validation_dataset")
val_dataset_dicts = DatasetCatalog.get("validation_dataset")
test_metadata = MetadataCatalog.get("test_dataset")
test_dataset_dicts = DatasetCatalog.get("test_dataset")
[06/24 16:36:09 d2.data.datasets.coco]: Loaded 2567 images in COCO format from /content/drive/MyDrive/pedal-model/Detectron2/train_coco_annotations.json [06/24 16:36:10 d2.data.datasets.coco]: Loaded 727 images in COCO format from /content/drive/MyDrive/pedal-model/Detectron2/validation_coco_annotations.json [06/24 16:36:11 d2.data.datasets.coco]: Loaded 388 images in COCO format from /content/drive/MyDrive/pedal-model/Detectron2/test_coco_annotations.json
Training¶
In this section, we train the Detectron2 model with the loaded images and annotations.
from detectron2.engine import DefaultTrainer
cfg = get_cfg()
cfg.OUTPUT_DIR = "/content/drive/MyDrive/pedal-model/Detectron2"
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("train_dataset",)
cfg.DATASETS.TEST = ("test_dataset",)
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025
cfg.SOLVER.MAX_ITER = 15000
cfg.SOLVER.STEPS = []
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 21
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
Model Evaluation¶
In this section, we will evaluate the performance of our Detectron2 object detection model. We will analyze key metrics to assess the model's accuracy in detecting and classifying objects. Additionally, confusion matrices will be used to visualize the performance across different classes, helping us identify any areas where the model may struggle or perform well. This comprehensive evaluation will provide insights into the strengths and weaknesses of the model, guiding any necessary adjustments or improvements.
General Evaluation¶
cfg.MODEL.WEIGHTS = os.path.join("/content/drive/MyDrive/pedal-model/Detectron2/model_final.pth")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7
predictor = DefaultPredictor(cfg)
[06/24 16:36:17 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /content/drive/MyDrive/pedal-model/Detectron2/model_final.pth ...
test_images_folder = "/content/drive/MyDrive/pedal-model/images/test/"
val_image_paths = [os.path.join(test_images_folder, img) for img in os.listdir(test_images_folder) if img.endswith(('jpg', 'jpeg', 'png'))]
metadata = MetadataCatalog.get(cfg.DATASETS.TRAIN[0])
for img_path in random.sample(val_image_paths, 3):
im = cv2.imread(img_path)
outputs = predictor(im)
instances = outputs["instances"].to("cpu")
pred_classes = instances.pred_classes.tolist()
class_names = [metadata.thing_classes[i] for i in pred_classes]
print(f"Predicted classes for {img_path}: {class_names}")
v = Visualizer(im[:, :, ::-1],
metadata=metadata,
scale=0.5,
instance_mode=ColorMode.IMAGE_BW
)
out = v.draw_instance_predictions(instances)
plt.imshow(out.get_image()[:, :, ::-1])
plt.axis('off')
plt.show()
Predicted classes for /content/drive/MyDrive/pedal-model/images/test/Lex V2 Rotary Modulation_153.jpg: ['18']
Predicted classes for /content/drive/MyDrive/pedal-model/images/test/UltraViolet Vintage Vibe_41.jpg: ['0']
Predicted classes for /content/drive/MyDrive/pedal-model/images/test/Deco V2 Tape Saturation & Doubletracker_237.jpg: ['14']
Validation Dataset¶
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
val_evaluator = COCOEvaluator("validation_dataset", output_dir="/content/drive/MyDrive/pedal-model/Detectron2/output")
val_loader = build_detection_test_loader(cfg, "validation_dataset")
coco_validation_inference = inference_on_dataset(predictor.model, val_loader, val_evaluator)
import seaborn as sns
import matplotlib.pyplot as plt
ap_metrics = ['AP', 'AP50', 'AP75', 'APs', 'APm', 'APl']
ap_values = [coco_validation_inference['bbox'][metric] for metric in ap_metrics]
ax = sns.barplot(x=ap_metrics, y=ap_values)
ax.set_title('Detectron2 Evaluation Metrics (Validation)')
ax.set_xlabel('Metric')
ax.set_ylabel('Value')
fig = plt.gcf()
fig.set_size_inches(10, 6)
for p in ax.patches:
ax.annotate('{:.3f}'.format(p.get_height()), (p.get_x() + p.get_width() / 2, p.get_height()), ha='center', va='bottom')
plt.show()
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
from sklearn.metrics import confusion_matrix, classification_report, precision_recall_curve
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import json
def extract_labels_from_coco_evaluator(evaluator):
with open(os.path.join(evaluator._output_dir, "coco_instances_results.json")) as f:
results = json.load(f)
true_labels = {}
pred_labels = {}
for res in results:
image_id = res['image_id']
if image_id not in true_labels:
true_labels[image_id] = []
pred_labels[image_id] = []
true_labels[image_id].append(res['category_id'])
pred_labels[image_id].append(res['category_id'])
return true_labels, pred_labels
true_labels, pred_labels = extract_labels_from_coco_evaluator(val_evaluator)
true_labels_flat = [label for labels in true_labels.values() for label in labels]
pred_labels_flat = [label for labels in pred_labels.values() for label in labels]
cm = confusion_matrix(true_labels_flat, pred_labels_flat)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Validation)')
plt.show()
Test Dataset¶
test_evaluator = COCOEvaluator("test_dataset", output_dir="/content/drive/MyDrive/pedal-model/Detectron2/output")
test_loader = build_detection_test_loader(cfg, "test_dataset")
coco_test_inference = inference_on_dataset(predictor.model, test_loader, test_evaluator)
ap_metrics = ['AP', 'AP50', 'AP75', 'APs', 'APm', 'APl']
ap_values = [coco_test_inference['bbox'][metric] for metric in ap_metrics]
ax = sns.barplot(x=ap_metrics, y=ap_values)
ax.set_title('Detectron2 Evaluation Metrics (Test)')
ax.set_xlabel('Metric')
ax.set_ylabel('Value')
fig = plt.gcf()
fig.set_size_inches(10, 6)
for p in ax.patches:
ax.annotate('{:.3f}'.format(p.get_height()), (p.get_x() + p.get_width() / 2, p.get_height()), ha='center', va='bottom')
plt.show()
def extract_labels_from_coco_evaluator(evaluator):
with open(os.path.join(evaluator._output_dir, "coco_instances_results.json")) as f:
results = json.load(f)
true_labels = {}
pred_labels = {}
for res in results:
image_id = res['image_id']
if image_id not in true_labels:
true_labels[image_id] = []
pred_labels[image_id] = []
true_labels[image_id].append(res['category_id'])
pred_labels[image_id].append(res['category_id'])
return true_labels, pred_labels
true_labels, pred_labels = extract_labels_from_coco_evaluator(test_evaluator)
true_labels_flat = [label for labels in true_labels.values() for label in labels]
pred_labels_flat = [label for labels in pred_labels.values() for label in labels]
cm = confusion_matrix(true_labels_flat, pred_labels_flat)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Test)')
plt.show()
Class Evalutation¶
Validation Dataset¶
coco_inf_dict = coco_validation_inference['bbox'].copy()
delete = []
for i in coco_inf_dict.keys():
if i.split("-")[0] != 'AP':
delete.append(i)
delete.append('AP')
for i in delete:
del coco_inf_dict[i]
class_names = [
"UltraViolet Vintage Vibe",
"Brig dBucket Delay",
"Cloudburst Ambient Reverb",
"BigSky Multi Reverb",
"TimeLine Multi Delay",
"Mobius Multi Modulation",
"Iridium Amp Modeler And Cab",
"Compadre Compressor & Boost",
"NightSky Experimental Reverb",
"Volante Magnetic Tape Delay",
"Zelzah Phaser & Modulation",
"Sunset Dual Overdrive",
"Riverside Drive & Distortion",
"blueSky V2 Reverb",
"Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay",
"El Capistan V2 Tape Delay",
"Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation",
"Ola Chorus & Vibrato",
"Orbit Flanger"
]
ax = sns.barplot(x=class_names, y=[map for map in coco_inf_dict.values()])
ax.set_title('Detectron2 Evaluation Metrics (Validation)')
ax.set_xlabel('Classes')
ax.set_ylabel('Value')
fig = plt.gcf()
fig.set_size_inches(12, 12)
for p in ax.patches:
ax.annotate('{:.3f}'.format(p.get_height()), (p.get_x() + p.get_width() / 2, p.get_height()), ha='center', va='bottom')
ax.set_xticks(range(len(class_names)))
ax.set_xticklabels(class_names, rotation=90)
plt.show()
Test Dataset¶
coco_inf_dict = coco_test_inference['bbox'].copy()
delete = []
for i in coco_inf_dict.keys():
if i.split("-")[0] != 'AP':
delete.append(i)
delete.append('AP')
for i in delete:
del coco_inf_dict[i]
class_names = [
"UltraViolet Vintage Vibe",
"Brig dBucket Delay",
"Cloudburst Ambient Reverb",
"BigSky Multi Reverb",
"TimeLine Multi Delay",
"Mobius Multi Modulation",
"Iridium Amp Modeler And Cab",
"Compadre Compressor & Boost",
"NightSky Experimental Reverb",
"Volante Magnetic Tape Delay",
"Zelzah Phaser & Modulation",
"Sunset Dual Overdrive",
"Riverside Drive & Distortion",
"blueSky V2 Reverb",
"Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay",
"El Capistan V2 Tape Delay",
"Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation",
"Ola Chorus & Vibrato",
"Orbit Flanger"
]
ax = sns.barplot(x=class_names, y=[map for map in coco_inf_dict.values()])
ax.set_title('Detectron2 Evaluation Metrics (Test)')
ax.set_xlabel('Classes')
ax.set_ylabel('Value')
fig = plt.gcf()
fig.set_size_inches(12, 12)
for p in ax.patches:
ax.annotate('{:.3f}'.format(p.get_height()), (p.get_x() + p.get_width() / 2, p.get_height()), ha='center', va='bottom')
ax.set_xticks(range(len(class_names)))
ax.set_xticklabels(class_names, rotation=90)
plt.show()
Evaluation on Unlabeled Online Images¶
test_images_folder = "/content/drive/MyDrive/pedal-model/examples/reddit"
image_paths = [os.path.join(test_images_folder, img) for img in os.listdir(test_images_folder) if img.endswith(('jpg', 'jpeg', 'png'))]
model_class_names = [
"UltraViolet Vintage Vibe",
"Brig dBucket Delay",
"Cloudburst Ambient Reverb",
"BigSky Multi Reverb",
"TimeLine Multi Delay",
"Mobius Multi Modulation",
"Iridium Amp Modeler And Cab",
"Compadre Compressor & Boost",
"NightSky Experimental Reverb",
"Volante Magnetic Tape Delay",
"Zelzah Phaser & Modulation",
"Sunset Dual Overdrive",
"Riverside Drive & Distortion",
"blueSky V2 Reverb",
"Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay",
"El Capistan V2 Tape Delay",
"Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation",
"Ola Chorus & Vibrato",
"Orbit Flanger"
]
for img_path in image_paths:
im = cv2.imread(img_path)
if im is None:
print(f"Error loading image: {img_path}")
continue
outputs = predictor(im)
instances = outputs["instances"].to("cpu")
pred_classes = instances.pred_classes.tolist()
class_names = [model_class_names[i] for i in pred_classes]
print(f"Predicted classes for {img_path}: {class_names}")
formatted_results = "Detection Results\n"
formatted_results += f"Image: {img_path}\n"
formatted_results += "---\n"
if instances.has("pred_boxes"):
for i in range(len(instances)):
class_id = pred_classes[i]
class_name = model_class_names[class_id]
confidence = instances.scores[i].item()
box = instances.pred_boxes[i].tensor.numpy().tolist()[0]
formatted_results += f"Class Name: {class_name}\n"
formatted_results += f"Confidence: {confidence:.2f}\n"
formatted_results += f"Box Coordinates: {box}\n"
formatted_results += "---\n"
print(formatted_results)
v = Visualizer(im[:, :, ::-1],
metadata=metadata,
scale=0.5,
instance_mode=ColorMode.IMAGE_BW
)
out = v.draw_instance_predictions(instances)
plt.imshow(out.get_image()[:, :, ::-1])
plt.axis('off')
plt.show()
Predicted classes for /content/drive/MyDrive/pedal-model/examples/reddit/reddit1.jpeg: ['Mobius Multi Modulation', 'BigSky Multi Reverb', 'Iridium Amp Modeler And Cab', 'Riverside Drive & Distortion', 'Deco V2 Tape Saturation & Doubletracker'] Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit1.jpeg --- Class Name: Mobius Multi Modulation Confidence: 0.96 Box Coordinates: [601.0153198242188, 348.0445861816406, 874.806640625, 547.9111328125] --- Class Name: BigSky Multi Reverb Confidence: 0.94 Box Coordinates: [291.7544250488281, 77.62279510498047, 577.2451171875, 286.0135803222656] --- Class Name: Iridium Amp Modeler And Cab Confidence: 0.80 Box Coordinates: [104.19577026367188, 84.00048065185547, 290.5104675292969, 274.38385009765625] --- Class Name: Riverside Drive & Distortion Confidence: 0.78 Box Coordinates: [882.333984375, 343.5067138671875, 1043.6500244140625, 526.3818359375] --- Class Name: Deco V2 Tape Saturation & Doubletracker Confidence: 0.76 Box Coordinates: [355.3375549316406, 359.4771728515625, 595.5907592773438, 551.387939453125] ---
Predicted classes for /content/drive/MyDrive/pedal-model/examples/reddit/reddit2.jpeg: ['TimeLine Multi Delay'] Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit2.jpeg --- Class Name: TimeLine Multi Delay Confidence: 0.95 Box Coordinates: [183.55455017089844, 209.4958953857422, 589.1721801757812, 605.0693969726562] ---
Predicted classes for /content/drive/MyDrive/pedal-model/examples/reddit/reddit3.jpeg: ['BigSky Multi Reverb', 'TimeLine Multi Delay', 'Mobius Multi Modulation'] Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit3.jpeg --- Class Name: BigSky Multi Reverb Confidence: 0.99 Box Coordinates: [18.185808181762695, 269.9821472167969, 147.65084838867188, 361.276123046875] --- Class Name: TimeLine Multi Delay Confidence: 0.98 Box Coordinates: [138.09800720214844, 266.8824157714844, 259.9830017089844, 353.4806823730469] --- Class Name: Mobius Multi Modulation Confidence: 0.87 Box Coordinates: [260.4153747558594, 267.8182678222656, 379.6722717285156, 348.8308410644531] ---
Predicted classes for /content/drive/MyDrive/pedal-model/examples/reddit/reddit4.jpeg: ['Flint V2 Tremolo & Reverb'] Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit4.jpeg --- Class Name: Flint V2 Tremolo & Reverb Confidence: 0.99 Box Coordinates: [52.06399917602539, 48.28710174560547, 201.0768585205078, 210.287841796875] ---
Predicted classes for /content/drive/MyDrive/pedal-model/examples/reddit/reddit5.jpeg: ['Volante Magnetic Tape Delay'] Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit5.jpeg --- Class Name: Volante Magnetic Tape Delay Confidence: 0.89 Box Coordinates: [39.6470832824707, 286.4281311035156, 263.1510314941406, 438.95843505859375] ---
Object Detection with YOLOv8¶
YOLOv8, the latest iteration in the YOLO (You Only Look Once) series, is a state-of-the-art object detection model known for its speed and accuracy. Developed to push the boundaries of real-time object detection, YOLOv8 incorporates advancements in architecture and training techniques to deliver high performance across various detection tasks. For my project, I utilized YOLOv8 to develop a model specifically for identifying guitar pedals. By training the model on a custom dataset, I achieved precise detection and classification of various guitar pedals, which is invaluable for inventory management and collection organization. YOLOv8's balance of efficiency and accuracy made it an excellent choice for this application, simplifying the deployment and fine-tuning process.
Setup¶
In this section, we load the YOLOv8 model and annotations necessary for our object detection task. This setup ensures that the model and data are properly configured before moving forward with training and evaluation.
import locale
from IPython.display import Image
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_curve, average_precision_score
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
locale.getpreferredencoding = lambda: "UTF-8"
!pip install shap
!pip install ultralytics
import os
from ultralytics import YOLO
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from IPython.display import display
import cv2
class_names = [
"UltraViolet Vintage Vibe", "Brig dBucket Delay", "Cloudburst Ambient Reverb",
"BigSky Multi Reverb", "TimeLine Multi Delay", "Mobius Multi Modulation",
"Iridium Amp Modeler And Cab", "Compadre Compressor & Boost", "NightSky Experimental Reverb",
"Volante Magnetic Tape Delay", "Zelzah Phaser & Modulation", "Sunset Dual Overdrive",
"Riverside Drive & Distortion", "blueSky V2 Reverb", "Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay", "El Capistan V2 Tape Delay", "Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation", "Ola Chorus & Vibrato", "Orbit Flanger"
]
Training¶
In this section, we train the YOLOv8 model with the loaded images and annotations.
model = YOLO("yolov8n.yaml")
file_path = '/content/drive/MyDrive/pedal-model/config.yaml'
results = model.train(data=file_path, epochs=100)
!scp -r runs/detect/train '/content/drive/MyDrive/pedal-model/yolov8'
Model Evaluation¶
In this section, we will evaluate the performance of our YOLOv8 object detection model. We will analyze key metrics to assess the model's accuracy in detecting and classifying objects. Additionally, confusion matrices will be used to visualize the performance across different classes, helping us identify any areas where the model may struggle or perform well. This comprehensive evaluation will provide insights into the strengths and weaknesses of the model, guiding any necessary adjustments or improvements.
confusion_matrix_path = '/content/drive/MyDrive/pedal-model/yolov8/train/confusion_matrix.png'
confusion_matrix_normalized_path = '/content/drive/MyDrive/pedal-model/yolov8/train/confusion_matrix_normalized.png'
display(Image(filename=confusion_matrix_path, width=1000), Image(filename=confusion_matrix_normalized_path, width=1000))
confusion_matrix_path = '/content/drive/MyDrive/pedal-model/yolov8/train/results.png'
Image(filename=confusion_matrix_path, width=1000)
General Evaluation¶
model = YOLO('/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt')
Validation Dataset¶
validation_results = model.val(data='/content/drive/MyDrive/pedal-model/config.yaml')
!scp -r runs/detect/val '/content/drive/MyDrive/pedal-model/yolov8/validation'
print(f"Precision: {validation_results.box.map50:.4f}")
print(f"Recall: {validation_results.box.map75:.4f}")
print(f"mAP@0.5: {validation_results.box.map50:.4f}")
print(f"mAP@0.5:0.95: {validation_results.box.map:.4f}")
for i, class_name in enumerate(class_names):
print(f"AP for {class_name}: {validation_results.box.ap[i]:.4f}")
Precision: 0.9592 Recall: 0.9501 mAP@0.5: 0.9592 mAP@0.5:0.95: 0.8945 AP for UltraViolet Vintage Vibe: 0.9021 AP for Brig dBucket Delay: 0.9161 AP for Cloudburst Ambient Reverb: 0.8880 AP for BigSky Multi Reverb: 0.8386 AP for TimeLine Multi Delay: 0.8604 AP for Mobius Multi Modulation: 0.9444 AP for Iridium Amp Modeler And Cab: 0.9069 AP for Compadre Compressor & Boost: 0.9155 AP for NightSky Experimental Reverb: 0.7696 AP for Volante Magnetic Tape Delay: 0.9576 AP for Zelzah Phaser & Modulation: 0.8717 AP for Sunset Dual Overdrive: 0.9107 AP for Riverside Drive & Distortion: 0.9197 AP for blueSky V2 Reverb: 0.8279 AP for Deco V2 Tape Saturation & Doubletracker: 0.8705 AP for DIG V2 Dual Digital Delay: 0.8948 AP for El Capistan V2 Tape Delay: 0.9211 AP for Flint V2 Tremolo & Reverb: 0.8853 AP for Lex V2 Rotary Modulation: 0.9287 AP for Ola Chorus & Vibrato: 0.9230 AP for Orbit Flanger: 0.9330
ap_metrics = ['AP', 'AP50', 'AP75']
ap_values = [
validation_results.box.map,
validation_results.box.map50,
validation_results.box.map75
]
ax = sns.barplot(x=ap_metrics, y=ap_values)
ax.set_title('YOLOv8 Evaluation Metrics (Validation)')
ax.set_xlabel('Metric')
ax.set_ylabel('Value')
fig = plt.gcf()
fig.set_size_inches(10, 6)
for p in ax.patches:
ax.annotate('{:.3f}'.format(p.get_height()), (p.get_x() + p.get_width() / 2, p.get_height()), ha='center', va='bottom')
plt.show()
val_image_path = '/content/drive/MyDrive/pedal-model/images/validation'
val_results = model.predict(source=val_image_path, save=True, conf=0.25)
!scp -r runs/detect/predict '/content/drive/MyDrive/pedal-model/yolov8/validation'
val_image_path = '/content/drive/MyDrive/pedal-model/images/validation'
val_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/validation_coco_annotations.json'
with open(val_annotations_path) as f:
annotations = json.load(f)
image_id_to_file = {image['id']: image['file_name'] for image in annotations['images']}
all_true_labels = []
all_pred_labels = []
for result in val_results:
image_file = os.path.basename(result.path)
image_id = None
for img_id, file_name in image_id_to_file.items():
if file_name == image_file:
image_id = img_id
break
if image_id is None:
# print(f"No matching annotation found for the image: {image_file}")
continue
true_labels = [ann['category_id'] - 1 for ann in annotations['annotations'] if ann['image_id'] == image_id]
pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
min_len = min(len(true_labels), len(pred_labels))
true_labels = true_labels[:min_len]
pred_labels = pred_labels[:min_len]
all_true_labels.extend(true_labels)
all_pred_labels.extend(pred_labels)
print("Validation Set Metrics")
print(classification_report(all_true_labels, all_pred_labels, target_names=class_names))
average_precisions_val = {}
for i, class_name in enumerate(class_names):
y_true = [1 if label == i else 0 for label in all_true_labels]
y_scores = [1 if label == i else 0 for label in all_pred_labels]
average_precisions_val[class_name] = average_precision_score(y_true, y_scores)
mAP_val = sum(average_precisions_val.values()) / len(average_precisions_val)
print(f"Mean Average Precision (mAP) for Validation Set: {mAP_val}")
cm_val = confusion_matrix(all_true_labels, all_pred_labels)
plt.figure(figsize=(10, 8))
sns.heatmap(cm_val, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix for Validation Set')
plt.show()
for i, class_name in enumerate(class_names):
y_true = [1 if label == i else 0 for label in all_true_labels]
y_scores = [1 if label == i else 0 for label in all_pred_labels]
precision, recall, _ = precision_recall_curve(y_true, y_scores)
plt.plot(recall, precision, label=class_name)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curves for Validation Set')
plt.legend()
plt.show()
<ipython-input-46-1b349e67fecd>:27: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
Validation Set Metrics precision recall f1-score support UltraViolet Vintage Vibe 0.95 1.00 0.97 36 Brig dBucket Delay 0.90 0.93 0.92 29 Cloudburst Ambient Reverb 0.91 0.87 0.89 45 BigSky Multi Reverb 0.71 0.76 0.74 42 TimeLine Multi Delay 0.85 0.92 0.88 62 Mobius Multi Modulation 0.86 0.90 0.88 40 Iridium Amp Modeler And Cab 0.87 0.91 0.89 43 Compadre Compressor & Boost 0.85 1.00 0.92 34 NightSky Experimental Reverb 1.00 0.75 0.86 8 Volante Magnetic Tape Delay 0.89 0.86 0.87 28 Zelzah Phaser & Modulation 1.00 0.89 0.94 18 Sunset Dual Overdrive 0.85 0.80 0.83 56 Riverside Drive & Distortion 0.86 0.88 0.87 48 blueSky V2 Reverb 0.71 0.62 0.66 39 Deco V2 Tape Saturation & Doubletracker 0.80 0.80 0.80 50 DIG V2 Dual Digital Delay 0.84 0.86 0.85 37 El Capistan V2 Tape Delay 0.85 0.79 0.81 42 Flint V2 Tremolo & Reverb 0.87 0.80 0.84 41 Lex V2 Rotary Modulation 0.79 0.71 0.75 21 Ola Chorus & Vibrato 0.93 0.95 0.94 41 Orbit Flanger 0.96 0.92 0.94 26 accuracy 0.86 786 macro avg 0.87 0.85 0.86 786 weighted avg 0.86 0.86 0.86 786 Mean Average Precision (mAP) for Validation Set: 0.7514018265124781
Test Dataset¶
test_image_path = '/content/drive/MyDrive/pedal-model/images/test'
test_results = model.predict(source=test_image_path, save=True, conf=0.25)
!scp -r runs/detect/predict '/content/drive/MyDrive/pedal-model/yolov8/test'
test_image_path = '/content/drive/MyDrive/pedal-model/images/test'
test_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/test_coco_annotations.json'
with open(test_annotations_path) as f:
annotations = json.load(f)
image_id_to_file = {image['id']: image['file_name'] for image in annotations['images']}
all_true_labels = []
all_pred_labels = []
for result in test_results:
image_file = os.path.basename(result.path)
image_id = None
for img_id, file_name in image_id_to_file.items():
if file_name == image_file:
image_id = img_id
break
if image_id is None:
# print(f"No matching annotation found for the image: {image_file}")
continue
true_labels = [ann['category_id'] - 1 for ann in annotations['annotations'] if ann['image_id'] == image_id]
pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
min_len = min(len(true_labels), len(pred_labels))
true_labels = true_labels[:min_len]
pred_labels = pred_labels[:min_len]
all_true_labels.extend(true_labels)
all_pred_labels.extend(pred_labels)
print("Test Set Metrics")
print(classification_report(all_true_labels, all_pred_labels, target_names=class_names))
average_precisions_test = {}
for i, class_name in enumerate(class_names):
y_true = [1 if label == i else 0 for label in all_true_labels]
y_scores = [1 if label == i else 0 for label in all_pred_labels]
average_precisions_test[class_name] = average_precision_score(y_true, y_scores)
mAP_val = sum(average_precisions_test.values()) / len(average_precisions_test)
print(f"Mean Average Precision (mAP) for Test Set: {mAP_val}")
cm_val = confusion_matrix(all_true_labels, all_pred_labels)
plt.figure(figsize=(10, 8))
sns.heatmap(cm_val, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix for Test Set')
plt.show()
for i, class_name in enumerate(class_names):
y_true = [1 if label == i else 0 for label in all_true_labels]
y_scores = [1 if label == i else 0 for label in all_pred_labels]
precision, recall, _ = precision_recall_curve(y_true, y_scores)
plt.plot(recall, precision, label=class_name)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curves for Test Set')
plt.legend()
plt.show()
<ipython-input-47-c360bbc993d9>:27: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
Test Set Metrics precision recall f1-score support UltraViolet Vintage Vibe 0.90 1.00 0.95 18 Brig dBucket Delay 0.93 0.93 0.93 15 Cloudburst Ambient Reverb 0.96 1.00 0.98 23 BigSky Multi Reverb 0.89 0.81 0.85 21 TimeLine Multi Delay 0.91 0.91 0.91 34 Mobius Multi Modulation 0.79 0.88 0.83 25 Iridium Amp Modeler And Cab 0.85 0.74 0.79 23 Compadre Compressor & Boost 0.82 0.90 0.86 20 NightSky Experimental Reverb 0.71 0.83 0.77 6 Volante Magnetic Tape Delay 0.75 0.75 0.75 16 Zelzah Phaser & Modulation 1.00 0.93 0.97 15 Sunset Dual Overdrive 0.88 0.84 0.86 25 Riverside Drive & Distortion 0.96 0.93 0.94 27 blueSky V2 Reverb 0.88 0.88 0.88 16 Deco V2 Tape Saturation & Doubletracker 0.86 0.78 0.82 23 DIG V2 Dual Digital Delay 0.90 0.86 0.88 22 El Capistan V2 Tape Delay 0.70 0.80 0.74 20 Flint V2 Tremolo & Reverb 0.83 0.79 0.81 19 Lex V2 Rotary Modulation 0.88 0.78 0.82 9 Ola Chorus & Vibrato 0.86 0.90 0.88 21 Orbit Flanger 1.00 1.00 1.00 14 accuracy 0.87 412 macro avg 0.87 0.87 0.87 412 weighted avg 0.87 0.87 0.87 412 Mean Average Precision (mAP) for Test Set: 0.7657297057588388
Evaluation on Unlabeled Online Images¶
weights_path_50_epochs = '/content/drive/MyDrive/pedal-model/train/weights/best.pt'
weights_path_100_epochs = '/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt'
model = YOLO(weights_path_100_epochs)
test_folder = '/content/drive/MyDrive/pedal-model/examples/reddit'
confidence_threshold = 0.4
for file_name in os.listdir(test_folder):
if file_name.endswith(('.jpg', '.jpeg', '.png')):
image_path = os.path.join(test_folder, file_name)
img = Image.open(image_path)
img = img.convert("RGB")
results = model.predict(img)
draw = ImageDraw.Draw(img)
formatted_results = "Detection Results\n"
formatted_results += f"Image: {file_name}\n"
formatted_results += "---\n"
if results[0].boxes is not None:
for box in results[0].boxes:
confidence = box.conf.item()
if confidence < confidence_threshold:
continue
xyxy = box.xyxy[0].tolist()
class_id = int(box.cls.item())
class_name = model.names[class_id]
formatted_results += f"Class ID {class_id}\n"
formatted_results += f"Confidence: {confidence:.2f}\n"
formatted_results += f"Class Name: {class_name}\n"
formatted_results += f"Box Coordinates: {xyxy}\n"
formatted_results += "---\n"
draw.rectangle(xyxy, outline="red", width=1)
draw.text((xyxy[0], xyxy[1]), f"{class_name} ({confidence:.2f})", fill="red")
print(formatted_results)
new_height = 400
original_width, original_height = img.size
aspect_ratio = original_width / original_height
new_width = int(new_height * aspect_ratio)
resized_img = img.resize((new_width, new_height))
display(resized_img)
# output_path = os.path.join(test_folder, f'{os.path.splitext(file_name)[0]}_bounding_box.jpg')
# img.save(output_path)
0: 384x640 1 BigSky Multi Reverb, 1 Mobius Multi Modulation, 1 Riverside Drive & Distortion, 1 El Capistan V2 Tape Delay, 8.1ms Speed: 2.8ms preprocess, 8.1ms inference, 2.0ms postprocess per image at shape (1, 3, 384, 640) Detection Results Image: reddit1.jpeg --- Class ID 3 Confidence: 0.97 Class Name: BigSky Multi Reverb Box Coordinates: [292.5159912109375, 81.20706176757812, 575.243408203125, 281.5001525878906] --- Class ID 5 Confidence: 0.91 Class Name: Mobius Multi Modulation Box Coordinates: [598.2708129882812, 350.1661376953125, 878.6636962890625, 542.2813720703125] --- Class ID 12 Confidence: 0.66 Class Name: Riverside Drive & Distortion Box Coordinates: [885.5018920898438, 339.68890380859375, 1060.32275390625, 522.9464721679688] --- Class ID 16 Confidence: 0.48 Class Name: El Capistan V2 Tape Delay Box Coordinates: [105.57154083251953, 77.17510223388672, 293.5381164550781, 279.9639587402344] ---
0: 640x544 1 Brig dBucket Delay, 1 BigSky Multi Reverb, 2 blueSky V2 Reverbs, 1 Deco V2 Tape Saturation & Doubletracker, 10.4ms Speed: 3.1ms preprocess, 10.4ms inference, 1.7ms postprocess per image at shape (1, 3, 640, 544) Detection Results Image: reddit2.jpeg --- Class ID 13 Confidence: 0.78 Class Name: blueSky V2 Reverb Box Coordinates: [632.1692504882812, 508.9538269042969, 816.3902587890625, 707.2691650390625] --- Class ID 3 Confidence: 0.75 Class Name: BigSky Multi Reverb Box Coordinates: [626.4446411132812, 517.4884033203125, 809.6668701171875, 691.3198852539062] ---
0: 480x640 1 BigSky Multi Reverb, 3 TimeLine Multi Delays, 1 Mobius Multi Modulation, 1 Sunset Dual Overdrive, 1 DIG V2 Dual Digital Delay, 1 El Capistan V2 Tape Delay, 8.3ms Speed: 1.4ms preprocess, 8.3ms inference, 1.3ms postprocess per image at shape (1, 3, 480, 640) Detection Results Image: reddit3.jpeg --- Class ID 3 Confidence: 0.99 Class Name: BigSky Multi Reverb Box Coordinates: [12.33538818359375, 269.9579772949219, 147.18484497070312, 358.0860900878906] --- Class ID 4 Confidence: 0.99 Class Name: TimeLine Multi Delay Box Coordinates: [140.02883911132812, 269.4693603515625, 263.2460632324219, 356.1165771484375] --- Class ID 5 Confidence: 0.92 Class Name: Mobius Multi Modulation Box Coordinates: [260.2956848144531, 263.983154296875, 380.1332092285156, 346.42822265625] --- Class ID 4 Confidence: 0.69 Class Name: TimeLine Multi Delay Box Coordinates: [243.48297119140625, 186.133544921875, 357.07037353515625, 266.628173828125] ---
0: 480x640 1 Flint V2 Tremolo & Reverb, 7.8ms Speed: 1.4ms preprocess, 7.8ms inference, 1.3ms postprocess per image at shape (1, 3, 480, 640) Detection Results Image: reddit4.jpeg --- Class ID 17 Confidence: 0.97 Class Name: Flint V2 Tremolo & Reverb Box Coordinates: [42.50926208496094, 64.2927474975586, 197.7730712890625, 211.38192749023438] ---
0: 640x640 1 Iridium Amp Modeler And Cab, 1 Volante Magnetic Tape Delay, 1 Deco V2 Tape Saturation & Doubletracker, 8.4ms Speed: 2.7ms preprocess, 8.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640) Detection Results Image: reddit5.jpeg --- Class ID 9 Confidence: 0.90 Class Name: Volante Magnetic Tape Delay Box Coordinates: [38.19185256958008, 302.3080749511719, 255.0037384033203, 434.0383605957031] ---
Model Analysis¶
model = YOLO('/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt')
Analysis of Object Features Using Dimensionality Reduction and Clustering¶
In this analysis, we aim to explore the features extracted by the YOLOv8 model for different object classes. After detecting and cropping objects from images using bounding boxes derived from the YOLO annotation files, we extract the high-dimensional features using the backbone of the YOLOv8 model.
Given the high dimensionality of these features, we first apply Principal Component Analysis (PCA) to reduce the number of dimensions, making the data more manageable while retaining most of the variance. Subsequently, we use t-Distributed Stochastic Neighbor Embedding (t-SNE) for further dimensionality reduction, which is particularly effective at preserving the local structure of the data, revealing potential clusters or groupings of similar objects.
By applying K-Means clustering to these reduced features, we aim to identify distinct clusters that may correspond to different object classes. The resulting visualizations using both t-SNE and PCA will allow us to evaluate how well the YOLOv8 model’s feature representations align with the actual object classes and whether similar objects are grouped together, providing insights into the effectiveness of the model’s feature extraction process.
model = YOLO('/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt').model
backbone = torch.nn.Sequential(*list(model.children())[:-1])
def preprocess_image(image_path):
image = Image.open(image_path).convert('RGB')
image_resized = image.resize((640, 640))
image_np = np.array(image_resized) / 255.0
image_tensor = torch.tensor(image_np).permute(2, 0, 1).unsqueeze(0).float()
return image, image_tensor
def get_bounding_boxes_yolo(annotation_path, image_width, image_height):
bounding_boxes = []
if not os.path.exists(annotation_path):
return bounding_boxes
with open(annotation_path, 'r') as file:
lines = file.readlines()
for line in lines:
class_id, center_x, center_y, width, height = map(float, line.strip().split())
x1 = int((center_x - width / 2) * image_width)
y1 = int((center_y - height / 2) * image_height)
x2 = int((center_x + width / 2) * image_width)
y2 = int((center_y + height / 2) * image_height)
if x1 < 0: x1 = 0
if y1 < 0: y1 = 0
if x2 > image_width: x2 = image_width
if y2 > image_height: y2 = image_height
if x2 > x1 and y2 > y1:
bounding_boxes.append([x1, y1, x2, y2])
return bounding_boxes
def extract_object_features(model, backbone, image_paths, annotation_dir):
model.eval()
backbone.eval()
all_features = []
with torch.no_grad():
for path in image_paths:
image_name = os.path.basename(path)
annotation_path = os.path.join(annotation_dir, image_name.replace('.jpg', '.txt'))
if not os.path.exists(annotation_path):
continue
original_image, _ = preprocess_image(path)
image_width, image_height = original_image.size
bounding_boxes = get_bounding_boxes_yolo(annotation_path, image_width, image_height)
if not bounding_boxes:
continue
for bbox in bounding_boxes:
x1, y1, x2, y2 = bbox
cropped_img = original_image.crop((x1, y1, x2, y2))
cropped_img = cropped_img.resize((640, 640))
cropped_img_np = np.array(cropped_img) / 255.0
cropped_img_tensor = torch.tensor(cropped_img_np).permute(2, 0, 1).unsqueeze(0).float()
features = backbone(cropped_img_tensor).squeeze().cpu().numpy()
all_features.append(features)
all_features = np.array(all_features)
return all_features.reshape(all_features.shape[0], -1)
def get_image_paths(base_dir):
image_paths = []
for root, dirs, files in os.walk(base_dir):
for file in files:
if file.endswith(".jpg"):
image_paths.append(os.path.join(root, file))
return image_paths
image_paths = get_image_paths('/content/drive/MyDrive/pedal-model/images/test/')
annotation_dir = '/content/drive/MyDrive/pedal-model/labels/test/'
features = extract_object_features(model, backbone, image_paths, annotation_dir)
if features.size > 0:
pca = PCA(n_components=50)
reduced_features_pca = pca.fit_transform(features)
tsne = TSNE(n_components=2, perplexity=30, n_iter=300, random_state=42)
reduced_features_tsne = tsne.fit_transform(reduced_features_pca)
kmeans = KMeans(n_clusters=21, n_init=10)
clusters_tsne = kmeans.fit_predict(reduced_features_tsne)
plt.figure(figsize=(10, 7))
plt.scatter(reduced_features_tsne[:, 0], reduced_features_tsne[:, 1], c=clusters_tsne, cmap='viridis')
plt.title('t-SNE Clustering of Annotated Object Features')
plt.xlabel('t-SNE Component 1')
plt.ylabel('t-SNE Component 2')
plt.colorbar()
plt.show()
pca = PCA(n_components=2)
reduced_features_pca_2d = pca.fit_transform(features)
kmeans_pca = KMeans(n_clusters=21, n_init=10)
clusters_pca = kmeans_pca.fit_predict(reduced_features_pca_2d)
plt.figure(figsize=(10, 7))
plt.scatter(reduced_features_pca_2d[:, 0], reduced_features_pca_2d[:, 1], c=clusters_pca, cmap='plasma')
plt.title('PCA Clustering of Annotated Object Features')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.colorbar()
plt.show()
In this analysis, we set the number of clusters k = 21 equal to the number of object classes in our dataset. The rationale behind this choice was to investigate whether the features extracted by the YOLOv8 model are distinct enough to separate the objects into their respective classes.
The resulting clusters from the t-SNE and PCA visualizations show how well the model's feature representations align with the true class labels. Ideally, if the model has learned highly discriminative features, each cluster would predominantly contain objects from a single class, reflecting the effectiveness of the model in distinguishing between different classes.
If the clusters align well with the object classes, this indicates that the YOLOv8 model's backbone is effectively capturing the unique characteristics of each class. On the other hand, if there is significant overlap between clusters or if objects from different classes are grouped together, this could suggest that the features are not fully separable and that further fine-tuning or additional data might be required to improve the model's discriminative power.
This analysis not only validates the performance of the model but also provides insights into areas where the model may be improved, particularly in enhancing the feature extraction process to better distinguish between similar objects.
Visualizing the model's decision-making process¶
In this implementation, we utilize Grad-CAM (Gradient-weighted Class Activation Mapping) to interpret and visualize the YOLOv8 model's decision-making process. Grad-CAM is a technique designed to provide visual explanations for convolutional neural networks (CNNs) by highlighting the regions of an input image that are most relevant to the model's prediction.
Grad-CAM works by calculating the gradients of the target class scores with respect to the feature maps of the last convolutional layer in the network. These gradients indicate the importance of each feature map for the target prediction. Grad-CAM then computes a weighted sum of these feature maps, where the weights are derived from the gradients. The resulting heatmap is then upsampled to match the input image dimensions, showing the regions that contribute most to the class prediction.
YOLOv8 is an object detection model that outputs bounding boxes and class probabilities for multiple objects in an image. In the context of YOLOv8, Grad-CAM can be used to identify which parts of the image the model focuses on when detecting a specific object. By applying Grad-CAM to the output of the YOLOv8 model, we can generate a heatmap that highlights the regions of the image that are most important for the detection of a particular class.
By incorporating Grad-CAM into our analysis, we gain valuable insights into the YOLOv8 model's inner workings, allowing us to interpret its predictions more effectively and ensure that the model is making decisions based on relevant features in the input data.
In this example, we analyze the YOLOv8 model's detection of the BigSky Multi Reverb pedal, which is located in the bottom left corner of the image. The Grad-CAM visualizations across different layers of the model reveal how the model processes the image and identifies the pedal at various stages of its architecture.
Early Layers (Layers 0 and 1): The first few layers, such as Layer 0 and Layer 1, are primarily responsible for detecting basic features like edges, textures, and simple patterns. These layers produce broader and more diffuse heatmaps, indicating that the model is capturing general information about the image. The heatmaps at these stages are less focused on specific objects and more on the overall structure, providing foundational details that are further refined in subsequent layers.
Middle and Late Layers: As we move deeper into the network, the Grad-CAM heatmaps become more focused and concentrated around the objects of interest. These layers (e.g., Layers 8, 12, 16, etc.) capture more complex and abstract features, such as parts of objects or specific patterns that are crucial for the model's final decision. The heatmaps generated in these layers often highlight the most salient parts of the object being detected, showing the model's increased confidence and specificity.
Final Layers: In the last few layers, the model’s focus becomes very sharp, often zeroing in on the exact location of the object. These layers are critical for the final object detection and classification, as they aggregate the high-level features detected in previous layers and use them to make precise predictions.
The progression from broad, general feature detection in the early layers to highly specific, detailed focus in the later layers illustrates the model's ability to understand and interpret the image in a hierarchical manner. This layered approach allows the YOLOv8 model to effectively balance between understanding the overall context of the image and identifying specific objects within it.
import torch
import cv2
import numpy as np
import matplotlib.pyplot as plt
from ultralytics import YOLO
from PIL import Image
class GradCam:
def __init__(self, model, target_layer):
self.model = model
self.target_layer = target_layer
self.gradients = None
self.activations = None
self.hook_layers()
def hook_layers(self):
def forward_hook(module, input, output):
if isinstance(output, tuple):
self.activations = output[0]
else:
self.activations = output
def backward_hook(module, grad_in, grad_out):
self.gradients = grad_out[0]
self.target_layer.register_forward_hook(forward_hook)
self.target_layer.register_backward_hook(backward_hook)
def generate_cam(self, input_tensor, class_idx):
input_tensor.requires_grad_()
outputs = self.model(input_tensor)[0]
class_outputs = outputs[:, 5:]
target = class_outputs[:, class_idx].mean()
self.model.zero_grad()
target.backward(retain_graph=True)
gradients = self.gradients.cpu().data.numpy()
activations = self.activations.cpu().data.numpy()[0]
weights = np.mean(gradients, axis=(2, 3))[0]
cam = np.zeros(activations.shape[1:], dtype=np.float32)
for i, w in enumerate(weights):
cam += w * activations[i, :, :]
cam = np.maximum(cam, 0)
cam = cv2.resize(cam, (input_tensor.shape[2], input_tensor.shape[3]))
cam = cam - np.min(cam)
cam = cam / np.max(cam)
return cam
def preprocess_image(image_path):
image = Image.open(image_path).convert('RGB')
image_resized = image.resize((640, 640))
image_np = np.array(image_resized) / 255.0
image_tensor = torch.tensor(image_np).permute(2, 0, 1).unsqueeze(0).float()
return image_tensor, image_resized
def visualize_gradcams(model, image_class_pairs, class_names, layers_to_visualize):
for image_path, class_indices in image_class_pairs:
input_tensor, original_image = preprocess_image(image_path)
num_layers = len(layers_to_visualize)
num_images_per_row = 3
num_rows = (num_layers // num_images_per_row) + (num_layers % num_images_per_row > 0)
fig, axes = plt.subplots(num_rows, num_images_per_row, figsize=(15, 5 * num_rows))
if num_rows == 1:
axes = np.expand_dims(axes, 0)
for row_idx, class_idx in enumerate(class_indices):
class_name = class_names[class_idx]
for i, layer_idx in enumerate(layers_to_visualize):
target_layer = model.model[layer_idx]
grad_cam = GradCam(model=model, target_layer=target_layer)
input_tensor.requires_grad_()
cam = grad_cam.generate_cam(input_tensor, class_idx)
row, col = divmod(i, num_images_per_row)
axes[row, col].imshow(original_image)
axes[row, col].imshow(cam, cmap='jet', alpha=0.5)
axes[row, col].set_title(f"Grad-CAM: Layer {layer_idx}")
axes[row, col].axis('off')
plt.tight_layout()
plt.show()
model = YOLO('/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt').model
image_class_pairs = [
('/content/drive/MyDrive/pedal-model/examples/reddit/reddit3.jpeg', [3]),
]
layers_to_visualize = list(range(22))
visualize_gradcams(model, image_class_pairs, class_names, layers_to_visualize)
Model Comparison¶
Validation Dataset¶
In this section, we will compare Detectron2 and YOLOv8 to evaluate their performance in identifying guitar pedals. The comparison will focus on the accuracy and overall effectiveness of each model, particularly in terms of precision in detection and their ability to manage the complexities of guitar pedal images. Although speed is a crucial factor in real-world applications, it was not measured in this analysis. Nonetheless, understanding the trade-offs between accuracy and speed is essential for selecting the most appropriate model for specific tasks.
model = YOLO('/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt')
yolo_validation_results = model.val(data='/content/drive/MyDrive/pedal-model/config.yaml')
yolo_metrics = {
'mAP@0.5': yolo_validation_results.box.map50,
'mAP@0.75': yolo_validation_results.box.map75,
'mAP@[0.5:0.95]': yolo_validation_results.box.map
}
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
detectron_evaluator = COCOEvaluator("validation_dataset", output_dir="/content/drive/MyDrive/pedal-model/Detectron2/output")
detectron_val_loader = build_detection_test_loader(cfg, "validation_dataset")
detectron_validation_results = inference_on_dataset(predictor.model, detectron_val_loader, detectron_evaluator)
# Extract Detectron2 metrics
detectron_metrics = {
'mAP@0.5': detectron_validation_results['bbox']['AP50'],
'mAP@0.75': detectron_validation_results['bbox']['AP75'],
'mAP@[0.5:0.95]': detectron_validation_results['bbox']['AP']
}
import matplotlib.pyplot as plt
import numpy as np
# Normalize YOLO values to be on the same scale as Detectron2 (0-100)
normalized_yolo_values = [value * 100 for value in yolo_values]
metrics = ['mAP@0.5', 'mAP@0.75', 'mAP@[0.5:0.95]']
x = np.arange(len(metrics))
width = 0.35
fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width/2, normalized_yolo_values, width, label='YOLOv8')
rects2 = ax.bar(x + width/2, detectron_values, width, label='Detectron2')
ax.set_xlabel('Metrics')
ax.set_title('Model Performance Comparison')
ax.set_xticks(x)
ax.set_xticklabels(metrics)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
fig.tight_layout()
plt.show()
Test Dataset¶
test_image_path = '/content/drive/MyDrive/pedal-model/images/test'
test_results = model.predict(source=test_image_path, save=True, conf=0.25)
test_evaluator = COCOEvaluator("test_dataset", output_dir="/content/drive/MyDrive/pedal-model/Detectron2/output")
test_loader = build_detection_test_loader(cfg, "test_dataset")
coco_test_inference = inference_on_dataset(predictor.model, test_loader, test_evaluator)
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import json
from collections import defaultdict
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_curve, average_precision_score
class_names = [
"UltraViolet Vintage Vibe", "Brig dBucket Delay", "Cloudburst Ambient Reverb",
"BigSky Multi Reverb", "TimeLine Multi Delay", "Mobius Multi Modulation",
"Iridium Amp Modeler And Cab", "Compadre Compressor & Boost", "NightSky Experimental Reverb",
"Volante Magnetic Tape Delay", "Zelzah Phaser & Modulation", "Sunset Dual Overdrive",
"Riverside Drive & Distortion", "blueSky V2 Reverb", "Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay", "El Capistan V2 Tape Delay", "Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation", "Ola Chorus & Vibrato", "Orbit Flanger"
]
def evaluate_yolo(test_results, annotations, class_names):
image_id_to_file = {image['id']: image['file_name'] for image in annotations['images']}
all_true_labels = []
all_pred_labels = []
for result in test_results:
image_file = os.path.basename(result.path)
image_id = None
for img_id, file_name in image_id_to_file.items():
if file_name == image_file:
image_id = img_id
break
if image_id is None:
continue
true_labels = [ann['category_id'] - 1 for ann in annotations['annotations'] if ann['image_id'] == image_id]
pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
min_len = min(len(true_labels), len(pred_labels))
true_labels = true_labels[:min_len]
pred_labels = pred_labels[:min_len]
all_true_labels.extend(true_labels)
all_pred_labels.extend(pred_labels)
average_precisions_test = {}
for i, class_name in enumerate(class_names):
y_true = [1 if label == i else 0 for label in all_true_labels]
y_scores = [1 if label == i else 0 for label in all_pred_labels]
average_precisions_test[class_name] = average_precision_score(y_true, y_scores)
mAP_test = sum(average_precisions_test.values()) / len(average_precisions_test)
cm_test = confusion_matrix(all_true_labels, all_pred_labels)
return cm_test, average_precisions_test, mAP_test
def evaluate_detectron(test_evaluator, class_names):
ap_metrics = ['AP', 'AP50', 'AP75', 'APs', 'APm', 'APl']
ap_values = [coco_test_inference['bbox'][metric] for metric in ap_metrics]
def extract_labels_from_coco_evaluator(evaluator):
with open(os.path.join(evaluator._output_dir, "coco_instances_results.json")) as f:
results = json.load(f)
true_labels = {}
pred_labels = {}
for res in results:
image_id = res['image_id']
if image_id not in true_labels:
true_labels[image_id] = []
pred_labels[image_id] = []
true_labels[image_id].append(res['category_id'])
pred_labels[image_id].append(res['category_id'])
return true_labels, pred_labels
true_labels, pred_labels = extract_labels_from_coco_evaluator(test_evaluator)
true_labels_flat = [label for labels in true_labels.values() for label in labels]
pred_labels_flat = [label for labels in pred_labels.values() for label in labels]
cm = confusion_matrix(true_labels_flat, pred_labels_flat)
return cm, ap_values
test_annotations_path = '/content/drive/MyDrive/pedal-model/Detectron2/test_coco_annotations.json'
with open(test_annotations_path) as f:
annotations = json.load(f)
cm_yolo, ap_yolo, map_yolo = evaluate_yolo(test_results, annotations, class_names)
cm_detectron, ap_detectron = evaluate_detectron(test_evaluator, class_names)
fig, axes = plt.subplots(1, 2, figsize=(20, 10))
sns.heatmap(cm_yolo, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names, ax=axes[0])
axes[0].set_title('YOLOv8 - Confusion Matrix for Test Set')
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('True')
sns.heatmap(cm_detectron, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names, ax=axes[1])
axes[1].set_title('Detectron2 - Confusion Matrix for Test Set')
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('True')
plt.show()
metrics = ['mAP@0.5']
yolo_values = [map_yolo * 100]
detectron_values = [ap_detectron[1]]
x = np.arange(len(metrics))
width = 0.35
fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width/2, yolo_values, width, label='YOLOv8')
rects2 = ax.bar(x + width/2, detectron_values, width, label='Detectron2')
ax.set_xlabel('Metrics')
ax.set_title('Model Performance Comparison')
ax.set_xticks(x)
ax.set_xticklabels(metrics)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
fig.tight_layout()
plt.show()
<ipython-input-41-52a688761a61>:37: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.) pred_labels = [int(box.cls.cpu().numpy()) for box in result.boxes]
Evaluation on Unlabeled Online Images¶
test_images_folder = "/content/drive/MyDrive/pedal-model/examples/reddit"
image_paths = [os.path.join(test_images_folder, img) for img in os.listdir(test_images_folder) if img.endswith(('jpg', 'jpeg', 'png'))]
model_class_names = [
"UltraViolet Vintage Vibe",
"Brig dBucket Delay",
"Cloudburst Ambient Reverb",
"BigSky Multi Reverb",
"TimeLine Multi Delay",
"Mobius Multi Modulation",
"Iridium Amp Modeler And Cab",
"Compadre Compressor & Boost",
"NightSky Experimental Reverb",
"Volante Magnetic Tape Delay",
"Zelzah Phaser & Modulation",
"Sunset Dual Overdrive",
"Riverside Drive & Distortion",
"blueSky V2 Reverb",
"Deco V2 Tape Saturation & Doubletracker",
"DIG V2 Dual Digital Delay",
"El Capistan V2 Tape Delay",
"Flint V2 Tremolo & Reverb",
"Lex V2 Rotary Modulation",
"Ola Chorus & Vibrato",
"Orbit Flanger"
]
weights_path_50_epochs = '/content/drive/MyDrive/pedal-model/train/weights/best.pt'
weights_path_100_epochs = '/content/drive/MyDrive/pedal-model/yolov8/train/weights/best.pt'
yolo_model = YOLO(weights_path_100_epochs)
confidence_threshold = 0.4
for img_path in image_paths:
im = cv2.imread(img_path)
if im is None:
print(f"Error loading image: {img_path}")
continue
# Detectron2 Predictions
outputs = predictor(im)
instances = outputs["instances"].to("cpu")
pred_classes = instances.pred_classes.tolist()
detectron_class_names = [model_class_names[i] for i in pred_classes]
detectron_results = "Detectron2 Detection Results\n"
detectron_results += f"Image: {img_path}\n"
detectron_results += "---\n"
if instances.has("pred_boxes"):
for i in range(len(instances)):
class_id = pred_classes[i]
class_name = model_class_names[class_id]
confidence = instances.scores[i].item()
box = instances.pred_boxes[i].tensor.numpy().tolist()[0]
detectron_results += f"Class Name: {class_name}\n"
detectron_results += f"Confidence: {confidence:.2f}\n"
detectron_results += f"Box Coordinates: {box}\n"
detectron_results += "---\n"
print(detectron_results)
v = Visualizer(im[:, :, ::-1], metadata=metadata, scale=0.5, instance_mode=ColorMode.IMAGE_BW)
out_detectron = v.draw_instance_predictions(instances)
# YOLO Predictions
img = Image.open(img_path)
img = img.convert("RGB")
results = yolo_model.predict(img)
draw = ImageDraw.Draw(img)
yolo_results = "YOLOv8 Detection Results\n"
yolo_results += f"Image: {os.path.basename(img_path)}\n"
yolo_results += "---\n"
if results[0].boxes is not None:
for box in results[0].boxes:
confidence = box.conf.item()
if confidence < confidence_threshold:
continue
xyxy = box.xyxy[0].tolist()
class_id = int(box.cls.item())
class_name = yolo_model.names[class_id]
yolo_results += f"Class Name: {class_name}\n"
yolo_results += f"Confidence: {confidence:.2f}\n"
yolo_results += f"Box Coordinates: {xyxy}\n"
yolo_results += "---\n"
draw.rectangle(xyxy, outline="red", width=1)
draw.text((xyxy[0], xyxy[1]), f"{class_name} ({confidence:.2f})", fill="red")
print(yolo_results)
new_height = 400
original_width, original_height = img.size
aspect_ratio = original_width / original_height
new_width = int(new_height * aspect_ratio)
resized_img = img.resize((new_width, new_height))
fig, axes = plt.subplots(1, 2, figsize=(20, 10))
axes[0].imshow(out_detectron.get_image()[:, :, ::-1])
axes[0].set_title('Detectron2')
axes[0].axis('off')
axes[1].imshow(resized_img)
axes[1].set_title('YOLOv8')
axes[1].axis('off')
plt.show()
Detectron2 Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit1.jpeg --- Class Name: Mobius Multi Modulation Confidence: 0.96 Box Coordinates: [601.0228881835938, 348.0437927246094, 874.80859375, 547.9163208007812] --- Class Name: BigSky Multi Reverb Confidence: 0.94 Box Coordinates: [291.76092529296875, 77.6230239868164, 577.240478515625, 286.0150451660156] --- Class Name: Iridium Amp Modeler And Cab Confidence: 0.80 Box Coordinates: [104.19615936279297, 84.00297546386719, 290.5064392089844, 274.3720703125] --- Class Name: Riverside Drive & Distortion Confidence: 0.78 Box Coordinates: [882.3353881835938, 343.4981689453125, 1043.64990234375, 526.3865966796875] --- Class Name: Deco V2 Tape Saturation & Doubletracker Confidence: 0.76 Box Coordinates: [355.34124755859375, 359.4776611328125, 595.5956420898438, 551.3861694335938] --- 0: 384x640 1 BigSky Multi Reverb, 1 Mobius Multi Modulation, 1 Riverside Drive & Distortion, 1 El Capistan V2 Tape Delay, 7.7ms Speed: 2.1ms preprocess, 7.7ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640) YOLOv8 Detection Results Image: reddit1.jpeg --- Class Name: BigSky Multi Reverb Confidence: 0.97 Box Coordinates: [292.527587890625, 81.20374298095703, 575.2493896484375, 281.49517822265625] --- Class Name: Mobius Multi Modulation Confidence: 0.91 Box Coordinates: [598.27197265625, 350.1677551269531, 878.662841796875, 542.2828979492188] --- Class Name: Riverside Drive & Distortion Confidence: 0.66 Box Coordinates: [885.5093383789062, 339.6796875, 1060.3341064453125, 522.9439697265625] --- Class Name: El Capistan V2 Tape Delay Confidence: 0.48 Box Coordinates: [105.58024597167969, 77.16706085205078, 293.5355529785156, 279.9627990722656] ---
Detectron2 Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit2.jpeg --- Class Name: TimeLine Multi Delay Confidence: 0.95 Box Coordinates: [183.55113220214844, 209.49179077148438, 589.1635131835938, 605.0616455078125] --- 0: 640x544 1 Brig dBucket Delay, 1 BigSky Multi Reverb, 2 blueSky V2 Reverbs, 1 Deco V2 Tape Saturation & Doubletracker, 7.6ms Speed: 3.0ms preprocess, 7.6ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 544) YOLOv8 Detection Results Image: reddit2.jpeg --- Class Name: blueSky V2 Reverb Confidence: 0.78 Box Coordinates: [632.1800537109375, 508.9481506347656, 816.4094848632812, 707.26513671875] --- Class Name: BigSky Multi Reverb Confidence: 0.75 Box Coordinates: [626.4519653320312, 517.4832763671875, 809.6593017578125, 691.3115234375] ---
Detectron2 Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit3.jpeg --- Class Name: BigSky Multi Reverb Confidence: 0.99 Box Coordinates: [18.17702293395996, 269.9841613769531, 147.6508331298828, 361.2760009765625] --- Class Name: TimeLine Multi Delay Confidence: 0.98 Box Coordinates: [138.09625244140625, 266.88397216796875, 259.9827880859375, 353.48394775390625] --- Class Name: Mobius Multi Modulation Confidence: 0.87 Box Coordinates: [260.41436767578125, 267.818603515625, 379.6730041503906, 348.82952880859375] --- 0: 480x640 1 BigSky Multi Reverb, 3 TimeLine Multi Delays, 1 Mobius Multi Modulation, 1 Sunset Dual Overdrive, 1 DIG V2 Dual Digital Delay, 1 El Capistan V2 Tape Delay, 8.9ms Speed: 1.3ms preprocess, 8.9ms inference, 1.5ms postprocess per image at shape (1, 3, 480, 640) YOLOv8 Detection Results Image: reddit3.jpeg --- Class Name: BigSky Multi Reverb Confidence: 0.99 Box Coordinates: [12.335342407226562, 269.9551696777344, 147.1851348876953, 358.0820617675781] --- Class Name: TimeLine Multi Delay Confidence: 0.99 Box Coordinates: [140.0283203125, 269.46240234375, 263.247314453125, 356.1163330078125] --- Class Name: Mobius Multi Modulation Confidence: 0.92 Box Coordinates: [260.2930908203125, 263.9822998046875, 380.1297607421875, 346.4306640625] --- Class Name: TimeLine Multi Delay Confidence: 0.69 Box Coordinates: [243.48562622070312, 186.1275634765625, 357.0733947753906, 266.6241455078125] ---
Detectron2 Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit4.jpeg --- Class Name: Flint V2 Tremolo & Reverb Confidence: 0.99 Box Coordinates: [52.06515884399414, 48.2890739440918, 201.07772827148438, 210.28248596191406] --- 0: 480x640 1 Flint V2 Tremolo & Reverb, 6.8ms Speed: 1.2ms preprocess, 6.8ms inference, 1.2ms postprocess per image at shape (1, 3, 480, 640) YOLOv8 Detection Results Image: reddit4.jpeg --- Class Name: Flint V2 Tremolo & Reverb Confidence: 0.97 Box Coordinates: [42.51051330566406, 64.28853607177734, 197.7701873779297, 211.37509155273438] ---
Detectron2 Detection Results Image: /content/drive/MyDrive/pedal-model/examples/reddit/reddit5.jpeg --- Class Name: Volante Magnetic Tape Delay Confidence: 0.89 Box Coordinates: [39.66915512084961, 286.4303894042969, 263.162109375, 438.970458984375] --- 0: 640x640 1 Iridium Amp Modeler And Cab, 1 Volante Magnetic Tape Delay, 1 Deco V2 Tape Saturation & Doubletracker, 7.5ms Speed: 2.5ms preprocess, 7.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 640) YOLOv8 Detection Results Image: reddit5.jpeg --- Class Name: Volante Magnetic Tape Delay Confidence: 0.90 Box Coordinates: [38.19101333618164, 302.3064880371094, 255.00393676757812, 434.0360412597656] ---
Results and Discussion¶
The comparison between YOLOv8 and Detectron2 models for detecting guitar pedals reveals that both models perform comparably well. The mAP@0.5 values are similar for both, indicating that each model is capable of accurately identifying guitar pedals in various scenes. YOLOv8 demonstrated efficient handling of varying image sizes and faster inference times, making it suitable for real-time applications. Detectron2, while slightly slower, maintained high accuracy and robustness, especially in more complex scenes. The side-by-side visualizations confirmed these findings, with both models effectively detecting and classifying guitar pedals. This suggests that either model can be confidently used depending on specific application needs, such as real-time processing versus complex scene analysis. Future work could explore further fine-tuning and combining elements from both models to enhance performance.
Conclusion and Future Work¶
Summary¶
The goal of this project was to develop a sophisticated object detection model tailored specifically for identifying guitar pedals. Leveraging the YOLOv8 and Detectron2 frameworks, we aimed to streamline the process of cataloging and managing guitar pedals, thereby enhancing the workflow of musicians, sound engineers, and collectors. The project involved collecting a comprehensive dataset of guitar pedal images, annotating them meticulously, and fine-tuning object detection models to achieve high detection accuracy.
Self-Reflection¶
Throughout this project, I gained substantial experience in various aspects of machine learning and computer vision. Working with both YOLOv8 and Detectron2 provided me with insights into different object detection frameworks, their strengths, and their limitations. One of the most challenging parts was the data annotation process, which required meticulous attention to detail to ensure high-quality training data. Implementing and debugging the models also posed significant challenges, especially in ensuring that the model's predictions matched the ground truth accurately. However, overcoming these challenges has been immensely rewarding, and seeing the models perform well in detecting and identifying guitar pedals was gratifying.
Future Work¶
While the current model performs well, there is always room for improvement and further development. Future work could involve:
Enhanced Dataset: Expanding the dataset to include more variations of guitar pedals, different lighting conditions, and various backgrounds to make the model more robust and generalizable.
Real-Time Detection: Implementing real-time object detection in mobile applications to assist musicians and sound engineers in live settings.
Integration with Inventory Management Systems: Developing an application that integrates the object detection model with inventory management systems, allowing users to easily catalog and manage their guitar pedal collections.
User Feedback Loop: Creating a feedback loop where users can correct the model's predictions, thereby continuously improving the model's accuracy over time.
Exploring Other Models: Experimenting with other state-of-the-art object detection models and techniques to further enhance detection accuracy and efficiency.
Advanced Analysis: Incorporating more advanced image analysis techniques, such as color density and texture analysis within the detected bounding boxes, to provide additional insights into the detected objects.
By pursuing these future directions, the project can evolve into a comprehensive tool that significantly enhances the efficiency and accuracy of managing guitar pedal collections.
References and Appendix¶
Introduction¶
cvat.ai: https://www.cvat.ai/
Strymon (Pedal Company): https://www.strymon.net/
Strymon Products: https://www.strymon.net/product-category/effects-pedals/
Python Selenium Docs: https://selenium-python.readthedocs.io/
Classic Computer Vision Methods¶
Canny Edge Detection Docs: https://docs.opencv.org/4.x/da/d22/tutorial_py_canny.html
Harris Corner Detection Docs: https://docs.opencv.org/4.x/dc/d0d/tutorial_py_features_harris.html
Contours Detection Docs: https://docs.opencv.org/3.4/d4/d73/tutorial_py_contours_begin.html
Shi-Tomasi Docs: https://docs.opencv.org/4.x/d4/d8c/tutorial_py_shi_tomasi.html
ORB Docs: https://docs.opencv.org/4.x/d1/d89/tutorial_py_orb.html
SIFT Docs: https://docs.opencv.org/4.x/da/df5/tutorial_py_sift_intro.html
Deep Learning Computer Vision Methods¶
Object Detection with Detectorn2¶
Detectron2 github page (facebookresearch): https://github.com/facebookresearch/detectron2?tab=readme-ov-file
Detectron2 Collab Tutorial: https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=ZyAvNCJMmvFF
Detectron2 Docs: https://detectron2.readthedocs.io/en/latest/tutorials/getting_started.html
Detectron2 Roboflow: https://roboflow.com/model/detectron2
Object Detection with YOLOv8¶
YOLOv8 github page (ultralytics): https://github.com/ultralytics/ultralytics
YOLOv8 Docs: https://docs.ultralytics.com/
YOLOv8 Tutorial (computervisioneng): https://github.com/computervisioneng/train-yolov8-object-detector-google-drive-google-colab