Loading Now

How to Train YOLOv5 with Custom Data

How to Train YOLOv5 with Custom Data

Utilising YOLOv5 with specialised datasets customises the typical object detection model, enhancing its efficacy for particular scenarios. This may encompass security systems aimed at identifying specific vehicles, agricultural tools designed to spot crop diseases, or retail solutions that recognise distinct products. Custom-trained YOLO models consistently outperform generic versions, providing significant accuracy improvements. This guide details the entire workflow, from data preparation and training configuration to executing the training process and refining your YOLOv5 model for deployment in a production environment.

How YOLOv5 Custom Training Operates

YOLOv5 employs transfer learning to adjust pre-trained weights sourced from the COCO dataset according to your specific classes. The model is structured with a backbone (CSPDarknet53), neck (PANet), and head (YOLO detection layers) which collaborate to forecast bounding boxes and class probabilities. During custom training, the last classification layers are tailored to fit your designated number of classes, while the earlier layers maintain their capability to detect features like edges and textures.

The training method operates by passing annotated images through the network, calculating losses related to box positions, objectness scores, and class predictions, and subsequently backpropagating gradients to refine the model weights. Techniques such as mosaic augmentation, CIoU loss, and genetic algorithm hyperparameter optimisation are incorporated to enhance training efficiency and the performance of the final model.

Step-by-Step Implementation Guide

Begin by establishing your development environment with the necessary dependencies:

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Arrange your dataset structure according to YOLOv5 specifications:

custom_dataset/
├── images/
│   ├── train/
│   ├── val/
│   └── test/
└── labels/
    ├── train/
    ├── val/
    └── test/

Format your annotations in YOLO style, where each text file contains a single line per object:

# Format: class_id center_x center_y width height (normalised 0-1)
0 0.5 0.3 0.2 0.4
1 0.7 0.6 0.15 0.25

Create a configuration file for your dataset (dataset.yaml):

path: /path/to/custom_dataset
train: images/train
val: images/val
test: images/test

nc: 3  # number of classes
names: ['class1', 'class2', 'class3']

Start the training process with suitable parameters tailored to your hardware and dataset size:

python train.py --img 640 --batch 16 --epochs 100 --data dataset.yaml --weights yolov5s.pt --cache

If you’re managing larger datasets or production models, consider leveraging more robust base models and extending the training duration:

python train.py --img 1280 --batch 8 --epochs 300 --data dataset.yaml --weights yolov5x.pt --device 0,1 --multi-scale

Real-World Applications and Examples

An industrial firm effectively utilised YOLOv5 for quality assurance by training on 15,000 images of circuit boards with labelled defects. Their bespoke model achieved an impressive 94.3% [email protected], overshadowing the 31% result yielded by the pre-trained COCO model. The training setup employed YOLOv5l as the base, running for 200 epochs with substantial augmentation:

python train.py --img 832 --batch 12 --epochs 200 --data pcb_defects.yaml --weights yolov5l.pt --hyp hyp.finetune.yaml --augment

A startup in the agricultural sector trained YOLOv5 on aerial footage to identify pest-inflicted damage across 50,000 images of crop fields. They followed a multi-stage training protocol, commencing with general crop feature training for 100 epochs, followed by fine-tuning specific to pest annotations:

# Stage 1: General crop detection
python train.py --img 640 --batch 24 --epochs 100 --data crops_general.yaml --weights yolov5m.pt

# Stage 2: Pest-specific fine-tuning  
python train.py --img 640 --batch 24 --epochs 150 --data pest_damage.yaml --weights runs/train/exp/weights/best.pt --freeze 10

In security, custom training has proven enormously beneficial. A retail company trained YOLOv5 on 25,000 frames from CCTV to recognise theft-related activities, achieving real-time detection at 45 FPS on NVIDIA RTX 3070 with tailored data augmentation methods.

Comparative Performance and Model Selection

Model Parameters (M) FLOPs (G) Speed GPU (ms) [email protected] Ideal Use Case
YOLOv5n 1.9 4.5 6.3 45.7 Mobile/Edge devices
YOLOv5s 7.2 16.5 6.4 56.8 Balanced speed/accuracy
YOLOv5m 21.2 49.0 8.2 64.1 Production systems
YOLOv5l 46.5 109.1 10.1 67.3 High accuracy needs
YOLOv5x 86.7 205.7 12.1 68.9 Maximum accuracy

The size of your dataset can significantly influence the selection of an effective model:

Dataset Size Suggested Model Training Epochs Anticipated mAP Enhancement Training Time (V100)
< 1,000 images YOLOv5s 100-150 15-25% 2-4 hours
1,000-5,000 YOLOv5m 150-250 25-40% 8-12 hours
5,000-20,000 YOLOv5l 200-300 40-60% 1-2 days
> 20,000 YOLOv5x 300-500 60-80% 3-5 days

Advanced Training Techniques and Optimisations

Adopting progressive resizing can enhance both efficiency and ultimate accuracy during training. Start with lower image sizes before gradually increasing resolution:

# Phase 1: Lower resolution for initial rapid learning
python train.py --img 416 --batch 32 --epochs 50 --data dataset.yaml --weights yolov5m.pt --name phase1

# Phase 2: Medium resolution for detail refinement  
python train.py --img 640 --batch 16 --epochs 100 --data dataset.yaml --weights runs/train/phase1/weights/best.pt --name phase2

# Phase 3: Full resolution for final optimisation
python train.py --img 832 --batch 8 --epochs 50 --data dataset.yaml --weights runs/train/phase2/weights/best.pt --name final

Custom hyperparameter tuning can yield significant performance boosts. Develop a custom hyperparameter file (hyp.custom.yaml) tailored to the specifics of your dataset:

lr0: 0.01
lrf: 0.2
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 0.05
cls: 0.5
cls_pw: 1.0
obj: 1.0
obj_pw: 1.0
iou_t: 0.20
anchor_t: 4.0
fl_gamma: 0.0
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 0.0
translate: 0.1
scale: 0.9
shear: 0.0
perspective: 0.0
flipud: 0.0
fliplr: 0.5
mosaic: 1.0
mixup: 0.15

Utilize these custom hyperparameters in conjunction with genetic algorithm evolution for automatic fine-tuning:

python train.py --img 640 --batch 16 --epochs 300 --data dataset.yaml --weights yolov5m.pt --hyp hyp.custom.yaml --evolve 50

Common Challenges and Solutions

GPU memory shortages are frequently encountered during training. For CUDA out-of-memory errors, consider decreasing the batch size and using gradient accumulation:

# Decrease batch size and accumulate gradients to uphold effective batch size
python train.py --img 640 --batch 4 --epochs 100 --data dataset.yaml --weights yolov5m.pt --accumulate 4

Subpar convergence can be attributed to excessive learning rates or inadequate data augmentation. Keep track of training metrics and make necessary adjustments:

  • If the loss stabilises prematurely: Decrease the learning rate by a factor of 10 and extend the epochs.
  • If the loss fluctuates dramatically: Lower the learning rate and momentum.
  • If the validation mAP declines while training improves: Amplify the strength of data augmentation.
  • If the model struggles with small objects: Enhance input image resolution and implement multi-scale training.

Imbalanced classes can severely undermine model efficacy. Counteract this by utilizing focal loss and class-weighted sampling:

# Activate focal loss for unbalanced datasets
python train.py --img 640 --batch 16 --epochs 200 --data dataset.yaml --weights yolov5m.pt --hyp hyp.focal.yaml

Ensuring annotation quality is crucial for stabilising training. Validate your dataset with YOLOv5’s integrated tools:

# Evaluate dataset statistics and visualise annotations
python val.py --data dataset.yaml --weights yolov5s.pt --task study
python detect.py --weights yolov5s.pt --source dataset/images/val --save-txt --save-conf

Best Practices and Deployment in Production

Data preparation is vital for achieving a high-quality model. Adhere to these recommendations for optimal outcomes:

  • Ensure consistent annotation quality among all contributors by providing comprehensive guidelines.
  • Include a variety of lighting conditions, angles, and backgrounds within your training set.
  • Maintain a strict separation between validation and test datasets to prevent data leakage.
  • Compile at least 100-200 examples per class for basic functionality, with a target of over 1000 for production usage.
  • Incorporate challenging negative examples by including hard backgrounds that lack target objects.

Optimising the model for deployment involves several post-training steps. Export your trained model across multiple formats suitable for various deployment situations:

# Export to ONNX for cross-platform usage
python export.py --weights runs/train/exp/weights/best.pt --include onnx --img 640

# Export to TensorRT for enhanced NVIDIA GPU inference  
python export.py --weights runs/train/exp/weights/best.pt --include engine --img 640 --device 0

# Export to CoreML for deployment on iOS
python export.py --weights runs/train/exp/weights/best.pt --include coreml --img 640

Establish model version control and performance monitoring within production environments. Keep an eye on key metrics such as inference time, accuracy variations, and edge case failures:

import torch
from utils.general import n_max_suppression
import time

def benchmark_model(model_path, test_images, conf_threshold=0.25):
    model = torch.load(model_path)
    model.eval()
    
    inference_times = []
    with torch.no_grad():
        for img in test_images:
            start_time = time.time()
            pred = model(img)
            pred = n_max_suppression(pred, conf_threshold)
            inference_times.append(time.time() - start_time)
    
    return {
        'avg_inference_time': sum(inference_times) / len(inference_times),
        'fps': len(inference_times) / sum(inference_times),
        'model_size_mb': os.path.getsize(model_path) / (1024 * 1024)
    }

Think about incorporating A/B testing frameworks to compare model iterations in live settings. Gradually release new models to a portion of traffic first, monitor performance metrics, and expand successful enhancements.

For a thorough understanding of documentation and advanced strategies, consult the official YOLOv5 documentation and delve into the comprehensive community wiki for assistance with specific deployment scenarios.



This article includes insights and information from diverse online resources. We acknowledge and thank the original authors, publishers, and sites. While efforts have been made to credit sources accurately, any unintentional oversights do not constitute a copyright violation. All trademarks and images mentioned belong to their respective holders. If you suspect any content infringes your copyright, please reach out for prompt review.

This article is meant solely for educational purposes and does not contravene copyright owners’ rights. If any copyrighted content is used without recognition or in breach of copyright laws, it is unintentional, and we will rectify it immediately upon notification. Please note that republishing or reproducing any content in part or whole without formal permission is prohibited. For further inquiries, please contact us.