May 1, 2026

Object Detection for Cardboard Recycling

NDA CTO 2018-2020 4 min read

An edge-deployed computer vision system for real-time cardboard detection to support automated recycling workflows.

Cover for Object Detection for Cardboard Recycling

Updated: May 1, 2026

Overview

Developed a real-time computer vision system to detect cardboard materials in video streams, enabling automated sorting and classification in recycling workflows.

The system was designed for edge deployment, running locally on constrained hardware to eliminate network latency and support on-device decision-making. A lightweight object detection model was trained on a custom dataset and optimized for consistent performance under real-world variability.

Problem

Manual sorting of recyclable materials is labor-intensive, inconsistent, and difficult to scale.
Existing systems lacked reliable, real-time identification of specific materials such as cardboard at the point of capture.
Public datasets did not adequately represent the variability of cardboard (shape, wear, lighting, background clutter), requiring custom data collection.

Constraints

Small, custom-built dataset (~1-2K labeled images) with limited environmental diversity.
Real-time inference requirement on edge hardware (CPU-constrained, no guaranteed GPU).
Input stream from webcam (640x480) with variable lighting and motion.
Need for robustness to high intra-class variation (flattened boxes, torn pieces, partial visibility).

Approach

Framed the problem as an object detection task, identifying bounding boxes around cardboard objects in each frame.

Used transfer learning with a pre-trained YOLOv5s model (PyTorch) to balance accuracy and latency under constrained compute.

Key focus areas:

High-quality dataset curation and annotation (label consistency, tight bounding boxes)
Targeted data augmentation (lighting shifts, rotations, occlusion simulation)
Model size and inference optimization for edge performance

System Design

The system was designed as a lightweight, on-device pipeline optimized for low-latency inference.

Training pipeline:

Images collected from target environments and annotated using labelImg
Dataset converted to YOLO format with standardized class definitions
Fine-tuning performed on pre-trained weights with augmentation strategies to simulate deployment conditions
Evaluation based on mAP@0.5 and precision/recall tradeoffs

Inference pipeline (edge):

Webcam stream captured at 640x480 resolution
Frames resized and normalized before inference
YOLOv5 model performs detection with non-max suppression (NMS) applied post-inference
Bounding boxes and confidence scores rendered in real-time

The system achieved ~20-25 FPS on CPU-based edge hardware with optimized model configuration.

Key Decisions

Prioritize edge-first deployment

Designed the system to run entirely on-device to avoid network dependency and reduce latency. This required constraining model size and optimizing inference rather than maximizing raw accuracy.

Use YOLOv5s for latency-accuracy balance

Evaluated heavier architectures (e.g., Faster R-CNN) but selected YOLOv5s due to significantly better real-time performance with acceptable accuracy tradeoffs for the use case.

Invest in dataset quality over scale

With limited data, focused on annotation quality and targeted augmentation rather than indiscriminate dataset expansion. This improved generalization more effectively than increasing volume with noisy labels.

Optimize for consistency, not peak accuracy

In a real-world setting, stable detection under varying conditions was more valuable than maximizing benchmark metrics. Model tuning emphasized reducing false negatives in common scenarios.

Results & Impact

Real-time Edge Inference: ~20-25 FPS on CPU at 640x480 resolution
Detection Performance: Achieved ~0.75 mAP@0.5 on validation set with consistent performance across varied lighting and backgrounds
Operational Feasibility: Demonstrated that on-device detection can support automated material classification without cloud dependency

This prototype validated the feasibility of deploying computer vision systems directly within recycling workflows, reducing reliance on manual inspection and enabling scalable automation.

Tradeoffs

Model size constraints limited ability to capture finer-grained visual distinctions.
Performance degraded in extreme lighting or heavy occlusion scenarios.
Limited dataset diversity impacted generalization to unseen environments.
Edge deployment restricted use of more computationally intensive architectures.

Learnings

Transfer learning is critical for bootstrapping performance in niche domains with limited data.
Annotation quality has a disproportionate impact on detection performance in small datasets.
Edge systems require deliberate tradeoffs between latency, accuracy, and model complexity.
Real-world robustness is driven more by data coverage than model sophistication.

Future Work

Expand dataset with broader environmental coverage and edge-case scenarios
Incorporate semi-supervised labeling to scale data efficiently
Explore model quantization and hardware-specific optimizations for improved edge performance
Extend to multi-class detection for broader waste classification (plastic, metal, mixed materials)

Back to all projects