May 1, 2026
Object Detection for Cardboard Recycling
An edge-deployed computer vision system for real-time cardboard detection to support automated recycling workflows.

Overview
Developed a real-time computer vision system to detect cardboard materials in video streams, enabling automated sorting and classification in recycling workflows.
The system was designed for edge deployment, running locally on constrained hardware to eliminate network latency and support on-device decision-making. A lightweight object detection model was trained on a custom dataset and optimized for consistent performance under real-world variability.
Problem
- Manual sorting of recyclable materials is labor-intensive, inconsistent, and difficult to scale.
- Existing systems lacked reliable, real-time identification of specific materials such as cardboard at the point of capture.
- Public datasets did not adequately represent the variability of cardboard (shape, wear, lighting, background clutter), requiring custom data collection.
Constraints
- Small, custom-built dataset (~1-2K labeled images) with limited environmental diversity.
- Real-time inference requirement on edge hardware (CPU-constrained, no guaranteed GPU).
- Input stream from webcam (640x480) with variable lighting and motion.
- Need for robustness to high intra-class variation (flattened boxes, torn pieces, partial visibility).
Approach
Framed the problem as an object detection task, identifying bounding boxes around cardboard objects in each frame.
Used transfer learning with a pre-trained YOLOv5s model (PyTorch) to balance accuracy and latency under constrained compute.
Key focus areas:
- High-quality dataset curation and annotation (label consistency, tight bounding boxes)
- Targeted data augmentation (lighting shifts, rotations, occlusion simulation)
- Model size and inference optimization for edge performance
System Design
The system was designed as a lightweight, on-device pipeline optimized for low-latency inference.
Training pipeline:
- Images collected from target environments and annotated using labelImg
- Dataset converted to YOLO format with standardized class definitions
- Fine-tuning performed on pre-trained weights with augmentation strategies to simulate deployment conditions
- Evaluation based on mAP@0.5 and precision/recall tradeoffs
Inference pipeline (edge):
- Webcam stream captured at 640x480 resolution
- Frames resized and normalized before inference
- YOLOv5 model performs detection with non-max suppression (NMS) applied post-inference
- Bounding boxes and confidence scores rendered in real-time
The system achieved ~20-25 FPS on CPU-based edge hardware with optimized model configuration.
Key Decisions
Prioritize edge-first deployment
Designed the system to run entirely on-device to avoid network dependency and reduce latency. This required constraining model size and optimizing inference rather than maximizing raw accuracy.
Use YOLOv5s for latency-accuracy balance
Evaluated heavier architectures (e.g., Faster R-CNN) but selected YOLOv5s due to significantly better real-time performance with acceptable accuracy tradeoffs for the use case.
Invest in dataset quality over scale
With limited data, focused on annotation quality and targeted augmentation rather than indiscriminate dataset expansion. This improved generalization more effectively than increasing volume with noisy labels.
Optimize for consistency, not peak accuracy
In a real-world setting, stable detection under varying conditions was more valuable than maximizing benchmark metrics. Model tuning emphasized reducing false negatives in common scenarios.
Results & Impact
- Real-time Edge Inference: ~20-25 FPS on CPU at 640x480 resolution
- Detection Performance: Achieved ~0.75 mAP@0.5 on validation set with consistent performance across varied lighting and backgrounds
- Operational Feasibility: Demonstrated that on-device detection can support automated material classification without cloud dependency
This prototype validated the feasibility of deploying computer vision systems directly within recycling workflows, reducing reliance on manual inspection and enabling scalable automation.
Tradeoffs
- Model size constraints limited ability to capture finer-grained visual distinctions.
- Performance degraded in extreme lighting or heavy occlusion scenarios.
- Limited dataset diversity impacted generalization to unseen environments.
- Edge deployment restricted use of more computationally intensive architectures.
Learnings
- Transfer learning is critical for bootstrapping performance in niche domains with limited data.
- Annotation quality has a disproportionate impact on detection performance in small datasets.
- Edge systems require deliberate tradeoffs between latency, accuracy, and model complexity.
- Real-world robustness is driven more by data coverage than model sophistication.
Future Work
- Expand dataset with broader environmental coverage and edge-case scenarios
- Incorporate semi-supervised labeling to scale data efficiently
- Explore model quantization and hardware-specific optimizations for improved edge performance
- Extend to multi-class detection for broader waste classification (plastic, metal, mixed materials)