Isaac Sim · ZED SDK · TensorRT · Kubernetes

PERCEPTION AI

Stereo depth sensing research for warehouse robotics. Benchmarking neural depth modes against photorealistic simulation ground truth.

ZED NEURAL_PLUS ZED Neural Depth Anything V2 Large
scroll
3,000frames analyzed
4.77mNEURAL_PLUS MAE
3depth models
17+benchmark runs
28.9%δ₁ accuracy
Models
Model Comparison
Full warehouse scene · GT 1920×1080 → ZED 960×600 · 50-frame stratified sample
Select Models & Runs
Head-to-Head — All Selected Runs
MAE / RMSE / δ₁ comparison (δ₁ scaled ×10)
MAE vs RMSE scatter · dot size = frame count
Per-Range Depth Accuracy
Range breakdown available where raw depth data was collected (NEURAL_PLUS canonical run). N/A = aggregate-only run.
Methodology: GT depth (1920×1080) resampled to ZED resolution (960×600) via nearest-neighbor. Range bins applied to GT values. δ₁: max(d/d̂, d̂/d) < 1.25. Only NEURAL_PLUS 3,000-frame run has per-range data — other modes show aggregate metrics only.
MAE by Range
Lower is better
δ₁ by Range
Higher is better
Per-Frame MAE
Frame-level MAE series for each selected model and run
Per-Frame MAE Over Time
Each point = sampled frame. Compare stability and variance across depth modes.
All Runs — Selected Models
Complete run catalog for selected depth modes
MAE per Run — Selected Models
Dot size proportional to frame count · ★ = canonical run
Run IDDateModeFrames MAERMSEδ₁
Perception AI Research
Depth sensing evaluation pipeline for warehouse robotics using photorealistic simulation.
📡

ZED Depth Modes Benchmarked

  • NEURAL_PLUS — Advanced TRT-compiled stereo neural depth. Best overall accuracy.
  • ZED_NEURAL — Standard neural stereo depth. Solid baseline performance.
  • NEURAL (legacy) — Earlier neural depth model. Limited runs available.
  • QUALITY — Classic stereo matching, quality preset. No neural inference.
🏭

Simulation Setup

NVIDIA Isaac Sim renders a photorealistic full warehouse scene with 1920×1080 ground-truth depth maps at float32 precision. The autonomous agent navigates via depth-reactive collision avoidance, generating 3,000 frames of varied depth distributions.

⚙️

Pipeline Infrastructure

  • KFP v2 / Argo Workflows on Kubernetes
  • CephFS PVC for shared pool storage
  • TRT engine cache with 1200s compile budget
  • Automated per-frame metrics: MAE, RMSE, δ₁
  • Per-range stats via 50-frame stratified sampling
📊

Evaluation Metrics

  • MAE — mean |GT − pred| over valid pixels
  • RMSE — √mean(GT − pred)² over valid pixels
  • δ₁ — fraction with max(d/d̂, d̂/d) < 1.25
  • Ranges: close 0.3–3m · mid 3–30m · long 30+m
  • GT resampled to ZED resolution before comparison