The Reality of the Sim-to-Real Gap
Simulation promises unlimited data, zero hardware wear, parallelized training, and perfect reproducibility. In practice, every simulation is an approximation, and the gap between that approximation and reality is where policies go to die. Contact dynamics differ between simulated and real materials. Actuator friction, backlash, and latency are difficult to model. Rendered images look convincing to humans but activate neural network feature detectors differently than real camera images. Sensor noise, cable stiffness, table vibration -- dozens of small effects that simulation ignores or simplifies compound into a gap that untreated policies cannot cross.
But the gap is not uniform. Some tasks transfer easily; others are nearly impossible with current methods. Understanding where your task falls on this spectrum -- and applying the right techniques for your specific gap -- is what separates teams that successfully deploy sim-trained policies from teams that spend months in simulation only to fail on real hardware. For background on the theoretical foundations, see our companion article on sim-to-real transfer theory.
What Transfers Well and What Does Not
Transfers well: Locomotion on flat and moderately rough terrain, basic object grasping with parallel jaw grippers, navigation and obstacle avoidance, and reaching motions in free space. These tasks depend on dynamics that simulation models accurately (rigid body mechanics, basic friction) and visual features that transfer reasonably (object shapes, workspace geometry).
Transfers with effort: Pick-and-place of rigid objects, pushing and sliding manipulation, door and drawer opening, and simple assembly with generous tolerances. These tasks involve contact dynamics that simulation approximates but does not perfectly match. Successful transfer requires the techniques described below.
Does not transfer well (yet): Contact-rich manipulation of deformable objects (cloth folding, cable routing), precise assembly with sub-millimeter tolerances (connector mating, snap-fit insertion), and tasks involving granular materials, fluids, or soft bodies. The physics of these interactions is either too complex to simulate accurately or too sensitive to parameter errors for domain randomization to bridge the gap.
Tip 1: Use Systematic Domain Randomization
Domain randomization is the most widely validated technique for sim-to-real transfer, but "randomize everything" is not a strategy. Effective domain randomization targets the specific parameters that differ between your simulation and your real setup. Start by identifying your failure modes on real hardware (even 5-10 real trials give useful signal), then randomize the simulation parameters most likely to cause those failures.
For visual policies, effective randomization ranges include: camera position offset of plus or minus 10 cm from nominal, viewing angle variation of plus or minus 10 degrees, brightness variation of plus or minus 30%, hue shift of plus or minus 20%, saturation variation of plus or minus 15%, Gaussian blur with sigma 0-2 pixels, and random rectangular occlusions covering 5-20% of the image. These ranges make simulated images look unrealistically varied -- but that is exactly the point. The policy learns features that are invariant to all these variations, which means it also handles the real-world visual differences between simulation and your actual camera.
For physical parameters, prioritize: contact friction coefficients (randomize by plus or minus 50%), object mass (plus or minus 30%), joint damping (plus or minus 20%), and actuator latency (add 0-20 ms random delay). Randomize contact stiffness for any task involving sustained contact.
Tip 2: Invest in Real-to-Sim Calibration
Default simulation parameters -- joint stiffness, damping, friction, inertia tensors -- often differ from your real robot by 10-50%. Before training, spend 2-4 hours doing system identification: move each joint through its range and measure the actual torque-position-velocity relationship. Use these measured values as the center of your domain randomization distribution. This step alone often reduces sim-to-real error by 30-50% on contact tasks.
Calibrate your camera model as well. Measure the actual intrinsic parameters (focal length, principal point, distortion coefficients) and extrinsic parameters (position and orientation relative to the robot base) of your real cameras and set your simulation cameras to match. Visual policies are surprisingly sensitive to camera parameter mismatches -- a 5% error in focal length can cause a 10-15% drop in grasping accuracy.
Tip 3: Prefer Depth Over RGB for Visual Input
Photorealistic RGB rendering in simulation still does not match real-world cameras closely enough for direct zero-shot transfer on many tasks. Lighting models, material shaders, and shadow algorithms produce images that look similar to humans but differ in ways that neural network feature detectors are sensitive to. Depth images have a much smaller sim-to-real gap because depth is a direct representation of geometry, and simulation renders geometry accurately.
Teams that use depth as the primary visual input (with RGB as a secondary channel for semantic information) consistently report 20-40% improvement in zero-shot sim-to-real transfer on grasping tasks. If your task does not require color or texture information for object discrimination, use depth-only. If it does require color information (sorting objects by color, for example), use an RGB channel but apply aggressive visual domain randomization on it.
Tip 4: Use Privileged Information During Training
The privileged information technique -- sometimes called teacher-student training -- has become the standard approach for sim-to-real locomotion and is increasingly used for manipulation. The idea: train a teacher policy in simulation that has access to ground-truth state information that would be unavailable on real hardware (exact object pose, true friction coefficients, ground-truth contact locations). Then train a student policy that uses only the sensor observations available on the real robot to match the teacher's behavior through distillation.
This works because the teacher policy can solve the task optimally using perfect information, and the student learns to approximate that optimal behavior using only the noisy, partial observations it will have access to on real hardware. The teacher provides a much stronger training signal than raw reward alone. This technique was central to the success of quadruped locomotion transfer at ETH Zurich, Carnegie Mellon, and Unitree, and has been adapted for manipulation tasks involving contact sensing and force control.
Tip 5: Randomize Contact Parameters Specifically
For any task involving sustained contact -- insertion, sliding, pushing, surface following -- contact parameter randomization is critical and often under-emphasized. Randomize not just friction coefficients but also contact stiffness, contact damping, restitution (bounciness), and the solver iteration count for the contact model. The contact solver in simulation introduces artifacts (penetration, jitter, premature slip) that the real world does not have, and randomizing the solver parameters forces the policy to handle these artifacts rather than exploit them.
A practical approach: run your policy on 100 simulation trials with fixed contact parameters and record the contact force profiles. Then repeat with heavily randomized contact parameters. If the policy's success rate drops significantly with randomized contacts, it was exploiting specific simulation artifacts. Retrain with randomized contacts until the success rate is stable, then transfer to real hardware.
Tip 6: Start with Coarse Tasks Before Refining
Attempting to transfer a policy for a precision task directly from simulation is a recipe for frustration. Instead, decompose your task into a coarse version and a fine version. The coarse version -- approach the object, roughly align the gripper, make initial contact -- transfers well from simulation because it depends on geometry and trajectory planning, not precise contact dynamics. The fine version -- final alignment, insertion, controlled force application -- should be trained or fine-tuned on real data.
This hierarchical approach combines the volume of simulation with the fidelity of real data. The simulation policy handles the 80% of the task that involves free-space motion and rough positioning. A small amount of real data (50-200 demonstrations) handles the 20% that requires precise contact dynamics. This hybrid consistently outperforms both sim-only and real-only training when total data budget is limited.
Tip 7: Test Frequently on Real Hardware
The single most common mistake in sim-to-real projects is spending months optimizing in simulation before testing on real hardware. By the time the team discovers how their policy fails in reality, they have invested enormous effort optimizing for the wrong thing. Test on real hardware early and often -- every 1-2 weeks at minimum during active development.
Structure your real-hardware testing as systematic perturbation testing, not random evaluation. Test at 5-10 specific challenging positions: extreme workspace corners, objects near the edge of reachable space, objects at atypical heights or orientations. This structured evaluation reveals whether failures are concentrated at specific conditions (diagnosable and fixable) or randomly distributed (harder to fix, suggesting a fundamental gap). Log both simulation and real failure modes using the same categories and compare the distributions -- where they diverge points directly to the simulation parameters that need improvement.
Tip 8: Get Your Actuator Model Right
Default simulation actuator models assume idealized behavior: instant torque response, zero backlash, perfect position tracking. Real servo motors have bandwidth limits (typically 5-50 Hz for position commands), backlash in gear trains (0.1-2 degrees depending on the reduction ratio), and torque-speed curves that differ significantly from constant-torque models. For OpenArm 101's serial elastic actuators, the compliance in the transmission is a feature for safety but introduces dynamics that the default rigid-body actuator model in MuJoCo or Isaac Sim ignores entirely.
The fix: measure your actual actuator response. Send a step command to each joint and record the position response over time. Fit a second-order transfer function (natural frequency, damping ratio, and gain) to each joint. Use these fitted models as the actuator model in simulation. For MuJoCo, set the gainprm, biasprm, and dynprm attributes on each actuator to match. For Isaac Sim, configure the PD gains and damping in the ArticulationController. This step takes 2-4 hours and typically reduces joint tracking error in sim-to-real transfer by 30-50%.
Tip 9: Randomize Materials and Textures Systematically
For visual policies, the appearance of objects and surfaces in simulation must either match reality closely or be randomized broadly enough that reality falls within the randomized distribution. The second approach is more robust. Effective texture randomization includes: random solid colors across the full HSV range for all objects and surfaces, procedural texture generation (Perlin noise, checkerboard, gradient) applied to table surfaces and backgrounds, and random HDRI environment maps for lighting (download 20-50 free HDRI maps from Polyhaven and cycle through them during training).
A counterintuitive finding: photorealistic rendering is often counterproductive for sim-to-real transfer. A policy trained with photorealistic simulation rendering learns to exploit visual details (specific shadow patterns, surface reflections) that do not match reality. A policy trained with heavily randomized, unrealistic textures learns geometric and structural features that transfer better. If you have access to high-quality rendering (Isaac Sim Omniverse), use it for evaluation and debugging, not for training data generation.
Tip 10: Implement a Physics Parameter Identification Pipeline
Beyond actuator modeling, the physical properties of objects in your task significantly affect contact behavior. Measure and calibrate the critical parameters:
- Object mass: Weigh every object your robot will manipulate. Set simulation masses to match. A 10% mass error on a heavy object produces noticeable force and trajectory errors.
- Friction coefficients: Place objects on an inclined surface and record the angle at which they begin to slide. Compute static friction coefficient as tan(angle). Set simulation friction accordingly and randomize +/-30% around the measured value.
- Center of mass: For asymmetric objects, the center of mass affects grasp stability and transport dynamics. Measure by balancing the object on a point or using a suspension method. Offset simulation COM accordingly.
- Inertia tensor: For most tabletop manipulation, approximating objects as uniform-density solids with the correct mass and geometry is sufficient. For irregular or hollow objects, measure or estimate the principal moments of inertia.
This calibration pipeline should be automated and repeated whenever your object set changes. SVRC's RL environment service includes a standardized physics identification protocol for customer hardware.
Tip 11: Choose the Right Contact Model for Your Task
Different simulators offer different contact models, and the choice matters more than most teams realize. MuJoCo's convex contact model is fast and stable but produces artifacts with non-convex geometries (objects interpenetrate at concavities). Isaac Sim's GPU-accelerated PhysX uses a compliant contact model that handles non-convex shapes better but can produce oscillations at low contact stiffness settings. Genesis offers differentiable contact, which is powerful for optimization but less mature for general-purpose contact simulation.
For grasping tasks with simple grippers: MuJoCo's default contact model is usually sufficient. For assembly tasks with tight clearances: increase the solver iteration count (MuJoCo: nconmax and njmax; Isaac Sim: solver position and velocity iterations) and use mesh-based collision rather than primitive shapes. For deformable or soft objects: use FEM-based soft body simulation (available in Isaac Sim and SOFA) rather than rigid body approximations.
Tip 12: Use Asymmetric Actor-Critic for RL-Based Transfer
If you are using reinforcement learning in simulation (rather than imitation learning), the asymmetric actor-critic architecture is now standard practice for sim-to-real transfer. The critic (value function) has access to privileged ground-truth state information during training in simulation. The actor (policy) uses only the observations available on real hardware. This allows the critic to provide accurate value estimates that guide the actor to learn effective behaviors using only real-world-compatible observations.
This is the architecture behind the successful locomotion transfer work at ETH Zurich (ANYmal), Carnegie Mellon, and Unitree. For manipulation, the pattern is the same: the critic sees exact object poses and contact states; the actor sees only camera images and proprioception. The result is a policy that uses only real-world sensors but was guided by perfect information during training.
Tip 13: Build a Sim-to-Real Regression Test Suite
Treat sim-to-real transfer like software engineering: build a test suite and run it on every policy update. Define 10-20 specific test configurations (object types, positions, orientations) and execute each in both simulation and real hardware. Track the correlation between sim and real success rates over time. A healthy sim-to-real pipeline shows >0.8 correlation between sim and real performance across test configurations. If correlation drops below 0.6, something in your simulation has diverged from reality (camera moved, object set changed, actuator degraded) and needs recalibration.
Tip 14: Handle Observation Space Differences Explicitly
Simulation provides perfect proprioception: exact joint positions, velocities, and accelerations at every timestep. Real robots have encoder noise, quantization, communication delays, and occasional dropped readings. Add realistic noise to simulated observations: Gaussian noise with standard deviation matched to your encoder specifications (typically 0.001-0.01 radians for joint positions), random dropped readings at 0.1-1% rate, and quantization to match your actual encoder resolution. Without these, the policy learns to rely on precision that is not available on real hardware.
Tip 15: Know When Your Gap Is Unbridgeable
Some sim-to-real gaps cannot be closed with current techniques, and recognizing this early saves months of effort. Signs that direct sim-to-real transfer will not work for your task: your task success depends on material properties that vary between object instances (fabric stiffness, surface texture friction), your task requires sub-millimeter precision in contact (and your robot has >1mm repeatability), or your task involves fluids, granular materials, or highly deformable objects. In these cases, use simulation for the coarse phases of the task and collect real data for the fine phases, or skip simulation entirely and use SVRC's real-data collection services from the start.
Simulator-Specific Tips
| Simulator | Best Practice | Common Pitfall |
|---|---|---|
| Isaac Sim | Use GPU-accelerated environments with at least 256 parallel instances for RL; disable ray-traced rendering during training | Leaving ray tracing on tanks throughput 10x; Omniverse USD scene setup takes days if unfamiliar |
| MuJoCo | Use mj_step with 4-5 substeps for contact stability; set solimp and solref per-geom for different materials | Default solver settings produce unstable contacts for tight assemblies; convex decomposition needed for non-convex meshes |
| Genesis | Leverage differentiable physics for system identification (optimize sim parameters to match real trajectories) | Ecosystem is less mature; fewer pre-built robot models; community support is smaller |
| PyBullet | Good for quick prototyping and education; use URDF models directly from manufacturer | Contact physics is less accurate than MuJoCo; no longer actively developed; avoid for production sim-to-real |
Physics Parameter Identification: Measurement Protocol
System identification is the highest-ROI step in any sim-to-real pipeline, yet most teams skip it or do it haphazardly. Here is a rigorous measurement protocol that takes 4-6 hours and dramatically improves transfer fidelity.
Joint dynamics identification (2 hours). For each joint of your robot arm:
- Command a sinusoidal position trajectory at 0.1 Hz, 0.5 Hz, 1 Hz, and 2 Hz. Record commanded position, actual position, and motor current at 1 kHz.
- Compute the frequency response (gain and phase) at each frequency. Fit a second-order transfer function: H(s) = K * omega_n^2 / (s^2 + 2*zeta*omega_n*s + omega_n^2). Record K (gain), omega_n (natural frequency), and zeta (damping ratio) for each joint.
- Measure backlash: command a slow ramp up then ramp down. The hysteresis width in the position trace is the backlash. Typical values: 0.1-0.3 deg for harmonic drives (Franka), 0.5-2.0 deg for planetary gearboxes (OpenArm, Kinova), 0.02-0.05 deg for direct drive.
- Measure friction: record the torque required to move each joint at very low speed (< 0.01 rad/s). This is the Coulomb friction. Then measure torque vs. speed at higher velocities to estimate viscous friction. Model: tau_friction = tau_coulomb * sign(v) + b_viscous * v.
Object property measurement (1 hour per 10 objects).
- Mass: Kitchen scale accurate to 1g. Record to the gram.
- Friction coefficient: Place object on a flat surface attached to a digital inclinometer. Increase angle slowly until the object slides. mu_s = tan(angle). Measure on at least 3 surfaces (metal, wood, rubber mat). Use the mean as simulation default, the range as randomization bounds.
- Center of mass (asymmetric objects): Suspend the object from two different points using a string. The intersection of the two plumb lines from the suspension point gives the COM in 2D. Repeat with a third suspension for 3D. Accuracy: approximately 5mm, which is sufficient for most manipulation.
- Coefficient of restitution: Drop the object from 30 cm onto a hard surface and record bounce height. COR = sqrt(h_bounce / h_drop). Most manipulation objects have COR of 0.1-0.5.
# sys_id.py -- Minimal joint dynamics identification
import numpy as np
from scipy.optimize import curve_fit
def second_order_response(t, K, wn, zeta):
"""Second-order underdamped step response."""
wd = wn * np.sqrt(1 - zeta**2)
return K * (1 - np.exp(-zeta * wn * t) *
(np.cos(wd * t) + (zeta / np.sqrt(1 - zeta**2)) * np.sin(wd * t)))
# Record step response: send a 0.1 rad step command and log at 1kHz
# t_data, pos_data = record_step_response(joint=3, amplitude=0.1)
# Fit parameters
popt, pcov = curve_fit(second_order_response, t_data, pos_data,
p0=[1.0, 50.0, 0.7], bounds=([0.5, 5, 0.1], [1.5, 200, 1.0]))
K, wn, zeta = popt
print(f"Joint 3: K={K:.3f}, wn={wn:.1f} rad/s, zeta={zeta:.3f}")
# Typical values for OpenArm 101: K=0.95-1.0, wn=30-60, zeta=0.6-0.9
These measured parameters go directly into your simulation configuration. For MuJoCo, set gainprm to [K * wn^2, 0, 0] and biasprm to [0, -K * wn^2, -2 * K * zeta * wn] on each actuator. For Isaac Sim, configure the PD gains in the ArticulationController to match the identified natural frequency and damping.
Sim-to-Real Transfer Success Rate by Task Type
Based on published results and SVRC evaluation data, the following table shows expected sim-to-real transfer success rates by task category, assuming the techniques described above are applied correctly.
| Task Category | Sim Success | Real Transfer (naive) | Real Transfer (w/ DR + SysID) | Gap Closure |
|---|---|---|---|---|
| Flat-ground locomotion | 98% | 70-80% | 90-95% | High |
| Rigid object pick-and-place | 95% | 40-55% | 75-85% | Medium-High |
| Door/drawer opening | 92% | 30-45% | 65-80% | Medium |
| Peg insertion (>1mm tolerance) | 90% | 15-25% | 55-70% | Medium |
| Precision assembly (<0.5mm) | 88% | 5-15% | 30-50% | Low-Medium |
| Cloth folding | 85% | 5-10% | 15-30% | Low |
| Fluid pouring | 80% | < 5% | 10-20% | Very Low |
The pattern is clear: transfer success correlates inversely with contact complexity. Tasks dominated by free-space motion and simple rigid contacts transfer well. Tasks dominated by complex contact dynamics (deformable materials, tight tolerances, fluids) transfer poorly regardless of technique. For tasks in the "low" gap closure category, simulation remains valuable for pre-training and coarse skill learning, but real-data fine-tuning is essential for deployment-quality performance.
Domain Randomization Configuration: A Complete Example
Here is a practical domain randomization configuration for a tabletop manipulation task in MuJoCo, covering the parameters that matter most for sim-to-real transfer.
# domain_randomization.py -- MuJoCo domain randomization for manipulation
import numpy as np
import mujoco
class DomainRandomizer:
"""Randomize physics and visual parameters each episode."""
def __init__(self, model):
self.model = model
self.defaults = {
'friction': model.geom_friction.copy(),
'mass': model.body_mass.copy(),
'damping': model.dof_damping.copy(),
}
def randomize(self):
m = self.model
# Contact friction: +/- 50% (critical for grasping)
m.geom_friction[:] = self.defaults['friction'] * np.random.uniform(0.5, 1.5, m.geom_friction.shape)
# Object mass: +/- 30%
for i in range(m.nbody):
if m.body_mass[i] > 0.01: # Skip fixed bodies
m.body_mass[i] = self.defaults['mass'][i] * np.random.uniform(0.7, 1.3)
# Joint damping: +/- 25%
m.dof_damping[:] = self.defaults['damping'] * np.random.uniform(0.75, 1.25, m.dof_damping.shape)
# Actuator latency: 0-20ms random delay (model in policy loop)
self.actuator_delay_ms = np.random.uniform(0, 20)
# Camera perturbation: +/- 3cm position, +/- 5deg rotation
for cam_id in range(m.ncam):
m.cam_pos[cam_id] += np.random.uniform(-0.03, 0.03, 3)
m.cam_quat[cam_id] += np.random.uniform(-0.05, 0.05, 4)
m.cam_quat[cam_id] /= np.linalg.norm(m.cam_quat[cam_id])
# Lighting randomization
for light_id in range(m.nlight):
m.light_diffuse[light_id] = np.random.uniform(0.3, 1.0, 3)
m.light_pos[light_id] += np.random.uniform(-0.5, 0.5, 3)
Call randomizer.randomize() at the start of each episode during training. The policy will initially perform worse than with fixed parameters, but after convergence, it will transfer substantially better to real hardware because it has learned to be robust to parameter variation rather than exploiting specific simulation values.
Diagnosing Sim-to-Real Failures: A Systematic Approach
When a sim-trained policy fails on real hardware, use this diagnostic framework to identify the specific gap causing the failure.
- Record failure videos on real hardware. Capture at least 10 failure episodes with all cameras active. Classify failures into categories: approach error, grasp error, transport error, placement error.
- Run the same initial conditions in simulation. Set up the simulation to match the real failure conditions as closely as possible. Does the policy succeed in sim with these conditions? If yes, the gap is in the physics or visual domain. If no, the policy has a fundamental capability gap that exists even in sim.
- Compare visual observations. Side-by-side the sim camera image and real camera image at the failure timestep. Are the images meaningfully different in ways the policy might be sensitive to? Check lighting, reflections, shadows, and object texture.
- Compare proprioceptive state. Plot joint position tracking error between commanded and actual positions on real hardware. If tracking error exceeds 2-3 degrees on any joint, the actuator model in sim is likely the problem.
- Compare contact behavior. If the failure occurs during contact (grasp, insertion), compare the F/T sensor readings on real hardware with the simulated contact forces. Large discrepancies (>50% difference in peak force) indicate contact model issues.
This diagnostic pipeline typically identifies the root cause within 2-3 iterations. The most common causes, in order of frequency: (1) camera position/calibration mismatch, (2) actuator model inaccuracy, (3) contact friction mismatch, (4) visual domain gap. Addressing #1 and #2 resolves 60-70% of sim-to-real failures.
Simulator Selection: Isaac Sim, MuJoCo, or Genesis
NVIDIA Isaac Sim (built on PhysX 5, integrated with Omniverse) is the leading choice for high-fidelity simulation as of 2026. Its GPU-accelerated physics enables thousands of parallel simulation instances, making reinforcement learning tractable for complex tasks. Isaac Sim also offers the best rendering quality for visual policy training. The main drawbacks are setup complexity, hardware requirements (high-end NVIDIA GPU), and the learning curve for the Omniverse ecosystem.
MuJoCo (now open-source from DeepMind) remains the standard for fast, accurate contact physics in research settings. It is faster per-environment than Isaac Sim, has a simpler API, and offers the most extensive ecosystem of pre-built environments and benchmarks. MuJoCo is the right choice when you need fast iteration on policy architecture and reward design and do not need photorealistic rendering. Its contact model is well-characterized and produces consistent results.
Genesis is a newer simulator that emphasizes speed and differentiability. It supports differentiable physics, enabling gradient-based optimization through the simulation, which can accelerate contact-rich task learning. Genesis is gaining adoption for tasks where differentiable simulation provides a clear advantage -- parameter optimization, trajectory optimization -- but its ecosystem is less mature than MuJoCo or Isaac Sim.
Visual Domain Randomization: Bridging the Rendering Gap
Physics parameter randomization handles the dynamics gap, but the visual gap between simulated and real camera images is often the dominant source of transfer failure for vision-based policies. Visual domain randomization addresses this by training the policy to be invariant to visual appearance.
| Visual Parameter | Randomization Range | Impact on Transfer | Notes |
|---|---|---|---|
| Object texture | Random RGB per face or procedural noise | High (+15-25%) | Prevents policy from relying on sim-specific textures |
| Lighting position + color | +/-1m position, 3000K-6500K color temp | High (+10-20%) | Shadows are the biggest visual gap; randomize shadow direction |
| Camera position + orientation | +/-3cm position, +/-5deg rotation | Medium (+8-15%) | Matches real-world camera mounting imprecision |
| Table/background color | Random RGB or texture from dataset | Medium (+8-12%) | Prevents background shortcuts; use real-photo textures for best results |
| Distractor objects | 0-5 random objects in workspace | Medium (+5-10%) | Crucial for cluttered deployment environments |
| Camera noise + blur | Gaussian noise (sigma 0-0.05), motion blur (0-3px) | Low (+3-5%) | Simulates real camera imperfections; more important for low-light deployment |
The order of importance for visual randomization is: object textures > lighting > camera position > background > distractors > noise. Teams with limited engineering time should focus on the top three. For photorealistic alternatives, training with Isaac Sim's path-traced rendering eliminates the need for aggressive texture randomization but requires significantly more GPU time per episode.
The Hybrid Approach: Sim Pre-Training + Real Fine-Tuning
The highest-performing sim-to-real pipeline in 2026 combines large-scale simulation training with a small amount of real-world fine-tuning. This hybrid approach leverages simulation's scalability for learning coarse skills and real data's fidelity for closing the final gap.
# hybrid_sim_real_pipeline.py -- Sim pre-train + real fine-tune
from pathlib import Path
def hybrid_pipeline(task_name, sim_episodes=10000, real_episodes=100):
"""
Stage 1: Pre-train in simulation with domain randomization
Stage 2: Fine-tune on real-world demonstrations
Stage 3: Evaluate on real hardware
"""
# Stage 1: Sim pre-training (runs on GPU cluster, 4-24 hours)
sim_config = {
"environment": f"sim/{task_name}",
"domain_randomization": True,
"visual_randomization": True,
"num_episodes": sim_episodes,
"architecture": "act",
"epochs": 500,
"checkpoint_dir": f"checkpoints/{task_name}_sim",
}
sim_model = train_policy(**sim_config)
# Stage 2: Real fine-tuning (uses sim checkpoint as init)
real_config = {
"dataset": f"data/real/{task_name}",
"num_episodes": real_episodes,
"pretrained_checkpoint": sim_model.checkpoint_path,
"learning_rate": 1e-5, # 10x lower than sim training
"epochs": 200,
"freeze_encoder_epochs": 100, # Freeze visual encoder initially
"checkpoint_dir": f"checkpoints/{task_name}_hybrid",
}
hybrid_model = finetune_policy(**real_config)
# Expected results:
# Sim-only: 40-60% real success (pick-place)
# Real-only (100 demos): 70-80% success
# Hybrid (10K sim + 100 real): 80-90% success
return hybrid_model
The hybrid approach typically matches the performance of 3-5x more real demonstrations: 100 real demos plus 10K sim episodes achieves comparable performance to 300-500 real-only demos. The economics favor the hybrid approach when sim environment setup is achievable within 1-2 weeks of engineering time. SVRC's RL environment service provides pre-built simulation environments for common manipulation tasks, reducing the setup time to 1-2 days.
Real-World Fine-Tuning Tips After Sim Pre-Training
The fine-tuning stage is where most teams make mistakes that negate the benefits of sim pre-training. Key guidelines:
- Use a 10x lower learning rate than sim training. The sim-pretrained model has already learned visual features and coarse motor skills. A high learning rate overwrites these and you lose the sim pre-training benefit entirely.
- Freeze the visual encoder for the first 50% of fine-tuning epochs. The sim-pretrained encoder has learned visual features that may be more general than what 100 real demos can teach. Let the action prediction head adapt first, then unfreeze the encoder with a very low LR (1/100th of the action head LR).
- Mix 10-20% sim data during fine-tuning. Adding a small fraction of sim episodes to the real fine-tuning batches prevents catastrophic forgetting of sim-learned skills and acts as a regularizer.
- Collect real demos that target sim failure modes. Run the sim-pretrained policy on real hardware before collecting fine-tuning data. Identify the specific failure modes (usually contact dynamics or visual appearance mismatch) and focus your real data collection on those failure conditions rather than collecting uniformly.
When to Skip Sim Entirely
Simulation is not always the right choice. Skip simulation and go directly to real data collection when: your task involves deformable objects or materials that are poorly simulated (cloth, cables, food); you have access to fast real-world data collection (SVRC's data services can collect 500+ episodes per day); your task requires fewer than 1,000 demonstrations; or when the effort to build an accurate simulation environment exceeds the effort to collect real data.
The decision framework is simple: estimate the cost of building and calibrating a simulation environment for your specific task (including engineering time, hardware for rendering, and the debugging time for sim-to-real transfer). Compare it to the cost of collecting the equivalent amount of real data. For many manipulation tasks in 2026, the real-data path is faster and more predictable. Simulation excels when you need millions of episodes (reinforcement learning), when the task transfers well (locomotion), or when real data collection is dangerous or expensive (surgical robotics, hazardous environments).
Start Your Transfer Pipeline
SVRC's RL environment service provides managed simulation environments with system identification and physics calibration for your specific hardware. For teams pursuing the hybrid approach, we also offer real-data collection through our data services to supplement your simulation training with the real-world demonstrations that close the final gap.
Related Reading
- Robot Policy Generalization: Why Your Robot Fails on New Objects
- Robot Learning vs. Classical Robotics: When to Use Which
- Scaling Laws for Robot Learning: What We Know in 2026
- Robot Deployment Checklist: 12 Steps Before Going Live
- ACT vs. Diffusion Policy: When to Use Which
- SVRC RL Environment Service
- SVRC Data Collection Services