How Robots Walk May 15, 2025
There are two popular approaches to making walking robots. The control theory approach is to carefully design a gait and a set of feedback loops to achieve a stable walk. The deep reinforcement learning approach is to tune an end-to-end deep neural network with RL in a simulator. The deep RL approach has shown spectacular successes in the last couple of years, but it has the usual neural network disadvantage of being opaque: you can't read the matrices to help understand how it works.
What follows is a tour of the code for a walking robot designed with a combination of the two approaches. I first wrote the best feedback loops I could using my understanding of what instabilities needed to be actively corrected. The parameters were rough guesses, generally with the right order of magnitude but not capable of stabilizing the robot. Then I tuned the parameters against a simulator using reinforcement learning, using a simple reward function. The result is a viable walking control algorithm using far fewer parameters than a DNN. The code shown here has only 125 parameters, while a DNN would have millions.
The walking code, the simulator, and the visual environment are all open-source. So you can download, run, and modify it. The walking code described here, which does all the balance and control feedback, is only about 300 lines. The walking isn't awesome – it's slow and not very steady – but it demonstrates many of the hard control problems involved in robot walking.
Here's what the simulated robot does.
The walking code has to do the following:
- Generate a repeating pattern, the left-right-left-right of walking
- Estimate the robot's orientation relative to gravity, it's velocity relative to the ground, and the position of the feet in the inertial coordinate system. This is easier (but still not trivial) on a simulator than on hardware.
- Generate baseline angles and torques for the joints
- Generate baseline inertial positions of the feet while walking, and compare to the actual positions
- When the robot is slightly out of position, apply feedback to correct it. There are several more ways to be out of position than falling front-back or left-right that need to be actively corrected.
- When the simulation starts, achieve a stable standing position before starting to walk.
- Start and stop walking at the right point in the cycle.
The robot here is a simulation of a humanoid, part of Mujoco's standard library. Without the code presented here, it just faceplants.
The graphics here are captured from the throbol editor. Throbol is a programming language for robots, and the editor is a kind of spreadsheet for real-time control problems. It has cells and formulas like a conventional spreadsheet, but a cell's value is a time series instead of just a single value. Below each formula it shows a graph of the value over the 10-second duration of the simulation. You can click on any of the formulas or cell names to see a live version (if your browser supports wasm+webgpu.)
Stabilizing the Torso
Let's start with a simple feedback loop. This formula below controls the abdomen X axis joint, where the robot bends left or right at the waist. It's a simple "P" control loop, where the joint torque (bot.abdomen-x) is a constant (7.5) times the difference between the target position (targ.abdomen-x) and the measured position (bot.j.abdomen-x). The graph shows the torque applied to stabilize the joint.
The target position for the joint is a combination of a cyclic motion (walk-cycle.lr) and some feedback for whether it's falling left or right (fb.falling-right).
And we can see that the actual position approximately tracks the target position:
Balance Feedback
So far we've seen feedback to keep an individual joint near its target position. We also need higher level feedback about whether the robot is falling left or right, front or back. For example, the torso control loop above includes feedback from the overall balance error fb.falling-back. Its value is positive when the robot is falling back, negative when falling forward.
The other term in the feedback is fb.egovel.floor: the velocity of the floor relative to the egocenter of the robot. In robotics, it's convenient to work with egocentric coordinates, where some point in the robot's body is defined as [0,0,0] and the world moves around it, and Z always points up relative to gravity, and X always points forward relative to the robot. Thus, walking forward appears like the floor moving backwards and the egovel.floor cell has its X coordinate negative when walking.
The robot needs feedback to control how fast it's walking. Too slow and it should lean forward a bit to accelerate, and vice-versa. We set the feedback term to the difference between a goal walking speed and the actual speed:
where the goal for this slow walk is for the floor to be moving backwards at 0.13 m/s:
The other term in the feedback, err.frontback, is a little gnarlier:
The key idea here is that where the feet are matters more at some points in the walk cycle than others. We need to apply strong feedback if the feet on the ground are slightly away from where we expected them, but not when they're in the air. This is driven by cpg.lfb and cpg.rfb, which say how important left and right foot errors are at any point. As the name suggests, they come from the central pattern generator.
The Central Pattern Generator
Walking requires a regular left-right-left-right pattern. Cyclic patterns can be represented as a Fourier series. So we take the dot product of a 8-term Fourier basis and some experimentally adjusted coefficients.
We then make a left and a right side of the cycle, offset by pi radians. These are complex numbers, so the graphs show the real part in red and the imaginary part in blue.
Walking Gait
We can then drive the leg motion from these cyclic values. Here's the left knee joint, for example.
The actual joint position shows the regular spikes to about -1.1 radians, a reasonable amount to bend the knee while walking.
CPG Control
You may notice how irregular the timing is. That's because the CPG is itself affected by feedback, as it must be for balance. To visualize why, imagine the robot is on its right foot and also teetering over to the right. It needs to take a little longer before switching to the left foot. So the CPG's phase increases at a rate depending on cpg.pause.
For example, it pauses if it's falling right and it's on the right half of the cycle, and vice-versa.
We also use these feedback terms to gate applying torque at the ankles to stabilize the robot. This has to be done whether or not it's walking.
Parameters
You may have wondered why some numbers are written like the 4.10~10
above. The ~
syntax in throbol marks an adjustable parameter, here with a range of [-10 .. 10]. There's no theoretical reason for this feedback coefficient to be 4.1, it's just a number that seems to work. 4.0 would work about the same. 3.0 would probably work, but the robot would be a bit more wobbly. 1.0 may not be enough to keep is upright. 10.0 might cause violent feedback oscillation in the ankles. In robotics, most numbers are like this, so the language is designed to make it easy to adjust them. In the throbol editor, you can hover over a number market with ~range
and shift-drag the mouse to change it and see the simulation results change interactively. It also works on live hardware within the limits of causality. Meaning: in simulation mode it reruns the whole simulation from the beginning when you change a parameter, but in live hardware mode it only changes it for the rest of the run.
There are also some numerical optimization tools to automatically tune parameters based on a fitness function. See the article Optimization in Throbol for more about this.
Multiple Timelines
One of the key tools in working with feedback loops is comparing the results with slightly different initial conditions or perturbations. You want the robot to follow the trajectory despite some noise in the actuators, irregular ground, wind, etc. So it's useful to look at the variation across a range of disturbances. There are several tools for analyzing this, but the most intuitive is a rendering that superimposes 10 instances of the robot from different timelines:
Experimentation
The best way to build intuition for robot walking is to experiment. Change parameters and see how things get worse or better. If you have a wasm+webgpu capable browser, you can try it directly in the browser or download the desktop version and run it. It has humanoid_walking.tb
as one of the example projects.
Here's my wish list for how the walking should be better:
- It swings its arms side-to-side too much, which looks dorky.
- There's some oscillation in the joint feedback loops. You can see it in the bot.ankle-y-right graph above.
- It takes small steps. It should walk faster.
- The walking speed feedback is weak.
- The direction control code is primitive. It should be able to follow a heading.