Optimization in Throbol May 15 2025
Throbol is designed on the principle that tuning parameters of robot control code is usually more time-consuming than getting the structure of the code right. Parameters, marked with the ~range
syntax, can be adjusted manually or with some built-in optimizers. This page describes the automatic parameter optimizers.
The first step in optimizing something is to design a fitness function for how good it is. Throbol follows the optimization literature convention, rather than the RL convention, by using minimization rather than maximization. So the fitness function is a measure of badness. For example, the humanoid_walking example integrates the total amount of balance feedback:
Then we specify how to optimize the function. Here we use the open-source Nomad optimizer. We give it the balance function, and a list of cells containing the parameters we want to optimize.
Typically you don't want to optimize every parameter in a sheet. For instance, the badness function here references err.side, which contains a parameter. If it were in the optimizer's scope, it would turn the feedback gain down to zero and ignore the fact that it was falling over.
There are 5 different optimizers: minimize.nomad
, minimize.particle
, minimize.derand1bin
, minimize.debest2bin
, and minimize.backprop
.
Nomad is a popular gradient-free optimizer. It generally starts by doing a grid search of parameters and narrows down its search region.
Particle search, Derand1bin, and Debest2bin are stochastic gradient-free search methods that maintain a pool of candidate solutions and add mutations according to some clever heuristics. Derand1bin is a good one to start with.
Backprop search uses gradients. Most (but not quite all) functions in Throbol provide reverse-mode differentiation operators to support backpropagation. Using minimize.backprop(foo, ...)
sets a negative input gradient on foo
, backpropagates it through the spreadsheet to the parameters, and applies the gradient to the parameters using the Adam optimizer. You can set Adam parameters like minimize.backprop(foo, ...){learning_rate=3e-4, regularization=1e-6, beta1=0.9, beta2=0.99, minibatch_size=32}
The backprop optimizer is still work in progress. It doesn't backpropagate through time -- every timestep is treated independently. This works well for tuning some parameters, but it doesn't work for ones that mainly affect state variables whose effect isn't immediately visible in the badness function.