Parallel coordinate descent
Definition
Parallel coordinate descent is a variant of gradient descent where we use a different learning rate in each coordinate. Explicitly, whereas with ordinary gradient descent, we define each iterate by subtracting a scalar multiple of the gradient vector from the previous iterate:
Ordinary gradient descent:
In parallel coordinate descent, we use a vector learning rate, i.e., we use a learning rate that could be different in each coordinate:
Parallel coordinate descent: for each coordinate :
Alternatively, using coordinate-wise vector multiplication, we can describe the above as: