Parallel coordinate descent: Difference between revisions
(Created page with "==Definition== '''Parallel coordinate descent''' is a variant of gradient descent where we use a different learning rate in each coordinate. Explicitly, whereas with ordi...") |
(No difference)
|
Revision as of 04:57, 8 September 2014
Definition
Parallel coordinate descent is a variant of gradient descent where we use a different learning rate in each coordinate. Explicitly, whereas with ordinary gradient descent, we define each iterate by subtracting a scalar multiple of the gradient vector from the previous iterate:
Ordinary gradient descent:
In parallel coordinate descent, we use a vector learning rate, i.e., we use a learning rate that could be different in each coordinate:
Parallel coordinate descent: for each coordinate :
Alternatively, using coordinate-wise vector multiplication, we can describe the above as: