DDPG Actor Update ( Pytorch Implementation Issus )

This is from https://github.com/MoritzTaylor/ddpg-pytorch/blob/master/ddpg.py implementation and I guess most of the ddpg implementation are written this way.

        
self.critic_optimizer.zero_grad()
state_action_batch = self.critic(state_batch, action_batch)
value_loss = F.mse_loss(state_action_batch, expected_values.detach())
value_loss.backward()
self.critic_optimizer.step()

# Update the actor network
self.actor_optimizer.zero_grad()
policy_loss = -self.critic(state_batch, self.actor(state_batch))
policy_loss = policy_loss.mean()
policy_loss.backward()
self.actor_optimizer.step()

However after policy_loss.backwad(), I think the gradient is left in the critic network with respect to critic parameters. Shouldn't this affect the next update of critic?

If it does, what could be the solution?



Read more here: https://stackoverflow.com/questions/68492785/ddpg-actor-update-pytorch-implementation-issus

Content Attribution

This content was originally published by Dongri at Recent Questions - Stack Overflow, and is syndicated here via their RSS feed. You can read the original post over there.

%d bloggers like this: