Convergence of batch gradient descent in logistic regression

General Tech Learning Aids/Tools 2 years ago

0 2 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (2)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

I am not really sure about how it behaves when using batch gradient descent in logistic regression.

As we do each iteration, L(W)L(W) is getting bigger and bigger, it will jump across the largest point and L(W)L(W) is going down. How do I know it without computing L(W)L(W) but only knowing old ww vector and updated ww vector?

If I use regularized logistic regression, will the weights become smaller and smaller or any other patterns?

profilepic.png
manpreet 2 years ago

Regularization is designed to combat overfitting, but not aid in gradient descent convergence.

If you are minimizing a function JJ parameterized by vector θθ and where each element of θθ is identified by θjθj, (i.e. minimize J(θ)J(θ)).

Then the basic idea in batch gradient descent is to iterate until convergence by computing a new value of θθ from the previous one in the following way. Updated each θjθj simultaneously with the formula.

θj:=θjαθjJ(θ)θj:=θj−α∂∂θjJ(θ)

That αα term is called the learning rate. It's arbitrary and if it's very small then the algorithm will converge slowly which will make the algorithm take a long time, but if it's too large, then what can happen is exactly what you are experiencing. In this case θθ will be updated in the right direction, but will go too far and jump past the minimum or it can even climb out and increase.

The remedy is to simply decrease αα until it doesn't happen. A sufficiently small learning rate guarantees that J(θ)J(θ) will decrease on every iteration. The trick is to determine what value of αα is a good one that allows fast convergence but avoids non-convergence.

A useful approach is to plot J(θ)J(θ) while the algorithm is running to observe how it decreases. Start with a small value (e.g. 0.01) increase it if it appears to result in slow convergence or decrease it still further in the case of non-convergence.


0 views   0 shares

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.