Which of these is SGD, Batch Gradient Descent, and Mini-Batch? What do they have in common? Are they all equal? What are their respective advantages and disadvantages? Given these advantages and disadvantages, which might we prefer?
Which of these is SGD, Batch Gradient Descent, and Mini-Batch? What do they have in common? Are they all equal? What are their respective advantages and disadvantages? Given these advantages and disadvantages, which might we prefer?