Effect of scaling features, problems of Vanishing Gradient and Exploding Gradient problems, the techniques to solve the problems including, properly initializing weights (Xavieror He initialization), Gradient Clipping, and using proper Activation Functions (e.g.: ReLU), Batch Normalization, Layer Normalization, alternative deep learning architectures (e.g.: LSTM or GRU for RNNs) and deep learning related optimizers including, Gradient Descent Optimizer, SGD with Momentum, AdaGrad (Adaptive Gradient Algorithm), RMSProp (Root Mean Square Propagation), and Adam (Adaptive Moment Estimation).