Why is my training loss and testing loss not changing? #455

BCBeeravelly · 2023-05-21T13:09:28Z

BCBeeravelly
May 21, 2023

Hi guys, I was trying to work on the LinearRegressionModelv2 in the Section 3 - PyTorch Workflow, but for some reason, my model is not training (aka the training loss and test loss remain the same and the weight and bias do not change during training). Please tell me if I am missing out on something

Lets write a training loop

torch.manual_seed(42)
epochs = 1000

X_train = X_train.to(device)
X_test = X_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

for epoch in range(0, epochs):
    model_1.train()
    
    # 1. Forward pass
    y_pred = model_1(X_train)
    
    #2. Calculate the loss
    loss = loss_fn(y_pred, y_train)
    
    # 3. Optimizer zero grad
    optimizer.zero_grad()
    
    # 4. Loss backward
    loss.backward()
    
    
    # 5. Optimizer step
    optimizer.step()
    
    ### Testing
    model_1.eval()
    with torch.inference_mode():
        
        test_preds = model_1(X_test)
        test_loss = loss_fn(test_preds, y_test)
        
    if epoch % 100 == 0:
                
        print(f'Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}')

apyhyk · 2023-05-21T19:45:51Z

apyhyk
May 21, 2023

The issue is where you have your if statement set up.

`model_1.eval()
with torch.inference_mode():

test_preds = model_1(X_test)
test_loss = loss_fn(test_preds, y_test)

if epoch % 100 == 0:`

SHOULD BE

`if epoch % 100 == 0:
model_1.eval()
with torch.inference_mode():

test_preds = model_1(X_test)
test_loss = loss_fn(test_preds, y_test)`

Ensure that they are indented properly or else the code will not run

0 replies

BCBeeravelly · 2023-05-22T14:55:22Z

BCBeeravelly
May 22, 2023
Author

Hey.
I have written a training loop in the below way and it actually trains here. All I have done tbh is change the variables. I am still not sure why it doesnt work in the second case. Please have a look at the code I have written earlier. Thanks!

# An epoch is one loop through the data
epochs = 300
torch.manual_seed(42)

# Track different values
epoch_count = []
loss_values = []
test_loss_values = []


### Training
# 0. Loop through the data
for epoch in range(epochs):
    # Set the model to training mode
    model_0.train() # train model in PyTorch sets all parameters that require gradients
    
    # 1. Forward pass
    y_pred = model_0(X_train)
    
    # 2. Calculate the lodd
    loss = loss_fn(y_pred, y_train)
#     print(f'Loss: {loss}')
    
    # 3. Optimizer zero grad
    optimizer.zero_grad()
    
    # 4. Perform backpropagation on the loss with respect to the parameters of the model
    loss.backward()
    
    # 5. Step the optimizer (perform gradient descent)
    optimizer.step() # by default how the optimizer changes will accumulate through the loop so.. we have to zero them above in step 3
    
    model_0.eval() # turns off different settings not needed for the evaluation of the model
    
    
    # Print out model state_dict()
#     print(model_0.state_dict())
    
    ### Testing
    with torch.inference_mode(): # turns off gradient tracking
        # 1. Forward pass
        test_preds = model_0(X_test)
        
        # 2. Calculate the loss
        test_loss = loss_fn(test_preds, y_test)
    
    # print out what's happening
    if epoch % 10 == 0:
        epoch_count.append(epoch)
        loss_values.append(loss)
        test_loss_values.append(test_loss)
        print(f'Epoch: {epoch} | Test Loss: {test_loss}')

0 replies

mrdbourke · 2023-05-24T05:11:01Z

mrdbourke
May 24, 2023
Maintainer

Hi @BCBeeravelly ,

In your first example, you are printing out the model's loss every ~100 epochs.

It appears that your loss doesn't change because it gets so small very quickly.

So it reaches the bottom value after the first 100 epochs and then stops improving from there (because it's already very good).

The reason your second piece of code prints a decreasing loss value is because you print it out every 10 epochs (small enough to see the loss decreasing.

Try this example (printing out every 1 epoch) to see the loss going down (notebook link):

import torch
from torch import nn 

# Create weight and bias
weight = 0.7
bias = 0.3

# Create range values
start = 0
end = 1
step = 0.02

# Create X and y (features and labels)
X = torch.arange(start, end, step).unsqueeze(dim=1) # without unsqueeze, errors will happen later on (shapes within linear layers)
y = weight * X + bias 

# Split data
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]
print(f"Length X_train: {len(X_train)}, Length X_test: {len(X_test)}, Length y_train: {len(y_train)}, Length y_test: {len(y_test)}")

# Setup device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Subclass nn.Module to make our model
class LinearRegressionModelV2(nn.Module):
    def __init__(self):
        super().__init__()
        # Use nn.Linear() for creating the model parameters
        self.linear_layer = nn.Linear(in_features=1, 
                                      out_features=1)
    
    # Define the forward computation (input data x flows through nn.Linear())
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear_layer(x)

# Set the manual seed when creating the model (this isn't always need but is used for demonstrative purposes, try commenting it out and seeing what happens)
torch.manual_seed(42)
model_1 = LinearRegressionModelV2()
model_1, model_1.state_dict()

# Create loss function
loss_fn = nn.L1Loss()

# Create optimizer
optimizer = torch.optim.SGD(params=model_1.parameters(), # optimize newly created model's parameters
                            lr=0.01)

# Training
torch.manual_seed(42)
epochs = 100

X_train = X_train.to(device)
X_test = X_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

for epoch in range(0, epochs):
    model_1.train()
    
    # 1. Forward pass
    y_pred = model_1(X_train)
    
    #2. Calculate the loss
    loss = loss_fn(y_pred, y_train)
    
    # 3. Optimizer zero grad
    optimizer.zero_grad()
    
    # 4. Loss backward
    loss.backward()
    
    
    # 5. Optimizer step
    optimizer.step()
    
    ### Testing
    model_1.eval()
    with torch.inference_mode():
        
        test_preds = model_1(X_test)
        test_loss = loss_fn(test_preds, y_test)
        
    if epoch % 1 == 0:
                
        print(f'Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}')

Output:

Length X_train: 40, Length X_test: 10, Length y_train: 40, Length y_test: 10
Epoch: 0 | Train loss: 0.55518 | Test loss: 0.57398
Epoch: 1 | Train loss: 0.54366 | Test loss: 0.56051
Epoch: 2 | Train loss: 0.53214 | Test loss: 0.54703
Epoch: 3 | Train loss: 0.52061 | Test loss: 0.53356
Epoch: 4 | Train loss: 0.50909 | Test loss: 0.52009
Epoch: 5 | Train loss: 0.49757 | Test loss: 0.50662
Epoch: 6 | Train loss: 0.48605 | Test loss: 0.49315
Epoch: 7 | Train loss: 0.47453 | Test loss: 0.47968
Epoch: 8 | Train loss: 0.46301 | Test loss: 0.46621
Epoch: 9 | Train loss: 0.45149 | Test loss: 0.45274
Epoch: 10 | Train loss: 0.43997 | Test loss: 0.43927
Epoch: 11 | Train loss: 0.42845 | Test loss: 0.42580
Epoch: 12 | Train loss: 0.41693 | Test loss: 0.41232
Epoch: 13 | Train loss: 0.40541 | Test loss: 0.39885
Epoch: 14 | Train loss: 0.39388 | Test loss: 0.38538
Epoch: 15 | Train loss: 0.38236 | Test loss: 0.37191
Epoch: 16 | Train loss: 0.37084 | Test loss: 0.35844
Epoch: 17 | Train loss: 0.35932 | Test loss: 0.34497
Epoch: 18 | Train loss: 0.34780 | Test loss: 0.33150
Epoch: 19 | Train loss: 0.33628 | Test loss: 0.31803
Epoch: 20 | Train loss: 0.32476 | Test loss: 0.30456
Epoch: 21 | Train loss: 0.31324 | Test loss: 0.29109
Epoch: 22 | Train loss: 0.30172 | Test loss: 0.27761
Epoch: 23 | Train loss: 0.29020 | Test loss: 0.26414
Epoch: 24 | Train loss: 0.27867 | Test loss: 0.25067
Epoch: 25 | Train loss: 0.26715 | Test loss: 0.23720
Epoch: 26 | Train loss: 0.25563 | Test loss: 0.22373
Epoch: 27 | Train loss: 0.24411 | Test loss: 0.21026
Epoch: 28 | Train loss: 0.23259 | Test loss: 0.19679
Epoch: 29 | Train loss: 0.22107 | Test loss: 0.18332
Epoch: 30 | Train loss: 0.20955 | Test loss: 0.16985
Epoch: 31 | Train loss: 0.19803 | Test loss: 0.15638
Epoch: 32 | Train loss: 0.18651 | Test loss: 0.14290
Epoch: 33 | Train loss: 0.17499 | Test loss: 0.12943
Epoch: 34 | Train loss: 0.16346 | Test loss: 0.11596
Epoch: 35 | Train loss: 0.15194 | Test loss: 0.10249
Epoch: 36 | Train loss: 0.14042 | Test loss: 0.08902
Epoch: 37 | Train loss: 0.12890 | Test loss: 0.07555
Epoch: 38 | Train loss: 0.11738 | Test loss: 0.06208
Epoch: 39 | Train loss: 0.10586 | Test loss: 0.04861
Epoch: 40 | Train loss: 0.09434 | Test loss: 0.03514
Epoch: 41 | Train loss: 0.08282 | Test loss: 0.02167
Epoch: 42 | Train loss: 0.07130 | Test loss: 0.00841
Epoch: 43 | Train loss: 0.05978 | Test loss: 0.00661
Epoch: 44 | Train loss: 0.04825 | Test loss: 0.01875
Epoch: 45 | Train loss: 0.03738 | Test loss: 0.02970
Epoch: 46 | Train loss: 0.03085 | Test loss: 0.03665
Epoch: 47 | Train loss: 0.02764 | Test loss: 0.04129
Epoch: 48 | Train loss: 0.02584 | Test loss: 0.04444
Epoch: 49 | Train loss: 0.02470 | Test loss: 0.04686
Epoch: 50 | Train loss: 0.02389 | Test loss: 0.04785
Epoch: 51 | Train loss: 0.02336 | Test loss: 0.04883
Epoch: 52 | Train loss: 0.02287 | Test loss: 0.04912
Epoch: 53 | Train loss: 0.02246 | Test loss: 0.04940
Epoch: 54 | Train loss: 0.02206 | Test loss: 0.04898
Epoch: 55 | Train loss: 0.02171 | Test loss: 0.04857
Epoch: 56 | Train loss: 0.02136 | Test loss: 0.04815
Epoch: 57 | Train loss: 0.02100 | Test loss: 0.04774
Epoch: 58 | Train loss: 0.02065 | Test loss: 0.04732
Epoch: 59 | Train loss: 0.02030 | Test loss: 0.04691
Epoch: 60 | Train loss: 0.01996 | Test loss: 0.04580
Epoch: 61 | Train loss: 0.01961 | Test loss: 0.04539
Epoch: 62 | Train loss: 0.01927 | Test loss: 0.04429
Epoch: 63 | Train loss: 0.01892 | Test loss: 0.04318
Epoch: 64 | Train loss: 0.01858 | Test loss: 0.04277
Epoch: 65 | Train loss: 0.01824 | Test loss: 0.04167
Epoch: 66 | Train loss: 0.01790 | Test loss: 0.04125
Epoch: 67 | Train loss: 0.01755 | Test loss: 0.04015
Epoch: 68 | Train loss: 0.01721 | Test loss: 0.03973
Epoch: 69 | Train loss: 0.01687 | Test loss: 0.03863
Epoch: 70 | Train loss: 0.01652 | Test loss: 0.03753
Epoch: 71 | Train loss: 0.01618 | Test loss: 0.03712
Epoch: 72 | Train loss: 0.01583 | Test loss: 0.03601
Epoch: 73 | Train loss: 0.01549 | Test loss: 0.03560
Epoch: 74 | Train loss: 0.01515 | Test loss: 0.03450
Epoch: 75 | Train loss: 0.01480 | Test loss: 0.03408
Epoch: 76 | Train loss: 0.01446 | Test loss: 0.03298
Epoch: 77 | Train loss: 0.01411 | Test loss: 0.03256
Epoch: 78 | Train loss: 0.01378 | Test loss: 0.03146
Epoch: 79 | Train loss: 0.01343 | Test loss: 0.03036
Epoch: 80 | Train loss: 0.01309 | Test loss: 0.02994
Epoch: 81 | Train loss: 0.01274 | Test loss: 0.02884
Epoch: 82 | Train loss: 0.01240 | Test loss: 0.02843
Epoch: 83 | Train loss: 0.01206 | Test loss: 0.02733
Epoch: 84 | Train loss: 0.01171 | Test loss: 0.02691
Epoch: 85 | Train loss: 0.01137 | Test loss: 0.02581
Epoch: 86 | Train loss: 0.01102 | Test loss: 0.02471
Epoch: 87 | Train loss: 0.01069 | Test loss: 0.02429
Epoch: 88 | Train loss: 0.01034 | Test loss: 0.02319
Epoch: 89 | Train loss: 0.01000 | Test loss: 0.02277
Epoch: 90 | Train loss: 0.00965 | Test loss: 0.02167
Epoch: 91 | Train loss: 0.00931 | Test loss: 0.02126
Epoch: 92 | Train loss: 0.00897 | Test loss: 0.02016
Epoch: 93 | Train loss: 0.00862 | Test loss: 0.01905
Epoch: 94 | Train loss: 0.00828 | Test loss: 0.01864
Epoch: 95 | Train loss: 0.00793 | Test loss: 0.01754
Epoch: 96 | Train loss: 0.00759 | Test loss: 0.01712
Epoch: 97 | Train loss: 0.00725 | Test loss: 0.01602
Epoch: 98 | Train loss: 0.00690 | Test loss: 0.01560
Epoch: 99 | Train loss: 0.00656 | Test loss: 0.01450

0 replies

BCBeeravelly · 2023-05-24T18:38:00Z

BCBeeravelly
May 24, 2023
Author

Hey all,
Thank you so much for the help. But I have debugged what the problem is. I have been using the wrong optimizer variable to perform the step. Sorry the confusion!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is my training loss and testing loss not changing? #455

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why is my training loss and testing loss not changing? #455

Uh oh!

Uh oh!

BCBeeravelly May 21, 2023

Lets write a training loop

Replies: 4 comments

Uh oh!

Uh oh!

apyhyk May 21, 2023

Uh oh!

Uh oh!

BCBeeravelly May 22, 2023 Author

Uh oh!

Uh oh!

mrdbourke May 24, 2023 Maintainer

Uh oh!

BCBeeravelly May 24, 2023 Author

BCBeeravelly
May 21, 2023

apyhyk
May 21, 2023

BCBeeravelly
May 22, 2023
Author

mrdbourke
May 24, 2023
Maintainer

BCBeeravelly
May 24, 2023
Author