Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when calling loss.backward() function #11

Open
anguyen9630 opened this issue Jan 29, 2023 · 1 comment
Open

RuntimeError when calling loss.backward() function #11

anguyen9630 opened this issue Jan 29, 2023 · 1 comment

Comments

@anguyen9630
Copy link

Hi, I know it has been a year since is has been done but I am not sure if you can help me. When using implicit calls, I get the following issue during training after calling the loss.backward() function.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1000, 1]], which is output 0 of NormBackward1, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I basically just grabbed the VGG19 model off pytorch and convert it. ResNet-18 have the same issue.

import torch
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

model = models.vgg19()

from bnn import BConfig, prepare_binary_model
# Import a few examples of quantizers
from bnn.ops import *

# Define the binarization configuration and assign it to the model
bconfig = BConfig(
    activation_pre_process = BasicInputBinarizer,
    activation_post_process = BasicScaleBinarizer,
    # optionally, one can pass certain custom variables
    weight_pre_process = XNORWeightBinarizer.with_args(center_weights=True)
)
# Convert the model appropiately, propagating the changes from parent node to leafs
# The custom_config_layers_name syntax will perform a match based on the layer name, setting a custom quantization function.
bmodel = prepare_binary_model(model, bconfig, custom_config_layers_name=[{'conv1' : BConfig()}])

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(bmodel.parameters(), lr=0.001, momentum=0.9)

print("Training begin!")
# Select GPU 4 as execution device
device = torch.device("cuda:4" if torch.cuda.is_available() else "cpu")

print("The model will be running on", device, "device")
# Convert model parameters and buffers to CPU or Cuda
bmodel.to(device)

save_path = './models/vgg19.pth'

bestaccuracy = 0.0
#break_epoch = 0

t_begin = time()
for epoch in range(50):  # loop over the dataset multiple times

    running_loss = 0.0
    break_epoch = epoch + 1
    
    correct = 0
    total = 0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs, labels = inputs.cuda(), labels.cuda()
        # zero the parameter gradients
        optimizer.zero_grad()
        
        #print(inputs.size(1))
        
        # forward + backward + optimize
        outputs = bmodel(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        # check for correct answer
        _, predictions = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predictions == labels).sum().item()

        # print statistics
        running_loss += loss.item()
        

        if i % 50 == 49:    # print every 50 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 50:.3f}')
            running_loss = 0.0
    
    #calculate accurary of epoch
    accuracy = 100 * correct / total
    print(f'Epoch {epoch + 1} accuracy: {accuracy:.3f}')
    
    #If accuracy is better than the last, save the model
    if accuracy > bestaccuracy:
        torch.save(bmodel.state_dict(), save_path)
        bestaccuracy = accuracy
        

time_taken = int(time()-t_begin)
time_min = int(time_taken/60)
time_sec = time_taken - (time_min*60)
print(f'Finished Training! Best accuracy: {bestaccuracy:.3f} - Training time (mm:ss): {time_min}:{time_sec}')
@Neltherion
Copy link

I too have this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants