-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use binary_crossentropy instead of sparse_categorical_crossentropy. #311
base: master
Are you sure you want to change the base?
Conversation
Fix that confidence score will not be optimized when number of classes is one.
Hi I thought about this before, do you know if this conversion will work for multiple classes? |
the file name of 'aeroplane_xxx.txt' is easily to be misunderstood. Use train.txt/eval.txt instead of the class-specific list file.
Yes, the computation of binary_crossentropy will lead to the output result different from that when the categorical_crossentropy used. It seems to me both will be okay for the detect-and-classify case, but I cannot tell how much influence it will have on the weights updating. I just run the train script with "tranfer darknet" mode, learning rate 1e-4 and early stop patience20 to verify if it is still working in multiple classes but do not take much to tune the model. The training data is from voc2012 dataset script you provided also. I can see the model is converging and seems to be a little overfitted. What do you think of the binary_crossentropy? I attached the training output below:
|
By the way, I also have another PR #315 about voc2012.py when I test for this issue. |
Revise voc2012 dataset preparing script
I ran into the same problem with score <= 0.5 for only one class. If we only have one clas, the class probability is always 0.5, since the model does not learn it (class loss is aways zero). See #311 |
Hi Manuel, I do not think class_prob is always 0.5 is reasonable when there is only one class(i think it is more reasonable to be always 1.0), so i modified the loss function( to make it not always zero). In my understanding, the confidence score indicates how confident with the location coordinates, and the class prob indicates how confident with the class predictions. Please correct me if i am wrong. However, I have to admit that your fix is simpler and safer. Reference from Yolo v1 Paper:
|
You are right, it is more of a hack. With "reasonable" i just meant that the model did not learn anything about the classes since the class loss is always zero |
Hi,
I am training a face detector based on your yolov3 model, and I found a problem: After some epochs of training, the output bounding box of the model is quite accurate but the confidence of the output is always around 0.5.
Taking a deep dive into the loss function, I find that the cls_loss is base on sparse_categorical_crossentropy. It will always output 0 because there is only one class(face) in my training case, so the neural network will not learn to reduce the value of cls_loss. I guess the output of 0.5 is exactly the value of 0 activated by sigmoid.
Perhaps we can just use binary_crossentropy to replace sparse_categorical_crossentropy?