Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use binary_crossentropy instead of sparse_categorical_crossentropy. #311

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yichenj
Copy link
Contributor

@yichenj yichenj commented Aug 3, 2020

Hi,

I am training a face detector based on your yolov3 model, and I found a problem: After some epochs of training, the output bounding box of the model is quite accurate but the confidence of the output is always around 0.5.

Taking a deep dive into the loss function, I find that the cls_loss is base on sparse_categorical_crossentropy. It will always output 0 because there is only one class(face) in my training case, so the neural network will not learn to reduce the value of cls_loss. I guess the output of 0.5 is exactly the value of 0 activated by sigmoid.

Perhaps we can just use binary_crossentropy to replace sparse_categorical_crossentropy?

Fix that confidence score will not be optimized when number of classes is one.
@zzh8829
Copy link
Owner

zzh8829 commented Aug 6, 2020

Hi I thought about this before, do you know if this conversion will work for multiple classes?

the file name of 'aeroplane_xxx.txt' is easily to be misunderstood.
Use train.txt/eval.txt instead of the class-specific list file.
@yichenj
Copy link
Contributor Author

yichenj commented Aug 7, 2020

Yes, the computation of binary_crossentropy will lead to the output result different from that when the categorical_crossentropy used. It seems to me both will be okay for the detect-and-classify case, but I cannot tell how much influence it will have on the weights updating.

I just run the train script with "tranfer darknet" mode, learning rate 1e-4 and early stop patience20 to verify if it is still working in multiple classes but do not take much to tune the model. The training data is from voc2012 dataset script you provided also. I can see the model is converging and seems to be a little overfitted.

What do you think of the binary_crossentropy?

I attached the training output below:

715/Unknown - 129s 181ms/step - loss: 819.5117 - yolo_output_0_loss: 28.9505 - yolo_output_1_loss: 104.6755 - yolo_output_2_loss: 675.1611
Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
715/715 [==============================] - 298s 416ms/step - loss: 819.5117 - yolo_output_0_loss: 28.9505 - 
yolo_output_1_loss: 104.6755 - yolo_output_2_loss: 675.1611 - val_loss: 180.2229 - val_yolo_output_0_loss: 6.5867 - val_yolo_output_1_loss: 18.5627 - val_yolo_output_2_loss: 144.3504 - lr: 1.0000e-04
Epoch 2/50
715/715 [==============================] - ETA: 0s - loss: 110.8124 - yolo_output_0_loss: 5.3276 - yolo_output_1_loss: 12.9779 - yolo_output_2_loss: 81.7900
Epoch 00002: saving model to checkpoints/yolov3_train_2.tf
715/715 [==============================] - 292s 409ms/step - loss: 110.8124 - yolo_output_0_loss: 5.3276 - yolo_output_1_loss: 12.9779 - yolo_output_2_loss: 81.7900 - val_loss: 71.5413 - val_yolo_output_0_loss: 4.7071 - val_yolo_output_1_loss: 9.2439 - val_yolo_output_2_loss: 46.8816 - lr: 1.0000e-04
Epoch 3/50
715/715 [==============================] - ETA: 0s - loss: 54.7577 - yolo_output_0_loss: 3.8612 - yolo_output_1_loss: 7.8642 - yolo_output_2_loss: 32.3329
Epoch 00003: saving model to checkpoints/yolov3_train_3.tf
715/715 [==============================] - 291s 407ms/step - loss: 54.7577 - yolo_output_0_loss: 3.8612 - yolo_output_1_loss: 7.8642 - yolo_output_2_loss: 32.3329 - val_loss: 43.5827 - val_yolo_output_0_loss: 4.1549 - val_yolo_output_1_loss: 6.4818 - val_yolo_output_2_loss: 22.2565 - lr: 1.0000e-04
Epoch 4/50
715/715 [==============================] - ETA: 0s - loss: 36.5493 - yolo_output_0_loss: 3.0293 - yolo_output_1_loss: 5.5990 - yolo_output_2_loss: 17.2411
Epoch 00004: saving model to checkpoints/yolov3_train_4.tf
715/715 [==============================] - 288s 403ms/step - loss: 36.5493 - yolo_output_0_loss: 3.0293 - yolo_output_1_loss: 5.5990 - yolo_output_2_loss: 17.2411 - val_loss: 32.7991 - val_yolo_output_0_loss: 4.2579 - val_yolo_output_1_loss: 5.0273 - val_yolo_output_2_loss: 12.8445 - lr: 1.0000e-04
Epoch 5/50
715/715 [==============================] - ETA: 0s - loss: 27.8374 - yolo_output_0_loss: 2.2460 - yolo_output_1_loss: 4.2862 - yolo_output_2_loss: 10.6454
Epoch 00005: saving model to checkpoints/yolov3_train_5.tf
715/715 [==============================] - 292s 408ms/step - loss: 27.8374 - yolo_output_0_loss: 2.2460 - yolo_output_1_loss: 4.2862 - yolo_output_2_loss: 10.6454 - val_loss: 27.5969 - val_yolo_output_0_loss: 4.6249 - val_yolo_output_1_loss: 4.4424 - val_yolo_output_2_loss: 7.8793 - lr: 1.0000e-04
Epoch 6/50
715/715 [==============================] - ETA: 0s - loss: 22.6884 - yolo_output_0_loss: 1.5510 - yolo_output_1_loss: 3.3140 - yolo_output_2_loss: 7.1820
Epoch 00006: saving model to checkpoints/yolov3_train_6.tf
715/715 [==============================] - 291s 407ms/step - loss: 22.6884 - yolo_output_0_loss: 1.5510 - yolo_output_1_loss: 3.3140 - yolo_output_2_loss: 7.1820 - val_loss: 25.6328 - val_yolo_output_0_loss: 5.0260 - val_yolo_output_1_loss: 4.3830 - val_yolo_output_2_loss: 5.5915 - lr: 1.0000e-04
Epoch 7/50
715/715 [==============================] - ETA: 0s - loss: 19.3426 - yolo_output_0_loss: 1.1537 - yolo_output_1_loss: 2.4202 - yolo_output_2_loss: 5.1452
Epoch 00007: saving model to checkpoints/yolov3_train_7.tf
715/715 [==============================] - 290s 406ms/step - loss: 19.3426 - yolo_output_0_loss: 1.1537 - yolo_output_1_loss: 2.4202 - yolo_output_2_loss: 5.1452 - val_loss: 24.6496 - val_yolo_output_0_loss: 5.3167 - val_yolo_output_1_loss: 4.3269 - val_yolo_output_2_loss: 4.3915 - lr: 1.0000e-04
Epoch 8/50
715/715 [==============================] - ETA: 0s - loss: 17.1593 - yolo_output_0_loss: 0.9577 - yolo_output_1_loss: 1.7566 - yolo_output_2_loss: 3.8395
Epoch 00008: saving model to checkpoints/yolov3_train_8.tf
715/715 [==============================] - 290s 405ms/step - loss: 17.1593 - yolo_output_0_loss: 0.9577 - yolo_output_1_loss: 1.7566 - yolo_output_2_loss: 3.8395 - val_loss: 23.6398 - val_yolo_output_0_loss: 5.1509 - val_yolo_output_1_loss: 4.4299 - val_yolo_output_2_loss: 3.4624 - lr: 1.0000e-04
Epoch 9/50
715/715 [==============================] - ETA: 0s - loss: 15.6915 - yolo_output_0_loss: 0.8240 - yolo_output_1_loss: 1.3136 - yolo_output_2_loss: 2.9669
Epoch 00009: saving model to checkpoints/yolov3_train_9.tf
715/715 [==============================] - 289s 404ms/step - loss: 15.6915 - yolo_output_0_loss: 0.8240 - yolo_output_1_loss: 1.3136 - yolo_output_2_loss: 2.9669 - val_loss: 23.4546 - val_yolo_output_0_loss: 5.0880 - val_yolo_output_1_loss: 4.7778 - val_yolo_output_2_loss: 3.0118 - lr: 1.0000e-04
Epoch 10/50
715/715 [==============================] - ETA: 0s - loss: 14.5766 - yolo_output_0_loss: 0.7202 - yolo_output_1_loss: 1.0361 - yolo_output_2_loss: 2.2553
Epoch 00010: saving model to checkpoints/yolov3_train_10.tf
715/715 [==============================] - 290s 406ms/step - loss: 14.5766 - yolo_output_0_loss: 0.7202 - yolo_output_1_loss: 1.0361 - yolo_output_2_loss: 2.2553 - val_loss: 23.7491 - val_yolo_output_0_loss: 5.8852 - val_yolo_output_1_loss: 4.5641 - val_yolo_output_2_loss: 2.7482 - lr: 1.0000e-04
Epoch 11/50
715/715 [==============================] - ETA: 0s - loss: 13.7487 - yolo_output_0_loss: 0.6390 - yolo_output_1_loss: 0.8311 - yolo_output_2_loss: 1.7401
Epoch 00011: saving model to checkpoints/yolov3_train_11.tf
715/715 [==============================] - 292s 408ms/step - loss: 13.7487 - yolo_output_0_loss: 0.6390 - yolo_output_1_loss: 0.8311 - yolo_output_2_loss: 1.7401 - val_loss: 23.7634 - val_yolo_output_0_loss: 6.1159 - val_yolo_output_1_loss: 4.7568 - val_yolo_output_2_loss: 2.3672 - lr: 1.0000e-04
Epoch 12/50
715/715 [==============================] - ETA: 0s - loss: 13.1561 - yolo_output_0_loss: 0.5972 - yolo_output_1_loss: 0.6994 - yolo_output_2_loss: 1.3506
Epoch 00012: saving model to checkpoints/yolov3_train_12.tf
715/715 [==============================] - 290s 405ms/step - loss: 13.1561 - yolo_output_0_loss: 0.5972 - yolo_output_1_loss: 0.6994 - yolo_output_2_loss: 1.3506 - val_loss: 23.7688 - val_yolo_output_0_loss: 5.7243 - val_yolo_output_1_loss: 5.2128 - val_yolo_output_2_loss: 2.3391 - lr: 1.0000e-04
Epoch 13/50
715/715 [==============================] - ETA: 0s - loss: 12.6940 - yolo_output_0_loss: 0.5004 - yolo_output_1_loss: 0.6218 - yolo_output_2_loss: 1.0971
Epoch 00013: saving model to checkpoints/yolov3_train_13.tf
715/715 [==============================] - 291s 407ms/step - loss: 12.6940 - yolo_output_0_loss: 0.5004 - yolo_output_1_loss: 0.6218 - yolo_output_2_loss: 1.0971 - val_loss: 23.9774 - val_yolo_output_0_loss: 5.8261 - val_yolo_output_1_loss: 5.3457 - val_yolo_output_2_loss: 2.3514 - lr: 1.0000e-04
Epoch 14/50
715/715 [==============================] - ETA: 0s - loss: 12.3385 - yolo_output_0_loss: 0.4928 - yolo_output_1_loss: 0.5467 - yolo_output_2_loss: 0.8634
Epoch 00014: saving model to checkpoints/yolov3_train_14.tf
715/715 [==============================] - 291s 407ms/step - loss: 12.3385 - yolo_output_0_loss: 0.4928 - yolo_output_1_loss: 0.5467 - yolo_output_2_loss: 0.8634 - val_loss: 24.5731 - val_yolo_output_0_loss: 6.5884 - val_yolo_output_1_loss: 5.1778 - val_yolo_output_2_loss: 2.3918 - lr: 1.0000e-04
Epoch 15/50
715/715 [==============================] - ETA: 0s - loss: 12.0355 - yolo_output_0_loss: 0.4475 - yolo_output_1_loss: 0.4991 - yolo_output_2_loss: 0.6954
Epoch 00015: saving model to checkpoints/yolov3_train_15.tf
715/715 [==============================] - 289s 404ms/step - loss: 12.0355 - yolo_output_0_loss: 0.4475 - yolo_output_1_loss: 0.4991 - yolo_output_2_loss: 0.6954 - val_loss: 24.6375 - val_yolo_output_0_loss: 6.0910 - val_yolo_output_1_loss: 5.6231 - val_yolo_output_2_loss: 2.5518 - lr: 1.0000e-04
Epoch 16/50
715/715 [==============================] - ETA: 0s - loss: 11.8657 - yolo_output_0_loss: 0.4290 - yolo_output_1_loss: 0.4797 - yolo_output_2_loss: 0.6056
Epoch 00016: saving model to checkpoints/yolov3_train_16.tf
715/715 [==============================] - 292s 409ms/step - loss: 11.8657 - yolo_output_0_loss: 0.4290 - yolo_output_1_loss: 0.4797 - yolo_output_2_loss: 0.6056 - val_loss: 25.8418 - val_yolo_output_0_loss: 6.4979 - val_yolo_output_1_loss: 6.4148 - val_yolo_output_2_loss: 2.5979 - lr: 1.0000e-04
Epoch 17/50
715/715 [==============================] - ETA: 0s - loss: 11.6607 - yolo_output_0_loss: 0.4017 - yolo_output_1_loss: 0.4542 - yolo_output_2_loss: 0.4942
Epoch 00017: saving model to checkpoints/yolov3_train_17.tf
715/715 [==============================] - 292s 409ms/step - loss: 11.6607 - yolo_output_0_loss: 0.4017 - yolo_output_1_loss: 0.4542 - yolo_output_2_loss: 0.4942 - val_loss: 25.9489 - val_yolo_output_0_loss: 6.7254 - val_yolo_output_1_loss: 6.2932 - val_yolo_output_2_loss: 2.6406 - lr: 1.0000e-04
Epoch 18/50
715/715 [==============================] - ETA: 0s - loss: 11.4946 - yolo_output_0_loss: 0.3758 - yolo_output_1_loss: 0.4128 - yolo_output_2_loss: 0.4374
Epoch 00018: saving model to checkpoints/yolov3_train_18.tf
715/715 [==============================] - 286s 400ms/step - loss: 11.4946 - yolo_output_0_loss: 0.3758 - yolo_output_1_loss: 0.4128 - yolo_output_2_loss: 0.4374 - val_loss: 25.9633 - val_yolo_output_0_loss: 6.7053 - val_yolo_output_1_loss: 6.2931 - val_yolo_output_2_loss: 2.7149 - lr: 1.0000e-04
Epoch 19/50
715/715 [==============================] - ETA: 0s - loss: 11.3196 - yolo_output_0_loss: 0.3771 - yolo_output_1_loss: 0.3616 - yolo_output_2_loss: 0.3506
Epoch 00019: ReduceLROnPlateau reducing learning rate to 9.999999747378752e-06.

Epoch 00019: saving model to checkpoints/yolov3_train_19.tf
715/715 [==============================] - 290s 405ms/step - loss: 11.3196 - yolo_output_0_loss: 0.3771 - yolo_output_1_loss: 0.3616 - yolo_output_2_loss: 0.3506 - val_loss: 26.3280 - val_yolo_output_0_loss: 6.9310 - val_yolo_output_1_loss: 6.3072 - val_yolo_output_2_loss: 2.8805 - lr: 1.0000e-04
Epoch 20/50
715/715 [==============================] - ETA: 0s - loss: 10.8853 - yolo_output_0_loss: 0.2090 - yolo_output_1_loss: 0.2390 - yolo_output_2_loss: 0.2333
Epoch 00020: saving model to checkpoints/yolov3_train_20.tf
715/715 [==============================] - 291s 408ms/step - loss: 10.8853 - yolo_output_0_loss: 0.2090 - yolo_output_1_loss: 0.2390 - yolo_output_2_loss: 0.2333 - val_loss: 26.0611 - val_yolo_output_0_loss: 6.5314 - val_yolo_output_1_loss: 6.4562 - val_yolo_output_2_loss: 2.8753 - lr: 1.0000e-05
Epoch 21/50
715/715 [==============================] - ETA: 0s - loss: 10.6879 - yolo_output_0_loss: 0.1426 - yolo_output_1_loss: 0.1790 - yolo_output_2_loss: 0.1749
Epoch 00021: saving model to checkpoints/yolov3_train_21.tf
715/715 [==============================] - 288s 403ms/step - loss: 10.6879 - yolo_output_0_loss: 0.1426 - yolo_output_1_loss: 0.1790 - yolo_output_2_loss: 0.1749 - val_loss: 26.3808 - val_yolo_output_0_loss: 6.6464 - val_yolo_output_1_loss: 6.6074 - val_yolo_output_2_loss: 2.9432 - lr: 1.0000e-05
Epoch 22/50
715/715 [==============================] - ETA: 0s - loss: 10.6148 - yolo_output_0_loss: 0.1238 - yolo_output_1_loss: 0.1609 - yolo_output_2_loss: 0.1553
Epoch 00022: saving model to checkpoints/yolov3_train_22.tf
715/715 [==============================] - 291s 407ms/step - loss: 10.6148 - yolo_output_0_loss: 0.1238 - yolo_output_1_loss: 0.1609 - yolo_output_2_loss: 0.1553 - val_loss: 26.5839 - val_yolo_output_0_loss: 6.7420 - val_yolo_output_1_loss: 6.6507 - val_yolo_output_2_loss: 3.0263 - lr: 1.0000e-05
Epoch 23/50
715/715 [==============================] - ETA: 0s - loss: 10.5493 - yolo_output_0_loss: 0.1082 - yolo_output_1_loss: 0.1481 - yolo_output_2_loss: 0.1396
Epoch 00023: saving model to checkpoints/yolov3_train_23.tf
715/715 [==============================] - 294s 411ms/step - loss: 10.5493 - yolo_output_0_loss: 0.1082 - yolo_output_1_loss: 0.1481 - yolo_output_2_loss: 0.1396 - val_loss: 26.9190 - val_yolo_output_0_loss: 6.8917 - val_yolo_output_1_loss: 6.8101 - val_yolo_output_2_loss: 3.0763 - lr: 1.0000e-05
Epoch 24/50
715/715 [==============================] - ETA: 0s - loss: 10.4861 - yolo_output_0_loss: 0.0964 - yolo_output_1_loss: 0.1365 - yolo_output_2_loss: 0.1267
Epoch 00024: saving model to checkpoints/yolov3_train_24.tf
715/715 [==============================] - 292s 409ms/step - loss: 10.4861 - yolo_output_0_loss: 0.0964 - yolo_output_1_loss: 0.1365 - yolo_output_2_loss: 0.1267 - val_loss: 27.1788 - val_yolo_output_0_loss: 7.0300 - val_yolo_output_1_loss: 6.9188 - val_yolo_output_2_loss: 3.1187 - lr: 1.0000e-05
Epoch 25/50
715/715 [==============================] - ETA: 0s - loss: 10.4143 - yolo_output_0_loss: 0.0841 - yolo_output_1_loss: 0.1229 - yolo_output_2_loss: 0.1135
Epoch 00025: saving model to checkpoints/yolov3_train_25.tf
715/715 [==============================] - 291s 406ms/step - loss: 10.4143 - yolo_output_0_loss: 0.0841 - yolo_output_1_loss: 0.1229 - yolo_output_2_loss: 0.1135 - val_loss: 27.3874 - val_yolo_output_0_loss: 7.1310 - val_yolo_output_1_loss: 6.9929 - val_yolo_output_2_loss: 3.1886 - lr: 1.0000e-05
Epoch 26/50
715/715 [==============================] - ETA: 0s - loss: 10.3489 - yolo_output_0_loss: 0.0750 - yolo_output_1_loss: 0.1148 - yolo_output_2_loss: 0.1050
Epoch 00026: saving model to checkpoints/yolov3_train_26.tf
715/715 [==============================] - 288s 402ms/step - loss: 10.3489 - yolo_output_0_loss: 0.0750 - yolo_output_1_loss: 0.1148 - yolo_output_2_loss: 0.1050 - val_loss: 27.5807 - val_yolo_output_0_loss: 7.1429 - val_yolo_output_1_loss: 7.1559 - val_yolo_output_2_loss: 3.2496 - lr: 1.0000e-05
Epoch 27/50
715/715 [==============================] - ETA: 0s - loss: 10.2810 - yolo_output_0_loss: 0.0682 - yolo_output_1_loss: 0.1074 - yolo_output_2_loss: 0.0963
Epoch 00027: saving model to checkpoints/yolov3_train_27.tf
715/715 [==============================] - 290s 405ms/step - loss: 10.2810 - yolo_output_0_loss: 0.0682 - yolo_output_1_loss: 0.1074 - yolo_output_2_loss: 0.0963 - val_loss: 27.7404 - val_yolo_output_0_loss: 7.2319 - val_yolo_output_1_loss: 7.1785 - val_yolo_output_2_loss: 3.3449 - lr: 1.0000e-05
Epoch 28/50
715/715 [==============================] - ETA: 0s - loss: 10.2145 - yolo_output_0_loss: 0.0643 - yolo_output_1_loss: 0.1012 - yolo_output_2_loss: 0.0884
Epoch 00028: saving model to checkpoints/yolov3_train_28.tf
715/715 [==============================] - 293s 409ms/step - loss: 10.2145 - yolo_output_0_loss: 0.0643 - yolo_output_1_loss: 0.1012 - yolo_output_2_loss: 0.0884 - val_loss: 27.9862 - val_yolo_output_0_loss: 7.4295 - val_yolo_output_1_loss: 7.2403 - val_yolo_output_2_loss: 3.3808 - lr: 1.0000e-05
Epoch 29/50
715/715 [==============================] - ETA: 0s - loss: 10.1470 - yolo_output_0_loss: 0.0596 - yolo_output_1_loss: 0.0967 - yolo_output_2_loss: 0.0807
Epoch 00029: ReduceLROnPlateau reducing learning rate to 9.999999747378752e-07.

Epoch 00029: saving model to checkpoints/yolov3_train_29.tf
715/715 [==============================] - 293s 410ms/step - loss: 10.1470 - yolo_output_0_loss: 0.0596 - yolo_output_1_loss: 0.0967 - yolo_output_2_loss: 0.0807 - val_loss: 28.1393 - val_yolo_output_0_loss: 7.3661 - val_yolo_output_1_loss: 7.4419 - val_yolo_output_2_loss: 3.4473 - lr: 1.0000e-05
Epoch 00029: early stopping

@yichenj
Copy link
Contributor Author

yichenj commented Aug 7, 2020

By the way, I also have another PR #315 about voc2012.py when I test for this issue.

Revise voc2012 dataset preparing script
@makra89
Copy link
Contributor

makra89 commented Sep 2, 2021

I ran into the same problem with score <= 0.5 for only one class.
The problem is not the classification loss, but the way the score is calculated

If we only have one clas, the class probability is always 0.5, since the model does not learn it (class loss is aways zero).
This fixes the problem for one class in yolo_nms:
if classes == 1:
scores = confidence
else:
scores = confidence * class_probs

See #311

@yichenj
Copy link
Contributor Author

yichenj commented Sep 3, 2021

I ran into the same problem with score <= 0.5 for only one class.
The problem is not the classification loss, but the way the score is calculated

If we only have one clas, the class probability is always 0.5, since the model does not learn it (class loss is aways zero).
This fixes the problem for one class in yolo_nms:
if classes == 1:
scores = confidence
else:
scores = confidence * class_probs

See #311

Hi Manuel, I do not think class_prob is always 0.5 is reasonable when there is only one class(i think it is more reasonable to be always 1.0), so i modified the loss function( to make it not always zero). In my understanding, the confidence score indicates how confident with the location coordinates, and the class prob indicates how confident with the class predictions. Please correct me if i am wrong.

However, I have to admit that your fix is simpler and safer.

Reference from Yolo v1 Paper:

These confidence scores reflect how confident the model is that the box contains an object and also how accurate it
thinks the box is that it predicts. Formally we define confidence as Po * IOU.

@makra89
Copy link
Contributor

makra89 commented Sep 3, 2021

You are right, it is more of a hack. With "reasonable" i just meant that the model did not learn anything about the classes since the class loss is always zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants