- 數據所 310554031 葉詠富
- 數據所 310554037 黃乾哲
- We use the Mask RCNN to select the object.
- The training data is “PennFudanPed” which is containing images that are used for pedestrian detection in the experiments reported in.
- Since our background is sample, the human contour is clear.
- We do the threshold to strengthen the the curve of the mask
- Use the pretrain model raft-thing to detect the flow from people and shadow
- convert flow to rgb images
- convert rgb image to grayscale image
- Mapping grayscale images onto a 1D vector
- Set the threshold to create RAFT Mask.
- The threshold is set with percentiles.
- For example: 10% of the data is less than the threshold as the foreground, greater than the threshold as the background
- threshold 2% vs 10%
- 2% covers less shadows, but reduces the effect of optical flow in nearby scenes
- 10% covers more shadows, but increases the effect of the optical flow of nearby scenes
- We choose 10%, hoping to cover more shadows
- Divide the rgb optical flow map into 10 groups using Kmeans
- Sort by the number of pixels in the group
- If the difference between the front and the back is 5 times, it is the dividing point between the foreground and background.
- Example: 90505/4247=21 > 5
- We can find that the use of KMeans can better connect the relationship between people and shadows to generate masks.
- Eliminate the effects of background.
- The people behind the original were kept.
- Mask RCNN is good at detecting the object.
- RAFT is good at detecting the shadow.
- Combine the both advantage to produce the mask.
- We use pretrain model Deepfillv2 to inpainting disappaer part by the mask
- The mask almost cover the whole object and its effect.
- RAFT can not detect the whole shadow, it may cause by the action of the object.
- Inpainting result is not nature, it need to retrain a personal model for this video’s background.
- Mask R-CNN
- RAFT
- Deepfillv2
- KMeans