You don't need to setup anything to use mediapipe hand pose detector, but if you want to classify some gesture you need
to use an classifier (e.g., an SVM classifier) and in this case you should follow the next step.
To use an SVM hand gesture classifier in your code, you need first to move the weights file in a folder called
weights
in your package directory, i.e., YOUR_PKG_DIR_PATH/weights/YOUR_WEIGHTS
.
The weights can be found on our drive in the computer_vision_models/classifiers/mediapipe_hand
folder.
To use Mediapipe in your python code you need to import the HandPoseInference class:
from ai_utils.HandPoseInference import HandPoseInference
Then you need to create the mediapipe HandPoseInference object instance passing some parameters, such as:
- display_img: [boolean] default = False
WhenTrue
print the detected hand points in an image. - static_image_mode: [boolean] default = False
SetTrue
if you want to use static images otherwise setFalse
for dynamic ones (moving images) - model_complexity: [int] default = 1
If 0 uses a simpler and faster model. Is 1 uses a more complex and slow but more precise model (speed is almost irrelevant on computers). - max_num_hands: [int] default = 2
Number of max detected hands on image. - min_detection_confidence: [double] default = 0.3
Score threshold used to filter out classification with low confidence. - min_tracking_confidence: [boolean] default = 0.3
Same as above but for hand tracking (it is related to hand movements in consecutive images). - flip_image: [boolean] default = True
If True flip images with respect to Y axis. It is used to obtain correct Handedness values because if the input image is flipped the algorith confuses right and left hands. - flatten: [boolean] default = True
If True the classification output is flattened in a single list composed by 63 values, otherwise a list of 3D points is returned (size: 21*3)
hand_pose = HandPoseInference(display_img=True)
To use the mediapipe hand detection use the get_hand_pose function to evaluate the image with the Neural Network..
hand_results = hand_pose.get_hand_pose(input_img)
inputs:
- img: [numpy array] mandatory
the image on which to compute the hands detection.
outputs:
- hands_detected: [dict]
A dictionary containing the hands detected on the image. The dict can have only two keys: left and right depending on the detected hands handedness. Each key has a list of hands with a format which depends on the instance initialization parameters