Pose Estimation || Application of Computer Vision and Deep Learning

What is Pose Estimation?

Pose Estimation is one of the recent applications of Deep Learning and Computer Vision. In pose estimation we try to predict the pose of a person, it means representing an orientation of a person in graphical format. While reading out the intro it might look a bit complex but actually, it is not. We implement the idea by considering each joint of our body as a coordinate for our graphical representation. The bones which connect two joints are shown as a simple line.

To make a posture simple and easy to visualize we do not represent each and every bone of a human body. Instead, we represent Hands, Legs, Shoulder and Face of a person.


Previously, one of the famous application of computer vision was face detection and face recognition. We also used to apply computer vision for detecting whether a person is present in an image or not. If the presence is detected then we can highlight the area where it is present. So, after making some advancements and a bit of research on it, we came on pose estimation of a person. It helps us to visualize the posture of a person maybe it is standing, sitting, walking, dancing or any other posture.

Pose Estimation Method

  • Single Person pose estimation
    It is one of the initial methods of pose estimation. In which we have a single person in an image. This method identifies the individual parts and then connect them to create a pose.

  • Multi-person pose estimation
    Generally, it will not be the case where we have only one person in a frame to detect and estimate on it. So, a single person method is a bit slow and inefficient. Multi-Person pose estimation is more difficult than the single person case as the location and the number of people in an image is unknown. Typically, we can tackle the above issue using one of two approaches:

    • The simple approach is to incorporate a person detector first, followed by estimating the parts and then calculating the pose for each person. This method is known as the top-down approach.
    • Another approach is to detect all parts in the image (i.e. parts of every person), followed by associating/grouping parts belonging to distinct persons. This method is known as the bottom-up approach.

Deep Learning Methods

1. OpenPose

It is one of the popular ones for pose estimation. It uses the bottom-up approach for Multi-person pose estimation. This is open source for implementation and available on GitHub with pretty good documentation. OpenPose first detects parts (key points) belonging to every person in the image, followed by assigning parts to distinct individuals.  

2. PoseNet

This is available with TensorFlow, which is a popular deep learning framework. PoseNet is a vision model that can be used to estimate the pose of a person in an image or video by estimating where key body joints are. It is ready to be used and to learn more about it check out its official documentation here

The following is a demo provided by TensorFlow -

If you want to experience it on your browser CLICK HERE. This is provided by Google.

3. RMPE (AlphaPose)

RMPE is a popular top-down method of Pose Estimation. The authors posit that top-down methods are usually dependent on the accuracy of the person detector, as pose estimation is performed on the region where the person is located. Hence, errors in localization and duplicate bounding box predictions can cause the pose extraction algorithm to perform sub-optimally.


  1. Gait Analysis - Tracking the variations in the pose of a person over a period of time can also be used for activity, gesture and gait recognition. Gait Analysis can help Athletes to know more about their pose while doing an activity which can help them to improve their performance.

  2. Self Driving Cars - In the self-driving car one of the primary goals is the safety of the driver and other pedestrians. So to detect pedestrian we can use pose estimation and detect the activity of pedestrian whether they are walking, standing or anything else.

  3. Training Robots - The robots need to be programmed to follow certain trajectories and walk like a human but instead of that we can implement pose estimation and let Robot analyze it and learn from it.

  4. Interactive Gaming and Animation - In gaming, we estimate the pose of human and replicate it on Game Character. Interactive animation is used in a technology called Real-Time Animation System (RTAS). It that allows you to generate, animate and direct CG characters live on-set in real-time.


Post a Comment

If you have any questions do let me know and also give your valuable feedback for this blog.

Popular posts from this blog

Getting Started with your first Machine Learning Algorithm - Linear Regression || First step towards ML

Implementing Hierarchical Clustering - In Python Programming language