Pose estimation & detection has been minimally implemented using the OpenPose implementation https://github.com/ildoonet/tf-pose-estimation with Tensorflow. For binary classification of poses, (sitting or upright), MobileNet (a CNN originally trained on the ImageNet Large Visual Recognition Challenge dataset), was retrained (final layer) on a dataset of ~1500 images of poses. For classifying scenes around the person in consideration, we retrain Inception-v3 architecture on the Stanford 40 dataset (9523 images), which incorporates 40 different day-to-day activites.
Pose classification accuracy is 94.56% and scene recognition rate is 92.3%.
https://github.com/Hzzone/pytorch-openpose
https://github.com/dronefreak/human-action-classification