TY - JOUR
T1 - A methodology for semantic action recognition based on pose and human-object interaction in avocado harvesting processes
AU - Vasconez, J. P.
AU - Admoni, H.
AU - Auat Cheein, F.
N1 - Funding Information:
The authors acknowledge the support provided by Universidad Técnica Federico Santa María. This work was supported in part by the Advanced Center of Electrical and Electronic Engineering - AC3E (ANID/FB0008), DGIIP-PIIC-UTFSM Chile, CONICYT PFCHA/ DOCTORADO BECAS CHILE/2018–21180513 and FONDECYT grant 1201319.
Funding Information:
The authors acknowledge the support provided by Universidad Técnica Federico Santa María. This work was supported in part by the Advanced Center of Electrical and Electronic Engineering - AC3E (ANID/FB0008), DGIIP-PIIC-UTFSM Chile, CONICYT PFCHA/ DOCTORADO BECAS CHILE/2018–21180513 and FONDECYT grant 1201319.
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/5
Y1 - 2021/5
N2 - The agricultural industry could greatly benefit from an intelligent system capable of supporting field workers to increase production. Such a system would need to monitor human workers, their current actions, their intentions, and possible future actions, which are the focus of this work. Herein, we propose and validate a methodology to recognize human actions during the avocado harvesting process in a Chilean farm based on combined object-pose semantic information using RGB still images. We use Faster R-CNN –Region Convolutional Neural Network– with Inception V2 convolutional object detection to recognize 17 categories, which include among others, field workers, tools, crops, and vehicles. Then, we use a convolutional-based 2D pose estimation method called OpenPose to detect 18 human skeleton joints. Both the object and the pose features are processed, normalized, and combined into a single feature vector. We test four classifiers –Support vector machine, Decision trees, K-Nearest-Neighbour, and Bagged trees– on the combined object-pose feature vectors to evaluate action classification performance. We also test such results using principal component analysis on the four classifiers to reduce dimensionality. Accuracy and inference time are analyzed for all the classifiers using 10 action categories, related to the avocado harvesting process. The results show that it is possible to detect human actions during harvesting, obtaining average accuracy performances (among all action categories) ranging from 57% to 99%, depending on the classifier used. The latter can be used to support an intelligent system, such as robots, interacting with field workers aimed at increasing productivity.
AB - The agricultural industry could greatly benefit from an intelligent system capable of supporting field workers to increase production. Such a system would need to monitor human workers, their current actions, their intentions, and possible future actions, which are the focus of this work. Herein, we propose and validate a methodology to recognize human actions during the avocado harvesting process in a Chilean farm based on combined object-pose semantic information using RGB still images. We use Faster R-CNN –Region Convolutional Neural Network– with Inception V2 convolutional object detection to recognize 17 categories, which include among others, field workers, tools, crops, and vehicles. Then, we use a convolutional-based 2D pose estimation method called OpenPose to detect 18 human skeleton joints. Both the object and the pose features are processed, normalized, and combined into a single feature vector. We test four classifiers –Support vector machine, Decision trees, K-Nearest-Neighbour, and Bagged trees– on the combined object-pose feature vectors to evaluate action classification performance. We also test such results using principal component analysis on the four classifiers to reduce dimensionality. Accuracy and inference time are analyzed for all the classifiers using 10 action categories, related to the avocado harvesting process. The results show that it is possible to detect human actions during harvesting, obtaining average accuracy performances (among all action categories) ranging from 57% to 99%, depending on the classifier used. The latter can be used to support an intelligent system, such as robots, interacting with field workers aimed at increasing productivity.
KW - Avocado harvesting process
KW - Human-object interaction
KW - Human–machine collaboration
KW - Semantic human action recognition
UR - http://www.scopus.com/inward/record.url?scp=85102434383&partnerID=8YFLogxK
U2 - 10.1016/j.compag.2021.106057
DO - 10.1016/j.compag.2021.106057
M3 - Article
AN - SCOPUS:85102434383
SN - 0168-1699
VL - 184
JO - Computers and Electronics in Agriculture
JF - Computers and Electronics in Agriculture
M1 - 106057
ER -