Prediction of Human Actions in Assembly Process by a Spatial-Temporal End-to-End Learning Model 2019-01-0509
It’s important to predict the future actions of human in the industry assembly process. Foreseeing future actions before they have happened is an essential part for flexible human-robot collaboration and crucial safety issues. Vision-based human actions prediction from videos provides intuitive and adequate knowledge for many complex applications. This problem can be interpreted as deducing the next action of people from a short video clip. The history information needs to be considered to learn these relations between each time step for predicting the future steps. However, it is difficult to extract the history information and use it to infer the future situation with the traditional methods. In this scenario, a model is needed to handle the spatial and temporal details stored in past human motions and construct the future action based on limited accessible human demonstrations. In this paper, we apply an autoencoder based deep learning framework for human actions construction, merging into the RNN pipeline for human future actions prediction. This contrasts with traditional approaches which use hand-crafted features and different domain output. The model can predict the future human actions in real time. We test an implementation of our framework on a 1/10-scale vehicle seats assembling process. Our experiment result indicates that the proposed model is effective in capturing the historical details that are necessary for future human actions prediction. The proposed model synthesizes the prior information from human demonstration and generates the corresponding future actions by those spatial-temporal features successfully.