Prediction of Human Actions in Assembly Process by a Spatial-Temporal End-to-End Learning Model
It’s important to predict human actions in the industry assembly process. Foreseeing future actions before they happened is an essential part for flexible human-robot collaboration and crucial to safety issues. Vision-based human action prediction from videos provides intuitive and adequate knowledge for many complex applications. This problem can be interpreted as deducing the next action of people from a short video clip. The history information needs to be considered to learn these relations among time steps for predicting the future steps. However, it is difficult to extract the history information and use it to infer the future situation with traditional methods. In this scenario, a model is needed to handle the spatial and temporal details stored in the past human motions and construct the future action based on limited accessible human demonstrations.