Abstract:Action detection becomes a research hotspot in video analysis due to its broad application prospects in autonomous driving, video surveillance, etc. In recent years, methods based on deep learning have made great progress in the field of action detection, and have attracted the attention of researchers at home and abroad. This paper summarizes these methods comprehensively. Firstly, the definition and challenges of the action detection task are introduced. Then, we classify relevant literature carefully from two aspects:temporal action detection and spatio-temporal action detection. The ideas, advantages, and disadvantages of different methods in each category are comprehensively analyzed. Additionally, we introduce some methods based on hot technologies such as weakly supervised learning, graph convolutional network,attention mechanism.Some of the most commonly used datasets and metrics are listed and the performances of the typical methods are compared on these datasets. Finally, we summarize the problems to be solved in the future and some directions worthy of attention for action detection community.