Paper Seminar

Multi-region two-stream R-CNN for action detection

2016-12-30 09:16
Abstract. We propose a multi-region two-stream R-CNN model for ac-
tion detection in realistic videos. We start from frame-level action detec-
tion based on faster R-CNN [1], and make three contributions: (1) we
show that a motion region proposal network generates high-quality pro-
posals, which are complementary to those of an appearance region pro-
posal network; (2) we show that stacking optical
ow over several frames
signicantly improves frame-level action detection; and (3) we embed
a multi-region scheme in the faster R-CNN model, which adds comple-
mentary information on body parts. We then link frame-level detections
with the Viterbi algorithm, and temporally localize an action with the
maximum subarray method. Experimental results on the UCF-Sports,
J-HMDB and UCF101 action detection datasets show that our approach
outperforms the state of the art with a signicant margin in both frame-
mAP and video-mAP.