Fig. 1
From: Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

The architecture of the proposed method. First, the backbone network extracts features from an input image. These features are then forwarded to a rotated region proposal network (RRPN), which generates rotated regions of interest (RRoIs) as potential vehicle locations, using rotated anchors as a basis. Feature maps of a predetermined size are extracted inside these RRoIs and are passed through four network branches. The first two branches handle the classification whether a region contains a vehicle or not and the regression of refined bounding box parameters, respectively. The third branch regresses shape parameters encoding the 3D vehicle shape, while the fourth branch predicts the vehicle type