研究目的
To resolve the non-detection problem in people counting due to occlusion by using a stereo camera and depth information, and to enable real-time processing on an embedded GPU board.
研究成果
The proposed stereo camera-based people counting method effectively addresses occlusion issues by leveraging depth information and view projection, achieving high accuracy (98.95% overall) and real-time performance on embedded GPU boards. It outperforms mono camera and deep learning-based methods in crowded scenarios. Future work should focus on automating ground estimation and extending to multiple counting lines for broader applications.
研究不足
The method is sensitive to errors in depth estimation, especially near object boundaries and in non-feature regions. Performance degrades with low resolution (e.g., VGA) for long-distance stereo matching. Occlusion handling is improved but not perfect; large loads carried by people can be misdetected. Requires initial manual input for ground definition and counting line setup. Real-time operation is limited to resolutions up to HD on the embedded hardware; FHD resolution does not achieve real-time frame rates.
1:Experimental Design and Method Selection:
The study uses a stereo camera setup with two IMX 185 sensors to capture synchronized stereo images. Stereo matching (Hernandez's method based on Semi-Global Matching) is applied to extract disparity maps, which are converted to depth maps. View projection transforms the side view to a top view using depth information. People detection employs height and occupancy maps with Gaussian distribution modeling and local max filtering. Tracking uses a Kalman filter-based tracker for counting people crossing a defined line.
2:Sample Selection and Data Sources:
Experimental sequences consist of 23 HD resolution videos (each 5 minutes long) captured in a high-traffic area. Ground truth data is manually annotated for accuracy comparison.
3:List of Experimental Equipment and Materials:
NVIDIA Jetson TX2 embedded board, two Sony IMX 185 CMOS image sensors, Leopard Imaging LI-JETSON-KIT-IMX185CS-D stereo camera kit, I-PEX cables (30 cm length), chessboard for calibration.
4:Experimental Procedures and Operational Workflow:
Calibrate cameras using a chessboard to obtain intrinsic and extrinsic parameters. Perform stereo rectification to align images. Extract disparity maps via stereo matching, convert to depth maps. Apply background subtraction (Gaussian mixture model) to depth maps to detect moving objects. Project moving objects to top view to generate height and occupancy maps. Detect people using likelihood maps and local max filtering. Track detected heads with Kalman filter and count crossings of a predefined line.
5:Data Analysis Methods:
Accuracy is calculated using Equation (9): |Up_GT - Up_m| + |Down_GT - Down_m| / (Up_GT + Down_GT), where GT is ground truth and m is measured value. Frame rates and resource occupancy (CPU, GPU) are measured using tegrastats on Jetson TX2.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容