- 标题
- 摘要
- 关键词
- 实验方案
- 产品
-
[IEEE 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) - Poznan, Poland (2018.9.19-2018.9.21)] 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) - Hardware implementation of the Gaussian Mixture Model foreground object segmentation algorithm working with ultra-high resolution video stream in real-time
摘要: In this paper a hardware implementation of the Gaussian Mixture Model algorithm for background modelling and foreground object segmentation is presented. The proposed vision system is able to handle video stream with resolution up to 4K (3840x2160 pixels) and 60 frames per second. Moreover, the constraints caused by memory bandwidth limit are also discussed and a few different solutions to tackle this issue have been considered. The designed modules have been verified on the ZCU102 development board with Xilinx Zynq UltraScale+ MPSoC device. Additionally, the computing performance and power consumption have been estimated.
关键词: FPGA,4K video,background modelling,real-time processing,GPU,Gaussian Mixture Model,foreground object segmentation
更新于2025-09-23 15:23:52
-
[Lecture Notes in Computer Science] Euro-Par 2018: Parallel Processing Workshops Volume 11339 (Euro-Par 2018 International Workshops, Turin, Italy, August 27-28, 2018, Revised Selected Papers) || Modeling and Optimizing Data Transfer in GPU-Accelerated Optical Coherence Tomography
摘要: Signal processing of optical coherence tomography (OCT) has become a bottleneck for using OCT in medical and industrial applications. Recently, GPUs gained more importance as compute device to achieve video frame rate of 25 frames/s. Therefore, we develop a CUDA implementation of an OCT signal processing chain: We focus on reformulating the signal processing algorithms in terms of high-performance libraries like CUBLAS and CUFFT. Additionally, we use NVIDIA’s stream concept to overlap computations and data transfers. Performance results are presented for two Pascal GPUs and validated with a derived performance model. The model gives an estimate for the overall execution time for the OCT signal processing chain, including compute and transfer times.
关键词: GPU,OCT,CUDA,Performance model
更新于2025-09-23 15:23:52
-
[IEEE 2018 International Conference on Intelligent and Innovative Computing Applications (ICONIC) - Mon Tresor, Plaine Magnien, Mauritius (2018.12.6-2018.12.7)] 2018 International Conference on Intelligent and Innovative Computing Applications (ICONIC) - Parallel Image Stitching Based on Multithreaded Processing on GPU
摘要: The paper discusses multithreaded processing of images on graphic processing units for the purposes of feature detection and matching. The problem of feature detection and feature correspondence is applied for image stitching and panorama creation. Parallel GPU implementation based on nVidia CUDA is presented and experimentally evaluated and compared by parallel multithread CPU processing for shared memory parallel computational model.
关键词: general purpose computations on GPU,feature matching,image stitching,multithreading,feature detection
更新于2025-09-23 15:22:29
-
[IEEE IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium - Valencia (2018.7.22-2018.7.27)] IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium - A Car-Borne SAR System for Interferometric Measurements: Development Status and System Enhancements
摘要: Terrestrial radar systems are used operationally for area-wide measurement and monitoring of surface displacements on steep slopes, as prevalent in mountainous areas or also in open pit mines. One limitation of these terrestrial systems is the decreasing cross-range resolution with increasing distance of observation due to the limited antenna size of the real aperture radar or the limited synthetic aperture of the quasi-stationary SAR systems. Recently, we have conducted a first experiment using a car-borne SAR system at Ku-band, demonstrating the time-domain back-projection (TDBP) focusing capability for the FMCW case and single-pass interferometric capability of our experimental Ku-band car-borne SAR system. The cross-range spatial resolution provided by such a car-based SAR system is potentially independent from the distance of observation, given that an adequate sensor trajectory can be built. In this paper, we give (1) an overview of the updated system hardware (radar setup and high-precision combined INS/GNSS positioning and attitude determination), and (2) present SAR imagery obtained with the updated prototype Ku-band car-borne SAR system.
关键词: azimuth focusing,Ku-band,SAR imaging,ground-based SAR system,car-borne SAR,parallelization,SAR interferometry,GPU,CUDA,interferometry,CARSAR,Synthetic aperture radar (SAR)
更新于2025-09-23 15:22:29
-
[IEEE 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP) - Porto, Portugal (2018.10.10-2018.10.12)] 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP) - Energy and Execution Time Comparison of Optical Flow Algorithms on SIMD and GPU Architectures
摘要: This article presents and compares optimized implementations of two optical flow algorithms on several target boards comprising multi-core SIMD processors and GPUs. The two algorithms are Horn-Schunck (HS) and TV-L1, and have been chosen because they are both well-known, and because of their different computational complexity and accuracy. For both algorithms, we have made parallel optimized SIMD implementations, while HS has also been implemented on GPUs. For each algorithm, the comparison between the different versions and target boards is carried out in a two-dimensional fashion: in terms of computing speed – in order to achieve real-time computation – and in terms of energy consumption since we target embedded systems. The results show that for HS, the GPUs are the most efficient in both dimensions, able to process in real-time performances (25 frames per second) up to 8 Mpix images for 0.35 J per image, against 1.8 Mpix images for 0.24 J per image on CPU. The results also highlight the impact of optimizations on TV-L1: far slower than HS without optimization, it can almost match its performance after optimization on CPU, and can achieve real-time performances with 0.25 J for 1.4 Mpix images. We hope these results will help developers design optical flow embedded systems.
关键词: embedded systems,TV-L1,optical flow,real-time processing,energy consumption,GPU,Horn-Schunck,SIMD
更新于2025-09-23 15:22:29
-
[IEEE 2018 48th European Microwave Conference (EuMC) - Madrid, Spain (2018.9.23-2018.9.27)] 2018 48th European Microwave Conference (EuMC) - Characteristics of Circularly Polarized Multimode Helical Beams
摘要: In this paper, a theoretical and numerical analysis to determine the properties of circularly polarized helical beams is presented. A beam-port has been programmed to reproduce the fields radiated by a circularly polarized helical beam antenna designed at the frequency of 10GHz. The propagation of four different circularly polarized helical modes over a distance of 3 m has been studied using time-domain analysis with CFDTD and analytical models. Thanks to the great volume of the simulations, the properties of these beam have been delineated and compared with the linearly polarized helical beams. The design of the antenna and the beam-port was developed using the CFDTD method and a Titan-XP GPU.
关键词: OAM,GPU,Helical Beam Antenna,CFDTD
更新于2025-09-23 15:22:29
-
GPU-accelerated integral imaging and full-parallax 3D display using stereo–plenoptic camera system
摘要: In this paper, we propose a novel approach to produce integral images ready to be displayed onto an integral-imaging monitor. Our main contribution is the use of commercial plenoptic camera, which is arranged in a stereo configuration. Our proposed set-up is able to record the radiance, spatial and angular, information simultaneously in each different stereo position. We illustrate our contribution by composing the point cloud from a pair of captured plenoptic images, and generate an integral image from the properly registered 3D information. We have exploited the graphics processing unit (GPU) acceleration in order to enhance the integral-image computation speed and efficiency. We present our approach with imaging experiments that demonstrate the improved quality of integral image. After the projection of such integral image onto the proposed monitor, 3D scenes are displayed with full-parallax.
关键词: Stereo camera,3D data registration,Point cloud,Plenoptic camera,GPU,3D display,Integral imaging
更新于2025-09-23 15:22:29
-
High Resolution and Fast Processing of Spectral Reconstruction in Fourier Transform Imaging Spectroscopy
摘要: High-resolution spectrum estimation has continually attracted great attention in spectrum reconstruction based on Fourier transform imaging spectroscopy (FTIS). In this paper, a parallel solution for interference data processing using high-resolution spectrum estimation is proposed to reconstruct the spectrum in a fast high-resolution way. In batch processing, we use high-performance parallel-computing on the graphics processing unit (GPU) for higher efficiency and lower operation time. In addition, a parallel processing mechanism is designed for our parallel algorithm to obtain higher performance. At the same time, other solving algorithms for the modern spectrum estimation model are introduced for discussion and comparison. We compare traditional high-resolution solving algorithms running on the central processing unit (CPU) and the parallel algorithm on the GPU for processing the interferogram. The experimental results illustrate that runtime is reduced by about 70% using our parallel solution, and the GPU has a great advantage in processing large data and accelerating applications.
关键词: parallel computing,high performance,GPU,Fourier transform imaging spectrometer,spectrum reconstruction
更新于2025-09-23 15:22:29
-
Extended attribute profiles on GPU applied to hyperspectral image classification
摘要: Extended pro?les are an important technique for modelling the spatial information of hyperspectral images at different levels of detail. They are used extensively as a pre-processing stage, especially in classi?cation schemes. In particular, attribute pro?les, based on the application of morphological attribute ?lters to the connected components of the image, have been shown to provide very good results. In this paper we present a parallel implementation of the attribute pro?les in CUDA for multispectral and hyperspectral imagery considering the attributes area and standard deviation. The pro?le computation is based on the max-tree approach but without building the tree itself. Instead, a matrix-based data structure is used along with a recursive ?ooding (component merging) and ?lter process. Additionally, a previous feature extraction stage based on wavelets is applied to the hyperspectral image in order to extract the most valuable spectral information, reducing the size of the resulting pro?le. This scheme ef?ciently exploits the thousands of available threads on the GPU, obtaining a considerable reduction in execution time as compared to the OpenMP CPU implementation.
关键词: Remote sensing,Attribute pro?les,GPU,Real-time,Hyperspectral,Supervised classi?cation
更新于2025-09-23 15:22:29
-
GPU Acceleration of Clustered DPCM for Lossless Compression of Hyperspectral Images
摘要: With the development of remote sensing technology, spatial and spectral resolutions of hyperspectral images have become increasingly dense. In order to overcome difficulties in the storage, transmission and manipulation of hyperspectral images, an effective compression algorithm is requisite. The Clustered Differential Pulse Code Modulation (C-DPCM), which is a prediction-based hyperspectral lossless compression algorithm, can achieve a relatively high compression ratio, but its efficiency still requires improvement. This paper presents a parallel implementation of the C-DPCM algorithm on Graphics Processing Units (GPUs) with the Compute Unified Device Architecture (CUDA), which is a parallel computing platform and programming model developed by NVIDIA. Three optimization strategies are utilized to implement the C-DPCM algorithm in parallel, including a version that uses shared memory and registers, a version that employs multi-stream, and a version that uses multi-GPU. In addition, we studied how to assign all classes to each GPU to minimize the processing time. Finally, we reduced the compression time from approximately half an hour to an hour to several seconds, with almost no loss in accuracy.
关键词: C-DPCM,GPU,CUDA,Hyperspectral image lossless compression
更新于2025-09-23 15:22:29