Ph.D. Dissertation Defense Title: Acceleration of the 3D FDTD Algorithm in Fixed-point Arithmetic using Reconfigurable Hardware Presenter: Wang Chen Date: Monday, August 13, 2007 Abstract: Understanding and predicting electromagnetic behavior is needed more and more in modern technology. The Finite-Difference Time-Domain (FDTD) method is a powerful computational electromagnetic technique for modeling electromagnetic space. However, the computation of this method is complex and time consuming. Implementing this algorithm in hardware will greatly increase its computational speed and widen its usage. This dissertation presents a fixed-point 3D UPML (Uniaxial Perfectly Matched Layer) FDTD FPGA accelerator, which supports a wide range of materials including dispersive media. By analyzing the performance of fixed-point arithmetic in the 3D FDTD algorithm, the correct fixed-point representation is chosen to minimize the relative error between the fixed-point and floating point results. The FPGA accelerator supports the UPML absorbing boundary conditions which have better performance in dispersive soil and human tissue media than PML boundary conditions. The 3D UPML FDTD hardware accelerator has been designed and implemented on a WildStar-II Pro FPGA computing board. The computational speed of the hardware implementation represents a 25 times speedup compare to the software implementation running on a 3.0GHz PC. This result indicates that the custom hardware implementation can achieve significant speed up for the 3D UPML FDTD algorithm. The speedup of the FDTD hardware implementation is due to three major factors: fixed-point representation, custom memory interface design, and pipelining and parallelism. The FDTD method is a data-intense algorithm. The bottleneck of the hardware design is its memory interface. With the limited bandwidth between the FPGA and on-board memories, a carefully designed custom memory interface allows fully utilization of the memory bandwidth and greatly improves performance. The FDTD algorithm is also a computationally intense algorithm. By considering the trade-offs between resources and performance, pipelining and parallelism are implemented to achieve optimal design performance based on the available hardware resources.