TY - GEN
T1 - An improved micro-architecture for function approximation using piecewise quadratic interpolation
AU - Erez, Shai
AU - Even, Guy
PY - 2008
Y1 - 2008
N2 - We present a new micro-architecture for evaluating functions based on piecewise quadratic interpolation. The micro-architecture consists mainly of a look-up table and two multiply-accumulate units. Previous micro-architectures based on piecewise quadratic interpolation have been shown to be efficient for small precision (e.g., single precision) computations. Moreover, they are as fast as piecewise linear interpolation while requiring smaller tables. Our main contribution is in circumventing the need for the additional squaring unit that appears in previous micro-architectures. Based on the proposed micro-architecture, we present a detailed design of single precision reciprocal approximation (1/x) . Our design is based on two multiply-accumulate units that contain truncated Booth radix 4 multipliers. The number of partial products in this design is reduced by over 20% compared to previous designs using quadratic interpolation. The latency of this design is roughly the delay of 19 full-adder gates, and it can be easily pipelined into two stages each with a delay of 10 full-adder gates.
AB - We present a new micro-architecture for evaluating functions based on piecewise quadratic interpolation. The micro-architecture consists mainly of a look-up table and two multiply-accumulate units. Previous micro-architectures based on piecewise quadratic interpolation have been shown to be efficient for small precision (e.g., single precision) computations. Moreover, they are as fast as piecewise linear interpolation while requiring smaller tables. Our main contribution is in circumventing the need for the additional squaring unit that appears in previous micro-architectures. Based on the proposed micro-architecture, we present a detailed design of single precision reciprocal approximation (1/x) . Our design is based on two multiply-accumulate units that contain truncated Booth radix 4 multipliers. The number of partial products in this design is reduced by over 20% compared to previous designs using quadratic interpolation. The latency of this design is roughly the delay of 19 full-adder gates, and it can be easily pipelined into two stages each with a delay of 10 full-adder gates.
UR - http://www.scopus.com/inward/record.url?scp=62349104785&partnerID=8YFLogxK
U2 - 10.1109/ICCD.2008.4751895
DO - 10.1109/ICCD.2008.4751895
M3 - פרסום בספר כנס
AN - SCOPUS:62349104785
SN - 9781424426584
T3 - 26th IEEE International Conference on Computer Design 2008, ICCD
SP - 422
EP - 426
BT - 26th IEEE International Conference on Computer Design 2008, ICCD
Y2 - 12 October 2008 through 15 October 2008
ER -