DSP
DSP
Prepared by,
Ms. C. Devisupraja
Assistant Professor
COURSE OUTCOMES
2
UNIT-1
Introduction: Digital signal-processing system, discrete Fourier Transform (DFT)
and fast Fourier transform (FFT), differences between DSP and other micro
processor architectures; Number formats: Fixed point, floating point and block
floating point formats, IEEE-754 floating point, dynamic range and precision,
relation between data word size and instruction word size; Sources of error in
DSP implementations: A/D conversion errors, DSP computational errors, D/A
conversion errors, Q-notation.
3
UNIT-1 CLO’S
CLO 1 AEC507.01
Understand howUnderstand how digital
digital to analog (D/A)toand analog to
Processing.
digital (A/D) converters operate on a signal and be able
to model these operations mathematically.
CLO 2 Understand the inter-relationship between DFT and
various transforms.
CLO 3 Understand the IEE-754 floating point and source of
errors in DSP implementations .
CLO 4 Understand the fast computation of DFT and appreciate
the FFT Processing
4
DIGITAL SIGNAL PROCESSING SYSTEM
•A DSP system uses a computer or a digital processor to process signals.
• Represent signals by a sequence of numbers
– Sampling or analog-to-digital conversions
5
DIGITAL SIGNAL PROCESSING SYSTEM
6
ANALOG, DIGITAL, MIXED SIGNAL PROCESSING
7
DIGITAL SIGNAL PROCESSING
8
9
SAMPLE AND HOLD CIRCUIT
• The main function of low pass ant aliasing filter is to band limit the input
signal to the folding frequency without distortion.
• It should be noted that even if the signal is band limited, there is always
wide-band additive noise which will be folded back to create aliasing.
• The quality of conversion process can be improved by using sample and hold
circuit
10
SAMPLE AND HOLD CIRCUIT
11
12
DIGITAL CONVERSION
13
RECONSTRUCTION
14
RECONSTRUCTION
15
RECONSTRUCTION
16
RECONSTRUCTION
17
DISCRETE FOURIER TRANSFORM
18
DISCRETE FOURIER TRANSFORM
• The Discrete Fourier Transform (DFT) is one of the most important tools in
digital signal processing that calculates the spectrum of a finite-duration.
• The representation of a digital signal in terms of its frequency component in a
frequency domain is important. The algorithm that transforms the time domain
signals to the frequency domain components is known as DFT.
•No of additions to compute DFT is N(N-1)
•No of Multiplications to compute DFT is N2.
Read more:
19
FAST FOURIER TRANSFORM
20
DFT vs FFT
21
NUMBER FORMATS
22
FIXED-POINT FORMAT
Simplest scheme
Number is represented as an integer or fraction using a fixed number of
bits.
An n-bit fixed-point signed integer −2n−1 ≤ x ≤ 2n−1 − 1is represented as:
x = −s · 2n−1 + bn−2 · 2n−2 + bn−3 · 2n−3 + · · · + b1 · 21 + b0 · 2
where s represents the sign of the number (s = 0 for positive and s = 1 for
negative)
23
FIXED-POINT INTEGER FORMAT
• the simplest scheme of number representation is the format in which
the number is represented as an integer or fraction using a fixed no of
bits.
Eq-2
s b-1 b-(n-3) B-(n-2) B-(n-1)
25
EXAMPLE 1
26
DOUBLE -PRECISION FIXED -POINT FORMAT
• It requires double the storage for the storage for the same data and may
need double the number of accesses for the same size of data bus of the
DSP device.
27
IEEE 754 FLOATING –POINT FORMAT
28
IEEE 754 EXAMPLE
29
DYNAMIC RANGE AND PRECISION
• Larger word size improves the precision but may pose a problem with the
speed of the processor, especially if its bus width is limited.
31
DYNAMIC RANGE AND PRECISION CONT…
32
EXAMPLE 1
Calculate the dynamic range and precision of each of the following number
representation formats
a) 24-bit , single –precision fixed point format
b) 48-bit , double –precision fixed point format
c) A floating point format with a 16-bit mantissa and an 8-bit exponent .
Sol: a) since each bit gives a dynamic range of 6 dB, the total dynamic range
is 24X6=144 dB, Percentage resolution is (1/224) X 100 = 6 X 10-6
b) since each bit gives a dynamic range of 6 dB, the total dynamic range
is 48X6=288 dB, Percentage resolution is (1/248) X 100 = 4 X 10—13
c) For floating-point representation , the dynamic range is determined
by the no of bits in the exponent. Since there are 8 exponent bits, the
dynamic range is (28-1)x6= 255X 6= 1530 dB.
33
EXAMPLE 1 CONT…
34
SOURCES OF ERRORS IN DSP IMPLEMENTATIONS
• The error in the A/D and D/A in the representation of analog signals by a limited
number of bits is called the quantization error.
• The quantization error decreases with the increase in the number of bits used
to represent signals in A/D and D/A converters.
• The errors in the DSP calculations are due to limited word length used. These
errors depend upon how the algorithm is implemented in a given DSP
architecture.
• This error can be reduced by using a larger word length for data and by using
rounding , instead of truncation , in calculations.
• Three types of errors
1. A/D conversion Errors
2. DSP computational Errors
3. D/A conversion Errors 35
UNIT-2
Multiplier and multiplier accumulator, modified bus structures and
memory access in PDSPs, multiple access memory, multiport
memory, SIMD, VLIW architectures, pipelining, special addressing
modes in PDSPs, on-chip peripherals.
36
UNIT-2 CLO’S
37
MAC UNIT
38
MAC UNIT Cont..
39
MAC in Von Neumann Architecture
40
MAC UNIT Cont...
41
MAC UNIT Cont..
42
MAC UNIT
43
MAC IN VON NEUMANN ARCHITECTURE
44
MAC UNIT CONT...
45
SHIFTERS
Shifters are used to either scale down or scale up operands or the results. The
following scenarios give the necessity of a shifter
a. While performing the addition of N numbers each of n bits long, the sum can
grow up to n+log2 N bits long. If the accumulator is of n bits long, then an
overflow error will occur.
b. This can be overcome by using a shifter to scale down the operand by an
amount of log2N.
c. Similarly while calculating the product of two n bit numbers, the product can
grow up to 2n bits long.
d. Generally the lower n bits get neglected and the sign bit is shifted to save the
sign of the product.
46
SHIFTERS CONT..
47
BARREL SHITERS
48
BARREL SHITERS Cont..
49
BARREL SHITERS CONT..
50
MAC UNIT
51
PROCESSOR ARCHITECTURES
52
Processor Architectures Cont..
53
Processor Architectures Cont..
5
VLIW ARCHITECTURE
Functional units
can be split into
submodules, e.g.
for images (8bits)
TI320C80,
1 RISC
4 x 32bit DSP which
can be split into 8bit
modules
19
5
Low Power MMAC Multiplier Multiple Accumulator
57
Basic structure of VLIW Architecture
58
VLIW characteristics
59
VLIW EXAMPLE: TMS320C62
60
VLIW EVALUATION
61
MULTIPLE ACCESS MEMORY
62
MULTIPORTED MEMORY
63
PIPELINING
64
PIPELINING
PIPELINING
65
SPECIAL ADDRESSING MODES IN P-DSPS
3) Memory-mapped Addressing
4) Indirect Addressing
6) Circular Addressing
66
SPECIAL ADDRESSING MODES IN P-DSPS
67
SPECIAL ADDRESSING MODES IN P-DSPS
68
3) Memory-mapped Addressing
• The CPU registers & I/O registers of P-DSPs are also accessible as memory
location.
• This is achieved by storing them in either the starting page or the final page of
the memory space.
• For Eg. In TMS320C5X, page 0 corresponds to CPU registers & I/O registers.
• When these registers are accessed using memory mapped addressing modes,
the higher address bits are not taken from the data page pointer & instead
made to be 0 in case of TI DSPs & 1 in Motorola DSPs..
69
4) Indirect Addressing
• In indirect addressing, any location in the 64K-word data space can be
accessed using the 16-bit address contained in an auxiliary register.
• The C54x DSP has eight 16-bit auxiliary registers (AR0–AR7).
71
BIT REVERSED ADDRESSING MODE
72
CIRCULAR ADDRESSING
• If it exceeds that, the address will be made equal to the beginning address
of the circular buffer
73
ON CHIP PERIPHERALS
•Clock Generator
•Hardware Timer
•Software-Programmable Wait-State Generators
• Parallel I/O Ports
•Host Port Interface (HPI)
•Serial Port
• TDM Serial Port
•Buffered Serial Port
•User Maskable Interrupts
74
UNIT-3
Architecture of TMS320C54XX DSPs, addressing modes,
memory space of TMS320C54XX processors. Program control,
instruction set and programming, on-chip peripherals,
interrupts of TMS320C54XX processors, pipeline operation.
75
UNIT-3 CLO’S
76
Architecture of TMS320C54XX
Multiply ,load/store , add/sub to/from ACC and new address generation can
be done simultaneously
77
INTRODUCTION
This unit provides the architectural overview of TMS320C54XX which comprises
of :-
• CPU
• On Chip Memory
• On Chip Peripherals
• Addressing Modes
• Interrupts
• Program Control
• Internal Memory Bus Organization
• Buses
• Pipelining
78
INTRODUCTION
• Separate Program & Data buses allow simultaneous access to program & data
providing high degree of parallelism.
• Data can be transferred between program & data memory.
79
EFFICIENT DATA/PROGRAM FLOW
80
TMS320C54X INTERNAL BLOCK DIAGRAM
81
CENTRAL PROCESSING UNIT (CPU)
82
FUNCTIONAL DIAGRAM OF CPU OF TMS320C54xx
83
ALU
• ACCUMULATORS A AND B:
• Accumulators A and B store the output from the ALU or the multiplier/adder
block and provide a second input to the ALU.
84
BARREL SHIFTER
• Barrel shifter provides the capability to scale the data during an operand read
or write.
• The’54xx barrel shifter can produce a left shift of 0 to 31 bits or a right shift of
0 to 16bits on the input data.
• The shift count field of status registers ST1, or in the temporary register T.
• The barrel shifter and the exponent encoder normalize the values in an
accumulator in a single cycle.
• The LSBs of the output are filled with0s, and the MSBs can be either zero
filled or sign extended, depending on the state of the sign-extension mode
bit in the status register ST1. 85
FUNCTIONAL DIAGRAM OF BARREL SHIFTER OF
TMS320C54xx
86
MULTIPLIER/ADDER UNIT
• In addition to the multiplier and adder, the unit consists of control logic for
integer and fractional computations and a 16-bit temporary storage register,T.
• The compare, select, and store unit (CSSU) is a hardware unit specifically
incorporated to accelerate the add/compare/select operation.
• The exponent encoder unit supports the EXP instructions, which stores in the
T register the number of leading redundant bits of the accumulator content.
• This information is useful while shifting the accumulator content for the
purpose of scaling.
87
FUNCTIONAL DIAGRAM OF MULTIPLIER/ADDER OF
TMS320C54xx
88
BUSES IN C54XX
89
BUSES USAGE
90
BUSES
91
BUSES
92
BUSES
• The C54x DSP can generate up to two data-memory addresses per cycle using
the two auxiliary register arithmetic units (ARAU0 and ARAU1).
• The PB can carry data operands stored in program space to the multiplier and
adder for multiply/accumulate operations or to a destination in data space
for data move instructions.
• The C54x DSP also has an on-chip bidirectional bus for accessing on-chip
peripherals. This bus is connected to DB and EB through the bus exchanger in
the CPU interface
93
INTERNAL MEMORY ORGANIZATION
• MMR : 26 CPU regs ,peripheral regs and scratch pad RAM block located on
data page 0(DP0)
• Among the devices, the following types of RAM are represented: dual-access
RAM (DARAM), single-access RAM (SARAM), and two-way shared RAM.
• or program/data memory.
• The C54x DSP also has 26 CPU registers plus peripheral registers that are
mapped in data-memory space.
95
INTERNAL MEMORY AND MEMORY-MAPPED
REGISTERS
• The amount and the types of memory of a processor have direct relevance to
the efficiency and performance obtainable in implementations with the
processors.
• All ‘54xx devices contain both RAM and ROM. RAM can be either dual-access
type (DARAM) or single-access type (SARAM).
97
PERIPHERAL REGISTERS FOR THE TMS320C54XX
PROCESSORS
98
DATA ADDRESSING MODES OF TMS320C54X
PROCESSORS:
99
BLOCK DIAGRAM OF THE DIRECT ADDRESSING
MODE FOR TMS320C54XX PROCESSORS
100
BLOCK DIAGRAM OF THE INDIRECT ADDRESSING
MODE FOR TMS320C54XX PROCESSORS.
101
INDIRECT ADDRESSING OPTIONS WITH A SINGLE
DATA –MEMORY OPERAND
102
CIRCULAR ADDRESSING
• A circular buffer is a sliding window contains most recent data. Circular buffer
of size R must start on a N-bit boundary, where 2N > R .
• The circular buffer size register (BK): specifies the size of circular buffer.
Effective base address (EFB): By zeroing the N LSBs of a user selected
AR(ARx).
• End of buffer address (EOB) : By repalcing the N LSBs of ARx with the N
LSBs of BK.
103
BLOCK DIAGRAM OF THE CIRCULAR ADDRESSING
MODE FOR TMS320C54XX PROCESSORS
104
MEMORY-MAPPED REGISTER ADDRESSING
105
16 BIT MEMORY MAPPED REGISTER ADDRESS
GENERATION
106
STACK ADDRESSING
107
TMS320C54XX INSTRUCTIONS AND
PROGRAMMING
108
MEMORY SPACE OF TMS320C54XX PROCESSORS
peripherals.
• Data memory: To store data required to run programs & for external memory
mapped registers.
109
PROGRAM MEMORY
110
FUNCTION OF DIFFERENT PIN PMST REGISTER
111
MEMORY MAP FOR THE TMS320C5416 PROCESSOR
112
PROGRAM CONTROL
• It contains program counter (PC), the program counter related H/W, hard
stack, repeat counters &status registers.
• branch instruction
• Subroutine call: The PC is loaded with the immediate value following the call
instruction
• Instructions such as BACC, CALA, etc ;The PC is loaded with the contents of
the accumulator low word
113
INTERRUPTS
• Many times when the CPU is in the midst of executing a program a peripheral
device may require a service from CPU.
114
INTERRUPTS
• Not all the interrupts are serviced by when they occur only those interrupts
that are called non maskable are serviced when they occur.
• Other Interrupts which are called maskable interrupts are serviced only if
they are enabled.
115
PIPELINE OPERATION of TMS320C54XX
• During any given cycle, from one to six different instructions can be active,
each at a different stage of completion.
116
PIPELINE OPERATION OF TMS320C54XX
117
PIPELINE OPERATION OF TMS320C54XX
• The contents of the instruction
Decode register (IR) are decoded to
determine the type of memory
access operation and the control
sequence at the data-address
generation unit (DAGEN) and the
CPU.
119
PIPELINING STAGES
120
.
121
ON CHIP PERIPHERALS
(HPI) Synchronous
122
GENERAL-PURPOSE I/O
• The C54xx DSP offers general-purpose I/O through two dedicated pins that
are software controlled. The two dedicated pins are the branch control input
pin (BIO) and the external flag output pin (XF).
• BIO can be used to monitor the status of peripheral devices.
• It is driven high by setting the XF bit (in ST1) and is driven low by clearing the
XF bit. The set status register bit (SSBX) and reset status register bit (RSBX)
instructions can be used to set and clear XF, respectively.
123
SOFTWARE PROGRAMMABLE WAIT STATE GENERATOR
• For off chip memory access from zero to seven wait states can be specified
within the software wait state register.
124
HOST PORT INTERFACE
• The host port interface is an 8 bit parallel port that provides an interface with
host processor.
• Information is exchanged between C54xx & host processor the C54xx on chip
memory that is accessible to both C54xx & host processor.
125
HARDWARE TIMER
• Timer Registers:-
126
TIMER REGISTERS
• Timer period register (PRD): The 16-bit memory-mapped timer period register
(PRD) is used to reload the timer register (TIM).
128
UNIT-4 CLO’S
129
INTRODUCTION
A typical DSP system has DSP with external memory, input devices and
output devices.
Since the manufacturers of memory and I/O devices are not same as that of
manufacturers of DSP and also since there are variety of memory and I/O
devices available, the signals generated by DSP may not suit memory and I/O
devices to be connected to DSP.
Thus, there is a need for interfacing devices the purpose of it being to use
DSP signals to generate the appropriate signals for setting up communication
with the memory.
130
INTRODUCTION
131
MEMORY SPACE ORGANIZATION
• The On- Chip Memory are faster than External Memory. There are no
interfacing requirements. Because they are on-chip, power consumption is
less and size is small. It exhibits better performance by DSP because of better
data flow within pipeline.
132
MEMORY SPACE ORGANIZATION
133
EFFICIENT DATA/PROGRAM FLOW
134
EFFICIENT DATA/PROGRAM FLOW
135
DATA MEMORY
• There are 3 bits available in memory mapped register, PMST for the purpose
of on-chip memory mapping.
• Second bit is OVLY. It implies RAM Overlay. It enables on-chip DARAM data
memory blocks to be mapped into program space. If this bit is 0, on-chip
RAM is addressable in data space but not in Program Space and if it is 1, on-
chip RAM is mapped into Program & Data Space.
136
DATA MEMORY
The third bit is DROM. It enables on-chip DARAM 4-7 to be mapped into
data space. If this bit is 0, on-chip DARAM 4-7 is not mapped into data
space and if this bit is 1, on-chip DARAM 4-7 is mapped into Data Space.
On-chip data memory is partitioned into several regions as shown in table
7.1. Data memory can be on chip / off-chip.
137
EXTERNAL BUS INTERFACING SIGNALS
• In DSP there are 16 external bus interfacing signals. The signal is characterized as
single bit i.e., single line or multiple bits i.e., Multiple lines / bus. It can be
synchronous / asynchronous with clock.
• The signal can be active low / active high. It can be output / input Signal. The
signal carrying line / lines Can be unidirectional / bidirectional Signal. The
characteristics of the signal depend on the purpose it serves
• In external bus interfacing signals, address bus and data bus are multi-lines bus.
Address bus is unidirectional and carries address of the location refereed. Data
bus is bidirectional and carries data to or from DSP. When data lines are not in
use, they are tri-stated.
138
EXTERNAL BUS INTERFACING SIGNALS
• Read/Write Signal is low when DSP is writing and high when DSP is reading.
Strobe Interfacing Signals, Memory Strobe and I/O Strobe both are active
low. They remain low during the entire read & write operations of memory
and I/O operations respectively.
• External Bus Interfacing Signals from 1-8 are all are unidirectional except Data
Bus which is bidirectional. Address Lines are outgoing signals and all other
control signals are also outgoing signals.
139
EXTERNAL BUS INTERFACING SIGNALS
• There are two Interrupt related signals: Interrupt Request and Interrupt
Acknowledge. Both are active low. Interrupt Request typically for data
exchange. For example, between ADC / another Processor.
140
EXTERNAL BUS INTERFACING SIGNALS
141
MEMORY INTERFACE
• The address bus is unidirectional, carries address into the memory IC bus is
bidirectional. Chip Select, Write Enable and Output Enable control signals are
active high or low and they carry signals into the memory ICs. The task of the
memory interface is to use DSP signals and generate the appropriate signals
for setting up communication with the memory.
142
MEMORY INTERFACE
143
TIMING SEQUENCE FOR EXTERNAL MEMORY
ACCESS
• The timing sequence of memory access is shown. There are two read
operations, both referring to program memory. Read Signal is high and
Program Memory Select is low. There is one Write operation referring to
external data memory. Data Memory Select is low and Write Signal low. Read
and write are to memory device and hence memory strobe is low. Internal
program memory reads take one clock cycle and External data memory
access require two clock cycles.
144
TIMING SEQUENCE FOR EXTERNAL MEMORY ACCESS
145
PARALLEL I/O INTERFACE
• I/O devices are interfaced to DSP using unconditional I/O mode, programmed
I/O mode or interrupt I/O mode. Unconditional I/O does not require any
handshaking signals. DSP assumes the readiness of the I/O and transfers the
data with its own speed. Programmed I/O requires handshaking signals.
• DSP waits for the readiness of the I/O readiness signal which is one of the
handshaking signals. After the completion of transaction DSP conveys the
same to the I/O through another handshaking signal. Interrupt I/O also
requires handshaking signals.
• DSP is interrupted by the I/O indicating the readiness of the I/O. DSP
acknowledges the interrupt, attends to the interrupt. Thus, DSP need not
wait for the I/O to respond. It can engage itself in execution as long as there
is no interrupt. 146
PROGRAMMED I /O INTERFACE
•The timing diagram in the case of programmed I/O is shown in figure. I/O
strobe and I/O space select are issued by the DSP. Two clock cycles each are
required for I/O read and I/O write operations.
147
PROGRAMMED I /O FLOWCHART
148
INTERRUPT I/O
This mode of interfacing I/O devices also requires handshaking signals. DSP is
interrupted by the I/O whenever it is ready. DSP Acknowledges the interrupt,
after testing certain conditions, attends to the interrupt. DSP need not wait
for the I/O to respond. It can engage itself in execution.
149
INTERRUPT I/O
150
INTERRUPT I/O
• Registers used in managing interrupts are Interrupt flag Register (IFR) and
Interrupt Mask Register (IMR). IFR maintains pending external & internal
interrupts. One in any bit position implies pending interrupt.
• One flag, Global enable bit (INTM), in ST1 register is used to enable or disable
all interrupts globally. If INTM is zero, all unmasked interrupts are enabled. If
it is one, all maskable interrupts are disabled.
151
INTERRUPT I/O FLOWCHART
152
DIRECT MEMORY ACCESS (DMA) OPERATION
• In any application, there is data transfer between DSP and memory and also
DSP and I/O device, as shown in fig. 7.10. However, there may be need for
transfer of large amount of data between two memory regions or between
memory and I/O. DSP can be involved in such transfer, as shown in fig. 7.11.
• Since amount of data is large, it will engage DSP in data transfer task for a
long time. DSP thus will not get utilized for the purpose it is meant for, i.e.,
data manipulation. The intervention of DSP has to be avoided for two
reasons: to utilize DSP for useful signal processing task and to increase the
speed of transfer by direct data transfer between memory or memory and
I/O. The direct data transfer is referred to as direct memory access (DMA).
The arrangement expected is shown in fig. 7.12. DMA controller helps in data
transfer instead of DSP.
153
DIRECT MEMORY ACCESS (DMA) OPERATION
154
REGISTER SUBADDRESS TECHNIQUE
156
UNIT-5 CLO’S
157
Q-NOTATION
158
Q-NOTATION
159
FIR FILTERS
• A finite impulse response (FIR) filter of order N can be described by the
difference equation.
• FIR filter is used to implement almost any type of digital frequency response.
Usually these filters are designed with a multiplier, adders and a series of delays
to create the output of the filter.
• The following figure shows the basic FIR filter diagram with N length. The
result of delays operates on input samples. The values of hk are the coefficients
which are used for multiplication. So that the o/p at a time and that is the
summation of all the delayed samples multiplied by the appropriate
coefficients.
161
FIR FILTERS
•The implementation requires signal delay for each sample to compute the next
output, y(n+1), is given as y(n+1)=h(N-1)x(n-(N-2))+h(N-2)x(n-(N-3))+
...h(1)x(n)+h(0)x(n+1) Figure shows the memory organization for the
implementation of the filter.
•The filter Coefficients and the signal samples are stored in two circular buffers
each of a size equal to the filter. AR2 is used to point to the samples and AR3 to
the coefficients.
• In order to start with the last product, the pointer register AR2 must be
initialized to access the signal sample x(2-(N-1)), and the pointer register AR3 to
access the filter coefficient h(N-1). As each product is computed and added to the
previous result, the pointers advance circularly. At the end of the computation,
the signal sample pointer is at the oldest sample, which is replaced with the
newest sample to proceed with the next output computation. 162
FIR FILTERS
163
IIR FILTERS
• An infinite impulse response (IIR) filter is represented by a transfer function,
which is a ratio of two polynomials in z.
• To implement such a filter, the difference equation representing the transfer
function can be derived and implemented using multiply and add operations.
• To show such an implementation, we consider a second order transfer
function given by
164
IIR FILTERS
165
INTERPOLATION FILTERS
166
INTERPOLATION FILTERS
167
INTERPOLATION FILTERS
• To implement an ideal interpolation. Figure shows how an interpolating filter
using a 15-tap FIR filter and an interpolation factor of 5 can be implemented. In
this example, each incoming samples is followed by four zeros to increase the
number of samples by a factor of 5.
•The interpolated samples are computed using a program similar to the one
used for a FIR filter implementation.
• One drawback of using the implementation strategy depicted in Figure is that
there are many multiplies in which one of the multiplying elements is zero. Such
multiplies need not be included in computation if the computation is
rearranged to take advantage of this fact.
168
INTERPOLATION FILTERS
• One such scheme, based on generating what are called poly-phase sub-filters,
is available for reducing the computation. For a case where the number of filter
coefficients N is a multiple of the interpolating factor L, the scheme
implements the interpolation filter using the equation.
• A scheme that uses poly-phase sub-filters to implement the
interpolating filter using the 15-tap FIR filter and an interpolation factor
of 5. In this implementation, the 15 filter taps are arranged as shown
and divided into five 3-tap sub filters.
• The input samples x(n), x(n-1) and x(n-2) are used five times to
generate the five output samples. This implementation requires 15
multiplies as opposed to 75 in the direct implementation of Figure
169
INTERPOLATION FILTERS
170
INTERPOLATION FILTERS
171
DECIMATION FILTERS
172
DECIMATION FILTERS
•The cutoff frequency of the filter is chosen so that it is less than half the final
sampling frequency.
• The filtered signal can be decimated by dropping samples. In fact, the samples
that are to be dropped need not be computed at all.
•Thus, the implementation of a decimator is just a FIR filter implementation in
which some of the outputs are not calculated.
• Figure shows a block diagram of a decimation filter. Digital decimation can be
implemented as depicted in Figure for an example of a decimation filter with
decimation factor of 3. It uses a low pass FIR filter with 5 taps. The computation
is similar to that of a FIR filter. However, after computing each output sample,
the signal array is delayed by three sample intervals by bringing the next three
samples into the circular buffer to replace the three oldest samples.
173
DECIMATION FILTERS
Decimation process
174
DECIMATION FILTER
175
INTRODUCTION TO FFT
• The transformed domain manipulations are sometimes simpler. They are also
more useful and powerful than time domain manipulation. For example,
convolution in time domain requires one of the signals to be folded, shifted
and multiplied by another signal, cumulatively. Instead, when the signals to
be convolved are transformed to DFT domain, the two DFT are multiplied and
inverse transform is taken. Thus, it simplifies the process of convolution.
176
INTRODUCTION
177
AN FFT ALGORITHM FOR DFT COMPUTATION
• As DFT / IDFT are part of signal processing system, there is a need for fast
computation of DFT / IDFT. There are algorithms available for fast
computation of DFT/ IDFT. There are referred to as Fast Fourier Transform
(FFT) algorithms.
• There are two FFT algorithms: Decimation-In-Time FFT (DITFFT) and
Decimation-In-Frequency FFT (DIFFFT). The computational complexity of both
the algorithms are of the order of log2(N). From the hardware / software
implementation viewpoint the algorithms have similar structure throughout
the computation. In-place computation is possible reducing the requirement
of large memory locations. The features of FFT are tabulated in the table 6.2.
178
AN FFT ALGORITHM FOR DFT COMPUTATION
179
EXAMPLE OF COMPUTATION OF 2 POINT DFT
180
BUTTERFLY STRUCTURE
181
BUTTERFLY STRUCTURE
• Separating the real and imaginary parts, the four equations to be realized in
implementation of DITFFT Butterfly structure are as in eq(6.6).
182
SIGNAL FLOW GRAPH OF 8 POINT DITFFT COMPUTATION
183
SPECTRUM OF x(n)
•Observe that with N=2^M, the number of stages in signal flow graph=M,
number of multiplications = (N/2)log2(N) and number of additions =
(N/2)log2(N). Number of Butterfly Structures per stage = N/2. They are
identical and hence in-place computation is possible. Also reusability of
hardware designed for implementing Butterfly structure is
•possible.
184
SPECTRUM
185
PROBLEM
• Problem : What minimum size FFT must be used to compute a DFT of 40
points? What must be done to samples before the chosen FFT is applied?
What is the frequency resolution achieved?
• Solution: Minimum size FFT for a 40 point sequence is 64 point FFT. Sequence
is extended to 64 by appending additional 24 zeros. The process improves
frequency resolution from
186
OVERFLOW AND SCALING
• In any processing system, number of bits per data in signal processing is fixed
and it is limited by the DSP processor used. Limited number of bits leads to
overflow and it results in erroneous answer. InQ15 notation, the range of
numbers that can be represented is -1 to 1. If the value of a number exceeds
these limits, there will be underflow / overflow. Data is scaled down to avoid
overflow.
•To find the maximum possible value for LHS term, Differentiate and equate
to zero
189
BIT-REVERSED INDEX GENERATION
• As noted in table 6.2, DITFFT algorithm requires input in bit reversed order.
The input sequence can be arranged in bit reverse order by reverse carry add
operation. Add half of DFT size (=N/2) to the present bit reversed ndex to get
next bit reverse index. And employ reverse carry propagation while adding
bits from left to right. The original index and bit reverse index for N=8 is listed
in table 6.3
190