ECE 320 Spring 2004 by Mark Butala, et al - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub for a complete version.

sections are larger than 1.0 in magnitude. For any stable second-order IIR section, the magnitude

of the "0" and "2" coefficients ( a 0 and a 2 , for example) will always be less than or equal to 1.0.

However, the magnitude of the "1" coefficient can be as large as 2.0. To overcome this problem,

you will have to divide the a 1 and b 1 coefficients by two prior to saving them for your DSP code.

Then, in your implementation, you will have to compensate somehow for using half the

coefficient value.

Repeating code

Rather than write separate code for each second-order section, you are encouraged first to write

one section, then write code that cycles through the second-order section code twice using the

repeat structure below. Because the IIR code will have to run inside the block I/O loop and this

loop uses the block repeat counter (BRC), you must use another looping structure to avoid

corrupting the BRC.

Note

You will have to make sure that your code uses different coefficients and states during the

second cycle of the repeat loop.

stm (num_stages-1),AR1

start_stage

; IIR code goes here

banz start_stage,*AR1-

Gain

It may be necessary to add gain to the output of the system. To do this, simply shift the output left

(which can be done using the ld opcode with its optional shift parameter) before saving the

output to memory.

Grading

Your grade on this lab will be split into three parts:

1 point: Prelab

4 points: Code. Your DSP code implementing the fourth-order IIR filter is worth 3 points and

the MATLAB exercise is worth 1 point.

5 points: Oral quiz. The quiz may cover differences between FIR and IIR filters, the prelab

material, and the MATLAB exercise.

1.5. Lab 4

Lab 4: Theory*

Introduction

In this lab you are going to apply the Fast Fourier Transform ( FFT) to analyze the spectral

content of an input signal in real time. After computing the FFT of a 1024-sample block of input

data, you will then compute the squared magnitude of the sampled spectrum and send it to the

output for display on the oscilloscope. In contrast to the systems you have implemented in the

previous labs, the FFT is an algorithm that operates on blocks of samples at a time. In order to

operate on blocks of samples, you will need to use interrupts to halt processing so that samples

can be transferred.

A second objective of this lab exercise is to introduce the TI-C549 C environment in a practical

DSP application. In future labs, the benefits of using the C environment will become clear as

larger systems are developed. The C environment provides a fast and convenient way to

implement a DSP system using C and assembly modules.

The FFT can be used to analyze the spectral content of a signal. Recall that the FFT is an efficient

algorithm for computing the Discrete Fourier Transform ( DFT), a frequency-sampled version

of the DTFT.

DFT:

index-35_1.jpg

index-35_2.jpg

index-35_3.jpg

()

where nk∈{0, 1, , N−1}

Your implementation will include windowing of the input data prior to the FFT computation. This

is simple a point-by-point multiplication of the input with an analysis window. As you will

explore in the prelab exercises, the choice of window affects the shape of the resulting window.

A block diagram representation of the spectrum analyzer you will implement in the lab, including

the required input and ouput locations, can be found depicted in Figure 1.8.

Figure 1.8.

FFT-based spectrum analyzer

Lab 4: Prelab*

MATLAB Exercise

Since the DFT is a sampled version of the spectrum of a digital signal, it has certain sampling

effects. To explore these sampling effects more thoroughly, we consider the effect of multiplying

the time signal by different window functions and the effect of using zero-padding to increase the

length (and thus the number of sample points) of the DFT. Using the following MATLAB script as

an example, plot the squared-magnitude response of the following test cases over the digital

frequencies

.

1. rectangular window with no zero-padding

2. hamming window with no zero-padding

3. rectangular window with zero-padding by factor of four ( i.e. , 1024-point FFT)

4. hamming window window with zero-padding by factor of four

Window sequences can be generated in MATLAB by using the boxcar and hamming functions.

1 N = 256; % length of test signals

2 num_freqs = 100; % number of frequencies to test

3

4 % Generate vector of frequencies to test

5

6 omega = pi/8 + [0:num_freqs-1]'/num_freqs*pi/4;

7

8 S = zeros(N,num_freqs); % matrix to hold FFT results

9

10

11 for i=1:length(omega) % loop through freq. vector

12 s = sin(omega(i)*[0:N-1]'); % generate test sine wave

13 win = boxcar(N); % use rectangular window

14 s = s.*win; % multiply input by window

15 S(:,i) = (abs(fft(s))).^2; % generate magnitude of FFT

16 % and store as a column of S

17 end

18

19 clf;

20 plot(S); % plot all spectra on same graph

21

Make sure you understand what every line in the script does. What signals are plotted?

You should be able to describe the tradeoff between mainlobe width and sidelobe behavior for the

various window functions. Does zero-padding increase frequency resolution? Are we getting

something for free? What is the relationship between the DFT, X[ k] , and the DTFT, X( ω) , of a sequence x[ n] ?

Lab 4: Lab*

Implementation

As this is your first experience with the C environment, you will have the option to add most of

the required code in C or assembly. A C skeleton will provide access to input samples, output

samples, and interrupt handling code. You will add code to transfer the inputs and outputs (in

blocks at a time), apply a hamming window, compute the magnitude-squared spectrum, and

include a trigger pulse. After the hamming window is created, either an assembly or C module that

bit reverses the input and performs an FFT calculation is called.

As your spectrum analyzer works on a block of samples at a time, you will need to use interrupts

to pause your processing while samples are transferred from/to the CODEC (A/D and D/A) buffer.

Fortunately, the interrupt handling routines have been written for you in a C shell program

available at v:\ece420\54x\dspclib\lab4main.c and the core code.

Interrupt Basics

Interrupts are an essential part of the operation of any microprocessor. They are particularly

important in embedded applications where DSPs are often used. Hardware interrupts provide a

way for interacting with external devices while the processor executes code. For example, in a key

entry system, a key press would generate a hardware interrupt. The system code would then jump

to a specified location in program memory where a routine could process the key input. Interrupts

provide an alternative to polling. Instead of checking for key presses at a predetermined rate

(requires a clock), the system could be busy executing other code. On the TI-C54x DSP, interrupts

provide a convenient way to transfer blocks of data to/from the CODEC in a timely fashion.

Interrupt Handling

The lab4main.c code and the core code are intended to make your interaction with the

hardware much simpler. As there was a core file for working in the assembly environment (Labs

0-3), there is a core file for the C environment (V:\ece420\54x\dspclib\core.asm) which handles

the interrupts from the CODEC (A/D and D/A) and the serial port. Here, we will describe the

important aspects of the core code necessary to complete the assignment.

At the heart of the hardware interaction is the auto-buffering serial port. In the auto-buffering

serial mode, the TI-C54x processor is able to do processing uninterrupted while samples are

transferred to/from a buffer of length BlockLen=64 samples. However, the spectrum analyzer to

be implemented in this lab works over a block of N=1024 samples. If it were possible to compute

a 1024-point FFT in the sample time of one BlockLen, then no additional interrupt handling

routines would be necessary. Samples could be collected in a 1024-length buffer and a 1024-point

FFT could be computed uninterrupted while the auto-buffering buffer fills. Unfortunately, the

DSP is not fast enough to accomplish this task.

We now provide an explanation of the shell C program lab4main.c listed in Appendix A. The lab4main.c file contains the function interrupt void irq and a main program. The

main program is an infinite loop over blocks of N=1024 samples. Note that while the DSP is

executing instructions in this loop, interrupts occur every BlockLen samples. Inside the infinite

loop, you will insert code to do the operations which follow. Although each of these operations

may be performed in C or assembly, we suggest you follow the guidelines suggested.

1. Transfer inputs and outputs (C)

2. Apply a Hamming Window (C/assembly)

3. Bit-reverse the input (C and assembly)

4. Apply an N -point FFT (C and assembly)

5. Compute the magnitude-squared spectrum (C/assembly)

6. Include a trigger pulse (C/assembly)

The function WaitAudio is an assembly function in the core code which handles the CODEC

interrupts. An interrupt from the CODEC occurs every BlockLen samples. The

SetAudioInterrupt(irq) call in the main program tells the core code to jump to the irq

function when an interrupt occurs. In the irq function, BlockLen samples of the A/D input in

Rcvptr (channel 1) are written to a length N inputs buffer, and BlockLen of the output

samples in the outputs buffer are written to the D/A output via Xmitptr on channel 2. In C,

pointers may be used as array names so that Xmitptr[0] is the first word pointed to by

Xmitptr. As in the assembly core, the input samples are not in consecutive order. The right and

left inputs are offset from Rcvptr respectively by 4 i and 4 i+2 , i=0 , …, BlockLen−1 . The six

output channels are accessed consecutively as offsets from Xmitptr. On channel 1 of the output,

the input is echoed out. You are to fill the buffer outputs with the windowed magnitude-

squared FFT values by performing the operations listed above.

In the main code, the while(!input_full); loop waits for N samples to collect in the

inputs buffer. Next, the N inputs and outputs must be transferred. You are to write this portion

of code. This portion of code is to be done first, within BlockLen sample times; otherwise the

first BlockLen of samples of output would not be available on time. Once this loop is finished,

the lengthy processing of the FFT can continue. During this processing, the DSP is interrupted

every BlockLen samples to transfer samples. Once this processing is over, the infinite loop

returns to while(!input_full); to wait for N samples to finish collecting.

The flow diagram in Figure 1.9 summarizes the operation of the interrupt handling routine

index-39_1.jpg

index-39_2.jpg

(b) interrupt handler

(a) main

Figure 1.9.

Overall program flow of the main function and the interrupt handling function.

Assembly FFT Routine

As the list of operations indicates, bit-reversal and FFT computation are to be done in both C and

assembly. For the assembly version, make sure that the line defining C_FFT is commented in

lab4main.c. We are providing you with a shell assembly file, available at

v:\ece420\54x\dspclib\c_fft_given.asm and shown in Appendix B, containing

many useful declarations and some code. The code for performing bit-reversal and other

declarations needed for the FFT routine are also provided in this section. However, we would like

you to enter this code manually, as you will be expected to understand its operation.

The assembly file c_fft_given.asm contains two main parts, the data section starting with

.sect ".data" and the program section starting with .sect ".text". Every function and

variable accessed in C must be preceded by a single underscore _ in assembly and a .global

_name must be placed in the assembly file for linking. In this example, bit_rev_fft is an

assembly function called from the C program with a label _bit_rev_fft in the text portion of

the assembly file and a .global _bit_rev_fft declaration. In each assembly function, the

macro ENTER_ASM is called upon entering and LEAVE_ASM is called upon exiting. These macros

are defined in v:\ece420\54x\dspclib\core.inc. The ENTER_ASM macro saves the

status registers and AR1, AR6, and AR7 when entering a function as required by the register use

conventions. The ENTER_ASM macro also sets the status registers to the assembly conventions we

have been using (i.e, FRCT=1 for fractional arithmetic and CPL=0 for DP referenced addressing).

The LEAVE_ASM macro just restores the saved registers.

Parameter Passing

The parameter passing convention between assembly and C is simple for single input, single

output assembly functions. From a C program, the input to an assembly program is in the low part

of accumulator A with the output returned in the same place. When more than one parameter is

passed to an assembly function, the parameters are passed on the stack (see the core file

description for more information). We suggest that you avoid passing or returning more than one

parameter. Instead, use global memory addresses to pass in or return more than one parameter.

Another alternative is to pass a pointer to the start of a buffer intended for passing and returning

parameters.

Registers Modified

When entering and leaving an assembly function, the ENTER_ASM and LEAVE_ASM macros

ensure that certain registers are saved and restored. Since the C program may use any and all

registers, the state of a register cannot be expected to remain the same between calls to assembly

function(s). Therefore, any information that needs to be preserved across calls to an assembly

function must be saved to memory!

Now, we explain how to use the FFT routine provided by TI for the C54x. The FFT routine

fft.asm located in v:\ece420\54x\dsplib\ computes an in-place, complex FFT. The

length of the FFT is defined as a label K_FFT_SIZE and the algorithm assumes that the input

starts at data memory location _fft_data. To have your code assemble for an N -point FFT,

you will have to include the following label definitions in your assembly code.

N .set 1024

K_FFT_SIZE .set N ; size of FFT

K_LOGN .set 10 ; number of stages (log_2(N))

In addition to defining these constants, you will have to include twiddle-factor tables for the FFT.

These tables (twiddle1 and twiddle2) are available in the shared directory v:\ece420\54x\dsplib\. Note that the tables are each N points long representing values

from 0 to just shy of 180 degrees and must be accessed using a circular pointer. To include these

tables at the proper location in memory with the appropriate labels referenced by the FFT, use the

following

.sect ".data"

.align 1024

sine .copy "v:\ece420\54x\dsplib\twiddle1"

.align 1024

cosine .copy "v:\ece420\54x\dsplib\twiddle2"

The FFT provided requires that the input be in bit-reversed order, with alternating real and

imaginary components. Bit-reversed addressing is a convenient way to order input x[ n] into a FFT

so that the output X( k) is in sequential order ( i.e. X(0) , X(1) , …, X( N−1) for an N -point FFT).

The following table illustrates the bit-reversed order for an eight-point sequence.

Table 1.2.

Input Order Binary Representation Bit-Reversed Representation Output Order

0

000

000

0

1

001

100

4

2

010

010

2

3

011

110

6

4

100

001

1

5

101

101

5

6

110

011

3

7

111

111

7

The following routine performs the bit-reversed reordering of the input data. The routine assumes

that the input is stored in data memory starting at the location labeled _bit_rev_data, which

must be aligned to the least power of two greater than the input buffer length, and consists of

alternating real and imaginary parts. Because our input data is going to be purely real in this lab,

you will have to make sure that you set the imaginary parts to zero by zeroing out every other

memory location.

1 bit_rev:

2 STM #_bit_rev_data,AR3 ; AR3 -> original input

3 STM #_fft_data,AR7 ; AR7 -> data processing buffer

4 MVMM AR7,AR2 ; AR2 -> bit-reversed data

5 STM #K_FFT_SIZE-1,BRC

6 RPTBD bit_rev_end-1

7 STM #K_FFT_SIZE,AR0 ; AR0 = 1/2 size of circ buffer

8 MVDD *AR3+,*AR2+

9 MVDD *AR3-,*AR2+

10 MAR *AR3+0B

11 bit_rev_end:

12 NOP

13 RET

As mentioned, in the above code _bit_rev_data is a label indicating the start of the input data

and _fft_data is a label indicating the start of a circular buffer where the bit-reversed data will

be written. Note that although AR7 is not used by the bit-reversed routine directly, it is used

extensively in the FFT routine to keep track of the start of the FFT data space.

In general, to have a pointer index memory in bit-reversed order, the AR0 register needs to be set

to one-half the length of the circular buffer; a statement such as ARx+0B is used to move the ARx

pointer to the next location. For more information regarding the bit-reversed addressing mode,

refer to page 5-18 in the TI-54x CPU and Peripherals manual [url]. Is it possible to bit-reverse a buffer in place? For a diagram of the ordering of the data expected by the FFT routine, see Figure

4-10 in the TI-54x Applications Guide [url]. Note that the FFT code uses all the pointers available and does not restore the pointers to their original values.

C FFT Routine

A bit-reversing and FFT routine have also been provided in lab4fft.c, listed in Appendix C.

Again, make sure you understand how the bit reversal is taking place. In lab4main.c, the

line defining C_FFT must not be commented for use of the C FFT routine. The sine tables

(twiddle factors) are located in sinetables.h. This fft requires its inputs in two buffers, the real buffer real and the imaginary buffer imag, and the output is placed in the same buffers. The

length of the FFT, N, and logN are defined in lab4.h, which is also listed in Appendix C.

When experimenting with the C FFT make sure you modify these length values instead of the

ones in the assembly code and lab4main.c!

Creating the Window

As mentioned, you will be using the FFT to compute the spectrum of a windowed input. For your

implementation you will need to create a 1024-point Hamming window. First, create a Hamming

window in matlab using the function hamming. For the assembly FFT, use save_coef to save

the window to a file that can then be included in your code with the .copy directive. For the C

FFT, use the matlab function write_intvector_headerfile with name set to 'window'

and elemperline set to 8 to create the header file that is included in lab4main.c.

Displaying the Spectrum

Once the DFT has been computed, you must calculate the squared magnitude of the spectrum for

display.

()

(| X( k)|)2=(ℜ( X( k)))2+(ℑ( X( k)))2

You may find the assembly instructions squr and squra useful in implementing Equation.

Because the squared magnitude is always nonnegative, you can replace one of the magnitude

values with a -1.0 as a trigger pulse for display on the oscilloscope. This is easily performed by

replacing the DC term ( k=0 ) with a -1.0 when copying the magnitude values to the output buffer.

The trigger pulse is necessary for the oscilloscope to lock to a specific point in the spectrum and

keep the spectrum fixed on the scope.

Intrinsics

If you are planning on writing some of the code in C, then you may be forced to use intrinsics.

Intrinsic instructions provide a way to use assembly instructions directly in C. An example of an

intrinsic instruction is bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0])

which performs the assembly signed multiply round instruction. You may also find the _lsmpy

instruction useful. For more information on intrinsics, see page 6-22 of the TI-C54x Optimizing

C/C++ Compiler User's Guide [url].

The following lines of code were borrowed from the C FFT to serve as an example of arithmetic

operations in C. Save this code in a file called mathex.c and compile this file. Look at the

resulting assembly file and investigate the differences between each block. Be sure to reference

the compiler user's guide to find out what the state of the FRCT and OVM bits are. Does each

block work properly for all possible values?

void main(void)

{

int s1, s2;