ELEC 301 Projects Fall 2005 by Danny Blanco, et al - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.


ELEC 301 Projects Fall 2005

Collection edited by: Richard Baraniuk and Rice University ELEC 301

Content authors: Danny Blanco, Elliot Ng, Charlie Ice, Bryan Grandy, Sara Joiner, Austin

Bratton, Ray Hwong, Jeanne Guillory, Richard Hall, Jared Flatow, Siddharth Gupta, Veena

Padmanabhan, Grant Lee, Heather Johnston, Deborah Miller, Warren Scott, _ _, Chris

Lamontagne, Bryce Luna, David Newell, William Howison, Patrick Kruse, Kyle Ringgenberg,

Michael Lawrence, Yi-Chieh Wu, Scott Novich, Andrea Trevino, and Phil Repicky

Online: < http://cnx.org/content/col10380/1.3> This selection and arrangement of content as a collection is copyrighted by Richard Baraniuk and Rice University ELEC 301.

It is licensed under the Creative Commons Attribution License: http://creativecommons.org/licenses/by/2.0/

Collection structure revised: 2007/09/25

For copyright and attribution information for the modules contained in this collection, see the " Attributions" section at the end of the collection.

ELEC 301 Projects Fall 2005

Table of Contents

Chapter 1. Steganography - What's In Your Picture

1.1. Abstract and History

Abstract and History


A Brief History of Steganography

1.2. Compression Framework


Compression Framework

1.3. Compression - Dropping the DCT Coefficients

Compression Algorithm

Dropping DCT Coefficients

1.4. Compression - Zeros Grouping

Compression Algorithm

Zeros Grouping

1.5. Zeros Hiding Method

Data Hiding Methods

Zero Hiding

Hiding Information

Data Retrieval

1.6. Bit-O-Steg Method - Background

Data Hiding Methods


Previous Work and Background

1.7. Bit-O-Steg Hiding

Data Hiding Methods


Hiding Information

Retrieving the Data

1.8. Importance of Steganalysis


Importance of Steganalysis

1.9. Steganalysis - Zeros Hiding Detection


Zeros Hiding Detection

1.10. Steganalysis - Bit-O-Steg Detection


Bit-o-steg detection

1.11. Future Considerations and Conclusions

Future Considerations and Conclusion

Future Work


1.12. Works Cited

Works Cited

1.13. Steganography Matlab Code

1.14. Group Members

Group Bio

Chapter 2. Investigation of Delay and Sum Beamforming Using a Two-Dimensional Array

2.1. Delay and Sum Beamforming with a 2D Array: Introduction

1. Introduction

2. Delay and Sum Beamforming

2.1 Nearfield Processing

2.2 Farfield Processing

3. Complications

3.1 Time Quantization

3.2 Aliasing, Resolution, and Sampling Frequency

3.3 Unknown Source Location

2.2. Hardware Setup for Perimeter Array of Microphones

Hardware Setup for Perimeter Array of Microphones

Choosing Microphones

Data Acquisition


Putting it all Together

2.3. Labview Implementation of 2D Array Delay and Sum Beamformer


Waveform Generation VI

Upsampling VI

Delay Generation VI

Main Analysis VI

Labview Code

2.4. Results of the Testing of 2D Array Beamformer

2.5. Delay and Sum Beamforming with a 2D Array: Conclusions

Summary of Results of Data

Limitations of Hardware and Computing Power

Possible Extensions

2.6. Expressing Appreciation for the Assistance of Others

Chapter 3. Seeing Using Sounds

3.1. Introduction and Background for Seeing with Sound


Background and Problems

3.2. Seeing using Sound - Design Overview

Input Filtering

The Mapping Process

3.3. Canny Edge Detection

Introduction to Edge Detection

Canny Edge Detection and Seeing Using Sound

3.4. Seeing using Sound's Mapping Algorithm

Vertical Mapping

Horizontal Mapping

Color Mapping

3.5. Demonstrations of Seeing using Sound


3.6. Final Remarks on Seeing using Sound

Future Considerations and Conclusions

Contact Information of Group Members

Chapter 4. Intelligent Motion Detection Using Compressed Sensing

4.1. Intelligent Motion Detection and Compressed Sensing

New Camera Technology with New Challenges

4.2. Compressed Sensing

4.3. Feature Extraction from CS Data

Can Random Noise Yield Specific Information?

Simplicity for Low Power

Investigation Goals

4.4. Methodology for Extracting Information from "Random" Measurements

Simulating Compressed Sensing

Random, On Average

Resolution Limit

4.5. Idealized Data for Motion Detection

Making Frames

Making Movies

4.6. Speed Calculation: the Details


Average Absolute Change to Measure Speed

Average Squared Change to Measure Speed

4.7. Ability to Detect Speed: Results

Calculations Performed on Each Movie Clip

Velocity Trends

4.8. Concluding Remarks for CS Motion Detection

4.9. Future Work in CS Motion Detection

4.10. Support Vector Machines

4.11. The Team and Acknowledgements

The Team


Chapter 5. Terahertz Ray Reflection Computerized Tomography

5.1. Introduction-Experimental Setup

T-rays: appropriateness for imaging applications

Experimental setup that provided the data used in this project

Two main steps for imaging the test object: I) Deconvolution, II) Reconstruction

5.2. Description/Manipulation of Data

5.3. Deconvolution with Inverse and Weiner Filters

Problem Statement

Inverse Filter

Wiener Filter

5.4. Results of Deconvolution

5.5. Reconstruction

Theory of Filtered Backprojection Algorithm (FBP)

5.6. Backprojection Implementation




Representative Results

5.7. Conclusions and References


Future Work


5.8. Team Incredible

Chapter 6. Filtering and Analysis of Heart Rhythms

6.1. Introduction to Electrocardiogram Signals


6.2. Medical Background

6.3. Block Diagram/Method

6.4. Sample Outputs

6.5. Overall Results and Conclusions

6.6. MATLAB Analysis Code

6.7. Group Members

Chapter 7. Naive Room Response Deconvolution

7.1. Introduction to Naive Acoustic Deconvolution

7.2. Naive Deconvolution Theory

7.3. Recording the Impulse Response of a Room

7.4. The Effectiveness of Naive Audio Deconvolution in a Room

7.5. Problems and Future Considerations in Naive Room Response Deconvolution

7.6. Authors' Contact Information

7.7. Room Response Deconvolution M-Files

Chapter 8. Musical Instrument Recognition

8.1. Introduction


8.2. Simple Music Theory as it relates to Signal Processing

Simple Music Theory


Duration and Volume

8.3. Common Music Terms

8.4. Matched Filter Based Detection

Shortcomings of the Matched Filter

8.5. System Overview

8.6. Pitch Detection

Pitch Detection

8.7. Sinusoidal Harmonic Modeling

Sinusoid Harmonic Modeling

8.8. Audio Features


How We Chose Features


8.9. Problems in Polyphonic Detection

8.10. Experimental Data and Results

Experimental Data





Monophonic Recordings

Polyphonic Recordings

8.11. Gaussian Mixture Model

Gaussian Mixture Model

Recognizing Spectral Patterns


8.12. Future Work in Musical Recognition

Improving the Gaussian Mixture Model

Improving training data

Increasing the scope

Improving Pitch Detection

8.13. Acknowledgements and Inquiries

8.14. Patrick Kruse

Patrick Alan Kruse

8.15. Kyle Ringgenberg

Kyle Martin Ringgenberg

8.16. Yi-Chieh Jessica Wu

Chapter 9. Accent Classification using Neural Networks

9.1. Introduction to Accent Classification with Neural Networks



Design Choices


9.2. Formants and Phonetics

Sample Spectograms

9.3. Collection of Samples

Choosing the sample set

9.4. Extracting Formants from Vowel Samples

9.5. Neural Network Design

9.6. Neural Network-based Accent Classification Results


Test 1: Chinese Subject

Test 2: Iranian Subject

Test 3: Chinese Subject

Test 4: Chinese Subject

Test 5: American Subject (Hybrid of Regions)

Test 6: Russian Subject

Test 7: Russian Subject

Test 8: Cantonese Subject

Test 1: Korean Subject

9.7. Conclusions and References





Chapter 1. Steganography - What's In Your Picture

1.1. Abstract and History*

Abstract and History


For years, people have devised different techniques for encrypting data while others have

attempted to break these encrypted codes. For our project we decided to put our wealth of DSP

knowledge to use in the art of steganography. Steganography is a technique that allows one to hide

binary data within an image while adding few noticeable changes. Technological advancements

over the past decade or so have brought terms like “mp3,” “jpeg,” and “mpeg” into our everyday

vocabulary. These lossy compression techniques lend themselves perfectly for hiding data. We

have chosen this project because it gives a chance to study several various aspects of DSP. First,

we devised our own compression technique which we loosely based off jpeg. There have been

many steganographic techniques created so far, which compelled us to create two of our own

strategies for hiding data in the images we compress. Our first method, zero hiding, adds the

binary data into the DCT coefficients dropped in compression. Our other method, which we called

bit-o-steg, uses a key to change the values of coefficients that remain after compression. Finally,

we had to find ways to analyze the success of our data hiding strategies, so through our research

we found both DSP and statistical methods to qualitatively measure our work.

A Brief History of Steganography

Steganography, or “hidden writing” can be traced back to 440 BC in ancient Greece. Often they

would write a message on a wooden panel, cover it in wax, and then write a message on the wax.

These wax tablets were already used as writing utensils, so the hiding of a message in a commonly

used device draws very little suspicion. In addition to use by the Greeks, the practice of

steganography was utilized by spies in World War II. There were even rumors that terrorists made

use of steganography early in 2001 to plan the attacks of September 11

1.2. Compression Framework*



Compression Framework

There are many picture file formats to save images to, however much of the research in

steganography is done using the JPEG format. JPEG is a very common and uses a relatively

straightforward compression algorithm. Although there are several JPEG compression scripts

written for MATLAB, customizing them for our purposes and getting the output to work with the

JPEG format would have shifted the focus of our project from steganography to implementing

JPEG compression. Thus we decided to implement our own custom image framework that would

be similar to JPEG but much more straightforward.

1.3. Compression - Dropping the DCT Coefficients*

Compression Algorithm

Dropping DCT Coefficients

Our framework and JPEG are both based around the discrete cosine transform. Just like with

sound, certain frequencies in an image are more noticeable than others, so taking them out of the

image doesn’t change the image much. We used the 2D discrete cosine transform (DCT) as seen

in equation 1 to take an image and converts it into the frequencies that make up the image, in

other words it takes us into the frequency domain.


There are several transforms that could have been utilized to get the image into the frequency

domain. The DCT, however, is a purely real transform. Thus, manipulating the frequencies is

much more straightforward compared to other transforms. From here we could take the DCT of

the entire image and then throw away frequencies that are less noticeable. Unfortunately this

would make the image blurry and cause the image to lose edges. To solve this problem the image

is divided into 8x8 blocks, to preserve the integrity of the image. To drop insignificant

frequencies, JPEG compression utilizes a quantization matrix. We simplified this process by using

a threshold value and dropping frequencies below the threshold. Thus our compression algorithm

models the basic functionality of the JPEG standard.

Figure 1.1.

The result of taking the DCT. The numbers in red are the coefficients that fall below the specified threshold of 10.

1.4. Compression - Zeros Grouping*

Compression Algorithm

Zeros Grouping

The second part to our image framework is zeros grouping. Just like the JPEG standard, the

algorithm utilizes a zig-zag pattern that goes through each DCT matrix and creates a 64-length

vector for each matrix. The advantage of the zig-zag pattern is that it groups the resulting vector

from low frequencies to high frequencies. Groups of zeros are then replaced with an ASCII

character representing how many zeros are represented within that group.

Figure 1.2.

Zig-zag method traverses the matrix and vectorizes the matrix. After grouping zeros the resulting bitstream is sent to a file.

With this simple framework in place, we are able to model a real world image compression

algorithm and focus on implementing steganography.

1.5. Zeros Hiding Method*

Data Hiding Methods

Zero Hiding

Hiding Information

We arrived at our first data hiding method, which we called “zero hiding,” quite intuitively. If you

recall, our compression algorithm removed the least important DCT coefficients. It follows, then,

that we could put the bit stream we wish to hide back into these dropped coefficients without

changing the image drastically. To do this though, there must be a way to distinguish a zero which

resulted from a dropped coefficient and a coefficient that is zero. To do this, we ran the image

through a modified compressor that, instead of dropping coefficients below the specified

threshold, replaced them with either a plus or minus one, depending on the sign of the coefficient.

Figure 1.3.

The DCT is taken and then each coefficient under the specified threshold (10) will be dropped. These are coefficients are shown in blue in the picture on the right.

Next the hiding algorithm is given a binary data stream and the threshold value. The data stream is

then divided up into words. However, the maximum decimal value of the word must be less than

the threshold, since values over the threshold signify an important coefficient in the picture. We

then increment each word’s decimal value by one to avoid putting in zero valued coefficients,

which would otherwise be indistinguishable from zero valued coefficients in the original image.

We then go back to the original coefficients matrix and replace the ones with the new value of the

data word, maintaining the sign throughout.

Figure 1.4.

The dropped coefficients are replaced with words created from the data stream. The IDCT is then taken, transforming the coefficient matrix back to a picture matrix.

Data Retrieval

To recover the hidden data the recovery script is given the threshold, and subtracts one from all

DCT coefficients blow that threshold and tacks their binary values together, forming the original

binary data.

1.6. Bit-O-Steg Method - Background*

Data Hiding Methods


Previous Work and Background

In our research we found a steganographic method known as JSteg, created by Derek Upsham. The

basic premise behind JSteg is that its algorithm hides the data sequentially within the least

significant bits of the DCT coefficients (Niels and Honeyman). The problem with JSteg is that it is

not very secure; there is no secret key with which it is encrypted. Therefore, anybody that knows

an image contains data with JSteg hiding can easily retrieve the hidden message. Our second

hiding method, which we have called bit-o-steg, improves upon the JSteg algorithm since we

employ the use of a key when hiding the data.

1.7. Bit-O-Steg Hiding*

Data Hiding Methods


Hiding Information

As you should recall, our zeros hiding method inserts data into the dropped coefficients of the

DCT. The bit-o-steg algorithm hides data within the coefficients that were not dropped. The

critical part of bit-o-steg is the key used to encrypt the data. This user defined key selects which

nonzero coefficients to change and which bits to change within each coefficient. The simplest key

would be a key of [1]. This would change each coefficient sequentially and change the last bit in

the coefficient.

Figure 1.5.

The key is what makes bit-o-steg unique from other algorithms. Here a key of [1 2] is applied to hide the data.

As you can see in figure 1, we chose a key of [1 2]. The key will select the first coefficient and its

least significant bit and input the first bit of the hidden data into that coefficient bit. Then the key

will count two coefficients and take the second least significant bit and repeat the hiding process.

Since this is the end of the key, it repeats, selecting the next coefficient. The length of this key has

no real bound, but it must ensure that all data is hidden before reaching the last DCT coefficient in

the image. There is, however, a range of values that must be selected for the key to work. Since

the key alters bits, values between one and eight must be used. However, if larger values are used,

it will alter the image greatly since it changes more and more significant bits.

Figure 1.6.

Minimal changes have been made to the picture matrix after the application of the bit-o-steg algorithm

Retrieving the Data

Retrieving the data is impossible unless you have the special key used to hide the data. Once you

get the key you simply reverse apply the key, extracting rather than inputting the bits and

reconstruct your hidden data stream from those bits.

1.8. Importance of Steganalysis*


Importance of Steganalysis

Image steganalysis is the science of analyzing images in order to discover methods of discovering

and detecting hidden messages and data within the images. Statistical digital signal processing is

often used in order to detect data within images.

It is important to detect hidden messages within the images. On the steganography side, this is

important in order to find methods in order to improve the algorithm implementing

steganography. By exposing the flaws to the algorithm, the user can further improve the algorithm

in order to make it more difficult to detect whether or not data is hidden in the images.

Steganalysis is also especially important in the security aspect, namely monitoring a user’s

communication with the outside world. In the age of Internet, images are sent via email or by

posting on websites. Detecting whether or not data is hidden in the images will allow the monitor

to further analyze the suspicious images in order find what the hidden message is.

Figure 1.7.

Can you tell if there is hidden data?


1.9. Steganalysis - Zeros Hiding Detection*


Zeros Hiding Detection

In order to find data hiding with our zeros hidden method, we first analyzed the histogram of the

DCT coefficients of an uncompressed image, compressed image without data, and compressed

image with hidden data. The histogram of the DCT coefficients reveals the number of times each

DCT coefficient value appears within the DCT matrix. From the analysis of an uncompressed

image (Figure 1), the histogram has a smooth curve. In the histogram of compressed image

(Figure 2), values before the threshold are dropped. Therefore, those values dropped to zero in the

histogram. The histogram of compressed image with data (Figure 3) shows a similar shape to an

uncompressed image. However, the values are much lower which makes sense since we are

replacing the values that were originally going to be dropped with data. Therefore it is statically

less likely to replace the dropped value with the same value.

Figure 1.8.

Figure 1.9.

Figure 1.10.

Therefore, after analyzing the histogram of the different types of images, we did an analysis of the

l2 norm in the DCT matrix. If the analysis results in no power in the one valued DCT coefficients,

it is a compressed image. This is due to the fact that ones are the minimum value that can be

dropped. If there is power in the ones, then the image is either uncompressed or contains hidden

data. The key difference between the two is the magnitude of the power in the ones. Statistically,

it is less likely that every dropped coefficient gets replaced with a one. Therefore, the magnitude

of the power in the ones in an image with data is lower than a compressed image. An image with

hidden data will on average fall below a certain threshold. This threshold is dependent on the

image size. Figure 4 shows the plots of the power without data, the power with data, and the

threshold. Clearly, the power without data is greater than the power with data. We found our

detection program to have a 90% success rate but resulted in a false-positive 12% of the time.


L2 Norm Equation

Figure 1.11.

1.10. Steganalysis - Bit-O-Steg Detection*


Bit-o-steg detection

Due to the complexity of bit-o-steg, we turned to previous research to find a viable detection

method. Each entry in the 8x8 blocks has a specific probability distribution. The distribution is

found by looking at the values of that entry slot across the entire image. Figure 1 shows a

histogram of an entry without data. The histogram looks at the DCT coefficient value and counts

how often that value appears within that entry slot. Figure 2 shows a histogram of an entry with

data. Comparing the two figures, there is a sudden drop around the 0 value in the histogram of an

entry with data. The histogram of an entry with data also appears to smooth out.

These distributions are defined by their own characteristic functions. The bit-o-steg hiding distorts

that distribution by randomly changes certain entries thus altering the function. Using the inner

product, we could test for a match between the characteristic function and the suspect image’s

probability distribution. Unfortunately, the distribution functions vary based on the subject of the

picture. Furthermore, we lack the statistical background necessary to classify these distributions

and properly identify the characteristic functions. Thus, implementing bit-o-steg detection proved

to be beyond the scope of this project.

Figure 1.12.

Figure 1.13.

1.11. Future Considerations and Conclusions*

Future Considerations and Conclusion

Future Work

Due to time and computing limitations, we could not explore all facets of steganography and

detection techniques. As you saw, we studied the power in our pictures to test for hidden data.

Another method which we were unable to explore was to analyze the noise of the pictures. Adding

hidden data adds random noise, so it follows that a properly tuned noise detection algorithm could

recognize whether or not a picture had steganographic data or not.


We explored several steganography techniques and the various detection algorithms associated

with them. By using the properties of the DCT and our understanding of the frequency domain we

developed the zeros hiding method. Zeros hiding proved to be easier to analyze than bit-o-steg and

can hide significantly more data. Unfortunately its ease of detection makes it a less secure

method. After researching various techniques already implemented, we chose to improve upon

one, thus creating our bit-o-steg method. Bit-o-steg can only hide data in coefficients that were

not dropped, thus limiting the amount of data we can hide. However, it greatly enhances the

effectiveness of the steganography since it uses a key, making it much more challenging to detect.

In the end we found both effective, but the complexity of bit-o-steg makes it more promising.

Detection of our methods was critical to the breadth of our project. By investigating the power in

various components of our images we discovered how to detect data hidden via the zero hiding

method. Detecting bit-o-steg required us to draw on past steganography research and statistically

analyze the effects of this type of data hiding. The methods and accompanying detection schemes

we developed broadened our understanding of steganography, which, unlike encryption, allows

secret data to be traded hands without raising an eyebrow.

1.12. Works Cited*

Works Cited

Cabeen, Ken, and Peter Gent. “Image Compression and the Discrete Cosine Transform.” College

of the Redwoods.

< http://online.redwoods.cc.ca.us/instruct/darnold/laproj/Fall98/PKen/dct.pdf >

Johnson, Neil F., and Sushil Jajodia. “Exploring Steganography: Seeing the Unseen.” George

Mason University. < http://www.jjtc.com/pub/r2026.pdf> Johnson, Neil F., and Sushil Jajodia. “Steganalysis: The Investigation of Hidden Information.”

George Mason University. < http://ieeexplore.ieee.org/iel4/5774/15421/00713394.pdf?


Judge, James C. “Steganography: Past, Present, Future.”

< http://www.sans.org/rr/whitepapers/stenganography/552.php>

Provos, Niels, and Peter Honeyman. “CITI Technical Report 01-11. Detecting Steganographic

Content on the Internet.” University of Michigan.

< http://www.citi.umich.edu/techreports/reports/citi-tr-01-11.pdf>

Provos, Niels, and Peter Honeyman. “Hide and Seek: An Introduction to Steganography.”

University of Michigan. < http://niels.xtdnet.nl/papers/practical.pdf>

Sallee, Phil. “Model-based Steganography.” University of California, Davis.

< http://redwood.ucdavis.edu/phil/papers/iwdw03.pdf> Silman, Joshua. “Steganography and Steganalysis: An Overview.”

< http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=241>

Wang, Huaiqing, and Shuozhong Wang. “Cyber warfare: Steganography vs. steganalysis.”

Communications of the ACM, Volume 47, Number 10. < http://acmqueue.com/modules.php?


1.13. Steganography Matlab Code*

detect_data_power.m Detects if the given image has data hidden in it with the zeros hiding method

hidden_zeros_read.m Reads data hidden by zeros hiding method from an image

image_hist.m Creates histogram of DCT coefficients

image_stat.m Determines the threshold value for zero hiding detection

invimageproc.m Takes tiled matrix and converts to image matrix

jvector.m Traverses 8x8 block using the zig-zag pattern

mat2DCEB.m Compresses image and returns tiled matrix

readdata.m Reads in binary data from a file

secread.m Reads in data hidden by bit-o-steg from an image

signed_mat2DCEB.m Modified compressor used in zeros hiding

std_stegcompress.m Groups zeros together and saves image

stegcompress.m Takes a compressed image and hides data in it using bit-o-steg

writedata.m Writes binary data to a file 1.14. Group Members*

Group Bio

Figure 1.14. Elliot Ng, Jones 2007

Figure 1.15. Bryan Grandy, Brown 2007

Figure 1.16. Charlie Ice, Brown 2007

Figure 1.17. Danny Blanco, Jones 2006


Chapter 2. Investigation of Delay and Sum

Beamforming Using a Two-Dimensional Array

2.1. Delay and Sum Beamforming with a 2D Array:


1. Introduction

Beamforming is the discipline that takes a set of microphones, usually in an array, and a set of

point source signals (in a space that we assumed to be R3 or very nearly so) and tries to focus on

one signal source to the exclusion of both noise and other signal sources. In this project, we assume a single source and use the old and powerful technique of delay and sum beamforming

implemented over a 2-dimensional array arranged as a 3 by 3 square with the center missing.

Having a 2-dimensional array allows the location of a source to be determined up to an ambiguity

of reflection across plane of the array.

2. Delay and Sum Beamforming

Delay and sum beamforming is quite true to its name as it merely takes the set of signals, delays

and maybe weights them by varying amounts, and then adds them all together. The size of the

delays is determined by the direction (for farfield) or point (for nearfield) at which the set of

microphones is aimed. Despite its simplicity, delay and sum manages to achieve optimal noise

suppression for the case of a point source in a background of white noise. Of course, normal signal

processing applies, and one can do better than just delay and sum if information about the signal

other than location is known a priori. For example, if it is known that the signal is bandlimited and

baseband, then a suitable lowpass filter can be applied to further suppress noise.

2.1 Nearfield Processing

Though not implemented, nearfield calculations are both more computationally intensive and

accurate. If it is assumed that the microphones have some sort of center for distance, then the

center can be designated as the origin for the coordinate system. A point source at a point (xs, ys,

zs) would then emit a signal s(t). A microphone at a point (xm, ym, zm) would then receive a

signal m(t). Assuming that signal propogates uniformly with speed v and that signal strength is

equal to the original signal strength divided by the square of the distance, we can conclude that the


received signal is:

Figure 2.1.

2.2 Farfield Processing

In this project, it was assumed that the array was always operating in farfield, an approximation in

which the source is assumed to be far enough away that the spherical waves it emits can be

approximated with plane waves. It is accurate in the limit where the distance between the

microphones and the source is large enough so that the angle between the source and each

microphone does not change significantly.

3. Complications

3.1 Time Quantization

Since all of the processing is done in a digital environment, we must work with samples of the

signals and not the signals themselves. Because of this, it is not possible to implement an arbitrary

time shift as any shift must be done in increments of the sample period. To remedy this, the

signals were interpolated digitally by upsampling them and then putting them through a lowpass

filter with cutoff corresponding to the amount of upsampling. An equiripple filter was chosen for

the lowpass filter as there appears to be no constraints as to the exact shape of the filter and

because an equiripple filter would avoid the Gibb’s phenomena found in a direct approximation of

an ideal lowpass filter. Using this interpolation, greater resolution can be achieved in the time

shifts, though the drawback is the large amount of additional data that must now also be

processed. In fact, even though the concept of delay and sum is incredibly simple, the amount of

computation that must be done because of the upsampling is often prohibitively high. It is

impossible for the amount of interpolation to be too high, but if it is too low, then it is entirely

possible that the direction of the source will be inaccurate or entirely wrong as the algorithm will

be unable to shift the signals to where they match enough.

3.2 Aliasing, Resolution, and Sampling Frequency

Given a fixed sampling frequency, there is always the ”normal” aliasing associated the Nyquist

Theorem, restricting the fully reconstructable signals to those that are bandlimited to half of the

sampling frequency. Something similar occurs with array spacing, and if proper care is not taken,

aliasing may occur in spatial dimensions. Using the spatial analogue of the Nyquist Theorem, the

minimum spacing between microphones must be at most half the wavelength corresponding to

highest frequency present. Thus, to achieve any resolution at all for higher frequency signals,

smaller arrays must be used; however, with a smaller array, the precision with which a direction

can be determined is diminished. It appears that there is an uncertainty principal at odds with

beamforming in its spatial dimensions.

3.3 Unknown Source Location

This is the main focus of the project: to try to locate a source using an array of microphones and

then focus the array in the direction of the source, obtaining greater suppression of noise than

would be possible using only one microphone. Since the direction of the source is unknown, we

decided to scan for the source by sweeping all possibilities. This is where the far field

approximation significantly reduces computational complexity. Using nearfield, any algorithm

would be forced to evaluate all possible combinations of three coordinates. With farfield, there are

only two angles to deal with as opposed to three coordinates so there is far less to compute.

Due to lack of computing power, we were forced to make a few, less-than-desirable assumptions

in order to make the algorithm run at all without crashing. One of these simplifications was using

only three of the microphones to perform the sweep of possible angles. A further simplification

was to assume that the three microphones could be broken into two pairs in the calculations for

determining the pair of angles from which the maximum was coming from. Further, hardware and

computer limitations limited sampling to a rate of 8000 Hz from each of eight microphones and

made the processing cost of upsampling prohibitive beyond a factor of around 10.

2.2. Hardware Setup for Perimeter Array of


Hardware Setup for Perimeter Array of Microphones

Choosing Microphones

Before you can do cool things such as beamforming by processing signals from microphones, you

need a way to gather the signals. A few essentials components in this are the microphones

themselves, the analog-to-digital converter, and in many cases a preamplifier for the microphones.

The type of microphones we used in our project were Electret Microphone Elements. We chose these because they:

are omni-directional

have a good, even, frequency response

run off a battery

are small (and inexpensive too)

Data Acquisition

The analog-to-digital converter we used was a DAQPad device generously lent to us by National



The DAQ device requires an input signal of 50 mV to 10V, and the Microphone Elements output

signals in the 20-200 μV range. So we built preamplifiers that took in the signal from the

microphones and amplified it 3158.44x. We used LM324 quad-operational amplifiers, with 56.2

kOhm and 1 kOhm resistors. At the inputs of each amplifier is a 2.2 uF capacitor designed to

eliminate a slight DC offset produced by each microphone.


Figure 2.2. PreAmplifier Schematic with LM324

The configuration for preamplifiers with a LM324 opamp.

Putting it all Together

The next step is putting everything together. We built the perimeter array of microphone in a

shallow box. To make a perimeter array we put 8 microphones in a pattern around the edge of a

square 5.5 centimeters apart. The preamps and batteries were housed under the box, with the

outputs coming from the side to connect to the DAQ card.


Figure 2.3. Microphone Perimeter Array

The box housing the perimeter microphone array and preamps.

2.3. Labview Implementation of 2D Array Delay and

Sum Beamformer*


Our project involves using a two-dimensional array of microphones to determine the direction from which a mystery signal comes. This involves taking data from the microphones, doing analysis of the data, and then outputting the results. The second step we chose to implement in

Labview. In order to interface correctly with the DAQ card we had available, we used Labview


The labview implementation of our project involved several stages: First, we wrote a vi designed

to get the input from the microphones by sampling the eight inputs to the DAQ card, and separate

the resulting data into eight arrays, each holding a digitized signal. We then upsampled each

array (using a separate vi for that purpose) and passed the upsampled signals to the main analysis



The main analysis vi tests three of the signals by taking two of them and testing them individually

against the third. This test involves delaying the two signals and taking the norm of the delayed

signal and the third signal. Each norm is collected in an array, from which the max norm --

correspondent to the correct delay between the two signals -- can be found.

If we know the correct delays between three signals, we can do some mathematics (explained in

greater depth in the section on the delay generation vi) and derive the angles the signal is coming from.

As a two-dimensional array has the ability to discern a point in three-dimensional space, there

are two angles found here: theta, the angle along the xy plane, and phi, the angle relative to the z axis.

From the angles found, we can then calculate the appropriate delays to be applied to the other five

signals. Finally, we take all eight signals, delay them appropriately, and add them together to get

our final result. This is known as delay and sum beamforming.

Waveform Generation VI

Most of the work here was done for us already, through Labview's Generate Waveforms VI, a

module that, given certain information about an attached DAQ card, sampling rate, time to be

sampled, etc., will seek out that DAQ card, sample the requested channels, and return the results

in a two-dimensional array of doubles, where one dimension corresponds to the sample of the

signal at one particular point in time, and the other to which channel sampled from.

Figure 2.4. Waveform Generation VI

VI we created to sample the microphones and upsample the resulting arrays

Our module took the data from said VI and separated it into eight one-dimensional arrays, one for

each microphone. (This was an essential step, as many of the array analysis functions that we

wished to use would only work with one-dimensional arrays.) Using our Upsampling VI

(discussed below), we then upsampled the signals, lowpass filtered them to interpolate the signal, and set the eight filtered and upsampled signals as the output of this VI. This module takes as an

input N, the amount by which the signals should be upsampled, and an input fs, the sampling


Upsampling VI


From the beginning, we knew that there would be restricted sampling rate of the DAQ card, and

the buffer would be effectively decreased by a factor of eight for any one signal (since the data

from all eight signals comes into the same buffer). Whatever sampling rate was left would meet

the most basic Nyquist requirements and avoid aliasing in that fashion; however, the resultant

signal was unlikely to possess much resolution beyond that. Thus, upsampling would be a


We initially searched Labview itself for a premade upsampling VI, presuming that one would

exist, as it is a fairly common signal processing algorithm. However, we were unable to find one

and so set about creating a module that would do the job. Our module takes as inputs the signal

(array of points) to be upsampled and N, the amount the signal was to be upsampled, and passes as

an output the upsampled signal.

Figure 2.5. Upsampling VI

Sub-VI used to upsample a signal

Following upsampling theory discussed in class (ELEC 301: Signals and Systems), the first step to our upsampler was to zeropad, that is, add zeros in between each point on the signal being

upsampled. Instead of attempting to implement a dynamic array, this was accomplished by

creating a new array of the appropriate length (N times the length of the original array, where N is

the amount the signal is being upsampled) and using a for loop to place the original signal

elements into the new array spaced N points apart.

This enlarged array of data is then passed back to the Waveform Generation VI where it is lowpass filtered in order to fill in ( interpolate) the new zeroed out positions, and passed onward as an output of the Waveform Generation VI. The filter used in this operation is the Equi-Ripple

FIR low pass filter.

Delay Generation VI

This VI does the bulk of the mathematical analysis of the input signals. It takes as inputs the two

delays between microphones one and two, and one and four (derived from the calculations of max

norms in the Main Analysis VI) and outputs an array that contains theta, phi, and the corresponding delays for the seven microphones (the delay of the first microphone is

automatically set to zero). In all cases, the delays are scaled to correspond to the number of

indexes the corresponding signal should be shifted, instead of the actual real-time delay. (As we

cannot shift a signal by fractional indexes.)

d12 = k12 / (fs * N);

d14 = k14 / (fs * N);

phi = acos( sqrt (v^2 * (d12 ^ 2 + d14 ^ 2) ) / d ) * sign (d12 * d14);



theta = atan (d14 / d12);

d13 = d12 * 2;

d16 = d14 * 2;

d18 = d13 + d16;

d15 = d13 + d14;

d17 = d16 + d12;

The first part of the above code is calculates the angles based on the spatial relations between the

three microphones (microphone 1, used, as we said before, as the origin, and microphones 2 and 4,

which can be found directly adjacent to microphone 1 in both directions). As you can see, it is

fairly simple geometry, complicated primarily by the scaling necessary to match the 'k' values

(integer values used to iterate the for loop) to their corresponding 'd' value (actual delay in time).

The second part of the code uses the angles mentioned above to calculate the delay values,

although again due to the regular nature of our array, it is possible to calculate only two of the

delays outright and extrapolate the rest of the delays from those two. (Which is indeed what we

have done in an effort to reduce calculations and make the algorithm more efficient.)

The final part of the code, not shown in the code above but which can be seen in the function node

in the figure below, involves the recipropcal of those first two lines; that is, rescaling all the 'd'

values found to 'k' values that can actually be used when shifting the signals prior to adding them


Figure 2.6. Delay Generation VI

Sub-VI that, given the values 'k12' and 'k14', will generate the shifts (in indices) of the seven microphones (with the shift of microphone 1 assumed to be zero) and the angles theta and phi that the signal came from.

Main Analysis VI

This is our top-end module, where all the modules mentioned in the previous section are brought

together in the same vi and linked together in the proper ways so as to create a working project.

Figure 2.7. Main Analysis VI

The culmination of our struggles with Labview 5.1, our top-end module which does ... well ... everything.

First, not unexpectedly, there is a call to the Waveform Generation VI, which provides us with our collected and upsampled signals. From that sub-VI, the signals from microphones 1, 2, and 4

are taken, microphones 1 and 2 passed to one for loop and 1 and 4 passed to the other. Within the

for loop, as mentioned before, one signal is shifted relative to the other, and the norm taken, for

all delay values possible. The result of this is concatenated into an array, the maximum norm


found, and from the location of the maximum norm, the value of the delay, or as close as we can

get with the sampling resolution we have.

These shift values (the integer index corresponding to as close as we can get to the ideal time

delay) are passed to the Delay Generation VI, which then returns an array of values. The theta and pi values function as outputs to the front panel, and then the delay (shift) values are used to

set the necessary shift for their corresponding microphone. Finally, the shifted output arrays are

all summed (using a for loop, as a point by point summing module also seemed to be among those

useful things not premade in Labview 5.1), and the output of the for loop, the array that is the sum

of all the previous ones, is then attached to a waveform graph, also on the front panel.

Figure 2.8. An Example Result

As the titles state, the upper waveform is that of the first signal (unmodified in any way), and the second that of the final, delayed and summed signal. Note how the latter signal is somewhat smoother and the noise level reduced in comparison to the signal itself (a series of claps). The two numbers at the bottom correspond to the computer's calculation of what direction the signal is coming from.

Phi is measured such that straight up is at zero, along the xy plane at 90 degrees. Theta is

measured with the "bottom" of the array (although it can of course be reoriented as the user

pleases), that is, the negative y direction, as zero degrees. The signs of the angles indicate the

direction of propagation of the wave, and are thus opposite to conventional intuition, and the

sign of phi is, of course, impossible to determine with any degree of accuracy due to the up-

down ambiguity inherent in a two-dimensional array.

Success! (For a deeper exploration of our results, please continue to the results module Labview Code

Upsampling VI

Waveform Generation VI

Delay Generation VI

Main Analysis VI

2.4. Results of the Testing of 2D Array Beamformer*




In this section, we discuss the results that we received upon testing our two-dimensional array

delay and sum beamformer. The output we designed for our system was relatively simple: the waveform for the delayed and summed signal, the waveform for the first signal (displayed for

comparison purposes), and the calculated values for theta and phi, the two angles used to express where in space our signal could be found.

Figure 2.9.

Our first example of successful output.

Note how the delayed and summed signal, as it should, bears a striking similarity to the initial

signal. However, due to the nature of the delay and sum algorithm, the final result has a great deal

higher magnitude, and the signal-to-noise ratio is lower. This is due to the fact that the signal

contributes the most to the inner product calculations done to find the proper delays, and thus the

signal is matched up properly; the noise, on the other hand, being essentially random, is additive

in some places and destructive in others, leading to an overall relative decrease in noise.

Figure 2.10.

A similar example, with a somewhat different signal

This signal, as can be seen, was somewhat closer to zero degrees and a bit higher off the xy plane

(smaller phi) than the previous example. We noticed rather quickly that, although it is a relatively

simple matter to vary theta (as that just involves moving around), it is less simple to gain

significant variation in phi: at far field distances, one must get a great deal higher off the ground

before the angle of the signal reaching the array changes significantly.

As it is, this particular phi value (at a significantly higher angle than before), was accomplished

by coming a great deal nearer to the array -- thus bringing in possible complications due to the

fact that the signal source could now be reasonably considered as near field. However, even for

middling-near field sources, the far field approximation still holds to a certain extent, as can be

shown by the accuracy this data maintained.

Figure 2.11.

Did we mention we happened to chose the computer nearest to a loud fan?

Speaking of noise ... due to the existence of a relatively noisy fan quite near to our work area, the