- Master Thesis in Computer Science -
Audio Glasses
Visualizing sound for the deaf

SUMMARY
This thesis describes the first steps in the development of a visual representation of auditive information for the deaf. With this, deaf people can 'see' sounds, and even understand speech. Although this thesis intends to describe only the first steps, small-scale experiments have already shown that the concept works, and people can recognise many features of sounds, including slowly spoken speech, after only a few minutes of training. It's far less radical than a cochlea implant - and it's very easy to replace as technology gets better.
If you are not interested in the mathematical background, just skip chapter 2.
If you lack the necessary mathematical background, you can just read the rest of the text and view the images. Especially the images will clear up a lot.
To get an overall idea of what is done here, view the AVI-files below. Put the two legenda-images on your screen, and with that try to interpret each image in the AVI-files.

DOWNLOAD THESIS AS A POSTSCRIPT FILE (zipped, 4.3 MB)
(PREFERRED) - Warning: Size after unzipping is about 160 MB
or
READ THESIS ONLINE (HTML version)

Symbols in the HTML version may be displayed incorrectly on some systems.
If you are capable of viewing PostScript files, download the PostScript version!

The HTML-file was automatically generated from a LaTeX-file by the program TTH, version 2.24.

DEMO .AVI FILES
RINGING DOORBELL
VOICE SAYING "SHT!"
VOICE SAYING "OI UA"
VOICE SAYING "ONE MAN LIVING ON AN ISLAND"
INSTRUMENTAL MUSIC Not available online due to copyright issues

In order to understand these AVI films, take a look at the legenda:
PART 1 and PART 2.


DOWNLOAD THE SOFTWARE
MS-DOS executables
C-sources (tested for DOS, Unix and Linux with GNU C-compiler).

Please send your comments to hans.van.zutphen@home.nl


drs Hans van Zutphen
May 23, 1999


Vision on Sound

Development of a visual representation of auditive information for the deaf

MASTER THESIS IN COMPUTER SCIENCE, NUMBER 446

SEPTEMBER 11, 1998

logo.gif

Supervisor: Dr.Ir. Huub van Thienen
University of Nijmegen, the Netherlands
Hans van Zutphen
9305491
hansvz@sci.kun.nl

Contents

1  Introduction
    1.1  Defenition of the goal of the project
    1.2  Connections to other research areas
    1.3  Thesis build up
2  Finding frequencies in a signal
    2.1  The discrete Fourier Transform
    2.2  A continuous variant of the discrete Fourier Transform
    2.3  Determining the frequencies in the signal
    2.4  Differences between Fourier Transform and total peaks
    2.5  Suggestions for further research
3  Signal display
    3.1  Spiral display
    3.2  Block display
        3.2.1  Finding the frequencies that make a tone
            A first step
            The basic idea
            The elaborate idea
        3.2.2  The block display
    3.3  Line display
    3.4  Examples
    3.5  Suggestions for further research
4  Conclusions
A  Algorithms in pseudo code
    A.1  Transformation: Sample to Circular Fourier Trans -form
    A.2  Transformation: Circular Fourier Transformed to partial peaks
    A.3  Transformation: Partial peaks to total peaks
B  A sample implementation
    B.1  File formats
        B.1.1  The audio file format (.au)
            File buildup
            Example file
        B.1.2  The Circular Fourier Transform file format (.cf)
            File buildup
            Example file
        B.1.3  The partial peaks file format (.pp)
            File buildup
            Example file
        B.1.4  The total peaks file format (.tp)
            File buildup
            Example file: .tpa file
            Example file: .tpc file
            Example file: .tpf file
        B.1.5  The tone file format (.tn)
            File buildup
            Example file
        B.1.6  The block positions file format (.bp)
            File buildup
            Example file
        B.1.7  The pixels file format (.px)
            File buildup
            Example file: .pxs-file
            Example file: .pxb-file
        B.1.8  The wave file format (.wav)
            File buildup
            Example
        B.1.9  The image file format (.tga)
            File buildup
    B.2  Forward transformations
        B.2.1  Wave to audio format (wav_au__)
            Usage
            Examples
        B.2.2  Wave to audio format (au__cf__)
            Usage
            Examples
        B.2.3  Fourier Transform to partial peaks format (cf__pp__)
            Usage
            Examples
        B.2.4  Circular Fourier Transform to Fourier total peaks format (cf__tpf_)
            Usage
            Examples
        B.2.5  Partial peaks to added total peaks format (pp__tpa_)
            Usage
            Examples
        B.2.6  Partial peaks to close total peaks format (pp__tpc_)
            Usage
            Examples
        B.2.7  Total peaks to spiral pixels (tp__pxs_)
            Usage
            Examples
        B.2.8  Total peaks to tone format (tp__tn__)
            Usage
            Examples
        B.2.9  Tone to block positions format (tn__bp__)
            Usage
            Examples
        B.2.10  Block positions to block pixels (bp__pxb_)
            Usage
            Examples
        B.2.11  Pixels to Targa files (px__tga_)
            Usage
            Examples
    B.3  Backward transformations
        B.3.1  Tone to total peaks format (tn__tp__)
            Usage
            Examples
        B.3.2  Total peaks to audio format (tp__au__)
            Usage
            Examples
        B.3.3  Audio to wave format (au__wav_)
            Usage
            Examples
    B.4  Directionless transformations
        B.4.1  Add total audio files (add__au)
            Usage
            Example
        B.4.2  Smooth borders to audio files (brdr_au)
            Usage
            Example
        B.4.3  Add constant value to audio files (incr_au)
            Usage
            Example
        B.4.4  Multiply samples by a constant value (mult_au)
            Usage
            Example
        B.4.5  Add total peaks files (add__tp)
            Usage
            Example
        B.4.6  Change pixel brightness (brig_px)
            Usage
            Examples
        B.4.7  Decrease pixel brightness (decr_px)
            Usage
            Examples
        B.4.8  Limit pixels (limi_px)
            Usage
            Examples
        B.4.9  Move pixels (move_px)
            Usage
            Examples
    B.5  Functions
        B.5.1  Number of pixels with a brightness above a certain value (getabopx)
            Usage
            Examples
        B.5.2  Maximum pixel brightness value (getmaxpx)
            Usage
            Examples
        B.5.3  Number of pixels (getnumpx)
            Usage
            Examples
    B.6  Scripts
        B.6.1  VOS Script (v_script)
            Usage
            Example
            Script file layout
        B.6.2  Standard scripts
        B.6.3  Useful other programs
            Calculation programs
            String and line-variable handling programs
            Possibilities of script files
Index

Chapter 1
Introduction

All over the world, there are people who are deaf, and who can therefore hear nothing or almost nothing. They face all kinds of problems in society. They are not only unable to understand what others say, which can be partially solved by using sign language or lip reading, but moreover they cannot hear anything that happens around them. Squeaking breaks, tooting cars, sirenes, bicycle bells, a shouted warning if something happens, a sounds that indicates that there is someone nearby; none (or almost none) of these sounds are noted by someone who is deaf. For door bells, cooking alarms, (text-)phones and other signals in the house, the same is the case.

Several solutions for some of these problems have been developed. For example, there are devices that show a signal as a light that can be placed somewhere in the house. But this is only possible if the source of the signal can be tapped from the source so a light can be connected to it. It is not a solution if someone is in another house or on the street.

The consequences of this are that, in spite of these solutions, people with hearing problems still face more danger on the street. Also, as mentioned above, it is still impossible for them to hear whether there is someone close to them, which can make it frightning to go outside, especially when it's dark. In some countries deaf people have even been shot by the police, because they did not respond after something was shouted at them.

To solve these and other problems, a different approach is necessary. A solution that is already available is surgery: A special device is 'plugged' into a part of the brain. The idea is that a part of the brain starts recognising the incoming signal. For this, a five-year training is necessary, during which the patient learns to observe things like the difference between music and speech - without being able to recognise much of it. Very few of such operations are performed. The mere fact that people are willing to have an operation like this and to train for five years, shows how important even a small improvement in registering sounds is to them - and how poorly other solutions perform.

In this thesis, a way of displaying auditive information (sounds), such that as much information as possible about the original signal can be seen in it, is developed. Eventually this could be placed in for example a pair of glasses, or in a watch or bracelet. Hopes are that it is possible to learn to recognise the images in the display in the same way as this is done after the surgery. The results might be better than with the surgery since the user can also try to understand what he or she sees, and several displays can be shown simultaneously. But even if the results are a lot worse, the threshold to use it will be much lower than that to have surgery, which gives this an important advantage over the other method.

1.1  Defenition of the goal of the project

The eventual goal of the project that this thesis is about, is to develop a software package which enables researchers to try different ways of processing and displaying audio signals, the result of which can be shown to test persons, who can then tell which combinations of processing and displaying methods they feel to be most pleasant and informative. In order to reach this goal, some processing steps and display variants need to be developed first. The development of these is also described in this thesis.

Since the software that is to be written is used for evaluation purposes only, some real-time and other requirements are left out in this thesis. If the software is ever to be used in a real environment, time and other constraints need to be added.

1.2  Connections to other research areas

Except for computer science and digital signal processing, this project has connections to the following research areas:

1.3  Thesis build up

This thesis is build up as follows:
In chapter 2, a new method of analysing an audio signal is developed, which has some advantages over some widely used methods.
In chapter 3, a number of different ways to display the results of the analysis above are reviewed. Also an algorithm is developed which finds which frequencies belong to the same tone (where a tone is defined to be a base frequency together with all its harmonics).
These two chapters can be read separately.
In appendix A, the algorithms that were developed in the previous chapters are given.
Finally, appendix B contains a description of an implementation which can, among others, perform the entire analysis and create different displays of the results.

Chapter 2
Finding frequencies in a signal

In order to be able to display frequencies, it is of course first necessary to find these frequencies. There are various methods to do this. These methods usually perform very good for analyzing and transforming back to the original form. This is however no guarantee that the result of this analysis is very good for displaying.

A complication is that sounds can change very rapidly. Therefore it is necessary to analyze very short pieces of sound, for example pieces of 0.02 seconds. Unfortunately, according to the uncertainty principle ([Pol97]), there is always a tradeoff between the precision with which a frequency can be determined and the precision with which the time at which that frequency occurs can be determined. Since the size of the signal intervals is very small (for example 0.02 seconds), the time at which a frequency occurs is determined with a rather high precision. Therefore, the precision with which the frequency can be determined is rather small.

In this chapter, a new signal-processing technique is developed from the well-known Fourier Transform ([Pee95], [Sch94]). The target of the Fourier Transform is to transform the signal to a different space (frequency space instead of time space ), with the possibility of a backward transform back to time space. Note that the latter is not important for displaying the signal in frequency space. Therefore, the analysis that is to developed here does not need to be reversible.

The required properties for the analysis that is to be developed here, are:

To develop a new analysis with these properties, it is necessary to use a different description of the Fourier Transform than what is usually used. This description will be given in the next section.

2.1  The discrete Fourier Transform

Usually, the discrete Fourier Transform of a discrete function f is given by a formula:
F(u) = ån = -¥¥ f(n)e-i   2p  u   n, with i = Ö[(-1)] ([Sch94]).
To be able to derive a new analysis that fulfills the required properties from this, it is easier to use a different description. The discrete Fourier Transform can also be described as follows.

In the following description, it is assumed that there is a finite signal interval , that is a finite sequence of samples, of an even size n. An often used solution to calculate the Fourier Transform of such a signal is to consider it as being periodical (each position p + n corresponds to position p). This way, a finite signal interval can be seen as an infinite periodical signal with period n.

For each value xm = n/m,   m Î 01, 1, 2, .. 1/2n, do the following: Step through the signal with step size 1. Also, rotate around the circle, with steps of [(2p)/( xm)] for each step through the signal. For each sample in the infinite signal and corresponding direction to a position on the circle, draw a vector from the centre of the circle to this direction with as strength the sample value at the current position in the signal. For x1 this means that there is exactly one rotation around the circle for any signal interval [p, p+n  (which consists of n elements). Therefore, the direction of the vector for position p+n of the signal is always identical to that for position p. For x2 the same goes for any partial signal interval [p, p+n/2  (so for values p + 1/2n and p); for xm this is (approximately, see below) true for any (partial) signal interval [p, p+n/m .

omcirk1.gif
Figure 2.1: A signal interval is wrapped around the circle in one rotation. Left a signal with a sinusoid with one period, right a signal with one with 21/4 period.


Note that partial intervals do not always overlap precisely. Small shifts for interval parts [p, p + n/m  (n and m as defined above) exist if n is not dividable by m. In such cases, for each a Î N, the direction for position p+[(an)/( m)] of the interval on the circle is always in [p + ë[(an)/( m)]û,   p + é[(an)/( m)] ù].2 If [(an)/( m)] is a whole number, the direction for position p corresponds precisely with that for p + [(an)/( m)]. For the total interval Èi = 0m - 1 [p+[(in)/( m)],   p+[((i+1)n)/( m)] > = [p, p+n , [(an)/( m)] is n (since a = m), which is a whole number, so for any valid value of m, the vector for the value at position p+n of the signal interval is always identical to that for position p.

For any value xm = n/m,   m Î 0, 1, 2, .. 1/2n, the vectors for the signal interval [p, p+n  are placed in exactly m rotations around the circle.

For m = 1/2n, each subsequent vector for the corresponding subsequent sample from the interval is directed exactly opposite to the preceding one. (x1/2n = 2, so the rotation per step is [(2p)/ 2] = p, exactly a half circle.) If m is larger than 1/2n, that would mean that the size of the rotation around the circle would in fact get smaller, and in the opposite direction. The values that would be obtained that way are in fact identical to those that are already known from smaller values of m, so it is not necessary to try larger values than 1/2n. This corresponds to the Nyquist Theorem 3 ([RMY]).

Instead of drawing vectors, which can get very complicated, the idea can be displayed graphically in another way (although the idea stays the same). Imagine the signal interval to be a piece of elastic . Assume for a moment that the values of all samples in the sequence are between -1 and +1. Draw each sample value on the elastic. Use white for +1, black for -1, and levels of gray for values in between. Now, the light values on the elastic form an image of the sample values. Next, the elastic can be wrapped around the circle, as is displayed (for one rotation) in fig. 2.1. Note that in this image, the idea of considering the interval as periodical is expressed very clearly. Multiple rotations can be obtained by stretching the elastic further. To keep the image clear, each subsequent rotation is drawn a bit more to the outside, so the rotations can be separated from each other. (It might seem that the interval is shown as being limited and non-periodical. In fact all rotations should be projected at the same place, in which case the starting point and the end point of the interval would connect.) An example of this is shown in fig. 2.6.

Note that in the images values on the piece of elastic are shown continuously, while there should be a finite number of vectors. This is a simplification for clarity of the images; in calculations and formulas a finite number of vectors is used.

Since the piece of elastic stands for a number of vectors that originate from the centre of the circle, it is possible to determine the centre of gravity of these vectors. This is defined here as the average of the vectors (the sum of all vectors divided by the number of vectors). In the images this can be determined by the amount of light and darkness that is present in each direction in the rotations of the piece of elastic. White stands for a vector in the current direction, black stands for a vector in the opposite direction, gray stands for a zero-vector. In the images the centre of gravity is drawn as a little ball with a circle around it. The position of the centre of gravity in the drawn coordinate system for m rotations is equal to the complex value of the mth value of the Fourier Transform of the signal. The distance to the centre of the circle corresponds to the amplitude of the corresponding frequency in the signal. Both the distance to the centre and the direction from the centre can usually be determined very easily for simple signals.

In figures 2.2 - 2.9 is shown how the first value of the Fourier Transform (with m = 1) can be calculated. For other values of m it is a little more complicated (fig. 2.6), but the principle stays the same. In all the images, for the stretched signal (left) and the signal as it is wrapped around the circle (right), the arrow starts and ends at the same position.

In the description given so far, a very important simplification is possible. So far, the signal was assumed to be infinite and periodical, and the number of rotations around the circle was also infinite (although only a part of it was displayed). It can be easily seen that the result is exactly the same if instead of an infinite signal, only a finite signal interval [p0, p0+n  is wrapped around the circle. This follows from the fact that for each xm, the vector for the value of each position p is identical to that for position p + n. By induction this is also identical for each position p + r*n, r Î N. The centre of gravity can therefore also be determined by wrapping only one period around the circle. This simplification also follows from the Fourier-theory, and is used for example in programs that calculate the Fourier Transform of a signal of a finite size.

In the images, a number of properties can be seen.

xfou-a.gif
Figure 2.2: A signal with one wave: Maximal deflection

Figure 2.2 shows that if a signal exists of exactly one wave 4 , this wave will produce a maximal deflection (the distance between the centre of gravity and the centre of the circle) for one rotation.

xfou-b.gif
Figure 2.3: A larger number of waves: No deflection

xfou-c.gif
Figure 2.4: A larger number of waves: No deflection


Figures 2.3 - 2.4 show that this is not the case if the signal is a sinusoid with a (whole) larger number of waves.

xfou-d.gif
Figure 2.5: A wave with a different phase: The same deflection, a different direction


In fig. 2.5 is shown, that when the wave starts at a different position (so if the start phase is different), the deflection has the same strength, but a different direction. The direction of a value in the Fourier Transform is the phase angle of the starting point of corresponding wave in the interval. Note that this means that for determining the phase angle, the position at which the interval starts is important.

xfou-g.gif
Figure 2.6: Two waves, two rotations: A deflection identical to that of fig. 2.2

xfo1.gif
Figure 2.7: Some combinations of numbers of waves and rotations


In fig. 2.6 is shown what happens when the number of rotations around the circle is identical to the number of waves within the interval. This is shown more elaborately in fig. 2.7. In 1B and 2B is shown how the signal is wrapped around the circle when the number of rotations is identical to the number of waves within the interval. Clearly, the light and dark areas are at the same direction for each rotation, so they all add up. Therefore, the deflection in this case is always maximal. In 1A and 2A is shown what happens when the number of rotations around the circle is one less than the number of waves in the interval. As will be described in more detail below, each subsequent wave will be projected [1/( number of waves)] rotation earlier, which accounts for the fact that in the images the white and black areas shift in each rotation, while the total shift is one rotation (the end matches the beginning), contradictory to the rotation direction. (If m waves are displayed in m-1 rotations, each wave's size is [(m-1)/( m)] rotations. Each wave therefore gets 1/m rotations less than an entire rotation. There are m waves, so the total amount of missed rotations is m * 1/m = 1.). In every direction, the amount of white and black is identical. The deflection is therefore 0. In 1C and 2C is shown what happens when the number of rotations around the circle is one less than the number of waves. In this case, every subsequent wave is projected [1/( number of waves)] rotation later, which accounts for the fact that in the images the white and black areas shift in each rotation, while the total shift is one rotation (the end matches the beginning), but this time in the direction of the rotation. Here also the amount of black and white in each direction is the same, so the deflection is 0 again. (The explanation is almost identical to that of 1A and 2A.) In 1D and 2D is shown what happens when the number of rotations around the circle is two less than the number of waves. In this case, each subsequent wave is projected [2/( number of waves)] rotations later, which accounts for the fact that in the images the white and black areas shift in each rotation, while the total shift is two rotations (the end matches the beginning again, but so does the centre if the number of waves is even (2D)), as before in the direction of the rotation. (The explanation again is almost identical to that of 1A and 2A.) As described before and visible from the images, for each number of rotations that is not equal to the number of waves (if both are whole numbers) is 0.

xfou-e.gif
Figure 2.8: The sum of two signals: The deflection is the sum of the deflections of the separate signals (see fig. 2.2 and 2.4)


Figure 2.8 shows what happens when several signals are added . In 2.8, the signals from fig. 2.2 and 2.4 are added. The centre of gravity is the sum of the centres of gravity of the two separate signals. This means that adding a signal with centre of gravity 0 has no effect.

xfou-f.gif
Figure 2.9: A number of waves that is not a whole: The signal is smeared.


Figure 2.9 shows one of the problems of the Fourier Transform . Only frequencies in the signal that are such that the number of waves in the interval with size n is a whole number, correspond with a value of the Fourier Transform. This means that only in these cases, a sinusoid in the signal is represented by only one non-zero value in the Fourier Transform. If the number of waves is not precisely a whole number, a sinusoid is represented by multiple non-zero values in the Fourier Transform. Also, these multiple non-zero values have smaller absolute values than the one non-zero value in the first case. A sinusoid with a number of waves that lies between m and m+1 will cause lower values for xm and xm+1, instead of one higher value at xm or xm+1 if the number of waves would have been exactly m or m+1. Also, for all values xi, the closer i is to the number of waves, the higher the corresponding value in the Fourier Transform is. Therefore a rather large area of the Fourier Transform around the number of waves will have non-zero values that are rather high (this will be referred to as smearing ). This means that there can be huge differences between Fourier Transforms of almost identical signals. For example, a signal with exactly 2000 waves looks totally different from a signal with 2000.5 waves. A signal with 2001 waves looks almost identical to that of 2000 waves. This does not correspond at all to the way the human ear experiences these sounds, so this is not desirable.

2.2  A continuous variant of the discrete Fourier Transform

After comparison with other methods, it turned out that the result of the Circular Fourier Transform that is developed in this section is equal to that of the Short Term Fourier Transform . More information about this transform can be found in [Pol96]. The deduction is shown here anyway, since many useful characteristics of this transform are described in it.

The latter problem can be solved very easily. Instead of using m = 0,   1,   2,   ...,   1/2n rotations around the circle, it is also possible to decrease the differences between subsequent values for m. Let a  Î   \ < 0,   1 , m = 0,   a,   2a,   ...,   1/2n, then it is possible to see between the values with a whole number of rotations. In the example from the preceding paragraph: For a signal with 2000.5 waves, the value for m = 2000.5 is identical to the value for m = 2000 for a signal with 2000 waves. If a¯ 0, the Fourier Transform becomes continuous. This will be referred to as the Circular Fourier Transform ((F)).

1bp-0.gif
Figure 2.10: Circular Fourier Transform


Note that constructing this Circular Fourier Transform is not the same as making the Fourier Transform continuous. The construction that is used here, is made possible because of the simplification that was given in the previous section: Instead of an infinite signal, only a signal of a finite size (the interval [p, p+n ) is wrapped around the circle, which makes the number of rotations finite. Without this simplification, for any number of rotations (for the interval) that is not a whole number, each subsequent repetition of the (periodical) signal would be projected shifted 2p*(m-ëm û) radials further around the circle, where m is the number of rotations for the interval with size n. Due to this shifting and the fact that the number of rotations is infinite, the centre of gravity would always be 0. Only when 2p*(m-ëm û) = 0, so when m is a whole number, each subsequent interval [p+n,   p+2n  is projected exactly on the previous interval [p,   p+n , so the centre of gravity is not drawn back to 0. If the Fourier Transform was made continuous this way, any value between the original Fourier Transform values would be 0. Note that the simplification does not correspond to the definition of the Fourier Transform. Therefore, the construction of the Circular Fourier Transform implies that the definition of the Fourier Transform is let go5, and instead a simplified model of the discrete Fourier Transform on a finite signal is extended to a continuous transform.

A possible algorithm for calculating the Circular Fourier Transform is given in appendix A.1.

If a Circular Fourier Transform of a signal consisting of one sinusoid is drawn in a graph, another property becomes visible. Let f0 be the frequency of the sinusoid (in number of waves per signal interval with size n). In the graph (fig. 2.10), there is not only a large peak at f0, but there are also smaller peaks at all positions with distances 11/2, 21/2, 31/2, ... from f0, so at all positions f0 + (1/2+ b) and f0 - (1/2+ b), b Î N+. At positions f0 + b, b Î Z\{0} (so at all distances 1, 2, 3, ... from f0) there are lowest points , with value 0.

The lowest points can be explained as follows. Let f1 be a frequency, such that the peak for this frequency falls exactly on a value in the Circular Fourier Transform. (In fact, this can be any frequency within the range of the Circular Fourier Transform, if a is small enough.) For the value xm which holds this peak, m is identical to the number of waves of this frequency within the interval with size n. Each wave is projected around the circle the same way m times, which causes a strong deflection. (Note that if m is not a whole number, the number of rotations for a part of the circle, and therefore the number of vectors in that direction, is one more than that for the rest of the circle.) So, each wave is projected in exactly one rotation around the circle. Now consider the situation for m + b, b Î N+, with m the number of waves, m + b the number of rotations for the interval. The projection of one of the m waves in m + b rotations for the entire interval now takes [(m+b)/( m)] = 1b/m rotations. Each of these waves now costs b/m rotations more than before. (This value will be referred to as wave-deviation ). This means that any subsequent wave will be projected around the circle shifted b/m (the wave-deviation) rotations further. Since there are m waves, the total wave-deviation is m * b/m = b. Therefore the m waves are spread equally around the circle. These waves extinguish each other completely, making the centre of gravity 0. For some values for m and b this is shown in fig. 2.7. Here, the number of waves corresponds to m, the step size or number of rotations corresponds to m + b. Note that for any value for b ¹ 0, the centre of gravity is 0, since for any value for b the difference between the number of waves and the number of rotations is a whole number.

This explanation is also valid for values of b below zero, in that case with a wave-deviation in the opposite direction. From the description, the value m corresponds to f0 in the graph. Therefore, position m + b is shown in the graph as position f0 + b.

Now it also becomes clear why the centre of gravity is non-zero for the other positions. For example for b Î N+, at positions f0 + b and f0 + b + 1 the projections of the waves extinguish each other completely. On the positions in between, this cannot be the case, since the total wave-deviation b is not a whole number. For example for b = 11/2, the signal will have a total wave-deviation of 11/2. If the interval is projected until the total wave-deviation is a whole, the centre of gravity is zero. But at this point, only two thirds of the interval is projected. The last part has a wave-deviation of only 1/2. The centre of gravity becomes non-zero, since it is not extinguished to zero by a following part which would match precisely at the end of the chosen interval, which would make the wave-deviation 1 again.

The graph 2.10 which shows the absolute value of the Circular Fourier Transform of a signal consisting of one sinusoid that is derived here can be found in many texts on the Fourier Transform ([Sch94], [Pee95]). It corresponds to the Fourier Transform of a signal that exists of two values: one non-zero value from position 0 until position p Î R+, 0 after this position, for example the step function
h(x) = { A     x £ p
0     x > p

The absolute value of the Fourier Transform is |H[n]| = Ap|[(sin npp)/( npp)]|, n Î N.

Function h(x) can also be considered as a finite function ranging from position 0 to position p, consisting of a sinusoid with frequency 0, phase (for example) 1/2p. As the function H[n] is shifted to the right, the frequency within the finite part of the corresponding function h(x) goes up (as always in a Fourier Transform). By shifting H[n] pq position to the right, the frequency goes up such that the number of waves within the finite signal increases with q waves. Therefore, each sinusoid within this finite signal can be described by shifting the function H[n] to the right p times 'number of waves' positions. The peak will then be at position p times the number of waves.

From combining this with the previously calculated graph (fig. 2.10), it follows that the absolute value of the wave-deviation-deflection , the deflection caused by the wave-deviation, can be calculated as follows ([Sch94], [Pee95]):

wave-deviation-deflection |g| = |[(sin bp)/( bp)]|

Note that the limit for b ® 0 can be calculated:


lim
b ® 0 
| sin bp
bp
| = 1. \pagebreak
Therefore the fact that the value at this point cannot be calculated will further be ignored, and the formula will be used without mentioning this.

If complex values are used instead of absolute values, the following formula can be found ([Sch94], [Pee95]):

g = [(b)/( |b|)] |[(sin bp)/( bp)]|cos((b-|b|)p- j) + i[(b)/( |b|)] |[(sin bp)/( bp)]|sin((b-|b|)p- j), (1)
with g = 1 iff b = 0.

The special case b = 0 will not be explicitly stated further on. The proof that this formula describes the wave-deviation-deflection accurately is omitted, since, as is shown later, this is not the case.

In fig. 2.11 the relation between wave-deviation b and wave-deviation-deflection g is shown.

disstel3.gif
Figure 2.11: The relation between wave-deviation b and wave-deviation-deflection g.

xfo2.gif
Figure 2.12: The origin of wave-deviation-deflection

xf-f.gif
Figure 2.13: Smearing of a peak in the Fourier Transform.


The smearing of a peak in a Fourier Transform can now easily be explained (fig. 2.13): If a sinusoid has exactly a whole number of waves m within the interval with size n, only the mth value of the Fourier Transform is non-zero, since all other whole step sizes correspond to a lowest point with value 0 (m + 1, m - 1, m + 2, m - 2, ...). (Note: The Fourier Transform consists of only those 1/2n values that correspond to the values of the Circular Fourier Transform that are calculated with a whole number of rotations.) If a peak is precisely in the centre between two subsequent Fourier-values, the effect of the smearing is maximal, since the surrounding Fourier-values match precisely with the surrounding peaks (m + 1/2, m - 1/2, m + 11/2, m - 11/2, m + 21/2, ...).

2.3  Determining the frequencies in the signal

Eventually, the information that is wanted is which frequencies are present in the signal. In the Circular Fourier Transform, each sinusoid in the signal is represented by a main peak , with the top at the value that describes that sinusoid, surrounded by smaller peaks. Eventually, for each sinusoid in the signal, only its frequency and amplitude (and if possible, its phase) should remain. There are several methods which come close to this. Some of these will be discussed here, and the best method will be chosen.

The previously defined wave-deviation-deflection g (1) is in fact a function of the wave-deviation b:

gj(b) = [(b)/( |b|)] |[(sin bp)/( bp)]|cos((b-|b|)p- j) + i[(b)/( |b|)] |[(sin bp)/( bp)]|sin((b-|b|)p- j) (2)

Let s be a sinusoid with amplitude As, number waves in the interval with size n ms (ms ³ 0). The main peak now lies at ms. The deflection at this position is As * gjs(0). The phase at this position, which is also the phase at the starting point of the interval, is js. At position ms+b in the Circular Fourier Transform, the deflection is As*gjs(b), since for any value for b, the Circular Fourier Transform at position ms + b is calculated (due to the definition of g). This can also be reversed. A sinusoid s with ms waves in the interval, amplitude As and phase js has, as Circular Fourier Transform, an image of the wave-deviation function, shifted to the right m positions, multiplied by As:

(F)(p) = As(gjs(p-ms)), 0 £ p £ 1/2n (3)

Now the following steps can be made:

One important problem has been ignored so far. It is not defined yet what exactly should be calculated if the number of rotations around the circle is not a whole number. An important requirement for this is that any sinusoid in the signal should have a Circular Fourier Transform that matches the formula that was given above.

pro1.gif
Figure 2.14: Both left and right, three images are drawn with the same value for b. The centre of gravity should be at exactly the same place for all images on the left and all images on the right, but this is the case in neither of the two sides. At the left, the drawn centre of gravity is calculated using the average of all the present vectors. At the right, fading is used.


A first idea could be to just take the average of all the vectors in the signal. Nothing special is done with the fact that a part of the circle has a higher number of rotations than the rest of the circle. See for example fig. 2.14, 1A, 2A, 3A. In all three images, the value for b (wave-derivation) is identical, so the centre of gravity should be at exactly the same position. It is not. This means that the results cannot correspond completely to the function for g(b), and therefore not with function (6).

pro1a.gif
Figure 2.15: Both left and right, the difference between the (absolute) values of the Circular Fourier Transforms and the expected (absolute) results according to the formula is displayed. The numbers correspond to the number of waves of the sinusoid in the signal. Left, the phase of the sinusoid is 0; right it is 1/2p.


In fig. 2.15, for a lot of different frequencies the difference between the (absolute) values of the Circular Fourier Transforms and the expected results according to the formula is displayed. Note that the difference is the highest for the lowest number of rotations. Apparently, the given formula is not very reliable, especially for lower numbers of rotations. This means that taking the average of the vectors is probably not the right way to find the centre of gravity.

A first idea could be to fade the part of the circle that has the higher number of rotations. If the number of rotations is m, then the part of the circle that has the lowest number of rotations has ëm û rotations, while the other part of the circle has ém ù rotations. By multiplying the vectors in the part of the circle with the higher number of rotations by [(ëm û)/( ém ù)], the importance of both parts of the circle can be made identical. This will be refered to as fading .

The effect of this is shown in fig. 2.14, 1B, 2B, 3B. The centres of gravity are still not at the same place, so fading is not the solution.

problem2.gif
Figure 2.16: Problems for fading.


Even if the centres of gravity would be at the same position, another problem would remain. If the input signal for the Circular Fourier Transform consists of one peak, then the position of this peak should only have an effect on the phases of the frequencies, not on their amplitudes. But fig. 2.16 shows clearly that when fading, the amplitudes vary also. If the peak is faded, so are all the frequencies that together form this peak. This is a second reason why fading is not the solution. Moreover, it shows that fading parts of the interval can never be a solution. To fulfill formula (6), something else needs to be done.

After trying a lot of other methods, which will not be mentioned here since the results were not very useful, it seems safe to assume that it is probably easier just to cope with this in the algorithm that finds the peaks, instead of transforming the signal to one that makes it easier to find the peaks.

For now, the (small) differences between the found function and the data that is found in the Circular Fourier Transform will be ingored. First, the focus will be on finding a method to improve the found data and on finding the peaks.

@@1bp-0.gif
Figure 2.17: Circular Fourier Transform after removal of surrounding peaks


From the formula for g follows, that in both directions from the main peak, each subsequent surrounding peak is a little less strong. More precisely, if the main peak has strength c, and a surrounding peak has strength ca, then the subsequent surrounding peak has strength c[1/( 1/a+p)]. Therefore, each surrounding peak with strength ca lies between two surrounding peaks with strengths c [1/( 1/a-p)] and c[1/( 1/a+p)] (if it is not the first surrounding peak, seen from the main peak). The surrounding peaks can be removed by subtracting from each position in the Circular Fourier Transform c [1/( ([1/( preceding peak)] + [1/( subsequent peak)])/2)], so

c [2/( 1/a-p + 1/a+p)]. (R1)

This will be refered to as the reverse average . Proof:
ca - c [2/( 1/a-p + 1/a+p)] = ca - c [2/( (2/a))] = ca - c a = 0.
The main peak is not removed this way, since in the centre of the main peak, the values at this position plus one and minus one are lowest points, with strength 0. The result of this removal step is shown in fig. 2.17.

@@3bp-1.gif
Figure 2.18: Circular Fourier Transform of a signal with three sinusoids after removal of surrounding peaks with method R1.


This method works fine if there is only one main peak. Let in a transform with two main peaks, a surrounding peak have strength ca + db. This peak then lies between two peaks with strengths c[1/( 1/a-p)] + d[1/( 1/b-p)] and c[1/( 1/a+p)] + d[1/( 1/b+p)]. (R1) then results to:
[2/( ([1/( (c[1/( 1/a-p)] + d[1/( 1/b-p)]))] + [1/( (c[1/( 1/a+p)] + d[1/( 1/b+p)]))]))] ¹ ca + db.
Therefore this method performs very poorly if there is more than one main peak (fig. 2.18). Main peaks that are close to each other can partially extinguish each other, and peaks that are less strong can be extinguished completely, even at larger distances. The cause of this is that the absolute value of g is used. If the Circular Fourier Transforms of a signal with more sinusoid is calculated, the main/surrounding peaks for all sinusoids are added in complex space . The results in absolute values can therefore be very confusing, so it is necessary to work in complex space.

3beep-a.gif
Figure 2.19: Spiral display of Fourier Transform (signal interval of 2048 values) of a signal with three sinusoids.

3beep-b.gif
Figure 2.20: Spiral display of a Circular Fourier Transform (signal interval of 2000 values) of a signal with three sinusoids.

3beep-c.gif
Figure 2.21: Spiral display of a Circular Fourier Transform (signal interval of 2000 values) after removal of surrounding peaks of a signal with three sinusoids. Values below zero are set to zero.


The advantages of finding peaks already become visible at this point. In the next chapter, a display called the spiral display will be described. At this point, the properties of this display are not important, except for the fact that different sinusoids are shown at different places on the display. In fig. 2.19 - 2.21, the results of different analysis techniques on the same signal are shown. The signal consisted of three different sinusoids. In fig. 2.19, the Fourier-analysis of the signal is shown. The three sinusoids can be seen, but they are not visible very clearly.
In fig. 2.20, the Circular Fourier-analysis of the signal is shown. The three sinusoids are a bit clearer here, but the result is still very poorly. Note that the surrounding peaks are clearly visible in the display.
In fig. 2.21, removal technique R1 is used on the data from the Circular Fourier-analysis. Values below 0 are set to 0. The three sinusoids are visible very clearly.
Note that in the latter two, the positions at which the peaks lie can be determined more precisely. In the Circular Fourier-analysis, (the top of) the peak can be at [1/( a)] times as many positions as in the Fourier-analysis. The width of the peaks are not very different between the first and the second image. Note that it is smaller in the third image, since also a part of the sides of the main peaks is removed.

x1bp-1.gif
Figure 2.22: Complex representation of a Circular Fourier Transform of a signal consisting of one sinusoid

x2bp-0.gif
Figure 2.23: Complex representation of a Circular Fourier Transform of a signal consisting of two sinusoids

x5bp-0.gif
Figure 2.24: Complex representation of a Circular Fourier Transform of a signal consisting of five sinusoids

x9bp-0.gif
Figure 2.25: Complex representation of a Circular Fourier Transform of a signal consisting of nine sinusoids

noise1.gif
Figure 2.26: Complex representation of a Circular Fourier Transform of a signal consisting of white noise


Another possibility, which solves all problems mentioned above, is based on the fact that the (absolute) highest value in the Fourier Transform must lie on a main peak. Since the shape of such a peak with its surrounding peaks is known, it can be removed, after which again the (absolute) highest value can be found, etc.. Note that, since the main peaks can be influenced by surrounding peaks of other main peaks, only an approximation of the amplitude, position and phase of a main peak can be known. If the main peak and its surrounding peaks are removed completely at once, and the results were not perfect, then some errors will be introduced in the remaining data. This can largely be avoided by removing only a small percentage of the main peak and its surrounding peaks. Small errors in amplitude, position and phase can still be corrected, since the values for this will change a little in a direction opposite to the error. The smaller this percentage is, the more precisely peak strengths can be ascertained. As mentioned before, it can happen that the position of the peak shifts a little in subsequent times that the same peak is found. According to tests, if less than 1/s of the main and surrounding peaks is removed at each step, the position of the main peak never shifts more than 1/s positions. (It might shift slightly more if the positions of some peaks are very close, so it might be safer to remove only [1/( 2s)].) By setting 1/s to a, the precision with which the Circular Fourier Transform has been calculated, the shifting will never be more than this precision. This way, it is easy to know which found peaks belong together, which makes it possible to add all these peaks, making it (for example) possible to locate the peaks with a greater precision than that of the Circular Fourier Transform. The peak position is a weighed average of the positions of the found partial peaks , with as weighing factor the strength of each of the found partial peaks.

The complex representations of some Circular Fourier Transforms of signals consisting of a limited number of sinusoids are shown in figs. 2.22 - 2.26.

Fig. 2.22 shows what the transform looks like for a signal consisting of one sinuisoid. The figure is drawn as follows: First, on one side of the centre, a number of circle-like shapes are drawn. Each subsequent shape gets a little bigger, all shapes start at the same point (the centre). The last one however, does not end at this point, but at the centre of the large loop, which corresponds to the main peak. After this, circle-like shapes are drawn at the other side of the centre, which get smaller (so in fact a mirrored image of the first half of the figure). The point that is furthest from the centre, the centre of the loop, is the centre of the main peak. Therefore it is possible to calculate the position, amplitude and phase of this point, making it possible to calculate and remove the corresponding main and surrounding peaks.

Figs. 2.23 - 2.25 show the same transform for a signal with more than one sinusoid. As shown in fig. 2.25, it is possible that when two main peaks lie very close to each other, between the loops of the two there are no points at the centre of the image. One loop just proceeds into the other.

Fig. 2.26 shows the transform of a signal consisting of white noise . Noise can be considered as built from a large number of sinusoids, spead over the entire (noise-)spectrum. Since only a short interval of the noise signal is analysed here, some frequencies are stronger than others. This corresponds to the fact, that if a small interval of noise is played continuously, some tones are audiable above the noise. The longer the noise interval is, the less this is the case.

The correspondence between the absolute values of some Circular Fourier Transforms and the peaks that are found using the method described above are shown in figs. 2.27 - 2.33.

This is also shown for some real-life, sampled signals in figs. 2.34 - 2.38. For some of the real-life sampled signals, the peaks that correspond to the highest frequencies are a bit disturbed. This is not caused by a deficiency of the analysis method. It is caused by the fact that if the pitch of tone changes, higher frequencies shift more than lower ones. (If a peak at position 10 shifts to 10.1, then a peak at position 100 shifts to 101. Therefore the image in the Circular Fourier Transform is affected much more for the latter one than for the first one.)

@1bp-0.gif
Figure 2.27: Close total peaks found in a Circular Fourier Transform of a signal consisting of one sinusoid

@3bp-0.gif
Figure 2.28: Close total peaks found in a Circular Fourier Transform of a signal consisting of three sinusoids

@3bp-1.gif
Figure 2.29: Close total peaks found in a Circular Fourier Transform of a signal consisting of three sinusoids, two of which lie close together

@3bp-2.gif
Figure 2.30: Close total peaks found in a Circular Fourier Transform of a signal consisting of three sinusoids, one of which has a strength of only one third of that of the others

@x5bp-0.gif
Figure 2.31: Close total peaks found in a Circular Fourier Transform of a signal consisting of five sinusoids

@x9bp-0.gif
Figure 2.32: Close total peaks found in a Circular Fourier Transform of a signal consisting of nine sinusoids. Note that the right two peaks cannot be separated in the top image, but are separated in the bottom image.

@noise1.gif
Figure 2.33: Close total peaks found in a Circular Fourier Transform of a signal consisting of white noise

@e.gif
Figure 2.34: Close total peaks found in a Circular Fourier Transform of a (sampled) sound /e/ (like in bench).

@a.gif
Figure 2.35: Close total peaks found in a Circular Fourier Transform of a (sampled) sound /a/.

@oo.gif
Figure 2.36: Close total peaks found in a Circular Fourier Transform of a (sampled) sound /o/ (like in home).

@oe.gif
Figure 2.37: Close total peaks found in a Circular Fourier Transform of a (sampled) sound /u/ (like in food).

@u.gif
Figure 2.38: Close total peaks found in a Circular Fourier Transform of a (sampled) sound /  180 e  / (like in turn).

So far, the problem of deriving the main and surrounding peaks from the position, amplitude and phase of the top position has been ignored. If the problem with the number of rotations around the circle that is not a whole were solved, using formula (3) would do. Since solving this problem seems problematic (see above), another method will be discussed here.

If the position, amplitude and phase of the main peak are known, it is possible to produce a signal which, if analysed again, would produce a main peak with exactly the same position, amplitude and phase. (This is a sinusoid with a number of waves equal to the position of the main peak, amplitude identical to that of the main peak (see below), starting with the phase of the main peak.) The result of this analysis can then be used instead of the result of formula (3). The advantage compared to using formula (3) is, that in this case also the errors caused by the numbers of rotations that are not a whole would occur in the results. By subtracting a percentage of the result of this from the total signal, the desired result without errors can be obtained. The result of this is shown in fig. 2.39.

3beep-d.gif
Figure 2.39: Spiral display of a Circular Fourier Transform (signal interval of 1000 (!) values) after finding peaks of a signal with three sinusoids.


It might seem that total peaks that are found this way still need to be corrected for errors in the strengths of the main peaks in the Circular Fourier Transform that are caused by numbers of rotations that are not a whole. This is however not the case. Let the amplitude of a sinusoid in the original signal be A, and the (absolute) measured top value of a main peak in the Circular Fourier Transform be dA. To remove a part of this peak, a signal consisting of a sinusoid with amplitude dA is constructed. But then the resulting (absolute) measured top value of the main peak in the (F)1 is d2A. The (absolute) value that is found for the partial peak also is dA, instead of A. The resulting peak is deminished by d2A divided by some constant value. So the value of the partial peak is multiplied by a factor d, and the amount with which the main peak is deminished is multiplied by a factor d2. For example for a value of d > 1, this means that the partial peak values are too high, but also that the main peak shrinks faster for each matching partial peak that is found than for d = 1. Therefore, the (absolute) value of the total peak that is calculated from these partial peaks is equal to the amplitude A of the original sinusoid, regardless of the frequency (and therefore the position of the main peak) of the sinusoid.

An important disadvantage of this method is, that for each sinusoid in the signal a large number of Circular Fourier Transforms needs to be calculated, whereas calculating a Circular Fourier Transform takes a considerable amount of time. (In a first test program, the analysis of the sound /e/ to a reasonable precision took about 30 hours.) Using tables is hardly a solution: There are as many as [(n)/( 2a)] possible peak positions, each of which having a Circular Fourier Transform consisting of [(n)/( 2a)] complex values, which are different for every different phase, so [(n2)/( 4a2)] need to be stored for every different phase, with a large precision for the phases. For n = 1000, a = 1/8 (which is not very high), and a precision for the phases of [1/ 1000] p (which is probably not even good enough), this would require 32 billion values, which would need to be stored in a very high precision. Larger values for n and smaller values for a can also be used, so this is not very realistic.

For this reason, a formula will be derived from the previously mentioned test program, which is then simplified to make it faster.6 ([Par90])

The program, in some pseudo-code , consists of some different parts. Let N be the size of the Circular Fourier Transform (i.e. the number of whole positions in it), M be the size of the signal interval (n), PARTS be [1/( a)]:

First, the Circular Fourier Transform of the signal interval s is calculated:

f := CFT(s, M);

After the Circular Fourier Transform f of the original signal interval is calculated, the highest peak is found:

FOR x FROM 0 UPTO N * PARTS
  DO 
    s := sqr(real(f[X]) ^ 2 + imag(f[X]) ^ 2)
    IF s > max_s
      THEN max_s := s; 
           max_s_pos := x;
           max_s_real := real(f[x]); 
           max_s_imag := imag(f[x]);
    ENDIF
  OD
max_s Now contains the amplitude of the peak, max_s_pos the position, max_s_real and max_s_imag the real and imaginary value of the peak, respectively. From the latter two the phase can be calculated:

IF max_s_real  ¹  0
  THEN j := arctan(max_s_real / max_s_imag)
  ELSE j := arctan(max_s_real * ¥)

Now, the corresponding sinusoid can be calculated:

number_of_waves := max_s_pos / PARTS;
FOR x FROM 0 UPTO M - 1
  DO
    s'[x] := sin(2p * number_of_waves * x / M + j) / M;
  OD

Next, the Circular Fourier Transform of this sinusoid can be calculated:

f' := CFT(s', M);

This can now be subtracted from the original Circular Fourier Transform:

FOR x FROM 0 UPTO N * PARTS
  DO
    f[x] -:= f'[x] / PARTS;
  OD
The Circular Fourier Transform is calculated in the following function:

FUNCTION CFT (ARRAY signal, size)

FOR x FROM 0 UPTO N STEP 1 / PARTS
  DO
    real := 0;
    imag := 0;
    stepsize := 2p * x / size;
    cos_count := 0;
    FOR u FROM 0 UPTO size - 1
      DO
        real +:= signal[u] * cos(cos_count);
        imag +:= signal[u] * sin(cos_count);
        cos_count +:= stepsize;
      OD
    f[PARTS * x] := 2 * complex(real, imag);
  OD

RETURN(f)

Using unfolding ([Par90]), from CFT a new algorithm can be derived for calculating the Circular Fourier Transform of a signal consisting of one sinusoid. For the calculation of the (F)1, CFT is called with s' for signal. Unfolding signal, using the formula that was used to calculate s', so signal[u] is unfolded to
s'[u] = sin(2p * number_of_waves * u / M + j) / M. This results to:

FOR x FROM 0 UPTO N STEP 1 / PARTS
  DO
    real := 0;
    imag := 0;
    stepsize := 2p * x / size;
    cos_count := 0;
    FOR u FROM 0 UPTO size - 1
      DO
        real +:= sin(2p * number_of_waves * u / M + j) / M * cos(cos_count);
        imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(cos_count);
        cos_count +:= stepsize;
      OD
    f[PARTS * x] := 2 * complex(real, imag);
  OD

One can easily see (or prove), that at any time the value of cos_count is used, this is equal to u * stepsize. By replacing this, the inner loop can be changed to:

DO
  real +:= sin(2p * number_of_waves * u / M + j) / M * cos(u * stepsize);
  imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(u * stepsize);
OD

By unfolding stepsize, this becomes:

DO
  real +:= sin(2p * number_of_waves * u / M + j) / M * cos(u * 2p * x / M);
  imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(u * 2p * x / M);
OD

complex(real, imag) now returns the complex number that is equal to the xth value of the Circular Fourier Transform (F)1 of a signal consisting of one sinusoid, with number_of_waves (which will be called a from now on) waves in the interval, starting at phase j at the beginning of the interval, with interval size M. What this program does can also be described as follows, if (F)1 (a, j, M) is defined to be the Circular Fourier Transform of a single sinusoid with amplitude a, phase j and signal interval size or number of sample values M:

(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) cos(2p x u/M)
                                + i sin(j + 2p a u/M) sin(2p x u/M))

Two other ways to write this formula are:

(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) (cos(2p x u/M) + i sin(2p x u/M))),

or

(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/Mei 2p x u/M).

By simplifying this formula as far as possible, and more specifically by removing the å-sign, the calculation can be accelerated.

A first acceleration, without removing the å-sign, can be obtained by calculation the values of cos(2p x u/M) + i sin(2p x u/M) (or ei 2p x u/M) in advance and storing these in a table. This table is circular; if M is a power of 2, it can be kept small and the right value in the table can be found very fast without the need to even calculate a modulo.

Note, that in the original program the values of sin(j + 2p a u/M) were also calculated in advance, so for each value of u, only two multiplications of real numbers (and an addition) are left.

It turns out that it is possible to remove the å-sign from the formula. The resulting formula might give slightly different results, introducing errors, but these are very small. To see this, it is important to note what the formula represents exactly. This can most easily be seen in the previously mentioned variant

(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) (cos(2p x u/M) + i sin(2p x u/M))).

Here, sin(j + 2p a u/M) is the signal for which the Circular Fourier-analysis needs to be calculated, and (cos(2p x u/M) + i sin(2p x u/M)) is the rotation around the circle, as was already shown before in many figures. From these images (see for example fig. 2.2), it follows clearly that it is possible to consider both the input signal (the sample) and the circle as continuous , instead of as a number of discrete points at equal distances from each other. So, å0M-1 can be replaced by ò-1/2M-1/27. The new formula now becomes:

(F)1 (a, j, M) (x) = 2/Mò-1/2M-1/2 (sin(j + 2p a u/M) cos(2p x u/M)
                                 + i sin(j + 2p a u/M) sin(2p x u/M)) (du)

or

(F)1 (a, j, M) (x) = 2/Mò-1/2M-1/2 sin(j + 2p a u/M) cos(2p x u/M) (du)
                    +i 2/Mò-1/2M-1/2 sin(j + 2p a u/M) sin(2p x u/M) (du)

Let l0 = -1/2, l1 = M-1/2. Calculation of the integrals gives:

òsin(j+ 2pa u/M) cos(2px u/M)   (du) =    (See box 1)
- [1/( 2pa 1/M - [((2px 1/M)2)/( 2pa 1/M)])] (cos(j+ 2pa u/M)   cos(2px u/M) + [(2px 1/M)/( 2pa 1/M)] sin(j+ 2pa u/M)   sin(2px u/M)).

òl0l1 sin(j+ 2pa u/M) cos(2px u/M)   (du) =

- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) + x/a sin(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]))
+ [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]) + x/a sin(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)])) =

- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) + x/a sin(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)])
                 - (cos(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]) + x/a sin(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]))) =

- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))
                 + x/a (sin(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]))).

For the imaginary part, this is:

The goal is to find the integral of cos(j+ pu)   cos (qu)   (du).  
Assume (wrongly):
òf(x)   g(x)   (dx) = (òf(x))   g(x) + f(x)   (òg(x)).
If that were true,
òcos(j+ pu)   cos (qu)   (du) = 1/p sin(j+ pu)   cos(qu) + cos(j+ pu)   1/q sin(qu).
To check this, take the derivate of both sides of this equation:
(òcos(j+ pu)   cos (qu)   du)¢ = (1/p sin(j+ pu)   cos(qu) + cos(j+ pu)   1/q sin(qu))¢
Þ cos(j+ pu)   cos (qu) = 1/p(sin(j+ pu)   cos(qu))¢+ 1/q(cos(j+ pu)   sin(qu))¢
1/p(sin(j+ pu)   cos(qu))¢ =
= 1/p(p cos(j+ pu)   cos(qu) - sin(j+ pu)   q sin(qu)) =
= cos(j+ pu)   cos(qu) - q/p sin(j+ pu)   sin(qu)
1/q(cos(j+ pu)   sin(qu))¢ =
= 1/q(-psin(j+ pu)   sin(qu) + cos(j+ pu)   q cos(qu)) =
= -p/q sin (j+ pu)   sin(qu) + cos(j+ pu)   cos(qu)
To get closer to the goal, multiply all arguments in the last line of these two by -[(q2)/( p2)]:
-[(q2)/( p2)] * (-p/q sin (j+ pu)   sin(qu) + cos(j+ pu)   cos(qu))) =
= q/p sin (j+ pu)   sin(qu) - [(q2)/( p2)] cos(j+ pu)   cos(qu))
Addition of the result of this and the first line gives:
cos(j+ pu)   cos(qu) - q/p sin(j+ pu)   sin(qu) + (q/p sin (j+ pu)   sin(qu) - [(q2)/( p2)] cos(j+ pu)   cos(qu)) =
= (1 - [(q2)/( p2)]) cos(j+ pu)   cos(qu).
De goal that was set was:
cos(j+ pu)   cos (qu),
so multiplying by [1/( 1 - [(q2)/( p2)])] is necessary.
Both multiplications are multiplications with constant values, so they can also be used in the integral.
The integral that is found this way is:
òcos(j+ pu)   cos (qu)   (du) = [1/( 1 - [(q2)/( p2)])] (1/p sin(j+ pu)   cos(qu) - [(q2)/( p2)] 1/q cos(j+ pu)   sin(qu)).
òcos(j+ pu)   cos (qu)   (du) = [1/( p - [(q2)/( p)])] (sin(j+ pu)   cos(qu) - q/p cos(j+ pu)   sin(qu)).
The derivate of this is cos(j+ pu)   cos (qu), as wanted.
1          DERIVATION OF A FORMULA FOR THE CALCULATION OF THE REAL PART
OF A CIRCULAR FOURIER TRANSFORM OF A SIGNAL CONSISTING OF ONE SINUSOID




The goal is to find the integral of cos(j+ pu)   i   sin (qu)   (du) = i   cos(j+ pu)   sin (qu)   (du).  
This is i times the integral of cos(j+ pu)   sin (qu)   (du).
When in the formula for the real part, sin(qu) is changed into cos(qu), and cos(qu) into -sin(qu), then this gives:
òcos(j+ pu)   sin (qu)   (du) = [1/( p - [(q2)/( p)])] (sin(j+ pu)   sin(qu) + q/p cos(j+ pu)   cos(qu)).
Calculating the derivate of this gives cos(j+ pu)   sin (qu).
2     DERIVATION OF A FORMULA FOR THE CALCULATION OF THE IMAGINARY PART
OF A CIRCULAR FOURIER TRANSFORM OF A SIGNAL CONSISTING OF ONE SINUSOID





òl0l1 sin(j+ 2pa u/M) sin(2px u/M)   (du) =

- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - x/a sin(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]))
+ [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]) - x/a sin(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)])) =

- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]))
                 - x/a (sin(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))).

The formula for (F)1 can now be rewritten as:

(F)1 (a, j, M) (x) = 2/Mòl0l1 sin(j + 2p a u/M) cos(2p x u/M) (du)
                    +i 2/Mòl0l1 sin(j + 2p a u/M) sin(2p x u/M) (du) =

- 2/M [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))
                 + x/a (sin(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)])))
- i 2/M [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]))
                 - x/a (sin(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))).


(F)1 (a, j, M) (x) =
-[1/( p(a - [(x2)/( a)]))] ( ((cos(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))
           + x/a (sin(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)])))
          + i ((cos(j+ 2pa [(l1)/( M)])   sin(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)])   sin(2px [(l0)/( M)]))
           - x/a (sin(j+ 2pa [(l1)/( M)])   cos(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)])   cos(2px [(l0)/( M)]))) ).


Most values in this formula can be calculated and stored in tables in advance. Other values are used more than once in the formula. Using this, the gain in speed compared to the previous test version is about 1500 times. An example algorithm for this function, in which all sines and consines are calculated outside the iteration , is given in appendix A.2.

The program got a lot bigger than the original version. This is partially caused by the optimising, but there is also another important difference. The original program contained an error, that is corrected here.8 In some cases, the highest position on the top is not precisely the number of waves of the sinusoid that caused this top. The position of the highest point of the main peak can be shifted a little due to the errors caused by the numbers of rotations that are not a whole. If in these cases the (F)1 of the tone with as number of waves the position of the highest point is calculated, and the result is analysed again, it turns out that the position of the highest point has changed. This is especially a problem for very small values for m, since the errors are highest in this part of the Circular Fourier Transform. If in these cases the position of the highest point is used for the number of waves, the errors in the result can become large, since not only the main peak, but also the surrounding peaks are not correctly removed. The Circular Fourier Transforms of the original sinusoid and the (F)1 do not match precisely.

To solve this, the program needs to calculate the (F)1 of a sinusoid such, that the highest point is at the same position as in the transform of the original signal. Note that the phase is different for different positions at the peak, so if another than the highest point is used, the phase needs to be corrected also.

Seen from the highest point on the main peak of a (F)1, to both directions each subsequent point is lower, as long as it is on the main peak. So, if a point is not the highest point on this peak, and only then, the point at one of the two positions at a distance a from that point must be higher. Therefore, it is not necessary to calculate the entire (F)1, check whether the main peak is at the right place, and calculating it again with a small correction if not, until the peak is at the right place. Instead, it suffices to calculate only the values for the wanted position pw for the highest position and the two values at positions pw - a and pw + a. If pw has the highest value, then the calculation can be finished by calculating the rest of the (F)1. If pw - a is higher, the peak for the current number of waves lies at a position that is too low, so the number of waves needs to be increased a little. If pw + a is higher, the opposite needs to be done. In both cases, also the phase needs to be corrected. This step needs to be repeated until the highest point lies at position pw (or to the opposite direction than before). Since the value that is found this way is equal to the number of waves in the original sinusoid (the previously found value was affected by errors), this is the value that is to be used for further calculations.

The result of the algorithm is a list of peak positions with phase and amplitude information. Since only a small part of each found peak is removed, the same position can occur very often in this list. The easiest way to determine the total amplitude and phase for each frequency is to add the data for all peaks with the same position (which is equal to the frequency multiplied by a constant (n/r)). The total peaks that are found this way will be refered to as added total peaks . This works fine, but this way many of the sinusoids are displayed at more than one frequency, since the positions of the partial peaks for a frequency can differ a little. For many applications this is no objection, since the frequencies for the same sinusoid are still very close to each other (depending among other things on the chosen value for a).

As mentioned before, if the part of the main and surrounding peaks that is removed in each step is small enough, then two found partial peaks that belong to the same total peak never lie more than a positions from each other, iff there is no third partial peak that belongs to the same total peak which is found by the given algorithm between the first two partial peaks. Using this, it is possible to determine peak positions precisely. The position of a total peak is the weighed average of the frequencies of the found partial peaks, with as weighing factor the strength of the peaks. Total peaks that are found this way will be refered to as close total peaks .

An algorithm that uses this to find the close total peaks is given in appendix A.3.

This completes the signal analysis.

Note: The uncertainty principle is not violated by this analysis method. The Circular Fourier Transform fulfills this principle, and the total peaks are nothing more than a different way to describe the Circular Fourier Transform. Peaks can be determined very precisely, but only if there are no other peaks that lie very close to each other. The distance between the peaks is proportional (in a constant signal) to the size of the sample interval, which corresponds to the uncertainty principle.

2.4  Differences between Fourier Transform and total peaks

There are some important differences between Fourier Transforms and (close or added) total peaks. Some important differences are mentioned below.

If a long sample is split in subsequent intervals, these intervals are analysed, an operation is performed on the results of this, and the results are transformed back, it is usually necessary to go smoothly from one interval to the next (since due to the operation that is performed, the interval border values might not match and therefore produce a click in the resulting total sample). If a standard Fourier Transform is used, it is not possible to determine what the signal looks like outside the original signal interval. Only an infinite repetition of this signal is found if the peaks that were found are used to calculate the signal outside the original interval. (This corresponds to the fact that the signal is considered to be periodical.) Therefore, it is necessary to use an overlapping area between the intervals, which is used for the smooth transition. If total peaks are used, it is possible to determine what the signal will probably look like outside the signal interval. (This corresponds to the fact that the signal is considered to be finite.) If this is used instead of an overlapping area between the intervals, no disturbances of the sound are audiable (although the signal can not be reproduced pricisely this way, so using an overlapping area is - at least theoretically - still a better option). This can be used to speed up the total transform a little. Note that this works better for close total peaks than for added total peaks, since the latter still cause a repetition of the signal every [1/( a)] periods.

2.5  Suggestions for further research

In this section, some suggestions are given for further research. This list is not complete. Not every point in this list is directly usable in this thesis, but it might be used as a starting point for new research.

In the test version of this program, the total analysis of a complex signal, consisting of both noise and other sounds, with a = 1/8 and (F) runs from 0 to 500 (so it contains 4000 values), takes about 45 seconds. The program was run on an IBM 6x86/150+ (Pentium clone), calculation was terminated when a partial peak was found with a strength of less than 0.5 percent of the first found partial peak. Simple signals take far less time, with a minimum of about 2.5 seconds for a single sinusoid.

In a real-time version, calculation can be terminated if there is no time left. In that case, peaks are determined much more precisely for simple signals than for complex signals.

Chapter 3
Signal display

In the previous chapter, a new frequency analysis method was developed. In this chapter, various ways of displaying the results of the analysis will be examined.

A very important point in displaying the obtained frequency information, is that it is only valid for a very short time. The human eye however is limited in the number of images that it can separate in a given time. There are two ways of dealing with this:

In this chapter the focus will be on the latter of these two.

3.1  Spiral display

In a graph where frequency response is set out against frequency, it is very hard to determine which frequencies are multiples of each other, which would mean that they describe the same tone (what a tone is will be defined later, for now this is not really important), but in another octave. The human ear however can do this very easily, and it is needed for example for listening to music.

Especially for music , it is very important to see relations between frequency response peaks in different octaves . Also, it is important that the way a tone looks should not change too much if the tone occurs in another octave, since it would be very difficult to even see on the display which tones sound similar.

rectang.gif
Figure 3.1: Construction of a rectangle display


A first idea is the rectangle display . In this display, subsequent octaves are displayed next to each other. Each subsequent octave occupies twice the domain of the frequency space that the preceding octave occupied. This is displayed in fig. 3.1. The numbers in the image correspond to positions in the frequency space (total peaks), placed linearly in this space. Note that each odd number in this display starts a progression of numbers, where each following number is twice the latter one (for example, '1, 2, 4, 8, 16, 32, ...' or '3, 6, 12, 24, 48, 96, ...').

This display has an important disadvantage. If a tone starts at a frequency that is displayed for example at the position marked with '30', and it goes up, it might happen that it gets at or above position '32'. But then, the total frequency is suddenly displayed at the bottom of the image instead of at the top. Since most frequencies in sounds are not constant, this will happen very often, making it very difficult to see in what direction the signal is moving and what it looks like. Two tones that sound about the same, are displayed at a totally different position.

spiral.gif
Figure 3.2: Construction of a spiral display


This problem can be solved by using the spiral display , which is shown in fig. 3.2. In this display, the advantages of the rectangle display are combined with a solution of the latter problem. Also, it corresponds more naturally to the way that human hearing works, since the transition from one octave into the next one is fluent.

It has to be noted here that it is impossible to see very fast what is displayed, if all present frequencies are displayed as very thin lines. Also, this way the image might differ too much from the next one if the frequency changes a little. To solve this, a larger area of the cycle needs to be illuminated. Note that it does not suffice to illuminate the same percentage of the cycle for every cycle, since then the outer cycles are being filled too much and frequencies in the inner cycles are still displayed with too thin lines. The area that is to be filled should be almost the same for each cycle, regardless of the size of the cycle. Note that this way, lower frequencies that might not have been determined very precisely, have a relatively larger area of the corresponding cycle that is filled. This renders the precise determination of frequencies to be less important. Some examples are shown in 3.4.

3.2  Block display

An advantage of a graph where frequency response is set out against frequency is, that - in a still image - one can see very easily which frequencies belong together. With belonging together is meant here that together they form a tone , that is, they are a base frequency and its harmonics . The human ear interprets these frequencies together as one tone. However, if for each new sample interval, a new graph is drawn, the human eye cannot determine what is displayed anymore. For example, if a tone goes up five percent in a tenth of a second, the frequencies will shift to the right by five percent. This way, for example the nineteenth harmonic is displayed in the new image at the same position that the twentieth was in the image of a tenth of a second before. Since the human eye cannot follow rapid changes like this, another way of displaying the information is needed.

In order to solve this, it is necessary to be able to find out which frequencies belong together. This will therefore be addressed first.

3.2.1  Finding the frequencies that make a tone

A harmonic is always a multiple of the base frequency. This means that, given the base frequency, one can easily find all harmonics. So, in order to find out which frequencies belong together, it is sufficient to find the base frequency.

A first step

It would seem logical to just try every possible value for the base frequency, and then take the one with the most matching frequencies. A problem with this method is that, since all numbers that have the base frequency as a multiple would suffice, this would result in a lot of matching base frequencies. By taking the highest found frequency with the maximum number of matching harmonics, this problem would be solved.

Now there are three problems left. First of all, the frequency analysis is not perfect. Therefore, in the analysis result, harmonics are not always exactly a multiple of the base frequency. This can be solved by taking an area around every frequency that is being examined, and take the sum of all frequency responses in this area. This way this problem can be solved.

Secondly, if a tone is not completely stable within an interval, the frequency analysis of this interval will be affected. The solution of this is identical to that of the first problem, but the areas need to be larger.

Thirdly and finally, there can be other tones present, with other base frequencies and harmonics, or noise. These should be ignored, but it is not possible to know in advance which frequencies belong to these other tones or noise. Therefore, this problem is not easily solved. Below, a solution will be presented with which it is possible to find the base frequency of the loudest tone in the signal, if the loudness of the other tones is relatively low. This solution is not perfect, and will sometimes produce poor results even if there is only one tone available. Mostly however, the results will be good.

The basic idea

The basic idea is as follows. In the spiral display (section 3.1) the base frequency of a tone can often be recognised easily. Starting from the center, the direction in which there is the most brightness usually contains the base frequency. So, this direction in the spiral corresponds in some way to the base frequency, but without information about the octave in which it lies. It is now possible to use this direction to calculate a value such that the base frequency is this value, multiplied by a power of two.

Using this value, all frequencies matching this value multiplied by a power of two can be tried. Trying a frequency means here, that the responses of this frequency and all its harmonics are added (see below). If the result of this addition drops too much compared to the result of the preceding frequency (this is the current frequency divided by 2), it is probable that a part of the tone has been removed, which would imply that the preceding frequency was the base frequency.

Several problems are apparent. First of all, in the spiral display, the direction in which there is the most brightness usually contains the base frequency. In cases that the direction in which there is the most brightness does not contain the base frequency, the result will be very bad.

Secondly, also in the spiral display, very low frequencies are not always displayed in the right direction. This can be solved by fading all low frequencies.

Thirdly, the wrong octave can be found. If it is too high, frequencies are lost, if it is too low, a lot of frequencies with response 0 can be present. This problem is partially solved by the block display (section 3.2.2).

The elaborate idea

The procedure of finding the base frequency as described above was a bit too simple. It will hardly ever produce the correct results. The procedure that will be described now is more elaborate, and gives much better results, although the mentioned problems are not solved by it.

Finding the direction in the spiral corresponds roughly to the following procedure. Take a start frequency fs and do the following for every value in the interval [fs, fs * size]. (size stands for a value a bit above 2, for example 2.1. This interval corresponds to a bit more than one cycle around the spiral in the spiral display.) For every base frequence f0 in this interval, add the frequency responses of this and all its harmonics. The wanted base frequency is the frequency f0 which gave the highest frequency response, multiplied by a power of two. In fact, taking the sum of the responses of the harmonics does not work fine. After testing with a lot of values, the best results seem to be obtained using:
response(f) = response_area(f) 0.4 / (2 + (log(1 + f)) 3)
where response_area(f) is the sum of all frequency responses in the area
[max   (f / mul_diversion,   f - max_diversion),   min   (f * mul_diversion,   f + max_diversion)]
Now,
fbase = ftop * 2n
ftop = { f0   |   f0 Î [fs, fs * size]   Ù  Ø$f1 [ f1 Î [fs, fs * size]   Ù  tone_resp(f1) > tone_resp(f0) ] }
where n Î N,
     tone_resp(f) = åi = 0¥ response(i * f).
According to some test results, a good value for fs would be around 0.5. Good values for mul_diversion and max_diversion are 1.5 and 0.175, respectively. max_division must always be smaller than 1/2precision, to avoid overlapping of areas. So if a larger value for max_division is needed, it might be necessary to increase precision also.

If the value that is found lies around fs or fs * size, it is probably not very reliable, since also the position of a peak relative to that of f0 has an effect on the response function. Therefore, it is possible, that instead of finding a top value at for example fs * 1.95, it is found at position fs. If the result lies close to the border of the interval, a new (higher or lower) value for fs should be set, and the calculation should be performed again.

Next, the right value for n needs to be determined. Take a small area around frequency ftop, for example the interval [ftop / 1.2,   ftop * 1.2]. For each value ftry in this interval, let
flatftry(m) = response_area(m * ftry), m Î N.
This function describes all the multiples of ftry. If ftry is the right value, and the area is sized correctly, the base frequency and its harmonics are all displayed in the flat function at positions m = k * 2octave, k Î N+. (ftry lies on octave 0.)

From this, the octave can be found very easily in case of a perfect analysis and no other signals. This is however hardly ever the case. According to some tests, the following method will usually produce the correct value for octave.

The sum of the peaks that are included for some octave a is:
octave_contentsftry, c(a) = åb = 0¥ calcc(flatftry(b * 2a))
where c   Î   {0,   1,   2}
     a   Î   N,
     calc0(x) = log(x + 1),
     calc1(x) = x 0.875,
     calc2(x) = x 1.125.

If the octave contents drop more than a factor 1.25 between two subsequent octaves, it is probable that a part of the peaks that belong to the tone are removed. So the octave right before is probably the octave in which the base frequency lies:
possible_apparent_octaveftry,   c(a) º
º octave_contentsftry, c(a + 1) < octave_contentsftry, c(a) / 1.25,
where a Î N,

Of course this is only the case for the first time that this occurs:
apparent_octaveftry,   c = {a   |   possible_apparent_octaveftry,   c(a)
                                  Ù  Ø$b Î N   [ b > a Ùpossible_apparent_octaveftry,   c(b)]}.

The following is necessary because for some situations (more precisely, if the base frequency is very strong compared to the other frequencies), the octave that is found is one higher than it should be.
confirmftry,   c(a) = log(1 + flatftry(a))
denyftry,   c(a) = åb = 2¥ log(1 + flatftry(a * b))
repair_octaveftry,   c(a) = {-1    |   confirmftry,   c(a)   >   denyftry,   c(a)   /   60      |   otherwise
repaired_apparent_octaveftry,   c =
                         = apparent_octaveftry,   c + repair_octaveftry,   c(apparent_octaveftry,   c).

From this, the wanted octave can be found by trying all combinations of ftry and c, and taking the highest number for repaired_apparent_octaveftry,   c that is found.
octaveftop = {a   |   $ftry   Î   [ftop / 1.2,   ftop * 1.2],   c   Î   {0,   1,   2} [ repaired_apparent_octaveftry,   c = a ]
              ÙØ$ftry   Î   [ftop / 1.2,   ftop * 1.2],   c   Î   {0,   1,   2} [ repaired_apparent_octaveftry,   c > a ]}.
If octaveftop is below zero, detection failed.

Now, the best value for ftry needs to be found. The signal that is lost by skipping all positions of flatftry that are not a multiple of 2octaveftop, is given by lossftry,   c(octaveftop), where
lossftry,   c(a) = åb = 0¥ calcc(flatftry(b * 2a)) - åb = 0¥ calcc(flatftry(b)).
The best value for ftry is that value where lossftry,   c(octaveftop) (for some c) is the lowest.
fbase = octaveftop * fbest
where fbest   Î   {f   |   f   Î   [ftop / 1.2,   ftop * 1.2]
                  Ù$c   Î   {0,   1,   2} [repaired_apparent_octavef,   c = octaveftop
                     ÙØ$f1   Î   [ftop / 1.2,   ftop * 1.2],   c1   Î   {0,   1,   2}
                        [repaired_apparent_octavef1,   c1 = octaveftop
                           Ùlossf1,   c1(octaveftop) < lossf,   c(octaveftop)] ] }.
With this frequency (fbase), the response of the base frequency and its harmonics can now be determined very easily.

Note, that all the mentioned values can be changed. The values that are printed here are the ones that gave the best results in extensive tests.

3.2.2  The block display

After all the frequencies that make a tone are found, a visual representation of this tone is to be displayed. This should be done in such a way that the display has a typical look for each different sound (for example, for each phoneme ). Also, if the tone changes only slightly, the display should not change too much, since the human eye must be able to follow it. If the wrong octave is found (chapter 3.2.1), this should cause as few differences as possible. All these characteristics are available in the block display .

block.gif
Figure 3.3: Construction of a block display


In the block display, each frequency's response is displayed as the brightness of a square . These squares are placed in the following way (see image 3.3):

Note that the numbers of the squares on the left from the bottom to the top level are the powers of two, starting at one.

If an error has been made in determining the octave , and the found base frequency was half the real one, then the shape of the display is not really changed. The real base frequency is displayed on square 2, the first real harmonic at square 4, the next at square 6, etc.. The shape is identical, only it is shifted up one level and there are holes in it. Something similar is the case if the found base frequency was one quarter, one eightst etc. of the real one. If the found base value was twice the real one, then half of the real frequencies are not displayed, but still, the shape of the image is contained as far as possible, shifted one level down. Something similar is the case if the found base frequency was four, eight etc. times the real one.

The shifting up and down of the image due to the wrong determination of the octave can be avoided by shifting the entire image up or down according to the found base frequency. If this is twice as high, the image should shift up exactly one level (while if this is an error, the squares are filled in one level too low). A logarithmic function can be used to determine the exact placement of the image according to the base frequency. This way it is also possible to see the perceptive frequency of the tone (which is the base frequency) in this display. Originally the idea was to place the block display at a position corresponding to the base frequency around the spiral display, but it would move too much there, making it virtually impossible for the human eye to recognize any shapes in it. (In fact, shifting up and down is not enough since the blocks on another level do not lie at exactly the same horizontal position. To spare space, horizontal shifting is not used.)

Note, that in the obtained image, every level can be regarded as a new octave. Each octave corresponds to one cycle in the spiral display. Starting from the position of the base frequency, each new turn is displayed on the next level in the block display. The start of each turn is always on the left of the corresponding level. Therefore, all frequencies that are displayed in the same direction as the base frequency but in a subsequent cycle, are displayed at the left of the level corresponding to that cycle. Some examples are shown in 3.4.

3.3  Line display

The images in the block display are clear, but they have the disadvantage that they change too rapidly. They can be used to recognise (for example) phonemes, but then words would need to be pronounced very slowly to enable recognition.

To solve this, it is necessary to show not only the current, but also some preceding images in the display. This is not possible using the two displays that were described above.

A possible solution which will be described here very briefly is to take the tone information (3.2.1) and display a tone on a straight line, such that each octave takes the same amount of space on the line. This way it is easy to scroll some old lines upward each time a new line is added. A disadvantage is that it is harder to recognise phonemes in this display. On the other hand, this way words might get recognisable images. It is possible that, after training, people can recognise these words without recognising exactly which vowels the words are composed of.

Note that the display that is obtained this way is not identical to the display that would be obtained by displaying the frequency information on a line. For example, in the described display the base frequency fills an amount of space on the line that corresponds to an entire octave, whereas if the frequency information were displayed, only a dot would be present. Note that the best results can be obtained by combining the two methods (calculating the value for each point on the line using both methods, and setting the point to the maximum of the two values), so tones are displayed as described above, and noise is also displayed correctly. More work on this needs to be done. Some examples are shown in 3.4.

3.4  Examples

In this section, some example images are shown to illustrate the displays that were described above.

total1.gif
Figure 3.4: Spiral and block displays of some phonemes, pronounced by three different persons. From left to right: Male, female, female. From top to bottom: /Ù/, /a/, /æ/, /e/. (These phonemes correspond to part, -, bench, day.)

total2.gif
Figure 3.5: Spiral and block displays of some phonemes, pronounced by three different persons. From left to right: Male, female, female. From top to bottom: /i/, /  180 c  /, /o/, /u/. (These phonemes correspond to him, for, -, food.)


Some example phonemes that were pronounced by three different persons are displayed in figs. 3.4 and 3.5. Especially in the block display, it is clear that identical phonemes look a lot alike, regardless of who pronounces the phoneme.

For longer sounds, the subsequent images corresponding to the input sample can be calculated. In the following examples, there were 44 images per second.

An analysis of the sound of a door-bell is shown in fig. 3.6 (spiral display only).
An analysis of the sounds /  180 c i ua/ is shown in figs. 3.7 and 3.8:
/  180 c /: 0 - 14
/i/: 17 - 31
/u/: 35 - 48
/a/: 52 - 65
The line display of this is shown in fig. 3.14.

An analysis of the word ``sht!'' (/òt/) is shown in fig. 3.9:
/ò/: 0 - 27
/t/: 32 - 44
The line display of this is shown in fig. 3.15.

Finally, an analysis of the sentence ``One man living on an island'' (/wÙmæn 'livih  180 c n  180 e n'ail  180 e nd/) is shown in figs. 3.10 - 3.13:
/w/: 1 - 11
/Ù/: 12 - 18
/m/: 19 - 27
/æ/: 29 - 47
/n/: 49 - 54
/l/: 57 - 59
/i/: 60 - 65
/v/: 66 - 69
/i/: 72 - 76
/h/: 78 - 81
180 c /: 86 - 89
/n/: 90 - 91
180 e /: 93 - 95
/n/: 96 - 97
/a/: 99 - 104
/i/: 109 - 112
/l/: 114 - 115
180 e /: 117 - 122
/n/: 125 - 127
/d/: 130 - 139
The line display of this is shown in fig. 3.16.

For 18 phonemes, the corresponding spiral and block display is shown in figs. 3.17 and 3.18. These phonemes are all pronounced by the same person. The images can be used to recognise phonemes in other images. Note however that phonemes can look slightly different for different people, or when the pitch is different. Therefore, the printed images can only be used to determine sounds very roughly. Especially phonemes that sound alike might be confused. To make recognition more reliable, a list of several images of each phoneme pronounced by several different people would need to be composed. Using such a list, it is also possible to try to develop a feeling for which characteristics are important for some phoneme and which are not.

ring.gif
Figure 3.6: A ringing door-bell

oi1.gif
Figure 3.7: /  180 c i/.

oi2.gif
Figure 3.8: /ua/.

sht.gif
Figure 3.9: /òt/.

island1.gif
Figure 3.10: /wÙmæ...

island2.gif
Figure 3.11: ...æ n 'liv...

island3.gif
Figure 3.12: ... ih  180 c n  180 e n'a...

island4.gif
Figure 3.13: ... ail  180 e nd/

_oi.gif
Figure 3.14: /  180 c i ua/

_sht.gif
Figure 3.15: /òt/

_island.gif
Figure 3.16: /wÙmæn 'livih  180 c n  180 e n'ail  180 e nd/

legenda1.gif
Figure 3.17: /Ù/ as in part, /a/, /  180 c / as in for,
. /o/, /u/ as in food, /  180 e / as in turn,
. /ü/, /æ/ as in bench, /e/ as in day.

legenda2.gif
Figure 3.18: // as in him, /i/ as in he, /j/ as in you,
. /l/ as in long, /n/ as in man, /m/ as in man,
. /w/ as in when, /h/ as in long, /s/ as in still.

3.5  Suggestions for further research

In this section some suggestions are given for further research. The list of suggestions is probably not complete.

Chapter 4
Conclusions

The goals that were set in 1.1 are largely accomplished by writing an implementation of the algorithms that were developed in the previous chapters. This implementation is described in appendix B. Right now, there are not very many ways to analyse the signal and display the results of the analysis, but since every step is performed by a different program and the communication between the programs is very simple, it is easy to write new programs which replace the currently available programs or add some functionality to it.

Using the currently available analysis method and displaying methods, a number of problems that deaf people face which were mentioned in the introduction have already been solved:

The fact that it is possible to recognise some vowels and some sorts of music after just minutes of training (for someone who is not deaf), raises hopes that these displays perform better than the surgery that is mentioned in the introduction, for which five years of training is necessary just to be able to differentiate music from speech.

In order to determine how the displays perform in real life, extensive field testing is necessary. The analysis method and displaying methods that are developed in this thisis form a good starting point for investigating this. Positive and negative aspects of the analysis method and the displaying methods need to be determined, and from the results of this new methods might need to be developed. The implementation (see appendix B) enables this to a limited extend, since it cannot be ran in real-time (yet). But still, it is easy to implement new methods and add functionality to perform operations on data (which can be recorded in the field), so a reasonable idea of how it performs can be obtained.

Appendix A
Algorithms in pseudo code

This appendix contains some transformation algorithms that were derived and sometimes used in the previous chapters. The algorithms are written in some pseudo-code, variable declarations are left out.

A.1  Transformation: Sample to Circular Fourier Transform

COMMENT Constant values
CONST PARTS = [1/( a)]
CONST PI = p
CONST PI_2 = 2 * p
CONST SAMPLE_SIZE = something



COMMENT This procedure transforms the sample interval to a (F).
PROC trans_au_2_cf(au[], cf_real[], cf_imag[])
  tp := PI_2 / (SAMPLE_SIZE * PARTS);
  FOR x FROM 0 UPTO SAMPLE_SIZE * PARTS - 1
    DO
      tp_mul_x := tp * x;
      sine[x] := sin(tp_mul_x);
      cosine[x] := cos(tp_mul_x);
    OD;

  FOR x FROM 0 UPTO N * PARTS - 1
    DO
      f_real := 0;
      f_imag := 0;
      cs_count := 0;
      FOR u FROM 0 UPTO SAMPLE_SIZE - 1
        DO
          f_real +:= au[u] * cosine[cs_count];
          f_imag +:= au[u] * sine[cs_count];
          cs_count +:= x;
          IF cs_count >= SAMPLE_SIZE * PARTS
            THEN cs_count -:= sample_size * PARTS;
          FI;
        OD;
      cf_real[x] := 2 * f_real;
      cf_imag[x] := 2 * f_imag;
    OD;

A.2  Transformation: Circular Fourier Transformed to partial peaks

COMMENT Constant values
CONST PARTS = [1/( a)]
CONST PI = p
CONST PI_2 = 2p
CONST N_PARTS = N * PARTS
CONST ONE_DIV_PARTS = 1 / PARTS
CONST SAMPLE_SIZE = something
CONST INT_START = - 1/2
CONST INT_END = SAMPLE_SIZE - 1/2
CONST GO_TO_PRECISION = 0 £ some precision < 1

COMMENT Calculate some numbers in advance
v1e := PI_2 * INT_END / SAMPLE_SIZE / PARTS;
v1s := PI_2 * INT_START / SAMPLE_SIZE / PARTS;
FOR x FROM 0 UPTO N * PARTS - 1
  DO
    v1e_mul_x := v1e * x;
    v1s_mul_x := v1s * x;
    CONST sine_e[x] := sin(v1e_mul_x);
    CONST cosine_e[x] := cos(v1e_mul_x);
    CONST sine_s[x] := sin(v1s_mul_x);
    CONST cosine_s[x] := cos(v1s_mul_x);
  OD;

FOR x FROM 0 UPTO PARTS * SAMPLE_SIZE
  DO
    CONST sine[x] := sin(PI_2 * x / (PARTS * SAMPLE_SIZE));
    CONST cosine[x] := cos(PI_2 * x / (PARTS * SAMPLE_SIZE));
  OD;



COMMENT Calculate at what peak strength we should stop
halt_strength = trans_cf_2_pp(cf_real, cf_imag) * GO_TO_PRECISION;

COMMENT Now find peaks until it gets below the calculated halt_strength
WHILE (trans_cf_2_pp(cf_real, cf_imag)) > halt_strength
  DO
  OD;



COMMENT This function takes the (F) and finds and stores the highest peak in it
COMMENT The found peak is then removed from the stored (F)
COMMENT Also, it returns the strength of this peak
FUNCTION trans_cf_2_pp(cf_real[], cf_imag[])
  COMMENT Find maximum
  max_v := 0;
  max_v_pos := 0;
  FOR x FROM 0 UPTO N * PARTS - 1
    DO
      v := (cf_real[x] * cf_real[x] + cf_imag[x] * cf_imag[x]);
      IF v > max_v
        THEN max_v := v;
             max_v_pos := x;
      FI;
    OD;
  max_v_real := cf_real[max_v_pos];
  max_v_imag := cf_imag[max_v_pos];

  COMMENT Calculate phase
  IF max_v_imag = 0
    THEN phi := atan(max_v_real * ¥);
    ELSE phi := signed_atan(max_v_real, max_v_imag);
  FI;
  number_of_waves := max_v_pos / PARTS;

  max_v := sqrt(max_v);
  mmax_v := max_v;

  COMMENT Calculate the (F) that corresponds to the current maximum
  calc_cf_1(bp_real, bp_imag, -phi, number_of_waves, max_v);

  COMMENT Subtract this from the original signal
  FOR x FROM 0 UPTO N * PARTS - 1
    DO
      cf_real[x] -:= bp_real[x];
      cf_imag[x] -:= bp_imag[x];
    OD;

RETURN mmax_v;



COMMENT This procedure returns the (F)1 of a single sinusoid,
        with number_of_waves waves within the interval,
        phase given by phi
        and amplitude amp / PARTS.
COMMENT Also, it stores the corresponding peak information.
PROC calc_cf_1(bp_real, bp_imag, phi, number_of_waves, amp)
  amplify := amp / PARTS;
  original_number_of_waves := number_of_waves;
  multiply := amplify;

  found_number_of_waves := 0;
  finished := FALSE;
  done := FALSE;
  direction_up := FALSE;
  direction_down := FALSE;
  ac := FALSE;

  one_div_parts := 1 / PARTS;

  WHILE NOT finished
    DO
      IF done
        THEN start := 0;
             end := N * PARTS;
             finished := TRUE;
        ELSE start := PARTS * number_of_waves - 1;
             end := start + 3;
             IF start < 0
               THEN start := 0;
                    end := 3;
             FI;
             IF end > N * PARTS
               THEN start := N * PARTS - 3;
                    end := N * PARTS;
             FI;
      FI;

      phi_plus_pi_2_times_number_of_waves_endint :=
        phi + PI_2 * number_of_waves * INT_END / SAMPLE_SIZE;
      phi_plus_pi_2_times_number_of_waves_startint :=
        phi + PI_2 * number_of_waves * INT_START / SAMPLE_SIZE;
      s_1e = sin(phi_plus_pi_2_times_number_of_waves_endint);
      s_1s = sin(phi_plus_pi_2_times_number_of_waves_startint);
      c_1e = cos(phi_plus_pi_2_times_number_of_waves_endint);
      c_1s = cos(phi_plus_pi_2_times_number_of_waves_startint);

      x_div_parts = start / PARTS;
      number_of_waves_mul_parts := number_of_waves * PARTS;

      remember_number_of_waves := number_of_waves;

      COMMENT Avoid division by 0
      IF number_of_waves = 0
        THEN number_of_waves := a very small number;
      FI;

      FOR x FROM start UPTO end - 1
        DO
          IF NOT (x := number_of_waves_mul_parts)
             AND NOT (x := (number_of_waves_mul_parts + 1))
             OR number_of_waves_mul_parts > PARTS
            THEN x_div_number_of_waves := x_div_parts / number_of_waves;
              mul := multiply / (PI * 
                (number_of_waves - x_div_parts * x_div_number_of_waves));

              bp_real[x] := mul *
                (c_1e * cosine_e[x]  -  c_1s * cosine_s[x]
                      + x_div_number_of_waves * (s_1e * sine_e[x]
                                              -  s_1s * sine_s[x]));
              bp_imag[x] := mul *
                (c_1e * sine_e[x]  -  c_1s * sine_s[x]
                      - x_div_number_of_waves * (s_1e * cosine_e[x]
                                              -  s_1s * cosine_s[x]));
            ELSE
              IF NOT ac
                THEN
                  real := 0;
                  imag := 0;
                  parts_mul_sample_size := PARTS * SAMPLE_SIZE;
                  parts_mul_number_of_waves := PARTS * number_of_waves;
                  rotate_count := 0;
                  wave_add_value := parts_mul_number_of_waves;
                  wave_current_value := phi / (PI_2 * parts_mul_sample_size);
                  IF wave_current_value < 0
                    THEN wave_current_value +:= parts_mul_sample_size;
                  FI;
                  IF wave_current_value >= parts_mul_sample_size
                    THEN wave_current_value -:= parts_mul_sample_size;
                  FI;

                  FOR u FROM 0 UPTO SAMPLE_SIZE - 1
                    DO
                      v := sine[wave_current_value];
                      real +:= v * cosine[rotate_count];
                      imag +:= v * sine[rotate_count];
                      rotate_count +:= x;
                      IF rotate_count >= parts_mul_sample_size
                        THEN rotate_count -:= parts_mul_sample_size;
                      FI;
                      wave_current_value +:= wave_add_value;
                      IF wave_current_value >= parts_mul_sample_size
                        THEN wave_current_value -:= parts_mul_sample_size;
                      FI;
                    OD
                  bp_real[x] := multiply * real / (SAMPLE_SIZE / 2);
                  bp_imag[x] := multiply * imag / (SAMPLE_SIZE / 2);
              FI;
          FI;

          bp_real[x] := -bp_real[x];
          bp_imag[x] := -bp_imag[x];

          x_div_parts +:= one_div_parts;
        OD;

      number_of_waves := remember_number_of_waves;

      IF NOT finished
        THEN
          v0 := bp_real[start  ] * bp_real[start  ] +
                bp_imag[start  ] * bp_imag[start  ];
          v1 := bp_real[start+1] * bp_real[start+1] +
                bp_imag[start+1] * bp_imag[start+1];
          v2 := bp_real[start+2] * bp_real[start+2] +
                bp_imag[start+2] * bp_imag[start+2];

          IF v0 > v1 AND v0 > v2
            THEN max_number_of_waves := start       / PARTS;
          FI;
          IF v1 > v0 AND v1 > v2
            THEN max_number_of_waves := (start + 1) / PARTS;
          FI;
          IF v2 > v0 AND v2 > v1
            THEN max_number_of_waves := (start + 2) / PARTS;
          FI;

          done := TRUE;
          ac := TRUE;
          IF max_number_of_waves > original_number_of_waves
             AND number_of_waves <= original_number_of_waves
             AND number_of_waves > 0
             AND NOT direction_up
            THEN ac := FALSE;
                 number_of_waves -:= .5 / PARTS;
                 phi +:= PI * .5 / PARTS;
                 IF number_of_waves >= .5 / PARTS
                   THEN done := FALSE;
                        direction_down := TRUE;
                 FI;
          FI;
          IF max_number_of_waves < original_number_of_waves
             AND number_of_waves >= original_number_of_waves
             AND NOT direction_down
            THEN ac := FALSE;
                 number_of_waves +:= .5 / PARTS;
                 phi -:= PI * .5 / PARTS;
                 IF number_of_waves <= N * PARTS - .5 / PARTS
                   THEN done := FALSE;
                        direction_up := TRUE;
                 FI;
          FI;
      FI;
    OD;

  store number_of_waves * PARTS;
  store amplify * sin(phi);
  store amplify * cos(phi);

A.3  Transformation: Partial peaks to total peaks

COMMENT Constant values
CONST PARTS = [1/( a)]



COMMENT This procedure combines corresponding peaks to one larger peak
PROC trans_pp_2_tpc(list_of_peaks)
  WHILE NOT ((pp := next(list_of_peaks)) = LAST_PEAK)
    DO
      tp_write := pp.position + 1;
      tp_read := tp_write_at;

      IF NOT tp_used[tp_read]
         AND (tp_used[tp_read - 1] OR tp_used[tp_read + 1])
        THEN
          IF tp_total[tp_read - 1] > tp_total[tp_read + 1]
            THEN tp_read -:= 1;
            ELSE tp_read +:= 1;
          FI;
          tp_real[tp_write] := tp_real[tp_read];
          tp_imag[tp_write] := tp_imag[tp_read];
          tp_used[tp_write] := TRUE;
          tp_total[tp_write] := tp_total[tp_read];
          tp_position[tp_write] := tp_position[tp_read];

          tp_real[tp_read] := 0;
          tp_imag[tp_read] := 0;
          tp_used[tp_read] := FALSE;
          tp_total[tp_read] := 0;
          tp_position[tp_read] := 0;
      FI;

      tp_strength := sqrt(pp.real * pp.real + pp.imag * pp.imag);

      tp_real[tp_write_at] +:= pp.real;
      tp_imag[tp_write_at] +:= pp.imag;
      tp_used[tp_write_at] := TRUE;
      tp_total[tp_write_at] +:= tp_strength;
      tp_position[tp_write_at] +:= tp_strength * (pp).position;
    OD

  FOR count FROM 0 UPTO N * PARTS + 2 - 1
    DO
      IF tp_used[count]
        THEN tp_position[count] /:= tp_total[count];
             tp_position[count] /:= PARTS;
      FI;
    OD;

  store tp_position[];
  store tp_real[];
  store tp_imag[];

Appendix B
A sample implementation

In this chapter, the components of a sample implementation of the described analysis and transformation steps are described.

The implementation is not fast enough to run real-time (for normal computers). To still get an idea of what a real-time display would be like, it is possible to combine a number of produced images to some kind of 'film', if possible including the original sound. (It is possible to use moving GIFs, but then the sound is left out.) Note that for testing in a real environment, the implementation needs to run a lot faster. This can partially be achieved by using pipelining (different steps of the transformation can be done simultaneously; after completion the resulting data is passed through the pipeline to the next step), although this causes a small delay (if 10 transformations are used, then the image for an interval is not shown before 10 new intervals have passed). Using a special purpose chip can speed up the calculation also, but it would have to be designed and produced first.

Some values that are used in the descriptions in this chapter are:
sample_size: The number of values in the sample interval.
N: The number of (whole) values in the Circular Fourier Transformed.
a: The step size with which the Circular Fourier Transformed is calculated.

B.1  File formats

In this section, the file formats that are used in the implementation are described. Also, different stages in an example analysis are shown.

Currently, all values that are used in the VOS package are of the same type, which will be referred to as Rvos. This type corresponds to the float type that can be found in many programming languages. In the file formats that are described here, the values are represented as floats (like 1.0632, -5426.7642 or 156). No other representations are allowed; other programs that communicate with VOS through these files or that replace VOS programs need to use the same representation.9

To make things clearer, the same example will be used throughout this section. This example is a sample interval, which consists of the following sinusoids:
A sinusoid with 4.25 waves in the interval, amplitude 40.
A sinusoid with 8.5 waves in the interval, amplitude 30.
A sinusoid with 12.75 waves in the interval, amplitude 20.
A sinusoid with 17 waves in the interval, amplitude 10.
sample_size is set to 250, N is set to 125, a is set to [1/ 16].

B.1.1  The audio file format (.au)

This format is used to store sample intervals.

File buildup

.au file =
[
sample0
sample1
sample2
:
samplesample_size - 1
],
with samplek Î Rvos.

Example file

x_au.gif

  

[
 52.84521 
 49.5074 
 44.47116 
 38.84552 
 33.58953 
 29.39153 
 26.6042 
 25.24154 
 25.03303 
 25.52038 
 26.17635 
 26.52206 
 26.22152 
 25.13728 
 23.33978 
 21.07197 
 18.67947 
 16.52277 
 14.89087 
 13.93461 
 13.63336 
 13.80148 
 14.13257 
 14.27141 
 13.89789 
 12.80442 
 10.94923 
 8.47304 
 5.67362 
 2.941488 
 .6685607 
-.852634 
-1.518153 
-1.470504 
-1.110286 
-1.054626 
-2.045891 
-4.822564 
 .
 .
 .
]

B.1.2  The Circular Fourier Transform file format (.cf)

This format is used to store the results of a Circular Fourier Transform.

File buildup

.cf file =
[
real0
imag0
reala
imaga
real2a
imag2a
:
realN - a
imagN - a
],
with realk Î Rvos,
imagk Î Rvos,
(F)(na) = realna + i   imagna, n Î 0, 1, ..., [(N)/( a)]-1.

Example file

x_xf.gif

  

[
3.774420
0.000000
3.700074
0.320861
3.490409
0.579477
3.183239
0.724500
2.834301
0.724533
2.507613
0.573791
2.264318
0.293380
2.151911
-0.072042
2.195668
-0.461722
2.393783
-0.809177
2.717085
-1.053726
3.113471
-1.151291
3.516401
-1.082570
3.856104
-0.857179
4.071664
-0.512922
4.121960
-0.110208
3.993578
0.277600
3.704263
0.576404
3.301174
0.724165
 .
 .
]

B.1.3  The partial peaks file format (.pp)

This format is used to store the found partial peaks.

File buildup

.pp file =
[
position0
real0
imag0
position1
real1
imag1
position2
real2
imag2
:
],
with positionk Î Rvos,
realk Î Rvos,
imagk Î Rvos,
realk + i   imagk is a partial peak at position a  positionk.

Example file

x_pk.gif

  

[
68.000000
0.754889
2.401091
68.000000
0.702137
2.249803
68.000000
0.652960
2.108081
68.000000
0.607117
1.975318
68.000000
0.564575
1.850896
136.000000
1.738537
0.648327
68.000000
0.519747
1.734613
136.000000
1.631269
0.600671
68.000000
0.478315
1.625690
136.000000
1.530611
0.556528
68.000000
0.440028
1.523656
 .
 .
]

In the image, the partial peaks are displayed in the order that they were found, from bottom to top. The width with which a partial peaks is drawn corresponds to its strength (this was done to spare some space). This way the way that the algorithm that finds the partial peaks works can be seen: Each time, the highest peak is found and reduced a little. Therefore, the leftmost peak (which is the highest), is the first to be found, and it is found several times before another peak occurs in the list. The peak that has the least strength, the rightmost peak, is the last to occur in the list. Also, one can see that the strengths of each following partial peaks diminishes rapidly compared to the latter one. Note that especially for the leftmost total peak, the positions of the partial peaks that are found are not precisely the same.

B.1.4  The total peaks file format (.tp)

This format is used to store the found total peaks. There are three different subtypes:
In an added total peaks-file (.tpa), only partial peaks that have exactly the same position are added to a total peak (2.3).
In a close total peaks-file (.tpc), all partial peaks that are very close to each other are added (only small changes for the peak position are allowed for each subsequent partial peak that belongs to the same total peak), which gives more precise peak positions and strengths (2.3).
Fourier total peaks-files (.tpf) are different from the other two file types. In .tpf-files, standard Fourier-transforms are stored.

File buildup

.tp file =
[
position0
real0
imag0
position1
real1
imag1
position2
real2
imag2
:
],
with positionk Î Rvos,
realk Î Rvos,
imagk Î Rvos,
realk + i   imagk is a total peak at position positionk.

Example file: .tpa file

x_wp.gif

  

[
4.187500
0.116187
0.442046
4.250000
9.812099
37.533306
4.312500
-0.050630
0.580375
8.500000
28.276247
9.407896
12.750000
15.390075
-12.444594
17.000000
-1.041886
-9.748543
]

Example file: .tpc file

x_wp2.gif

  

[
4.187500
0.116187
0.442046
4.250925
9.761469
38.113678
8.500000
28.276247
9.407896
12.750001
15.390075
-12.444594
16.999998
-1.041886
-9.748543
]

The difference between the two methods can be seen by looking at the leftmost peak. The partial peaks that correspond to this total peak were not exactly the same. Due to this, this peak is split into three different peaks in the .tp-file. The result in the .tpc-file is not perfect either, but at least one of the three peaks is added to the largest one in this case. The differences between these two methods are often greater.

Example file: .tpf file

x_wpf.gif

  

[
0.000000
1.887210
0.000000
1.000000
3.993578
0.277600
2.000000
4.834719
0.762332
3.000000
7.452860
2.161197
4.000000
32.167206
16.499331
5.000000
-9.109533
-8.576137
6.000000
-3.034911
-6.121105
7.000000
-0.806739
-7.597823
8.000000
3.738009
-19.256525
9.000000
-8.171220
17.218431
10.000000
-4.348225
5.252337
11.000000
-4.311576
2.849575
12.000000
-7.261050
1.532696
 .
 .
 .
]

B.1.5  The tone file format (.tn)

This format is used to store a tone, a base frequency with all its harmonics.

File buildup

.tn file =
[
base_frequency
strength_base_frequency
strength_harmonic1
strength_harmonic2
strength_harmonic3
strength_harmonic4
:
],
with base_frequency and strength_harmonick (and strength_base_frequency) Î Rvos.

Example file

[
4.167447
39.800907
29.800245
19.791977
9.804061
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
 .
 .
 .
]

B.1.6  The block positions file format (.bp)

This format is used to store the positions of the blocks in the block display.

File buildup

.bp file =
[
base_frequency
block1_from_x
block1_to_x
block1_level
block1_strength
block2_from_x
block2_to_x
block2_level
block2_strength
block3_from_x
block3_to_x
block3_level
block3_strength
:
],
with base_frequency and blockk_from_x, blockk_to_x, blockk_level_x and blockk_strength Î Rvos.

Example file

[
4.167447
-16.500000
15.500000
0.000000
39.800907
-33.000000
-1.000000
1.000000
29.800245
-1.000000
31.000000
1.000000
19.791977
-49.500000
-25.500000
2.000000
9.804061
]

B.1.7  The pixels file format (.px)

This format is used to store pixel positions and values. .px-files are the common format. Two subtypes are .pxs-file for pixels in the spiral display, and .pxb-files for pixels in the block display.

File buildup

.px file =
[
pixel1_x
pixel1_y
pixel1_value
pixel2_x
pixel2_y
pixel2_value
pixel3_x
pixel3_y
pixel3_value
:
],
with pixelk_x, pixelk_y and pixelk_value Î Rvos. Both the x and y value can also be negative. Higher values for x means a position further to the right, higher values for y mean a position further downwards.

Example file: .pxs-file

x_pxs.gif

  

[
39.000000
135.000000
26.269814
38.000000
136.000000
78.809441
39.000000
136.000000
65.674530
40.000000
136.000000
39.404720
41.000000
136.000000
13.134907
37.000000
137.000000
144.483963
38.000000
137.000000
118.214157
39.000000
137.000000
105.079254
40.000000
137.000000
78.809441
41.000000
137.000000
52.539627
42.000000
137.000000
26.269814
43.000000
137.000000
13.134907
 .
 .
]

Example file: .pxb-file

x_pxb.gif

  

[
43.000000
14.000000
188.686920
44.000000
14.000000
188.686920
45.000000
14.000000
188.686920
46.000000
14.000000
188.686920
47.000000
14.000000
188.686920
48.000000
14.000000
188.686920
49.000000
14.000000
188.686920
50.000000
14.000000
188.686920
51.000000
14.000000
188.686920
52.000000
14.000000
188.686920
53.000000
14.000000
188.686920
54.000000
14.000000
188.686920
 .
 .
]

B.1.8  The wave file format (.wav)

This format is used to store sample data. It is used to communicate with other (non-VOS) programs.

File buildup

.wav file =
[
sample0
sample1
sample2
:
],
where samplek is a 16-bit word, stored in two bytes, in the order LSB:MSB. The file must be monaural, and sampled at a fixed sample rate (44.1 kHz is suggested but not obligatory).

Example

A well-known example for this format is the RIFF Wave-format (MS Windows .WAV-files), if no compression is used and the file stored is 16-bit and monaural. This file has a 22 words (44 bytes) header, followed by data in the format that was described above. (This means that files of this type can used, if the first 22 words can be discarded. See B.2.1.)

In order to keep things easy, the name wave file will be used throughout the rest of this appendix. Note however that these file do not have a header.

B.1.9  The image file format (.tga)

This format can be used to store images. It is used to export resulting images to other (non-VOS) programs. Any program that can read 24-bit color Targa-files can be used to view and use these image files.

File buildup

See Targa documentation ([Twe97]).

B.2  Forward transformations

In this section, the programs that perform a forward transformation are described. With forward transformation is meant here, that it is a transformation from the direction of the original sample data towards the direction of an image.

These programs can be called from the prompt of the operating system. This is tested only for DOS and UNIX, but it will probably also work for other operating systems (and if not, another solution can be provided, see B.6.1).

The programs are described in the order in which they will usually be called during a forward transformation.

B.2.1  Wave to audio format (wav_au__)

This transformation is used to transform a part of a wave file (B.1.8) to an audio format file (B.1.1).

Usage

wav_au__ <sample_size> <N> <PARTS> <input_file> <output_file> <start_pos>

  .<sample_size>     is the number of values in the resulting audio file.

.<N>               is the number of whole values in the Circular Fourier Transform (not used).

.<PARTS>           is the number of values between two standard Fourier Transform values, [1/( a)] (not used).

.<input_file>      is the name of the wave file that is read from.

.<output_file>     is the name of the audio file that is written to.

.<start_pos>       is the (word) position in the input file at which to start reading.

This transforms <sample_size> words of the wave file <input_file>, starting at (word) position <start_pos>, to the audio file <output_file>.

Note: If the input file has a header, this should be skipped. This can be done by using the right value for <start_pos>. For example for RIFF Wave-files, which have a 22 words header, 22 should be added to the wanted start position in the sample data to skip the header.

Examples

wav_au__ 1000 500 8 test1.wav test1_00.au 22
wav_au__ 1000 500 8 test1.wav test1_01.au 1022
wav_au__ 1000 500 8 test1.wav test1_02.au 2022
wav_au__ 1000 500 8 test1.wav test1_03.au 3022

wav_au__ 250 125 16 test2.wav test2_00.au 22
wav_au__ 250 125 16 test2.wav test2_01.au 272
wav_au__ 250 125 16 test2.wav test2_02.au 522

B.2.2  Wave to audio format (au__cf__)

This transformation is used to transform an audio file (B.1.1) to a Circular Fourier Transform file (B.1.2).

Usage

au__cf__ <sample_size> <N> <PARTS> <input_file> <output_file>

  .<sample_size>     is the number of values in the original audio file.

.<N>               is the number of whole values in the Circular Fourier Transform.

.<PARTS>           is the number of values between two standard Fourier Transform values, [1/( a)]. If this value is 1, the result is a standard Fourier Transform.

.<input_file>      is the name of the audio file that is read from.

.<output_file>     is the name of the Circular Fourier Transform file that is written to.

This transforms the audio file <input_file> of size sample_size to its Circular Fourier Transform, which is calculated from position 0 to N, with step size [1/( PARTS)], and stores the result in the Circular Fourier Transform file <output_file>.

Examples

wav_au__ 1000 500 8 test1_00.au test1_00.cf

wav_au__ 250 125 16 test2_03.au test2_03.cf

B.2.3  Fourier Transform to partial peaks format (cf__pp__)

This transformation is used to transform a Circular Fourier Transform file (B.1.2) to a file containing a list of partial peaks (B.1.3).

Usage

__pp__ <sample_size> <N> <PARTS> <input_file> <output_file>@
<leftover_file> <precision|-halt_value>

  .<sample_size>     is the number of values in the original audio file.

.<N>               is the number of whole values in the Circular Fourier Transform.

.<PARTS>           is the number of values between two standard Fourier Transform values, [1/( a)].

.<input_file>      is the name of the Circular Fourier Transform file that is read from.

.<output_file>     is the name of the partial peaks file that is written to.

.<leftover_file>   is the name of the Circular Fourier Transform file that the leftover Circular Fourier Transform data is written to.

.<precision|-halt_value>  determines at what moment the finding of partial peaks is ended.
precision Is a value between 0 and 1. The first found (and usually strongest) peak strength is multiplied by this value. As soon as a peak strength is found that is below this strength, the search is abandoned.
halt_value Is a value above 0 (preceded by a - to separate it from precision). As soon as a peak strength is found that is below this value, the search is abandoned.

This takes the Circular Fourier Transform of size sample_size, which is calculated from position 0 to N, with step size [1/( PARTS)] from file <input_file>, finds partial peaks until the required precision or halt_value is reached, and then stores the result in the partial peaks file <output_file>. The Circular Fourier Transform data that is left over (finding peaks is stopped before all values in the transform are reduced to 0) is written to the Circular Fourier Transform file <leftover_file>.

Examples

cf__pp__ 1000 500 8 test1_00.cf test1_00.pp test1_00.cf_ .01

cf__pp__ 1000 500 8 test1_00.cf test1_00.pp test1_00.cf_ -200

cf__pp__ 250 125 16 test2_03.cf test2_03.pp test2_03.cf_ .0025

Index (showing section)

absolute value, 21
added total peaks, 53
amplitude, 25
area, 60

backward transformation, 125

base frequency, 60
     finding, 62
block display, 60, 61, 64
     level, 64
     perceptive frequency, 65
     placement, 65
     shifting, 65
         logarithmic function, 65
     square, 64

centre of gravity, 12

centre of the circle, 10
Circular Fourier Transform, 19, 85, 87
     absolute value, 21
     disadvantage, 54
     formula, 42, 51
     speed, 54
close total peaks, 53
complex space, 30
continuous, 19, 49

deflection, 13

directionless transformation, 130
discrete, 10
display
     block, 60, 61, 64
         level, 64
         perceptive frequency, 65
         placement, 65
         shifting, 65
         square, 64
     rectangle, 58
         disadvantage, 58
     spiral, 57, 59
         music, 57
         octave, 57
displaying, 57
DOS, 147

ear, 57, 60

elastic, 12
eye, 57, 60, 64, 65

fading, 28

file format, 96
     .au, 97
     .bp, 105
     .cf, 98
     .pp, 99
     .pxb, 106
     .pxs, 106
     .px, 106
     .tga, 109
     .tn, 104
     .tpa, 101
     .tpc, 101
     .tpf, 101
     .tp, 101
     .wav, 108
     audio, 97
     block positions, 105
     Circular Fourier Transform, 98
     image, 109
     partial peaks, 99
     pixel, 106
     Targa, 109
     tone, 104
     total peaks, 101
     wave, 108
finite, 13
finite signal interval, 13
forward transformation, 110
Fourier Transform, 9, 10
     Circular, 19
         absolute value, 21
     discrete, 10
         simplified model, 20
     highest value, 31
     problems, 16
frequency
     area around, 60
     base, 60
         finding, 62
     determination, 24
     harmonic, 60
     multiple, 60
     perceptive, 65
     shifting, 60
frequency space, 9
function, 140

harmonic, 60

hearing, 59
human ear, 57, 60
human eye, 57, 60, 64, 65
human hearing, 59

identifier word

     arguments, 145
     help, 145
     nowarn_shell, 145
     remark, 145
     script, 145
     shell, 145
     vos, 146
image, 25
implementation, 95
induction, 13
infinite, 13
integral, 49
interval
     signal, 10
         finite, 13
iteration, 52

level, 64

line variable, 148
linear placement, 58
logarithmic function, 65
lowest point, 20

main peak, 24

multiple rotations, 12
music, 57

noise, 33

     recognition, 54
     separation from other sounds, 54
     splitting off, 54
Nyquist Theorem, 12

octave, 57

     fluent transition, 59
     following, 58
     preceding, 58
     wrong, 61, 64
operating system dependency, 147
overlapping, 10

partial peak, 32, 87, 93

peak, 20
     main, 24
         top, 24
     partial, 32, 87, 93
         adding, 53
     position, 32
         precise determination, 53
     spreading, 54
     surrounding, 28
     total, 93
         added, 53
         close, 53
perceptive frequency, 65
periodical, 10
periodical signal, 10
phase, 13
phase angle, 13
phoneme, 64, 66
program
     add, 153
     crlf2spc, 154
     divide, 153
     fillin, 153
     max, 153
     min, 153
     multiply, 153
     subtract, 153
projection, 12
pseudo-code, 42

rectangle display, 58

     disadvantage, 58
removal
     method R1, 29
reverse average, 28
rotation, 10
     multiple, 12

sample, 85

sample implementation, 95
sample rate, 13
sample time, 13
script, 144
     add_file.vos, 148
     arg_line.vos, 149
     cat.vos, 149
     copy.vos, 149
     current.os, 147
     delete.vos, 149
     erase.vos, 149
     move.vos, 150
     perform.vos, 149
     print.vos, 150
     run_file.vos, 149
     s_cat.vos, 150
     s_copy.vos, 150
     s_delete.vos, 150
     s_erase.vos, 150
     s_move.vos, 150
     set.vos, 149
     set_line.vos, 149
     set_perf.vos, 149
     tmp.os, 148
     txt_file.vos, 149
script file, 145
Short Term Fourier Transform, 19
signal, 9
     adding, 16
     periodical, 10
signal interval, 10
simplification, 13
smearing, 18
space
     complex, 30
     frequency, 9
     time, 9
spiral display, 30, 57, 59
     music, 57
     octave, 57
square, 64
step function, 21
surrounding peak, 28

table, 41, 52

temporary directory, 147
time space, 9
tone, 60
     finding frequencies, 60
     unstable, 61
     visual representation, 64
top, 24
total, 93
total peaks
     added, 53
     close, 53
tradeoff, 9
transformation, 110, 125, 130
     add__au, 131
     add__tp, 135
     au__cf__, 113
     au__wav_, 129
     bp__pxb_, 123
     brdr_au, 132
     brig_px, 136
     cf__pp__, 114
     cf__tpf_, 116
     decr_px, 137
     getabopx, 141
     getmaxpx, 142
     getnumpx, 143
     incr_au, 133
     limi_px, 138
     move_px, 139
     mult_au, 134
     pp__tpa_, 117
     pp__tpc_, 118
     px__tga_, 124
     tn__bp__, 122
     tn__tp__, 126
     tp__au__, 127
     tp__pxs_, 119
     tp__tn__, 120
     wav_au__, 111
     add audio files, 131
     add constant value to audio files, 133
     add total peaks files, 135
     audio to Circular Fourier Transform, 113
     audio to wave, 129
     backward, 125
     block positions to block pixels, 123
     change pixel brightness, 136
     Circular Fourier Transform to Fourier total peaks, 116
     Circular Fourier Transform to partial peaks, 87, 114
     decrease pixel brightness, 137
     directionless, 130
     forward, 110
     limit pixels, 138
     maximum pixel brightness value, 142
     move pixels, 139
     multiply samples by a constant value, 134
     number of pixels, 143
     number of pixels with a brightness above a certain value, 141
     partial peaks to added total peaks, 117
     partial peaks to close total peaks, 118
     partial peaks to total peaks, 93
     pixels to Targa, 124
     sample to Circular Fourier Transform, 85
     smooth borders to audio files, 132
     tone to block positions, 122
     tone to total peaks, 126
     total peaks to audio, 127
     total peaks to spiral pixels, 119
     total peaks to tone, 120
     wave to audio, 111

uncertainty principle, 9

unfolding, 46
Unix, 147
unstable tone, 61

variable, 148

vector, 10
VOS sample implementation, 95

wave, 13

wave-deviation, 20
wave-deviation-deflection, 21
white noise, 33
wrong octave, 61


Footnotes:

1 For m = 0, xm = ¥, so [(2p)/( xm)] = 0. Therefore this division by zero causes no problems.

2 ëq û is q rounded downwards, éq ù is q rounded upwards.

3 Nyquist Theorem:
Accurate reproduction of a signal demands that the sampling frequency be at least twice the rate of the highest frequency in the source signal.

4 A wave is defined here as a one period of a sinusoid. The number of waves within an interval is [(fn)/( r)], where f is the frequency of the sinusoid (in number of waves/second), r is the sample rate (the number of measured values/second) of the signal, and n is the number of measured values within the interval. The sample time is T = n/r.

5 This is the reason why I assumed that this transform was not equal to another special Fourier Transform

6 One could argue that this could be postponed until a real time version of the program needs to be written. This would however make it almost impossible to use the current program for even the smallest tests, so it really needs to be done now.
One could also argue that it is also possible to gain speed by using a faster computer or maybe a different programming language. However, to get a good idea of the properties and possibly problems of displaying sounds, it is also necessary to calculate series of images for longer signals, which would still take quite a large amount of time. (For example, a speedup of a factor ten is unsufficient is hundreds of images need to be calculated. By deriving and simplifying a formula, far greater gains can be obtained.

7 The values -1/2 and M - 1/2 have been chosen arbitrarely and give good results. Whether these results are exact or only a good estimate, is not really important here and is not discussed any further. Note that if the values 0 and M - 1 are chosen, a better optimisation is possible. The results however get a lot worse, so these values cannot be used.

8 This could have been corrected before, but since the program gets bigger because of this and it does not affect the optimising, it has been postponed.

9 If it is really necessary or if a speedup is required, it is possible to change this format by changing only one procedure and one function in the file vos_io.c.


File translated from TEX by TTH, version 2.24.
On 23 May 1999, 22:05.