Audio Glasses Visualizing sound for the deaf
DOWNLOAD THESIS AS A POSTSCRIPT FILE (zipped, 4.3 MB) (PREFERRED) - Warning: Size after unzipping is about 160 MB or READ THESIS ONLINE (HTML version) Symbols in the HTML version may be displayed incorrectly on some systems. If you are capable of viewing PostScript files, download the PostScript version! The HTML-file was automatically generated from a LaTeX-file by the program TTH, version 2.24. DEMO .AVI FILES RINGING DOORBELL VOICE SAYING "SHT!" VOICE SAYING "OI UA" VOICE SAYING "ONE MAN LIVING ON AN ISLAND" INSTRUMENTAL MUSIC Not available online due to copyright issues In order to understand these AVI films, take a look at the legenda: PART 1 and PART 2. DOWNLOAD THE SOFTWARE MS-DOS executables C-sources (tested for DOS, Unix and Linux with GNU C-compiler). Please send your comments to hans.van.zutphen@home.nl |
Vision on Sound
Development of a visual representation of auditive information for the deaf
MASTER THESIS IN COMPUTER SCIENCE, NUMBER 446
SEPTEMBER 11, 1998
Supervisor: Dr.Ir. Huub van Thienen University of Nijmegen, the Netherlands |
Hans van Zutphen 9305491 hansvz@sci.kun.nl |
Several solutions for some of these problems have been developed.
For example, there are devices that show a signal as a light that can
be placed somewhere in the house. But this is only possible if the
source of the signal can be tapped from the source so a light can be
connected to it. It is not a solution if someone is in another house or on the
street.
The consequences of this are that, in spite of these solutions, people with
hearing problems still face more danger on the street. Also,
as mentioned above, it is still impossible for them to hear whether there is
someone close to them, which can make it frightning to go outside,
especially when it's dark.
In some countries deaf people have even been shot by the police, because
they did not respond after something was shouted at them.
To solve these and other problems, a different approach is necessary.
A solution that is already available is surgery: A special device is 'plugged'
into a part of the brain. The idea is that a part of the brain starts
recognising the incoming signal.
For this, a five-year training is necessary,
during which the patient learns to observe things like the difference
between music and speech - without being able to recognise much of
it. Very few of such operations are performed. The mere fact that
people are willing to have an operation like this and to train for
five years, shows how important even a small improvement in registering
sounds is to them - and how poorly other solutions perform.
In this thesis, a way of displaying auditive information (sounds), such that as much information as possible about the original signal can be seen in it, is developed. Eventually this could be placed in for example a pair of glasses, or in a watch or bracelet. Hopes are that it is possible to learn to recognise the images in the display in the same way as this is done after the surgery. The results might be better than with the surgery since the user can also try to understand what he or she sees, and several displays can be shown simultaneously. But even if the results are a lot worse, the threshold to use it will be much lower than that to have surgery, which gives this an important advantage over the other method.
Since the software that is to be written is used for evaluation purposes only, some real-time and other requirements are left out in this thesis. If the software is ever to be used in a real environment, time and other constraints need to be added.
A complication is that sounds can change very rapidly. Therefore it is
necessary to analyze very short pieces of sound, for example pieces
of 0.02 seconds.
Unfortunately, according to the
uncertainty principle
([Pol97]),
there is always a tradeoff
between the precision with which a
frequency
can be determined and the precision with which the time at which that
frequency occurs can be determined. Since the size of the signal intervals
is very small (for example 0.02 seconds), the time at which a frequency
occurs is determined with a rather high precision.
Therefore, the precision with which the frequency can be determined
is rather small.
In this chapter, a new signal-processing technique is developed from
the well-known Fourier Transform
([Pee95], [Sch94]).
The target of the Fourier Transform is to transform the signal to
a different space (frequency space
instead of
time space
),
with the possibility of a backward transform back to time space.
Note that the latter is not important for displaying the signal in
frequency space.
Therefore, the analysis that is to developed here does not need to
be reversible.
The required properties for the analysis that is to be developed here, are:
To develop a new analysis with these properties, it is necessary to use a different description of the Fourier Transform than what is usually used. This description will be given in the next section.
In the following description, it is assumed that there is
a finite signal interval
, that
is a finite sequence of samples, of an even size n.
An often used solution to calculate the Fourier Transform of such a signal
is to consider it as being periodical
(each position p + n corresponds
to position p).
This way, a finite signal interval can be seen as an infinite periodical
signal
with period n.
For each value xm = n/m, m Î 01, 1, 2, .. 1/2n,
do the following:
Step through the signal with step size 1.
Also, rotate around the circle, with steps of [(2p)/( xm)] for each
step through the signal.
For each sample in the infinite signal and corresponding direction to a position on the circle,
draw a vector
from the centre of the circle
to this direction with as strength
the sample value at the current position in the signal.
For x1 this means that there is exactly one rotation
around the circle for
any signal interval [p, p+n (which consists of n elements).
Therefore, the direction of the vector for position p+n of the signal
is always identical to that for position p.
For x2 the same goes for any partial signal interval [p, p+n/2 (so
for values p + 1/2n and p); for xm this is
(approximately, see below) true for any (partial) signal interval [p, p+n/m .
Note that partial intervals do not always overlap
precisely.
Small shifts for interval parts [p, p + n/m (n
and m as defined above) exist if n is not dividable by m.
In such cases, for each a Î N, the direction for position p+[(an)/( m)] of the
interval on the circle is always in
[p + ë[(an)/( m)]û, p + é[(an)/( m)] ù].2
If [(an)/( m)] is a whole number, the direction for position
p corresponds precisely with that for p + [(an)/( m)]. For the
total interval
Èi = 0m - 1 [p+[(in)/( m)], p+[((i+1)n)/( m)] > = [p, p+n ,
[(an)/( m)] is n (since a = m), which is a whole number,
so for any valid value of m, the vector for the value at position p+n of
the signal interval is always identical to that for position p.
For any value xm = n/m, m Î 0, 1, 2, .. 1/2n, the
vectors for the signal interval [p, p+n are placed
in exactly m rotations around the circle.
For m = 1/2n, each subsequent vector for the corresponding
subsequent sample from
the interval is directed exactly opposite to the preceding one.
(x1/2n = 2, so the rotation per step is
[(2p)/ 2] = p, exactly a half circle.)
If m is larger than 1/2n, that would mean that the size of
the rotation around the circle would in fact get smaller, and in the
opposite direction. The values that would be obtained that way are in
fact identical to those that are already known from smaller values of
m, so it is not necessary to try larger values than 1/2n.
This corresponds to the Nyquist Theorem
3 ([RMY]).
Instead of drawing vectors, which can get very complicated, the idea can
be displayed graphically in another way (although the idea stays the same).
Imagine the signal interval to be a piece of elastic
.
Assume for a moment that the values of all samples in the sequence are
between -1 and +1.
Draw each sample value on the elastic. Use white for +1, black for -1,
and levels of gray for values in between.
Now, the light values on the elastic form an image of the sample values.
Next, the elastic can be wrapped around the circle, as is displayed (for
one rotation) in fig. 2.1.
Note that in this image, the idea of considering the interval as
periodical is expressed very clearly.
Multiple rotations
can be obtained by stretching the elastic further.
To keep the image clear, each subsequent rotation is drawn a bit more to
the outside, so the rotations can be separated from each other.
(It might seem that the interval is shown as being limited and non-periodical.
In fact all rotations should be projected
at the same place, in which case
the starting point and the end point of the interval would connect.)
An example of this is shown in fig. 2.6.
Note that in the images values on the piece of elastic are shown continuously,
while there should be a finite number of vectors.
This is a simplification for clarity of the images; in calculations and
formulas a finite number of vectors is used.
Since the piece of elastic stands for a number of vectors that
originate from the centre of the circle, it is possible
to determine the centre of gravity
of these vectors.
This is defined here as the average of the vectors (the sum of all
vectors divided by the number of vectors).
In the images this can be determined by the amount of light and
darkness that is present in each direction in the rotations of the piece of
elastic.
White stands for a vector in the current direction,
black stands for a vector in the opposite direction,
gray stands for a zero-vector.
In the images the centre of gravity is drawn as a little ball with
a circle around it.
The position of the centre of gravity in the drawn coordinate system
for m rotations is equal to the complex value of the mth value
of the Fourier Transform of the signal.
The distance to the centre of the circle corresponds to the amplitude
of the corresponding frequency in the signal.
Both the distance to the centre and the direction from the centre can
usually be determined very easily for simple signals.
In figures 2.2 - 2.9 is shown how the first value of
the Fourier Transform (with m = 1) can be calculated.
For other values of m it is a little more complicated (fig. 2.6),
but the principle stays the same.
In all the images, for the stretched signal (left) and the
signal as it is wrapped around the circle (right), the arrow starts and
ends at the same position.
In the description given so far, a very important simplification is possible.
So far, the signal was assumed to be infinite
and periodical,
and the number of rotations around the circle was also infinite
(although only a part of it was displayed).
It can be easily seen that the result is exactly the same if instead of
an infinite signal, only a finite
signal interval
[p0, p0+n is
wrapped around the circle.
This follows from the fact that for each xm, the vector for the value
of each position p is identical to that for position p + n.
By induction
this is also identical for each position p + r*n, r Î N.
The centre of gravity can therefore also be determined by wrapping only
one period around the circle.
This simplification
also follows from the Fourier-theory, and is used
for example in programs that calculate the Fourier Transform of a signal
of a finite size.
In the images, a number of properties can be seen.
Figure 2.2 shows that if a signal exists of exactly one
wave
4
,
this wave will produce a maximal
deflection
(the distance between the centre
of gravity and the centre of the circle)
for one rotation.
Figures 2.3 - 2.4 show that this is not the case
if the signal is a sinusoid with a (whole) larger number of waves.
In fig. 2.5 is shown, that when the wave starts at a different
position (so if the start phase
is different),
the deflection
has the same strength, but a different direction.
The direction of a value in the Fourier Transform is the phase angle
of the starting point of corresponding wave in
the interval. Note that this means that for determining the phase angle, the position at which
the interval starts is important.
In fig. 2.6 is shown what happens when the number of rotations
around the circle is identical to the number of waves within the
interval.
This is shown more elaborately in fig. 2.7.
In 1B and 2B is shown how the signal is wrapped around the circle
when the number of rotations is identical to the number of waves within
the interval.
Clearly, the light and dark areas are at the same direction for each rotation,
so they all add up.
Therefore, the deflection in this case is always maximal.
In 1A and 2A is shown what happens when the number of rotations
around the circle is one less than the number of waves in the interval.
As will be described in more detail below, each subsequent wave will be
projected [1/( number of waves)] rotation earlier,
which accounts for the fact that in the images the white and black
areas shift in each rotation, while the total shift is one rotation (the
end matches the beginning), contradictory to the rotation direction.
(If m waves are displayed in m-1 rotations, each wave's size is
[(m-1)/( m)] rotations. Each wave therefore gets 1/m
rotations less than an entire rotation.
There are m waves, so the total amount of missed rotations is
m * 1/m = 1.).
In every direction, the amount of white and black is identical. The deflection
is therefore 0.
In 1C and 2C is shown what happens when the number of rotations
around the circle is one less than the number of waves.
In this case, every subsequent wave is projected
[1/( number of waves)] rotation later, which accounts for
the fact that in the images the white and black areas shift in each rotation,
while the total shift is one rotation (the end matches the beginning),
but this time in the direction of the rotation.
Here also the amount of black and white in each direction is the same,
so the deflection is 0 again.
(The explanation is almost identical to that of 1A and 2A.)
In 1D and 2D is shown what happens when the number of rotations
around the circle is two less than the number of waves.
In this case, each subsequent wave is projected
[2/( number of waves)] rotations later, which accounts for
the fact that in the images the white and black areas shift in each rotation,
while the total shift is two rotations (the end matches the beginning
again, but so does the centre if the number of waves is even (2D)),
as before in the direction of the rotation.
(The explanation again is almost identical to that of 1A and 2A.)
As described before and visible from the images, for each number of
rotations that is not equal to the number of waves (if both are whole
numbers) is 0.
Figure 2.8 shows what happens when several signals are added
.
In 2.8, the signals from fig. 2.2 and 2.4
are added. The centre of gravity is the sum of the centres of gravity of
the two separate signals. This means that adding a signal with
centre of gravity 0 has no effect.
Figure 2.9 shows one of the problems of the Fourier Transform
.
Only frequencies in the signal that are such that the number of waves in
the interval with size n is a whole number, correspond with a value of
the Fourier Transform.
This means that only in these cases, a sinusoid in the signal is represented
by only one non-zero value in the Fourier Transform.
If the number of waves is not precisely a whole number, a sinusoid is
represented by multiple non-zero values in the Fourier Transform.
Also, these multiple non-zero values have smaller absolute values than the one
non-zero value in the first case.
A sinusoid with a number of waves that lies between m and m+1 will
cause lower values for xm and xm+1, instead of one higher value
at xm or xm+1 if the number of waves would have been exactly m or
m+1.
Also, for all values xi, the closer i is to the number of waves, the
higher the corresponding value in the Fourier Transform is. Therefore a rather
large area of the Fourier Transform around the number of waves will have
non-zero values that are rather high (this will be referred to as
smearing
).
This means that there can be huge differences between Fourier Transforms of
almost identical signals. For example, a signal with exactly 2000 waves looks
totally different from a signal with 2000.5 waves. A signal with 2001
waves looks almost identical to that of 2000 waves. This does not correspond
at all to the way the human ear experiences these sounds, so this is not
desirable.
The latter problem can be solved very easily.
Instead of using m = 0, 1, 2, ..., 1/2n rotations around the circle,
it is also possible to decrease the differences between subsequent values for
m.
Let a Î \ < 0, 1 , m = 0, a, 2a, ..., 1/2n,
then it is possible to see
between the values with a whole number of rotations.
In the example from the preceding paragraph:
For a signal with 2000.5 waves, the value for m = 2000.5 is identical
to the value for m = 2000 for a signal with 2000 waves.
If a¯ 0, the Fourier Transform becomes continuous.
This will be referred to as the Circular Fourier Transform
((F)).
Note that constructing this Circular Fourier Transform is not the same
as making the Fourier Transform continuous.
The construction that is used here, is made possible because of the
simplification that was given in the previous section: Instead of an
infinite signal, only a signal of a finite size (the interval [p, p+n )
is wrapped around the circle, which makes the number of rotations finite.
Without this simplification, for any number of rotations (for the interval)
that is not a whole number, each subsequent repetition of the (periodical)
signal would be projected shifted 2p*(m-ëm û) radials further
around the circle, where m is the number of rotations for the interval with
size n.
Due to this shifting and the fact that the number of rotations is infinite,
the centre of gravity would always be 0.
Only when 2p*(m-ëm û) = 0, so when m is a whole number,
each subsequent interval [p+n, p+2n is projected exactly on the previous
interval [p, p+n , so the centre of gravity is not drawn back to 0.
If the Fourier Transform was made continuous this way, any value between the
original Fourier Transform values would be 0.
Note that the simplification does not correspond to the definition of the
Fourier Transform. Therefore, the construction of the Circular
Fourier Transform implies that the definition of the Fourier Transform is
let go5, and instead a
simplified model of the discrete
Fourier Transform
on a finite signal is extended to a continuous transform.
A possible algorithm for calculating the Circular Fourier Transform is given in appendix
A.1.
If a Circular Fourier Transform of a signal consisting of one sinusoid is
drawn in a graph, another property becomes visible.
Let f0 be the frequency of the sinusoid (in number of waves per signal
interval with size n).
In the graph (fig. 2.10),
there is not only a large peak
at f0, but there are also smaller peaks
at all positions with distances
11/2, 21/2, 31/2, ... from f0, so at all positions
f0 + (1/2+ b) and f0 - (1/2+ b), b Î N+.
At positions f0 + b, b Î Z\{0} (so at all distances 1,
2, 3, ... from f0) there are lowest points
, with value 0.
The lowest points can be explained as follows.
Let f1 be a frequency, such that the peak for this frequency falls exactly
on a value in the Circular Fourier Transform. (In fact, this can be any
frequency within the range of the Circular Fourier Transform, if a is
small enough.)
For the value xm which holds this peak, m is identical to the number
of waves of this frequency within the interval with size n.
Each wave is projected around the circle the same way m times, which causes
a strong deflection. (Note that if m is not a whole number, the number of
rotations for a part of the circle, and therefore the number of vectors in
that direction, is one more than that for the rest of the circle.)
So, each wave is projected in exactly one rotation around the circle.
Now consider the situation for m + b, b Î N+, with m the number of
waves, m + b the number of rotations for the interval.
The projection of one of the m waves in m + b rotations for the entire
interval now takes [(m+b)/( m)] = 1b/m rotations.
Each of these waves now costs b/m rotations more than before.
(This value will be referred to as wave-deviation
).
This means that any subsequent wave will be projected around the circle
shifted b/m (the wave-deviation) rotations further.
Since there are m waves, the total wave-deviation is m * b/m = b.
Therefore the m waves are spread equally around the circle. These waves
extinguish each other completely, making the centre of gravity 0.
For some values for m and b this is shown in fig. 2.7.
Here, the number of waves corresponds to m, the step size or number
of rotations corresponds to m + b.
Note that for any value for b ¹ 0, the centre of gravity is 0, since
for any value for b the difference between the number of waves and the
number of rotations is a whole number.
This explanation is also valid for values of b below zero, in that case
with a wave-deviation in the opposite direction.
From the description, the value m corresponds to f0 in the graph.
Therefore, position m + b is shown in the graph as position f0 + b.
Now it also becomes clear why the centre of gravity is non-zero for the
other positions.
For example for b Î N+, at positions f0 + b and f0 + b + 1
the projections of the waves extinguish each other completely.
On the positions in between, this cannot be the case, since the
total wave-deviation b is not a whole number.
For example for b = 11/2, the signal will have a total wave-deviation
of 11/2. If the interval is projected until the total wave-deviation
is a whole, the centre of gravity is zero. But at this point, only two
thirds of the interval is projected. The last part has a wave-deviation
of only 1/2. The centre of gravity becomes non-zero, since it is not
extinguished to zero by a following part which would match precisely at the
end of the chosen interval, which would make the wave-deviation 1 again.
The graph 2.10 which shows the absolute value
of the Circular
Fourier Transform of a signal consisting of one sinusoid
that is derived here can be found in many texts
on the Fourier Transform ([Sch94], [Pee95]).
It corresponds to the Fourier Transform of a signal that exists of two
values: one non-zero value from position 0 until position p Î R+,
0 after this position,
for example the step function
h(x) = { |
A
x £ p 0 x > p |
Function h(x) can also be considered as a finite function ranging from position
0 to position p, consisting of a sinusoid with frequency 0, phase
(for example) 1/2p.
As the function H[n] is shifted to the right, the frequency within
the finite part of the corresponding function h(x) goes up (as always
in a Fourier Transform). By shifting H[n] pq position to the right,
the frequency goes up such that the number of waves within the finite
signal increases with q waves.
Therefore, each sinusoid within this finite signal can be described by
shifting the function H[n] to the right p times 'number of waves' positions.
The peak will then be at position p times the number of waves.
From combining this with the previously calculated graph (fig. 2.10),
it follows that the absolute value of the wave-deviation-deflection
,
the deflection caused by the wave-deviation, can be calculated as follows ([Sch94], [Pee95]):
wave-deviation-deflection |g| = |[(sin bp)/( bp)]|
Note that the limit for b ® 0 can be calculated:
|
g = [(b)/( |b|)] |[(sin bp)/( bp)]|cos((b-|b|)p- j) + i[(b)/( |b|)] |[(sin bp)/( bp)]|sin((b-|b|)p- j), (1)
with g = 1
iff b = 0.
The special case b = 0 will not be explicitly stated further on.
The proof that this formula describes the wave-deviation-deflection
accurately is omitted, since, as is shown later,
this is not the case.
In fig. 2.11 the relation between wave-deviation b and
wave-deviation-deflection g is shown.
The smearing of a peak in a Fourier Transform can now easily be explained
(fig. 2.13): If a sinusoid has exactly a whole number of waves m
within the interval with size n, only the mth value of the
Fourier Transform is non-zero, since all other whole step sizes correspond
to a lowest point with value 0 (m + 1, m - 1, m + 2, m - 2, ...).
(Note: The Fourier Transform consists of only those 1/2n values that
correspond to the values of the Circular Fourier Transform that are calculated
with a whole number of rotations.)
If a peak is precisely in the centre between two subsequent Fourier-values,
the effect of the smearing is maximal, since the surrounding Fourier-values
match precisely with the surrounding peaks
(m + 1/2, m - 1/2, m + 11/2, m - 11/2, m + 21/2, ...).
The previously defined wave-deviation-deflection g (1)
is in fact a function of the wave-deviation b:
gj(b) = [(b)/( |b|)] |[(sin bp)/( bp)]|cos((b-|b|)p- j) + i[(b)/( |b|)] |[(sin bp)/( bp)]|sin((b-|b|)p- j) (2)
Let s be a sinusoid with amplitude As, number waves in the interval
with size n ms (ms ³ 0).
The main peak now lies at ms.
The deflection at this position is As * gjs(0).
The phase at this position, which is also the phase at the starting point
of the interval, is js.
At position ms+b in the Circular Fourier Transform, the deflection is
As*gjs(b), since for any value for b, the Circular Fourier Transform
at position ms + b is calculated (due to the definition of g).
This can also be reversed.
A sinusoid s with ms waves in the interval, amplitude
As and phase
js has, as Circular Fourier Transform, an image
of the
wave-deviation function, shifted to the right m positions, multiplied
by As:
(F)(p) = As(gjs(p-ms)), 0 £ p £ 1/2n (3)
Now the following steps can be made:
(F)(p) = Asi(gjsi(p-msi)), 0 £ p £ 1/2n (4)
where (F)(p), 0 £ p £ 1/2n is the Circular Fourier Transform of the
signal interval with size n.
The Circular Fourier Transform of the total signal is the sum of the
Circular Fourier Transforms of all the present sinusoids (this addition
corresponds to that for normal Fourier Transforms):
(F)(p) = åi = 1q Asi(gjsi(p-msi)), 1 £ p £ 1/2n
(5)
By combining (2) with (5), the following formula can be found:
(F)(p) = åi = 1q Asi([(p-msi)/( |p-msi|)] |[(sin (p-msi)p)/( (p-msi)p)]|cos(((p-msi)-|p-msi|)p- jsi)
+i[(p-msi)/( |p-msi|)] |[(sin (p-msi)p)/( (p-msi)p)]|sin(((p-msi)-|p-msi|)p- jsi)),
1 £ p £ 1/2n
(6)
Now a method needs to be found to find a combination of si's, that describes the Circular Fourier Transform, such that there are as few si's as possible (since that is the easiest way to build the total signal). Note that if this requirement were left out, also the normal Fourier Transform or the Circular Fourier Transform would suffice.
One important problem has been ignored so far.
It is not defined yet what exactly should be calculated if the number of
rotations around the circle is not a whole number.
An important requirement for this is that any sinusoid in the signal should
have a Circular Fourier Transform that matches the formula that was given
above.
A first idea could be to just take the average of all the vectors in the
signal.
Nothing special is done with the fact that a part of the circle has a
higher number of rotations than the rest of the circle.
See for example fig. 2.14, 1A, 2A, 3A.
In all three images, the value for b (wave-derivation) is identical, so
the centre of gravity should be at exactly the same position.
It is not. This means that the results cannot correspond completely to
the function for g(b), and therefore not with function (6).
In fig. 2.15, for a lot of different frequencies the
difference between the (absolute) values of the Circular Fourier Transforms
and the expected results according to the formula is displayed.
Note that the difference is the highest for the lowest number of rotations.
Apparently, the given formula is not very reliable, especially for lower numbers
of rotations. This means that taking the average of the vectors
is probably not the right way to find the centre of gravity.
A first idea could be to fade the part of the circle that has the higher
number of rotations. If the number of rotations is m, then the part of the
circle that has the lowest number of rotations has ëm û rotations,
while the other part of the circle has ém ù rotations.
By multiplying the vectors in the part of the circle with the higher number
of rotations by [(ëm û)/( ém ù)], the importance
of both parts of the circle can be made identical. This will be refered to
as fading
.
The effect of this is shown in fig. 2.14, 1B, 2B, 3B.
The centres of gravity are still not at the same place, so fading is not
the solution.
Even if the centres of gravity would be at the same position, another problem
would remain.
If the input signal for the Circular Fourier Transform consists of one peak,
then the position of this peak should only have an effect on the phases of
the frequencies, not on their amplitudes.
But fig. 2.16 shows clearly that when fading, the amplitudes
vary also.
If the peak is faded, so are all the frequencies that together form this
peak.
This is a second reason why fading is not the solution. Moreover, it shows
that fading parts of the interval can never be a solution.
To fulfill formula (6), something else needs to be done.
After trying a lot of other methods, which will not be mentioned here since
the results were not very useful, it seems safe to assume that it is
probably easier just to cope with this in the algorithm that finds the
peaks, instead of transforming the signal to one that makes it easier to
find the peaks.
For now, the (small) differences between the found function and the
data that is found in the Circular Fourier Transform will be ingored.
First, the focus will be on finding a method to improve the found data
and on finding the peaks.
From the formula for g follows, that in both directions from the main peak,
each subsequent surrounding peak
is a little less strong.
More precisely, if the main peak has strength c, and a surrounding peak
has strength ca, then the subsequent surrounding peak has strength
c[1/( 1/a+p)].
Therefore, each surrounding peak with strength ca lies between two
surrounding peaks with strengths
c [1/( 1/a-p)] and c[1/( 1/a+p)] (if it is not
the first surrounding peak, seen from the main peak).
The surrounding peaks can be removed by subtracting from each position
in the Circular Fourier Transform
c [1/( ([1/( preceding peak)] + [1/( subsequent peak)])/2)],
so
c [2/( 1/a-p + 1/a+p)].
(R1)
This will be refered to as the reverse average
.
Proof:
ca - c [2/( 1/a-p + 1/a+p)] = ca - c [2/( (2/a))] = ca - c a = 0.
The main peak is not removed this way, since in the centre of the main
peak, the values at this position plus one and minus one are lowest points,
with strength 0.
The result of this removal step
is shown in fig. 2.17.
This method works fine if there is only one main peak.
Let in a transform with two main peaks, a surrounding peak have strength
ca + db.
This peak then lies between two peaks with strengths
c[1/( 1/a-p)] + d[1/( 1/b-p)] and
c[1/( 1/a+p)] + d[1/( 1/b+p)].
(R1) then results to:
[2/( ([1/( (c[1/( 1/a-p)] + d[1/( 1/b-p)]))] + [1/( (c[1/( 1/a+p)] + d[1/( 1/b+p)]))]))] ¹ ca + db.
Therefore this method performs very poorly if there is more than one main peak
(fig. 2.18).
Main peaks that are close to each other can partially extinguish each other,
and peaks that are less strong can be extinguished completely, even at
larger distances.
The cause of this is that the absolute value of g is used. If the
Circular Fourier Transforms of a signal with more sinusoid is calculated,
the main/surrounding peaks for all sinusoids are added in complex space
.
The results in absolute values can therefore be very confusing, so it is
necessary to work in complex space.
The advantages of finding peaks already become visible at this point.
In the next chapter, a display called the spiral display
will be described.
At this point, the properties of this display are not important, except for
the fact that different sinusoids are shown at different places on the
display.
In fig. 2.19 - 2.21, the results of different analysis
techniques on the same signal are shown. The signal consisted of three
different sinusoids.
In fig. 2.19, the Fourier-analysis of the signal is shown.
The three sinusoids can be seen, but they are not visible very clearly.
In fig. 2.20, the Circular Fourier-analysis of the signal is shown.
The three sinusoids are a bit clearer here, but the result is still very
poorly. Note that the surrounding peaks are clearly visible in the display.
In fig. 2.21, removal technique R1 is used on the data from the
Circular Fourier-analysis. Values below 0 are set to 0.
The three sinusoids are visible very clearly.
Note that in the latter two, the positions at which the peaks lie can be
determined more precisely. In the Circular Fourier-analysis, (the top of)
the peak can be at [1/( a)] times as many positions as in the
Fourier-analysis. The width of the peaks are not very different between the
first and the second image. Note that it is smaller in the third image,
since also a part of the sides of the main peaks is removed.
Another possibility, which solves all problems mentioned above, is based
on the fact that the (absolute) highest value
in the Fourier Transform
must lie on a main peak.
Since the shape of such a peak with its surrounding peaks is known,
it can be removed, after which again the (absolute) highest value can
be found, etc..
Note that, since the main peaks can be influenced by surrounding peaks of
other main peaks, only an approximation of the amplitude, position and phase
of a main peak can be known.
If the main peak and its surrounding peaks are removed completely at once,
and the results were not perfect, then some errors will be introduced in
the remaining data.
This can largely be avoided by removing only a small percentage of the
main peak and its surrounding peaks. Small errors in amplitude, position
and phase can still be corrected, since the values for this will change
a little in a direction opposite to the error.
The smaller this percentage is, the more precisely peak strengths can be
ascertained.
As mentioned before, it can happen that the position of the peak
shifts a little in subsequent times that the same peak is found.
According to tests, if less than 1/s of the main and surrounding
peaks is removed at each step, the position of the main peak never shifts
more than 1/s positions. (It might shift slightly more if the positions
of some peaks
are very close, so it might be safer to remove only [1/( 2s)].)
By setting 1/s to a, the precision with which the
Circular Fourier Transform has been calculated,
the shifting will never be more than this precision.
This way, it is easy to know which found peaks belong together, which makes
it possible to add all these peaks, making it (for example) possible to
locate the peaks with a greater precision than that of the
Circular Fourier Transform.
The peak position
is a weighed average of the positions of the found
partial peaks
, with as weighing factor the strength of each of the found
partial peaks.
The complex representations of some Circular Fourier Transforms of signals
consisting of a limited number of sinusoids are shown in
figs. 2.22 - 2.26.
Fig. 2.22 shows what the transform looks like for a signal consisting
of one sinuisoid.
The figure is drawn as follows: First, on one side of the centre, a number
of circle-like shapes are drawn. Each subsequent shape gets a little
bigger, all shapes start at the same point (the centre).
The last one however, does not end at this point, but at the centre of
the large loop, which corresponds to the main peak.
After this, circle-like shapes are drawn at the other side of the centre,
which get smaller (so in fact a mirrored image of the first half of the
figure). The point that is furthest from the centre, the centre of the
loop, is the centre of the main peak. Therefore it is possible to calculate
the position, amplitude and phase of this point, making it possible to
calculate and remove the corresponding main and surrounding peaks.
Figs. 2.23 - 2.25 show the same transform for a signal
with more than one sinusoid.
As shown in fig. 2.25, it is possible that when two main peaks
lie very close to each other, between the loops of the two there are no
points at the centre of the image. One loop just proceeds into the other.
Fig. 2.26 shows the transform of a signal consisting of white noise
.
Noise can be considered as built from a large number of sinusoids, spead
over the entire (noise-)spectrum.
Since only a short interval of the noise signal is analysed here, some
frequencies are stronger than others. This corresponds to the fact,
that if a small interval of noise is played continuously,
some tones are audiable above the noise.
The longer the noise interval is, the less this is the case.
The correspondence between the absolute values of some
Circular Fourier Transforms and the peaks that are found using the method
described above are shown in figs. 2.27 - 2.33.
This is also shown for some real-life, sampled signals in figs.
2.34 - 2.38.
For some of the real-life sampled signals, the peaks that correspond to
the highest frequencies are a bit disturbed.
This is not caused by a deficiency of the analysis method.
It is caused by the fact
that if the pitch of tone changes, higher frequencies shift more than lower
ones. (If a peak at position 10 shifts to 10.1, then a peak at position
100 shifts to 101.
Therefore the image in the Circular Fourier Transform is affected much more
for the latter one than for the first one.)
So far, the problem of deriving the main and surrounding peaks from the
position, amplitude and phase of the top position has been ignored.
If the problem with the number of rotations around the circle that is not a
whole were solved, using formula (3) would do.
Since solving this problem seems problematic (see above), another method
will be discussed here.
If the position, amplitude and phase of the main peak are known, it is
possible to produce a signal which, if analysed again, would produce a
main peak with exactly the same position, amplitude
and phase. (This is a sinusoid with a number of waves equal to the position
of the main peak, amplitude identical to that of the main peak (see below),
starting with the phase of the main peak.)
The result of this analysis can then be used instead of the result of
formula (3).
The advantage compared to using formula (3) is, that in this case also
the errors caused by the numbers of rotations that are not a whole
would occur in the results.
By subtracting a percentage of the result of this from the total signal,
the desired result without errors can be obtained.
The result of this is shown in fig. 2.39.
It might seem that total peaks that are found this way still need to be
corrected for errors in the strengths of the main peaks in the Circular
Fourier Transform that are caused by numbers of rotations that are
not a whole.
This is however not the case.
Let the amplitude of a sinusoid in the original signal be A, and the
(absolute) measured top value of a main peak in the
Circular Fourier Transform be dA.
To remove a part of this peak, a signal consisting of a sinusoid with amplitude
dA is constructed. But then the resulting (absolute) measured top value of the
main peak in the (F)1 is d2A.
The (absolute) value that is found for the partial peak also is dA,
instead of A.
The resulting peak is deminished by d2A divided by some constant value.
So the value of the partial peak is multiplied by a factor d,
and the amount with which the main peak is deminished is multiplied
by a factor d2.
For example for a value of d > 1, this means that the partial peak values
are too high, but also that the main peak shrinks faster for each matching
partial peak that is found than for d = 1.
Therefore, the (absolute) value of the total peak that is calculated
from these partial peaks is equal to the amplitude A
of the original sinusoid, regardless of the frequency (and therefore the
position of the main peak) of the sinusoid.
An important disadvantage of this method is, that for each sinusoid in the
signal a large number of Circular Fourier Transforms needs to be calculated,
whereas calculating a Circular Fourier Transform takes a considerable amount
of time. (In a first test program, the analysis of the sound
/e/ to a reasonable precision took about 30 hours.)
Using tables
is hardly a solution: There are as many as [(n)/( 2a)]
possible peak positions, each of which having a Circular Fourier Transform
consisting of [(n)/( 2a)] complex values, which are different
for every different phase, so [(n2)/( 4a2)] need to be stored
for every different phase, with a large precision for the phases.
For n = 1000, a = 1/8 (which is not very high),
and a precision for the phases of [1/ 1000] p (which is probably
not even good enough), this would require
32 billion values, which would need to be stored in a very high precision.
Larger values for n and smaller values for a can also be used,
so this is not very realistic.
For this reason, a formula
will be derived from the previously mentioned
test program, which is then simplified to make it
faster.6 ([Par90])
The program, in some pseudo-code
, consists of some different parts.
Let N be the size of the Circular Fourier Transform (i.e. the number
of whole positions in it),
M be the size of the signal interval (n),
PARTS be [1/( a)]:
First, the Circular Fourier Transform of the signal interval s is
calculated:
f := CFT(s, M);
After the Circular Fourier Transform f of the original signal interval
is calculated, the highest peak is found:
FOR x FROM 0 UPTO N * PARTS DO s := sqr(real(f[X]) ^ 2 + imag(f[X]) ^ 2) IF s > max_s THEN max_s := s; max_s_pos := x; max_s_real := real(f[x]); max_s_imag := imag(f[x]); ENDIF ODmax_s Now contains the amplitude of the peak, max_s_pos the position, max_s_real and max_s_imag the real and imaginary value of the peak, respectively. From the latter two the phase can be calculated:
IF max_s_real ¹ 0
THEN j := arctan(max_s_real / max_s_imag)
ELSE j := arctan(max_s_real * ¥)
Now, the corresponding sinusoid can be calculated:
number_of_waves := max_s_pos / PARTS;
FOR x FROM 0 UPTO M - 1
DO
s'[x] := sin(2p * number_of_waves * x / M + j) / M;
OD
Next, the Circular Fourier Transform of this sinusoid can be calculated:
f' := CFT(s', M);
This can now be subtracted from the original Circular Fourier Transform:
FOR x FROM 0 UPTO N * PARTS DO f[x] -:= f'[x] / PARTS; ODThe Circular Fourier Transform is calculated in the following function:
FUNCTION CFT (ARRAY signal, size)
FOR x FROM 0 UPTO N STEP 1 / PARTS
DO
real := 0;
imag := 0;
stepsize := 2p * x / size;
cos_count := 0;
FOR u FROM 0 UPTO size - 1
DO
real +:= signal[u] * cos(cos_count);
imag +:= signal[u] * sin(cos_count);
cos_count +:= stepsize;
OD
f[PARTS * x] := 2 * complex(real, imag);
OD
RETURN(f)
Using unfolding
([Par90]), from CFT a new algorithm can
be derived for calculating the Circular Fourier Transform of a signal
consisting of one sinusoid.
For the calculation of the (F)1, CFT is called with
s' for signal.
Unfolding signal, using the formula that was used to calculate
s', so signal[u] is unfolded to
s'[u] = sin(2p * number_of_waves * u / M + j) / M.
This results to:
FOR x FROM 0 UPTO N STEP 1 / PARTS
DO
real := 0;
imag := 0;
stepsize := 2p * x / size;
cos_count := 0;
FOR u FROM 0 UPTO size - 1
DO
real +:= sin(2p * number_of_waves * u / M + j) / M * cos(cos_count);
imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(cos_count);
cos_count +:= stepsize;
OD
f[PARTS * x] := 2 * complex(real, imag);
OD
One can easily see (or prove), that at any time the value of cos_count
is used, this is equal to u * stepsize.
By replacing this, the inner loop can be changed to:
DO
real +:= sin(2p * number_of_waves * u / M + j) / M * cos(u * stepsize);
imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(u * stepsize);
OD
By unfolding stepsize, this becomes:
DO
real +:= sin(2p * number_of_waves * u / M + j) / M * cos(u * 2p * x / M);
imag +:= sin(2p * number_of_waves * u / M + j) / M * sin(u * 2p * x / M);
OD
complex(real, imag) now returns the complex number that is equal
to the xth value of the Circular Fourier Transform (F)1 of
a signal consisting of one sinusoid, with number_of_waves (which
will be called a from now on) waves in the interval,
starting at phase j at the beginning of the interval,
with interval size M. What this program does can also be described as
follows, if (F)1 (a, j, M) is defined to be the
Circular Fourier Transform of a single sinusoid with amplitude a, phase
j and signal interval size or number of sample values M:
(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) cos(2p x u/M)
+ i sin(j + 2p a u/M) sin(2p x u/M))
Two other ways to write this formula are:
(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) (cos(2p x u/M) + i sin(2p x u/M))),
or
(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) ei 2p x u/M).
By simplifying this formula as far as possible, and more specifically by
removing the å-sign, the calculation can be accelerated.
A first acceleration, without removing the å-sign, can be obtained by
calculation the values of
cos(2p x u/M) + i sin(2p x u/M)
(or ei 2p x u/M) in advance and storing these in
a table.
This table is circular; if M is a power of 2, it can be kept small
and the right value in the table can be found very fast without the need
to even calculate a modulo.
Note, that in the original program the values of
sin(j + 2p a u/M) were also calculated in
advance, so for each value of u, only two multiplications of real numbers
(and an addition) are left.
It turns out that it is possible to remove the å-sign from the formula.
The resulting formula might give slightly different results, introducing
errors, but these are very small.
To see this, it is important to note what the formula represents exactly.
This can most easily be seen in the previously mentioned variant
(F)1 (a, j, M) (x) = 2/Måu = 0M-1 (sin(j + 2p a u/M) (cos(2p x u/M) + i sin(2p x u/M))).
Here, sin(j + 2p a u/M) is the signal for which the
Circular Fourier-analysis needs to be calculated, and
(cos(2p x u/M) + i sin(2p x u/M))
is the rotation around the circle, as was already shown before in many figures.
From these images (see for example fig. 2.2), it follows clearly
that it is possible to consider both the input signal (the sample) and
the circle as continuous
, instead of as a number of discrete points at equal
distances from each other.
So, å0M-1 can be replaced by
ò-1/2M-1/27.
The new formula now becomes:
(F)1 (a, j, M) (x) = 2/Mò-1/2M-1/2 (sin(j + 2p a u/M) cos(2p x u/M)
+ i sin(j + 2p a u/M) sin(2p x u/M)) (du)
or
(F)1 (a, j, M) (x) = 2/Mò-1/2M-1/2 sin(j + 2p a u/M) cos(2p x u/M) (du)
+i 2/Mò-1/2M-1/2 sin(j + 2p a u/M) sin(2p x u/M) (du)
Let l0 = -1/2, l1 = M-1/2.
Calculation of the integrals
gives:
òsin(j+ 2pa u/M) cos(2px u/M) (du) = (See box 1)
- [1/( 2pa 1/M - [((2px 1/M)2)/( 2pa 1/M)])] (cos(j+ 2pa u/M) cos(2px u/M) + [(2px 1/M)/( 2pa 1/M)] sin(j+ 2pa u/M) sin(2px u/M)).
òl0l1 sin(j+ 2pa u/M) cos(2px u/M) (du) =
- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) + x/a sin(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]))
+ [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]) + x/a sin(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)])) =
- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) + x/a sin(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)])
- (cos(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]) + x/a sin(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)]))) =
- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]))
+ x/a (sin(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)]))).
For the imaginary part, this is:
The goal is to find the integral of cos(j+ pu) cos (qu) (du).
Assume (wrongly):
òf(x) g(x) (dx) = (òf(x)) g(x) + f(x) (òg(x)).
If that were true,
òcos(j+ pu) cos (qu) (du) = 1/p sin(j+ pu) cos(qu) + cos(j+ pu) 1/q sin(qu).
To check this, take the derivate of both sides of this equation:
(òcos(j+ pu) cos (qu) du)¢ = (1/p sin(j+ pu) cos(qu) + cos(j+ pu) 1/q sin(qu))¢
Þ cos(j+ pu) cos (qu) = 1/p(sin(j+ pu) cos(qu))¢+ 1/q(cos(j+ pu) sin(qu))¢
1/p(sin(j+ pu) cos(qu))¢ =
= 1/p(p cos(j+ pu) cos(qu) - sin(j+ pu) q sin(qu)) =
= cos(j+ pu) cos(qu) - q/p sin(j+ pu) sin(qu)
1/q(cos(j+ pu) sin(qu))¢ =
= 1/q(-psin(j+ pu) sin(qu) + cos(j+ pu) q cos(qu)) =
= -p/q sin (j+ pu) sin(qu) + cos(j+ pu) cos(qu)
To get closer to the goal, multiply all arguments in the last line of these two by -[(q2)/( p2)]:
-[(q2)/( p2)] * (-p/q sin (j+ pu) sin(qu) + cos(j+ pu) cos(qu))) =
= q/p sin (j+ pu) sin(qu) - [(q2)/( p2)] cos(j+ pu) cos(qu))
Addition of the result of this and the first line gives:
cos(j+ pu) cos(qu) - q/p sin(j+ pu) sin(qu) + (q/p sin (j+ pu) sin(qu) - [(q2)/( p2)] cos(j+ pu) cos(qu)) =
= (1 - [(q2)/( p2)]) cos(j+ pu) cos(qu).
De goal that was set was:
cos(j+ pu) cos (qu),
so multiplying by [1/( 1 - [(q2)/( p2)])] is necessary.
Both multiplications are multiplications with constant values, so they can also be used in the integral.
The integral that is found this way is:
òcos(j+ pu) cos (qu) (du) = [1/( 1 - [(q2)/( p2)])] (1/p sin(j+ pu) cos(qu) - [(q2)/( p2)] 1/q cos(j+ pu) sin(qu)).
òcos(j+ pu) cos (qu) (du) = [1/( p - [(q2)/( p)])] (sin(j+ pu) cos(qu) - q/p cos(j+ pu) sin(qu)).
The derivate of this is cos(j+ pu) cos (qu), as wanted.
1 DERIVATION OF A FORMULA FOR THE CALCULATION OF THE REAL PART
OF A CIRCULAR FOURIER TRANSFORM OF A SIGNAL CONSISTING OF ONE SINUSOID
The goal is to find the integral of cos(j+ pu) i sin (qu) (du) = i cos(j+ pu) sin (qu) (du).
This is i times the integral of cos(j+ pu) sin (qu) (du).
When in the formula for the real part, sin(qu) is changed into cos(qu), and cos(qu) into -sin(qu), then this gives:
òcos(j+ pu) sin (qu) (du) = [1/( p - [(q2)/( p)])] (sin(j+ pu) sin(qu) + q/p cos(j+ pu) cos(qu)).
Calculating the derivate of this gives cos(j+ pu) sin (qu).
2 DERIVATION OF A FORMULA FOR THE CALCULATION OF THE IMAGINARY PART
OF A CIRCULAR FOURIER TRANSFORM OF A SIGNAL CONSISTING OF ONE SINUSOID
òl0l1 sin(j+ 2pa u/M) sin(2px u/M) (du) =
- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]) - x/a sin(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]))
+ [1/( 2pa 1/M - [(2px2 1/M)/( a)])] (cos(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)]) - x/a sin(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)])) =
- [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)]))
- x/a (sin(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]))).
The formula for (F)1 can now be rewritten as:
(F)1 (a, j, M) (x) = 2/Mòl0l1 sin(j + 2p a u/M) cos(2p x u/M) (du)
+i 2/Mòl0l1 sin(j + 2p a u/M) sin(2p x u/M) (du) =
- 2/M [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]))
+ x/a (sin(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)])))
- i 2/M [1/( 2pa 1/M - [(2px2 1/M)/( a)])] ((cos(j+ 2pa [(l1)/( M)]) sin(2px [(l1)/( M)]) - cos(j+ 2pa [(l0)/( M)]) sin(2px [(l0)/( M)]))
- x/a (sin(j+ 2pa [(l1)/( M)]) cos(2px [(l1)/( M)]) - sin(j+ 2pa [(l0)/( M)]) cos(2px [(l0)/( M)]))).
|
Most values in this formula can be calculated and stored in tables
in advance.
Other values are used more than once in the formula.
Using this, the gain in speed compared to the previous test version is
about 1500 times.
An example algorithm for this function, in which all sines and consines
are calculated outside the iteration
, is given in appendix
A.2.
The program got a lot bigger than the original version. This is partially
caused by the optimising, but there is also another important difference.
The original program contained an error, that is corrected here.8
In some cases, the highest position
on the top is not precisely the number of waves of the sinusoid that
caused this top. The position of the highest point of the main peak can be
shifted a little due to the errors caused by the numbers of rotations that
are not a whole. If in these cases the (F)1 of the tone with as number
of waves the position of the highest point is calculated,
and the result is analysed again,
it turns out that the position of the highest point has changed.
This is especially a problem for very small values for m, since the errors
are highest in this part of the Circular Fourier Transform.
If in these cases the position of the highest point is used for the number
of waves, the errors in the result can become large, since not only the
main peak, but also the surrounding peaks are not correctly removed.
The Circular Fourier Transforms of the original sinusoid and the (F)1 do
not match precisely.
To solve this, the program needs to calculate the (F)1 of a sinusoid
such, that the highest point is at the same position as
in the transform of the original signal. Note that the phase is different
for different positions at the peak, so if another than the highest point is
used, the phase needs to be corrected also.
Seen from the highest point on the main peak of a (F)1, to both directions
each subsequent point is lower, as long as it is on the main peak.
So, if a point is not the highest point on this peak, and only then, the
point at one of the two positions at a distance a from that point must
be higher.
Therefore, it is not necessary to calculate the entire (F)1, check whether
the main peak is at the right place, and calculating it again with a small
correction if not, until the peak is at the right place.
Instead, it suffices to calculate only the values for the wanted position pw
for the highest position and the two values at positions pw - a and
pw + a.
If pw has the highest value, then the calculation can be finished by
calculating the rest of the (F)1.
If pw - a is higher, the peak for the current number of waves lies
at a position that is too low, so the number of waves needs to be increased
a little. If pw + a is higher, the opposite needs to be done.
In both cases, also the phase needs to be corrected.
This step needs to be repeated until the highest point lies at position pw
(or to the opposite direction than before).
Since the value that is found this way is equal to the number of waves in the
original sinusoid (the previously found value was affected by errors), this
is the value that is to be used for further calculations.
The result of the algorithm is a list of peak positions with phase and
amplitude information. Since only a small part of each found peak is removed,
the same position can occur very often in this list.
The easiest way to determine the total amplitude and phase for each
frequency is to add the data for all peaks
with the same position (which
is equal to the frequency multiplied by a constant (n/r)).
The total peaks that are found this way will be refered to as
added total peaks
.
This works fine, but this way many of the sinusoids are displayed
at more than one frequency, since the positions of the partial peaks for
a frequency can differ a little.
For many applications this is no objection, since the frequencies for the
same sinusoid are still very close to each other (depending among other
things on the chosen value for a).
As mentioned before, if the part of the main and surrounding peaks that
is removed in each step is small enough, then two found partial peaks
that belong to the same total peak never lie more than a positions
from each other, iff there is no third partial peak that belongs to the same
total peak which is found by the given algorithm between the first two
partial peaks.
Using this, it is possible to determine peak positions
precisely.
The position of a total peak is the weighed average of the frequencies of the
found partial peaks, with as weighing factor the strength of the peaks.
Total peaks that are found this way will be refered to as
close total peaks
.
An algorithm that uses this to find the close total peaks is given in appendix
A.3.
This completes the signal analysis.
Note: The uncertainty principle is not violated by this analysis method. The Circular Fourier Transform fulfills this principle, and the total peaks are nothing more than a different way to describe the Circular Fourier Transform. Peaks can be determined very precisely, but only if there are no other peaks that lie very close to each other. The distance between the peaks is proportional (in a constant signal) to the size of the sample interval, which corresponds to the uncertainty principle.
If a long sample is split in subsequent intervals, these intervals are analysed, an operation is performed on the results of this, and the results are transformed back, it is usually necessary to go smoothly from one interval to the next (since due to the operation that is performed, the interval border values might not match and therefore produce a click in the resulting total sample). If a standard Fourier Transform is used, it is not possible to determine what the signal looks like outside the original signal interval. Only an infinite repetition of this signal is found if the peaks that were found are used to calculate the signal outside the original interval. (This corresponds to the fact that the signal is considered to be periodical.) Therefore, it is necessary to use an overlapping area between the intervals, which is used for the smooth transition. If total peaks are used, it is possible to determine what the signal will probably look like outside the signal interval. (This corresponds to the fact that the signal is considered to be finite.) If this is used instead of an overlapping area between the intervals, no disturbances of the sound are audiable (although the signal can not be reproduced pricisely this way, so using an overlapping area is - at least theoretically - still a better option). This can be used to speed up the total transform a little. Note that this works better for close total peaks than for added total peaks, since the latter still cause a repetition of the signal every [1/( a)] periods.
In a real-time version, calculation can be terminated if there is no time left. In that case, peaks are determined much more precisely for simple signals than for complex signals.
A very important point in displaying the obtained frequency information, is that it is only valid for a very short time. The human eye however is limited in the number of images that it can separate in a given time. There are two ways of dealing with this:
In this chapter the focus will be on the latter of these two.
Especially for music
, it is very important to see relations between
frequency response peaks in different octaves
. Also, it is important that
the way a tone looks should not change too much if the tone occurs in another
octave, since it would be very difficult to even see on the display
which tones sound similar.
A first idea is the rectangle display
.
In this display, subsequent octaves are displayed next to each other.
Each subsequent octave
occupies twice the domain of the frequency space
that the preceding octave
occupied.
This is displayed in fig. 3.1.
The numbers in the image correspond to positions in the frequency
space (total peaks), placed linearly
in this space.
Note that each odd number in this display starts a progression of numbers,
where each following number is twice the latter one
(for example, '1, 2, 4, 8, 16, 32, ...' or '3, 6, 12, 24, 48, 96, ...').
This display has an important disadvantage.
If a tone starts at a frequency that is displayed for example
at the position marked with '30', and it goes up, it might happen that it
gets at or above position '32'. But then, the total frequency is suddenly displayed
at the bottom of the image instead of at the top. Since most frequencies
in sounds are not constant, this will happen very often, making it very
difficult to see in what direction the signal is moving and what it
looks like. Two tones that sound about the same, are displayed at a totally
different position.
This problem can be solved by using the spiral display
, which is
shown in fig. 3.2.
In this display, the advantages of the rectangle display are combined with
a solution of the latter problem. Also, it corresponds more naturally to
the way that human hearing
works, since the transition
from one octave into
the next one is fluent.
It has to be noted here that it is impossible to see very fast what is displayed, if all present frequencies are displayed as very thin lines. Also, this way the image might differ too much from the next one if the frequency changes a little. To solve this, a larger area of the cycle needs to be illuminated. Note that it does not suffice to illuminate the same percentage of the cycle for every cycle, since then the outer cycles are being filled too much and frequencies in the inner cycles are still displayed with too thin lines. The area that is to be filled should be almost the same for each cycle, regardless of the size of the cycle. Note that this way, lower frequencies that might not have been determined very precisely, have a relatively larger area of the corresponding cycle that is filled. This renders the precise determination of frequencies to be less important. Some examples are shown in 3.4.
In order to solve this, it is necessary to be able to find out which frequencies belong together. This will therefore be addressed first.
Now there are three problems left.
First of all, the frequency analysis is not perfect.
Therefore, in the analysis result, harmonics are not always exactly
a multiple of the base frequency.
This can be solved by taking an area
around every frequency that is being
examined, and take the sum of all frequency responses in this area.
This way this problem can be solved.
Secondly, if a tone is not completely stable
within an interval, the frequency
analysis of this interval will be affected. The solution of this is identical
to that of the first problem, but the areas need to be larger.
Thirdly and finally, there can be other tones present, with other base frequencies and harmonics, or noise. These should be ignored, but it is not possible to know in advance which frequencies belong to these other tones or noise. Therefore, this problem is not easily solved. Below, a solution will be presented with which it is possible to find the base frequency of the loudest tone in the signal, if the loudness of the other tones is relatively low. This solution is not perfect, and will sometimes produce poor results even if there is only one tone available. Mostly however, the results will be good.
Using this value, all frequencies matching this value multiplied by a power
of two can be tried. Trying a frequency means here, that the responses of
this frequency and all its harmonics are added (see below).
If the result of this addition drops too much compared to the result
of the preceding frequency (this is the current frequency divided by 2),
it is probable that a part of the tone has been removed,
which would imply that the preceding frequency was the base frequency.
Several problems are apparent.
First of all, in the spiral display, the direction in which
there is the most brightness usually contains the base frequency.
In cases that the direction in which there is the most brightness does not
contain the base frequency, the result will be very bad.
Secondly, also in the spiral display, very low frequencies are not always
displayed in the right direction. This can be solved by fading all low
frequencies.
Thirdly, the wrong octave can be found. If it is too high, frequencies are lost, if it is too low, a lot of frequencies with response 0 can be present. This problem is partially solved by the block display (section 3.2.2).
Finding the direction in the spiral corresponds roughly to the following
procedure.
Take a start frequency fs and do the following for every value in the
interval [fs, fs * size]. (size stands for a value a bit above 2,
for example 2.1. This interval corresponds to a bit more than one cycle
around the spiral in the spiral display.)
For every base frequence f0 in this interval, add the frequency responses
of this and all its harmonics. The wanted base frequency is the
frequency f0 which gave the highest frequency response, multiplied by
a power of two.
In fact, taking the sum of the responses of the harmonics does
not work fine. After testing with a lot of values, the best results seem
to be obtained using:
response(f) = response_area(f) 0.4 / (2 + (log(1 + f)) 3)
where response_area(f) is the sum of all frequency responses in the area
[max (f / mul_diversion, f - max_diversion), min (f * mul_diversion, f + max_diversion)]
Now,
fbase = ftop * 2n
ftop = { f0 | f0 Î [fs, fs * size] Ù Ø$f1 [ f1 Î [fs, fs * size] Ù tone_resp(f1) > tone_resp(f0) ] }
where n Î N,
tone_resp(f) = åi = 0¥ response(i * f).
According to some test results, a good value for fs would be around 0.5.
Good values for mul_diversion and max_diversion are 1.5 and 0.175,
respectively. max_division must always be smaller than 1/2precision,
to avoid overlapping of areas. So if a larger value for max_division is
needed, it might be necessary to increase precision also.
If the value that is found lies around fs or fs * size, it is
probably not very reliable, since also the position of a peak
relative to that of f0 has
an effect on the response function. Therefore, it is possible, that
instead of finding a top value at for example fs * 1.95, it is
found at position fs. If the result lies close to the border of the
interval, a new (higher or lower) value for fs should be set, and
the calculation should be performed again.
Next, the right value for n needs to be determined.
Take a small area around frequency ftop, for example the interval
[ftop / 1.2, ftop * 1.2].
For each value ftry in this interval, let
flatftry(m) = response_area(m * ftry), m Î N.
This function describes all the multiples of ftry.
If ftry is the right value, and the area is sized correctly,
the base frequency and its harmonics are all displayed in the flat function
at positions m = k * 2octave, k Î N+. (ftry lies on octave 0.)
From this, the octave can be found very easily in case of a perfect analysis
and no other signals. This is however hardly ever the case.
According to some tests, the following method will usually produce the
correct value for octave.
The sum of the peaks that are included for some octave a is:
octave_contentsftry, c(a) = åb = 0¥ calcc(flatftry(b * 2a))
where c Î {0, 1, 2}
a Î N,
calc0(x) = log(x + 1),
calc1(x) = x 0.875,
calc2(x) = x 1.125.
If the octave contents drop more than a factor 1.25 between two subsequent
octaves, it is probable that a part of the peaks that belong to the tone
are removed. So the octave right before is probably the octave in which the
base frequency lies:
possible_apparent_octaveftry, c(a) º
º octave_contentsftry, c(a + 1) < octave_contentsftry, c(a) / 1.25,
where a Î N,
Of course this is only the case for the first time that this occurs:
apparent_octaveftry, c = {a | possible_apparent_octaveftry, c(a)
Ù Ø$b Î N [ b > a Ùpossible_apparent_octaveftry, c(b)]}.
The following is necessary because for some situations (more precisely,
if the base frequency is very strong compared to the other frequencies),
the octave that is found is one higher than it should be.
confirmftry, c(a) = log(1 + flatftry(a))
denyftry, c(a) = åb = 2¥ log(1 + flatftry(a * b))
repair_octaveftry, c(a) = {-1 | confirmftry, c(a) > denyftry, c(a) / 60 | otherwise
repaired_apparent_octaveftry, c =
= apparent_octaveftry, c + repair_octaveftry, c(apparent_octaveftry, c).
From this, the wanted octave can be found by trying all combinations of
ftry and c, and taking the highest number for
repaired_apparent_octaveftry, c that is found.
octaveftop = {a | $ftry Î [ftop / 1.2, ftop * 1.2], c Î {0, 1, 2} [ repaired_apparent_octaveftry, c = a ]
ÙØ$ftry Î [ftop / 1.2, ftop * 1.2], c Î {0, 1, 2} [ repaired_apparent_octaveftry, c > a ]}.
If octaveftop is below zero, detection failed.
Now, the best value for ftry needs to be found.
The signal that is lost by skipping all positions of flatftry
that are not a multiple of 2octaveftop, is given by
lossftry, c(octaveftop), where
lossftry, c(a) = åb = 0¥ calcc(flatftry(b * 2a)) - åb = 0¥ calcc(flatftry(b)).
The best value for ftry is that value where
lossftry, c(octaveftop) (for some c) is the lowest.
fbase = octaveftop * fbest
where fbest Î {f | f Î [ftop / 1.2, ftop * 1.2]
Ù$c Î {0, 1, 2} [repaired_apparent_octavef, c = octaveftop
ÙØ$f1 Î [ftop / 1.2, ftop * 1.2], c1 Î {0, 1, 2}
[repaired_apparent_octavef1, c1 = octaveftop
Ùlossf1, c1(octaveftop) < lossf, c(octaveftop)] ] }.
With this frequency (fbase), the response of the base frequency and its
harmonics can now be determined very easily.
Note, that all the mentioned values can be changed. The values that are printed here are the ones that gave the best results in extensive tests.
In the block display, each frequency's response is displayed as the
brightness of a square
. These squares are placed in the following way
(see image 3.3):
Note that the numbers of the squares on the left from the bottom to the top level
are the powers of two, starting at one.
If an error has been made in determining the octave
, and the found base
frequency was half the real one, then the shape of the display is not
really changed. The real base frequency is displayed on square 2, the first
real harmonic at square 4, the next at square 6, etc..
The shape is identical, only it is shifted up one level and there are
holes in it. Something similar is the case if the found base frequency was
one quarter, one eightst etc. of the real one.
If the found base value was twice the real one, then half of the real
frequencies are not displayed, but still, the shape of the image is contained
as far as possible, shifted one level down.
Something similar is the case if the found base frequency was
four, eight etc. times the real one.
The shifting up and down of the image due to the wrong determination of the
octave can be avoided by
shifting
the
entire image up or down according to
the found base frequency. If this is twice as high, the image should shift
up exactly one level (while if this is an error, the squares are filled
in one level too low). A logarithmic function
can be used to determine the
exact placement
of the image according to the base frequency. This way
it is also possible to see the perceptive frequency
of the tone (which
is the base frequency) in this display.
Originally the idea was to place the block display at a position
corresponding to the base frequency
around the spiral display, but it would move too much there, making it
virtually impossible for the human eye
to recognize any shapes in it.
(In fact, shifting up and down is not enough since the blocks on another level
do not lie at exactly the same horizontal position. To spare space, horizontal
shifting is not used.)
Note, that in the obtained image, every level can be regarded as a new octave. Each octave corresponds to one cycle in the spiral display. Starting from the position of the base frequency, each new turn is displayed on the next level in the block display. The start of each turn is always on the left of the corresponding level. Therefore, all frequencies that are displayed in the same direction as the base frequency but in a subsequent cycle, are displayed at the left of the level corresponding to that cycle. Some examples are shown in 3.4.
To solve this, it is necessary to show not only the current, but also
some preceding images in the display. This is not possible using the two
displays that were described above.
A possible solution which will be described here very briefly
is to take the tone information (3.2.1)
and display a tone on a straight line, such that each octave takes the
same amount of space on the line. This way it is easy to scroll some old
lines upward each time a new line is added. A disadvantage is that it is
harder to recognise phonemes in this display. On the other hand, this way
words might get recognisable images. It is possible that, after training, people
can recognise these words without recognising exactly which vowels the words
are composed of.
Note that the display that is obtained this way is not identical to the display that would be obtained by displaying the frequency information on a line. For example, in the described display the base frequency fills an amount of space on the line that corresponds to an entire octave, whereas if the frequency information were displayed, only a dot would be present. Note that the best results can be obtained by combining the two methods (calculating the value for each point on the line using both methods, and setting the point to the maximum of the two values), so tones are displayed as described above, and noise is also displayed correctly. More work on this needs to be done. Some examples are shown in 3.4.
Some example phonemes
that were pronounced by three different persons are
displayed in figs. 3.4 and 3.5. Especially in the block
display, it is clear that identical phonemes look a lot alike, regardless of
who pronounces the phoneme.
For longer sounds, the subsequent images corresponding to the input sample
can be calculated. In the following examples, there were 44 images
per second.
An analysis of the sound of a door-bell is shown in fig. 3.6 (spiral
display only).
An analysis of the sounds
/ 180 c i ua/
is shown in figs. 3.7 and 3.8:
/ 180 c /: 0 - 14
/i/: 17 - 31
/u/: 35 - 48
/a/: 52 - 65
The line display of this is shown in fig. 3.14.
An analysis of the word ``sht!'' (/òt/) is shown in fig. 3.9:
/ò/: 0 - 27
/t/: 32 - 44
The line display of this is shown in fig. 3.15.
Finally, an analysis of the sentence ``One man living on an island''
(/wÙmæn 'livih 180 c n 180 e n'ail 180 e nd/)
is shown in figs. 3.10 - 3.13:
/w/: 1 - 11
/Ù/: 12 - 18
/m/: 19 - 27
/æ/: 29 - 47
/n/: 49 - 54
/l/: 57 - 59
/i/: 60 - 65
/v/: 66 - 69
/i/: 72 - 76
/h/: 78 - 81
/ 180 c /: 86 - 89
/n/: 90 - 91
/ 180 e /: 93 - 95
/n/: 96 - 97
/a/: 99 - 104
/i/: 109 - 112
/l/: 114 - 115
/ 180 e /: 117 - 122
/n/: 125 - 127
/d/: 130 - 139
The line display of this is shown in fig. 3.16.
For 18 phonemes, the corresponding spiral and block display is shown in figs. 3.17 and 3.18. These phonemes are all pronounced by the same person. The images can be used to recognise phonemes in other images. Note however that phonemes can look slightly different for different people, or when the pitch is different. Therefore, the printed images can only be used to determine sounds very roughly. Especially phonemes that sound alike might be confused. To make recognition more reliable, a list of several images of each phoneme pronounced by several different people would need to be composed. Using such a list, it is also possible to try to develop a feeling for which characteristics are important for some phoneme and which are not.
Using the currently available analysis method and displaying methods, a number of problems that deaf people face which were mentioned in the introduction have already been solved:
Some sorts of music featuring a clear, high pitched tone are visible clear enough in the spiral display to enable recognising the song. In a real-time version, it would even be possible for someone who is deaf to sing along with it, without being off-pitch.
Noise is displayed very clearly in the spiral display.
Different vowels look different in both displays, but the differences are visible much clearer in the block display. (It took me only a few minutes to learn to recognise five different vowels in the block display, regardless of who pronounces the vowel. Recognising vowels in the spiral display is much harder, although not impossible.) One of the reasons for this is the fact that the block display is always at (about) the same position, while in the spiral display everything moves around very fast if the pitch of the sound changes. A disadvantage of both methods is that if the speech is not very slowly, movements are still too fast to recognise much. If people are able to learn to recognise the vowels in the spiral display, this is recommended, since this is more reliable than the block display, which can show only one tone and which depends on the correct determination of the base frequency of this tone, which in turn depends on the absency of frequencies that do not belong to the tone, and on the tone itself.
The fact that it is possible to recognise some vowels and some sorts of
music after just minutes of training (for someone who is not deaf),
raises hopes that these displays
perform better than the surgery that is mentioned in the introduction,
for which five years of training is necessary just to be able to
differentiate music from speech.
In order to determine how the displays perform in real life, extensive field testing is necessary. The analysis method and displaying methods that are developed in this thisis form a good starting point for investigating this. Positive and negative aspects of the analysis method and the displaying methods need to be determined, and from the results of this new methods might need to be developed. The implementation (see appendix B) enables this to a limited extend, since it cannot be ran in real-time (yet). But still, it is easy to implement new methods and add functionality to perform operations on data (which can be recorded in the field), so a reasonable idea of how it performs can be obtained.
This appendix contains some transformation algorithms that were derived and sometimes used in the previous chapters. The algorithms are written in some pseudo-code, variable declarations are left out.
COMMENT This procedure transforms the sample interval to a (F).
PROC trans_au_2_cf(au[], cf_real[], cf_imag[])
tp := PI_2 / (SAMPLE_SIZE * PARTS);
FOR x FROM 0 UPTO SAMPLE_SIZE * PARTS - 1
DO
tp_mul_x := tp * x;
sine[x] := sin(tp_mul_x);
cosine[x] := cos(tp_mul_x);
OD;
FOR x FROM 0 UPTO N * PARTS - 1
DO
f_real := 0;
f_imag := 0;
cs_count := 0;
FOR u FROM 0 UPTO SAMPLE_SIZE - 1
DO
f_real +:= au[u] * cosine[cs_count];
f_imag +:= au[u] * sine[cs_count];
cs_count +:= x;
IF cs_count >= SAMPLE_SIZE * PARTS
THEN cs_count -:= sample_size * PARTS;
FI;
OD;
cf_real[x] := 2 * f_real;
cf_imag[x] := 2 * f_imag;
OD;
COMMENT Calculate some numbers in advance
v1e := PI_2 * INT_END / SAMPLE_SIZE / PARTS;
v1s := PI_2 * INT_START / SAMPLE_SIZE / PARTS;
FOR x FROM 0 UPTO N * PARTS - 1
DO
v1e_mul_x := v1e * x;
v1s_mul_x := v1s * x;
CONST sine_e[x] := sin(v1e_mul_x);
CONST cosine_e[x] := cos(v1e_mul_x);
CONST sine_s[x] := sin(v1s_mul_x);
CONST cosine_s[x] := cos(v1s_mul_x);
OD;
FOR x FROM 0 UPTO PARTS * SAMPLE_SIZE
DO
CONST sine[x] := sin(PI_2 * x / (PARTS * SAMPLE_SIZE));
CONST cosine[x] := cos(PI_2 * x / (PARTS * SAMPLE_SIZE));
OD;
COMMENT Calculate at what peak strength we should stop
halt_strength = trans_cf_2_pp(cf_real, cf_imag) * GO_TO_PRECISION;
COMMENT Now find peaks until it gets below the calculated halt_strength
WHILE (trans_cf_2_pp(cf_real, cf_imag)) > halt_strength
DO
OD;
COMMENT This function takes the (F) and finds and stores the highest peak in it
COMMENT The found peak is then removed from the stored (F)
COMMENT Also, it returns the strength of this peak
FUNCTION trans_cf_2_pp(cf_real[], cf_imag[])
COMMENT Find maximum
max_v := 0;
max_v_pos := 0;
FOR x FROM 0 UPTO N * PARTS - 1
DO
v := (cf_real[x] * cf_real[x] + cf_imag[x] * cf_imag[x]);
IF v > max_v
THEN max_v := v;
max_v_pos := x;
FI;
OD;
max_v_real := cf_real[max_v_pos];
max_v_imag := cf_imag[max_v_pos];
COMMENT Calculate phase
IF max_v_imag = 0
THEN phi := atan(max_v_real * ¥);
ELSE phi := signed_atan(max_v_real, max_v_imag);
FI;
number_of_waves := max_v_pos / PARTS;
max_v := sqrt(max_v);
mmax_v := max_v;
COMMENT Calculate the (F) that corresponds to the current maximum
calc_cf_1(bp_real, bp_imag, -phi, number_of_waves, max_v);
COMMENT Subtract this from the original signal
FOR x FROM 0 UPTO N * PARTS - 1
DO
cf_real[x] -:= bp_real[x];
cf_imag[x] -:= bp_imag[x];
OD;
RETURN mmax_v;
COMMENT This procedure returns the (F)1 of a single sinusoid,
with number_of_waves waves within the interval,
phase given by phi
and amplitude amp / PARTS.
COMMENT Also, it stores the corresponding peak information.
PROC calc_cf_1(bp_real, bp_imag, phi, number_of_waves, amp)
amplify := amp / PARTS;
original_number_of_waves := number_of_waves;
multiply := amplify;
found_number_of_waves := 0;
finished := FALSE;
done := FALSE;
direction_up := FALSE;
direction_down := FALSE;
ac := FALSE;
one_div_parts := 1 / PARTS;
WHILE NOT finished
DO
IF done
THEN start := 0;
end := N * PARTS;
finished := TRUE;
ELSE start := PARTS * number_of_waves - 1;
end := start + 3;
IF start < 0
THEN start := 0;
end := 3;
FI;
IF end > N * PARTS
THEN start := N * PARTS - 3;
end := N * PARTS;
FI;
FI;
phi_plus_pi_2_times_number_of_waves_endint :=
phi + PI_2 * number_of_waves * INT_END / SAMPLE_SIZE;
phi_plus_pi_2_times_number_of_waves_startint :=
phi + PI_2 * number_of_waves * INT_START / SAMPLE_SIZE;
s_1e = sin(phi_plus_pi_2_times_number_of_waves_endint);
s_1s = sin(phi_plus_pi_2_times_number_of_waves_startint);
c_1e = cos(phi_plus_pi_2_times_number_of_waves_endint);
c_1s = cos(phi_plus_pi_2_times_number_of_waves_startint);
x_div_parts = start / PARTS;
number_of_waves_mul_parts := number_of_waves * PARTS;
remember_number_of_waves := number_of_waves;
COMMENT Avoid division by 0
IF number_of_waves = 0
THEN number_of_waves := a very small number;
FI;
FOR x FROM start UPTO end - 1
DO
IF NOT (x := number_of_waves_mul_parts)
AND NOT (x := (number_of_waves_mul_parts + 1))
OR number_of_waves_mul_parts > PARTS
THEN x_div_number_of_waves := x_div_parts / number_of_waves;
mul := multiply / (PI *
(number_of_waves - x_div_parts * x_div_number_of_waves));
bp_real[x] := mul *
(c_1e * cosine_e[x] - c_1s * cosine_s[x]
+ x_div_number_of_waves * (s_1e * sine_e[x]
- s_1s * sine_s[x]));
bp_imag[x] := mul *
(c_1e * sine_e[x] - c_1s * sine_s[x]
- x_div_number_of_waves * (s_1e * cosine_e[x]
- s_1s * cosine_s[x]));
ELSE
IF NOT ac
THEN
real := 0;
imag := 0;
parts_mul_sample_size := PARTS * SAMPLE_SIZE;
parts_mul_number_of_waves := PARTS * number_of_waves;
rotate_count := 0;
wave_add_value := parts_mul_number_of_waves;
wave_current_value := phi / (PI_2 * parts_mul_sample_size);
IF wave_current_value < 0
THEN wave_current_value +:= parts_mul_sample_size;
FI;
IF wave_current_value >= parts_mul_sample_size
THEN wave_current_value -:= parts_mul_sample_size;
FI;
FOR u FROM 0 UPTO SAMPLE_SIZE - 1
DO
v := sine[wave_current_value];
real +:= v * cosine[rotate_count];
imag +:= v * sine[rotate_count];
rotate_count +:= x;
IF rotate_count >= parts_mul_sample_size
THEN rotate_count -:= parts_mul_sample_size;
FI;
wave_current_value +:= wave_add_value;
IF wave_current_value >= parts_mul_sample_size
THEN wave_current_value -:= parts_mul_sample_size;
FI;
OD
bp_real[x] := multiply * real / (SAMPLE_SIZE / 2);
bp_imag[x] := multiply * imag / (SAMPLE_SIZE / 2);
FI;
FI;
bp_real[x] := -bp_real[x];
bp_imag[x] := -bp_imag[x];
x_div_parts +:= one_div_parts;
OD;
number_of_waves := remember_number_of_waves;
IF NOT finished
THEN
v0 := bp_real[start ] * bp_real[start ] +
bp_imag[start ] * bp_imag[start ];
v1 := bp_real[start+1] * bp_real[start+1] +
bp_imag[start+1] * bp_imag[start+1];
v2 := bp_real[start+2] * bp_real[start+2] +
bp_imag[start+2] * bp_imag[start+2];
IF v0 > v1 AND v0 > v2
THEN max_number_of_waves := start / PARTS;
FI;
IF v1 > v0 AND v1 > v2
THEN max_number_of_waves := (start + 1) / PARTS;
FI;
IF v2 > v0 AND v2 > v1
THEN max_number_of_waves := (start + 2) / PARTS;
FI;
done := TRUE;
ac := TRUE;
IF max_number_of_waves > original_number_of_waves
AND number_of_waves <= original_number_of_waves
AND number_of_waves > 0
AND NOT direction_up
THEN ac := FALSE;
number_of_waves -:= .5 / PARTS;
phi +:= PI * .5 / PARTS;
IF number_of_waves >= .5 / PARTS
THEN done := FALSE;
direction_down := TRUE;
FI;
FI;
IF max_number_of_waves < original_number_of_waves
AND number_of_waves >= original_number_of_waves
AND NOT direction_down
THEN ac := FALSE;
number_of_waves +:= .5 / PARTS;
phi -:= PI * .5 / PARTS;
IF number_of_waves <= N * PARTS - .5 / PARTS
THEN done := FALSE;
direction_up := TRUE;
FI;
FI;
FI;
OD;
store number_of_waves * PARTS;
store amplify * sin(phi);
store amplify * cos(phi);
COMMENT This procedure combines corresponding peaks to one larger peak
PROC trans_pp_2_tpc(list_of_peaks)
WHILE NOT ((pp := next(list_of_peaks)) = LAST_PEAK)
DO
tp_write := pp.position + 1;
tp_read := tp_write_at;
IF NOT tp_used[tp_read]
AND (tp_used[tp_read - 1] OR tp_used[tp_read + 1])
THEN
IF tp_total[tp_read - 1] > tp_total[tp_read + 1]
THEN tp_read -:= 1;
ELSE tp_read +:= 1;
FI;
tp_real[tp_write] := tp_real[tp_read];
tp_imag[tp_write] := tp_imag[tp_read];
tp_used[tp_write] := TRUE;
tp_total[tp_write] := tp_total[tp_read];
tp_position[tp_write] := tp_position[tp_read];
tp_real[tp_read] := 0;
tp_imag[tp_read] := 0;
tp_used[tp_read] := FALSE;
tp_total[tp_read] := 0;
tp_position[tp_read] := 0;
FI;
tp_strength := sqrt(pp.real * pp.real + pp.imag * pp.imag);
tp_real[tp_write_at] +:= pp.real;
tp_imag[tp_write_at] +:= pp.imag;
tp_used[tp_write_at] := TRUE;
tp_total[tp_write_at] +:= tp_strength;
tp_position[tp_write_at] +:= tp_strength * (pp).position;
OD
FOR count FROM 0 UPTO N * PARTS + 2 - 1
DO
IF tp_used[count]
THEN tp_position[count] /:= tp_total[count];
tp_position[count] /:= PARTS;
FI;
OD;
store tp_position[];
store tp_real[];
store tp_imag[];
The implementation is not fast enough to run real-time (for normal computers).
To still get an idea of what a real-time display would be like, it is possible
to combine a number of produced images to some kind of 'film', if possible
including the original sound.
(It is possible to use moving GIFs, but then the sound is left out.)
Note that for testing in a real environment, the implementation needs to
run a lot faster.
This can partially be achieved by using pipelining (different steps of the
transformation can be done simultaneously; after completion the resulting data
is passed through the pipeline to the next step), although this causes
a small delay (if 10 transformations are used, then the image for an
interval is not shown before 10 new intervals have passed).
Using a special purpose chip can speed up the calculation also, but it would
have to be designed and produced first.
Some values that are used in the descriptions in this chapter are:
sample_size: The number of values in the sample interval.
N: The number of (whole) values in the Circular Fourier Transformed.
a: The step size with which the Circular Fourier Transformed is calculated.
Currently, all values that are used in the VOS package
are of the same type, which will be referred to as Rvos.
This type corresponds to the float type that can be found in
many programming languages.
In the file formats that are described here, the values are represented as
floats (like 1.0632, -5426.7642 or 156).
No other representations are allowed; other programs that
communicate with VOS through these files or that replace VOS programs
need to use the same representation.9
To make things clearer, the same example will be used throughout this section.
This example is a sample interval, which consists of the following sinusoids:
A sinusoid with 4.25 waves in the interval, amplitude 40.
A sinusoid with 8.5 waves in the interval, amplitude 30.
A sinusoid with 12.75 waves in the interval, amplitude 20.
A sinusoid with 17 waves in the interval, amplitude 10.
sample_size is set to 250, N is set to 125, a is set to [1/ 16].
[
52.84521
49.5074
44.47116
38.84552
33.58953
29.39153
26.6042
25.24154
25.03303
25.52038
26.17635
26.52206
26.22152
25.13728
23.33978
21.07197
18.67947
16.52277
14.89087
13.93461
13.63336
13.80148
14.13257
14.27141
13.89789
12.80442
10.94923
8.47304
5.67362
2.941488
.6685607
-.852634
-1.518153
-1.470504
-1.110286
-1.054626
-2.045891
-4.822564
.
.
.
]
[
3.774420
0.000000
3.700074
0.320861
3.490409
0.579477
3.183239
0.724500
2.834301
0.724533
2.507613
0.573791
2.264318
0.293380
2.151911
-0.072042
2.195668
-0.461722
2.393783
-0.809177
2.717085
-1.053726
3.113471
-1.151291
3.516401
-1.082570
3.856104
-0.857179
4.071664
-0.512922
4.121960
-0.110208
3.993578
0.277600
3.704263
0.576404
3.301174
0.724165
.
.
]
[
68.000000
0.754889
2.401091
68.000000
0.702137
2.249803
68.000000
0.652960
2.108081
68.000000
0.607117
1.975318
68.000000
0.564575
1.850896
136.000000
1.738537
0.648327
68.000000
0.519747
1.734613
136.000000
1.631269
0.600671
68.000000
0.478315
1.625690
136.000000
1.530611
0.556528
68.000000
0.440028
1.523656
.
.
]
In the image, the partial peaks are displayed in the order that they were found, from bottom to top. The width with which a partial peaks is drawn corresponds to its strength (this was done to spare some space). This way the way that the algorithm that finds the partial peaks works can be seen: Each time, the highest peak is found and reduced a little. Therefore, the leftmost peak (which is the highest), is the first to be found, and it is found several times before another peak occurs in the list. The peak that has the least strength, the rightmost peak, is the last to occur in the list. Also, one can see that the strengths of each following partial peaks diminishes rapidly compared to the latter one. Note that especially for the leftmost total peak, the positions of the partial peaks that are found are not precisely the same.
[
4.187500
0.116187
0.442046
4.250000
9.812099
37.533306
4.312500
-0.050630
0.580375
8.500000
28.276247
9.407896
12.750000
15.390075
-12.444594
17.000000
-1.041886
-9.748543
]
[
4.187500
0.116187
0.442046
4.250925
9.761469
38.113678
8.500000
28.276247
9.407896
12.750001
15.390075
-12.444594
16.999998
-1.041886
-9.748543
]
The difference between the two methods can be seen by looking at the leftmost peak. The partial peaks that correspond to this total peak were not exactly the same. Due to this, this peak is split into three different peaks in the .tp-file. The result in the .tpc-file is not perfect either, but at least one of the three peaks is added to the largest one in this case. The differences between these two methods are often greater.
[
0.000000
1.887210
0.000000
1.000000
3.993578
0.277600
2.000000
4.834719
0.762332
3.000000
7.452860
2.161197
4.000000
32.167206
16.499331
5.000000
-9.109533
-8.576137
6.000000
-3.034911
-6.121105
7.000000
-0.806739
-7.597823
8.000000
3.738009
-19.256525
9.000000
-8.171220
17.218431
10.000000
-4.348225
5.252337
11.000000
-4.311576
2.849575
12.000000
-7.261050
1.532696
.
.
.
]
[ 4.167447 39.800907 29.800245 19.791977 9.804061 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 . . . ]
[ 4.167447 -16.500000 15.500000 0.000000 39.800907 -33.000000 -1.000000 1.000000 29.800245 -1.000000 31.000000 1.000000 19.791977 -49.500000 -25.500000 2.000000 9.804061 ]
[
39.000000
135.000000
26.269814
38.000000
136.000000
78.809441
39.000000
136.000000
65.674530
40.000000
136.000000
39.404720
41.000000
136.000000
13.134907
37.000000
137.000000
144.483963
38.000000
137.000000
118.214157
39.000000
137.000000
105.079254
40.000000
137.000000
78.809441
41.000000
137.000000
52.539627
42.000000
137.000000
26.269814
43.000000
137.000000
13.134907
.
.
]
[
43.000000
14.000000
188.686920
44.000000
14.000000
188.686920
45.000000
14.000000
188.686920
46.000000
14.000000
188.686920
47.000000
14.000000
188.686920
48.000000
14.000000
188.686920
49.000000
14.000000
188.686920
50.000000
14.000000
188.686920
51.000000
14.000000
188.686920
52.000000
14.000000
188.686920
53.000000
14.000000
188.686920
54.000000
14.000000
188.686920
.
.
]
In order to keep things easy, the name wave file will be used throughout the rest of this appendix. Note however that these file do not have a header.
These programs can be called from the prompt of the operating system.
This is tested only for DOS and UNIX, but it will probably also work
for other operating systems (and if not, another solution can be provided,
see B.6.1).
The programs are described in the order in which they will usually be called during a forward transformation.
.<sample_size> is the number of values in the resulting audio file.
.<N> is the number of whole values in the Circular Fourier Transform (not used).
.<PARTS> is the number of values between two standard Fourier Transform values, [1/( a)] (not used).
.<input_file> is the name of the wave file that is read from.
.<output_file> is the name of the audio file that is written to.
.<start_pos> is the (word) position in the input file at which to start reading.
This transforms <sample_size> words of the wave file
<input_file>, starting at (word) position <start_pos>,
to the audio file <output_file>.
Note: If the input file has a header, this should be skipped. This can be done by using the right value for <start_pos>. For example for RIFF Wave-files, which have a 22 words header, 22 should be added to the wanted start position in the sample data to skip the header.
wav_au__ 250 125 16 test2.wav test2_00.au 22
wav_au__ 250 125 16 test2.wav test2_01.au 272
wav_au__ 250 125 16 test2.wav test2_02.au 522
.<sample_size> is the number of values in the original audio file.
.<N> is the number of whole values in the Circular Fourier Transform.
.<PARTS> is the number of values between two standard Fourier Transform values, [1/( a)].
If this value is 1, the result is a standard Fourier Transform.
.<input_file> is the name of the audio file that is read from.
.<output_file> is the name of the Circular Fourier Transform file that is written to.
This transforms the audio file <input_file> of size sample_size to its Circular Fourier Transform, which is calculated from position 0 to N, with step size [1/( PARTS)], and stores the result in the Circular Fourier Transform file <output_file>.
wav_au__ 250 125 16 test2_03.au test2_03.cf
.<sample_size> is the number of values in the original audio file.
.<N> is the number of whole values in the Circular Fourier Transform.
.<PARTS> is the number of values between two standard Fourier Transform values, [1/( a)].
.<input_file> is the name of the Circular Fourier Transform file that is read from.
.<output_file> is the name of the partial peaks file that is written to.
.<leftover_file> is the name of the Circular Fourier Transform file that the leftover Circular Fourier Transform data is written to.
.<precision|-halt_value> determines at what moment the finding of partial peaks is ended.
precision Is a value between 0 and 1. The first found (and usually strongest) peak strength is multiplied by this value. As soon as a peak strength is found that is below this strength,
the search is abandoned.
halt_value Is a value above 0 (preceded by a - to separate it from precision).
As soon as a peak strength is found that is below this value, the search is abandoned.
This takes the Circular Fourier Transform of size sample_size, which is calculated from position 0 to N, with step size [1/( PARTS)] from file <input_file>, finds partial peaks until the required precision or halt_value is reached, and then stores the result in the partial peaks file <output_file>. The Circular Fourier Transform data that is left over (finding peaks is stopped before all values in the transform are reduced to 0) is written to the Circular Fourier Transform file <leftover_file>.
cf__pp__ 1000 500 8 test1_00.cf test1_00.pp test1_00.cf_ -200
cf__pp__ 250 125 16 test2_03.cf test2_03.pp test2_03.cf_ .0025
1 For m = 0, xm = ¥, so [(2p)/( xm)] = 0. Therefore this division by zero causes no problems.
2 ëq û is q rounded downwards, éq ù is q rounded upwards.
3 Nyquist Theorem:
Accurate reproduction of a signal demands that the sampling frequency be
at least twice the rate of the highest frequency in the source
signal.
4 A wave is defined here as a one period of a sinusoid. The number of waves within an interval is [(fn)/( r)], where f is the frequency of the sinusoid (in number of waves/second), r is the sample rate (the number of measured values/second) of the signal, and n is the number of measured values within the interval. The sample time is T = n/r.
5 This is the reason why I assumed that this transform was not equal to another special Fourier Transform
6 One could argue that this could
be postponed until a real time version of the program needs to be written.
This would however make it almost impossible to use the current program for
even the smallest tests, so it really needs to be done now.
One could also argue that it is also possible to gain speed by using a
faster computer or maybe a different programming language.
However, to get a good idea of the properties and possibly problems of
displaying sounds, it is also necessary to calculate series of images
for longer signals, which would still take quite a large amount of time.
(For example, a speedup of a factor ten is unsufficient is hundreds of
images need to be calculated. By deriving and simplifying a formula,
far greater gains can be obtained.
7 The values -1/2 and M - 1/2 have been chosen arbitrarely and give good results. Whether these results are exact or only a good estimate, is not really important here and is not discussed any further. Note that if the values 0 and M - 1 are chosen, a better optimisation is possible. The results however get a lot worse, so these values cannot be used.
8 This could have been corrected before, but since the program gets bigger because of this and it does not affect the optimising, it has been postponed.
9 If it is really necessary or if a speedup is required, it is possible to change this format by changing only one procedure and one function in the file vos_io.c.