Comparison of 2 'wav' files

Discussion:

(too old to reply)

HuaMin

2009-06-16 10:22:03 UTC

Hi,
What's the right mechanism for comparing that, to check if that is for the
same sound or not? Is it better to choose C++ instead of C# for doing this?

Bob Masta

2009-06-16 11:02:29 UTC

Permalink

On Tue, 16 Jun 2009 03:22:03 -0700,
=?Utf-8?B?SHVhTWlu?=

Post by HuaMin
Hi,
What's the right mechanism for comparing that, to check if that is for the
same sound or not? Is it better to choose C++ instead of C# for doing this?

It's not clear what you are looking for.

If you just want to see if the files are
identical, any byte-by-byte comparison scheme will
work perfectly.

If you want to know if two files are different
recordings of the same sound (different mic types
and/or positions, different start-stop times,
etc.) then the job is *much* harder. You would
probably want to do FFTs (say, 1024 samples each)
followed by some sort of feature extraction.

If you have reason to believe that the recordings
are identical except for a time shift, then
correlation techniques may be the best choice.

You will not get a yes/no result from these FFT or
correlation methods. You will have to apply some
threshold based upon your requirements and
experience to estimate the likelihood of a match.
Definitely not a simple project!

Best regards,

Bob Masta

DAQARTA v4.51
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!

HuaMin

2009-06-23 05:12:01 UTC

Permalink

Many thanks Bob. Sorry for my late reply. Is there any existing way (like the
correlation techniques) that compares the sound from different people, for
instance, different pronounciation from different people?

Post by Bob Masta
On Tue, 16 Jun 2009 03:22:03 -0700,
=?Utf-8?B?SHVhTWlu?=

Post by HuaMin
Hi,
What's the right mechanism for comparing that, to check if that is for the
same sound or not? Is it better to choose C++ instead of C# for doing this?

It's not clear what you are looking for.
If you just want to see if the files are
identical, any byte-by-byte comparison scheme will
work perfectly.
If you want to know if two files are different
recordings of the same sound (different mic types
and/or positions, different start-stop times,
etc.) then the job is *much* harder. You would
probably want to do FFTs (say, 1024 samples each)
followed by some sort of feature extraction.
If you have reason to believe that the recordings
are identical except for a time shift, then
correlation techniques may be the best choice.
You will not get a yes/no result from these FFT or
correlation methods. You will have to apply some
threshold based upon your requirements and
experience to estimate the likelihood of a match.
Definitely not a simple project!
Best regards,
Bob Masta
DAQARTA v4.51
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!

Bob Masta

2009-06-23 12:07:57 UTC

Permalink

On Mon, 22 Jun 2009 22:12:01 -0700,
=?Utf-8?B?SHVhTWlu?=

Post by HuaMin
Many thanks Bob. Sorry for my late reply. Is there any existing way (like the
correlation techniques) that compares the sound from different people, for
instance, different pronounciation from different people?

I haven't looked lately, but I'll bet you can find
a huge body of work by searching for "speech
recognition" plus "technique" or "algorithm", etc.

The last I checked, this was still regarded as a
hard problem to solve. I suspect the best speech
recognition software uses multiple techniques,
with plenty of "fudge factors" based upon test
results.

If you are doing basic research on pronunciation,
your job may be somewhat easier if you can have
each subject utter the same short phrase, word, or
even single syllable. Then you can align the
starts and use a series of short overlapping
spectra to watch the development of the sounds.

My Daqarta software can show color spectrograms of
real-time sounds. But to get highest time
resolution (high overlap) it's best to record the
sound first. See
<http://www.daqarta.com/dw_sgram.htm>
for speech examples.

It won't cost you a thing to try, and if you don't
need live input you can avoid the US$29 purchase
price altogether: After the 30-day/30-session
trial expires, the inputs stop working but you can
still analyze files.

Daqarta only shows one spectrogram at a time, not
side-by-side comparisons of different subjects.
(Though I suppose if you were motivated you could
splice two short utterances from different
subjects so they appeared sequentially in the same
file.) But even if this is not the solution to
your problem (which you still haven't explained),
it should give you plenty of insight on the issues
you will need to deal with. You can, for example,
change window functions, overlap, and dynamic
range to see how they affect the spectrogram.

Best regards,

Bob Masta

DAQARTA v4.51
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!

HuaMin

2009-06-24 06:17:01 UTC

Permalink

Any advice?

Post by Bob Masta
On Tue, 16 Jun 2009 03:22:03 -0700,
=?Utf-8?B?SHVhTWlu?=

Post by HuaMin
Hi,
What's the right mechanism for comparing that, to check if that is for the
same sound or not? Is it better to choose C++ instead of C# for doing this?

It's not clear what you are looking for.
If you just want to see if the files are
identical, any byte-by-byte comparison scheme will
work perfectly.
If you want to know if two files are different
recordings of the same sound (different mic types
and/or positions, different start-stop times,
etc.) then the job is *much* harder. You would
probably want to do FFTs (say, 1024 samples each)
followed by some sort of feature extraction.
If you have reason to believe that the recordings
are identical except for a time shift, then
correlation techniques may be the best choice.
You will not get a yes/no result from these FFT or
correlation methods. You will have to apply some
threshold based upon your requirements and
experience to estimate the likelihood of a match.
Definitely not a simple project!
Best regards,
Bob Masta
DAQARTA v4.51
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!

Chris P.

2009-06-24 14:09:17 UTC

Permalink

That would be called voice recognition. What are your exact requirements?
Do you have a commercial need or is this a research project?

--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-03 03:25:01 UTC

Permalink

Thanks Chris. It's a commerical need.

Post by Chris P.

That would be called voice recognition. What are your exact requirements?
Do you have a commercial need or is this a research project?
--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-06 04:25:01 UTC

Permalink

Chris,
Do you have more advice for doing that?

Post by HuaMin
Thanks Chris. It's a commerical need.

Post by Chris P.

Chris P.

2009-07-06 16:54:13 UTC

Permalink

Post by HuaMin
Do you have more advice for doing that?

Nuance has an SDK that can be licensed from them. I believe it is called
"Nuance Verifier". You will have to contact Nuance sales, they don't sell
it on the web site.

--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-10 04:12:01 UTC

Permalink

Many thanks and good day Chris.

I've checked that, one product from that is 'Dragon Naturally speaking'. As
its technical support does need a valid registry key, I can't use that
service. Do you know which product actually is for comparing 2 sounds.

How about the idea to store a sound like a sequence of sme stuff?

Post by Chris P.

Post by HuaMin
Do you have more advice for doing that?

Nuance has an SDK that can be licensed from them. I believe it is called
"Nuance Verifier". You will have to contact Nuance sales, they don't sell
it on the web site.
--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-10 07:41:01 UTC

Permalink

Chris,
I remember that there's a way to transfer a 'wav' file into something that
can be stored into the PC. How about that way?

Post by HuaMin
Many thanks and good day Chris.
I've checked that, one product from that is 'Dragon Naturally speaking'. As
its technical support does need a valid registry key, I can't use that
service. Do you know which product actually is for comparing 2 sounds.
How about the idea to store a sound like a sequence of some stuff?

Post by Chris P.

Post by HuaMin
Do you have more advice for doing that?

Chris P.

2009-07-10 14:30:14 UTC

Permalink

Post by HuaMin
I've checked that, one product from that is 'Dragon Naturally speaking'. As
its technical support does need a valid registry key, I can't use that
service. Do you know which product actually is for comparing 2 sounds.
How about the idea to store a sound like a sequence of sme stuff?

Are you comparing sounds or voices? Are you trying to authenticate the
voice or match it to a phrase?

If you are comparing speech phrases, then comparing sounds is not enough.
You have to break the speech down into phonemes and then compare this to
what you've stored.

If you are voice printing for speaker identification then this is a whole
different challenge. This requires recognizing the high frequency
variations in signal.

Nuance has products that do both of these tasks, you will have to contact
their sales to get evaluation software.

--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-15 10:39:01 UTC

Permalink

Thanks Chris. I do expect to store the sound file of every one word of a
phrase, and further to validate/detect the sound of the words from the speech
against the stored sound files. Is there any example for this?

Post by Chris P.

Are you comparing sounds or voices? Are you trying to authenticate the
voice or match it to a phrase?
If you are comparing speech phrases, then comparing sounds is not enough.
You have to break the speech down into phonemes and then compare this to
what you've stored.
If you are voice printing for speaker identification then this is a whole
different challenge. This requires recognizing the high frequency
variations in signal.
Nuance has products that do both of these tasks, you will have to contact
their sales to get evaluation software.
--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-07-16 06:48:01 UTC

Permalink

Any advice?

Post by HuaMin
Thanks Chris. I do expect to store the sound file of every one word of a
phrase, and further to validate/detect the sound of the words from the speech
against the stored sound files. Is there any example for this?

Post by Chris P.

Are you comparing sounds or voices? Are you trying to authenticate the
voice or match it to a phrase?
If you are comparing speech phrases, then comparing sounds is not enough.
You have to break the speech down into phonemes and then compare this to
what you've stored.
If you are voice printing for speaker identification then this is a whole
different challenge. This requires recognizing the high frequency
variations in signal.
Nuance has products that do both of these tasks, you will have to contact
their sales to get evaluation software.
--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

Chris P.

2009-07-16 22:08:47 UTC

Permalink

There are no examples of this. This is Ph.D level research, not something
you are going to find source code for laying around. Use one of the
available commercial products for phonetic comparison, it's really your
only choice.

--
http://www.chrisnet.net/code.htm
[MS MVP for DirectShow / MediaFoundation]

HuaMin

2009-09-22 01:52:01 UTC

Permalink

Many thanks Chris and I do understand the difficulty of this. But I still
expect to have some hints for the way instead of using existing products.

Post by Chris P.

HuaMin

2009-09-29 03:36:01 UTC

Permalink

Any advice?

Post by HuaMin
Many thanks Chris and I do understand the difficulty of this. But I still
expect to have some hints for the way instead of using any existing products.

Post by Chris P.