Juniper Publishers

Juniper publishers have been established with the aim of spreading quality scientific information to the research community throughout the universe. We, as Open Access publishers, strive to offer the best in class online science publications. Open Access process eliminates the barriers associated with the older publication models, thus matching up with the rapidity of the twenty-first century. Our main areas of interest lie in the fields of science, engineering and other related areas.

Wednesday, December 16, 2020

Introduction to Tape Authentication- A Study of Acoustic Characters - Juniper Publishers

Trends in Technical & Scientific Research - Juniper Publishers

Introduction

Two terms often confused within are speaker verification and speaker recognition. The basic difference between the two lies in identity of the speaker. In other words, speaker verification could also be explained as the search which 1:1 match of the speaker and the other as 1: N search for the same. A sound spectrogram is also considered as a “photograph of an individual’s voice”. Taking the noise encountered during recorded speech recognition into consideration, probable location of the individual during the time of speech could be known.

Auditory analysis in general could be explained as the examination of various constituents of the voice which is received by an ear. The various governing factors for the same includes, sound quality, the order in which an individual breathes, amplitude, the style of speaking, the irregularities in speech, psychological conditions, etc., including all the acoustic characteristics of the individual.

Auditory Analysis

Each of ours voice is unique and the features that makes it unique are known as the voice prints. Voice prints are significant importance, specifically talking about forensic phonetics. The movement of the air particles occurs at a specific frequency and amplitude with respect to an individual [1].

The auditory analysis is the procedure of examination of the various parts and constituents of an individual’s voice that are received by ear. The characteristics if major concern, include the identification of an individual’s pitch, breathing order, his acoustics, tone, intonation and the state of mind of the individual while speaking. The auditory analysis of an individual’s voice is considered as one of the most important components under consideration, by the experts comparing and identifying the voice samples given. The various characteristics that are taken into consideration includes:

a) Melody

b) Irregularities in speech

c) Timbre of voice

d) Jargon used

e) Intensity

f) Emotional status

g) Rate of breathing while speaking

h) Pronunciation errors

i) Speed while speaking

As per various research conducted, it is suggested that the success rate of speaker recognition using auditory analysis is 85-99%. It is said so because the speaker rarely deviates from his individual characteristics while speaking. The characteristics , discussed above are some of the few, characters that are used for the analysis, the way you speak , the stress you give on a particular word, the accent you use are all that is included in the acoustic character used in differentiation you from rest of the population.

Speaking Style

The features of the character basically involve the features one has while speaking, the way in which the sentence spoken generally begins, the accent that is used while speech. The speaking style includes the property, of either being a fast speaker, or a slow speaker, or whether the speaker is a spontaneous one or not.

Speaking Speed

It is also not that the same person speaks in the same speed every time, condition is mostly emotion driven. The speaking speed of an individual is also said as the way in which he speaks a particular word, the time he takes, when he comes across each syllable and the pause, he gives between them. It is considered as one of the important factors while verification of sound as per auditory analysis.

Sound Intensity

Intensity is basically the voice tone of an individual. Intensity in voice is dependent upon the situation of an individual or the environment in which that specific sentence has been spoken. Intensity in one’s speech is the identification of recording that is made, giving an evidence of the situation in which, that individual was present.

Voice Timbre

Voice timbre, in simple words could be said as the quality of the tone while speaking. There is a slight difference between pitch and timbre, which is, that, timbre is a combination of overtone frequencies, whereas pitch usually involves a single frequency. Voice timbre is considered as the most important acoustic character, which when heard in a well stored record, can make the work of the investigator easier.

Irregularities in Speech

The stammer in one’s speech or in other words, the obstruction in an individual’s speech. Apraxia, dysarthria, DASE etc., could be considered while investigations and are the characteristic while searching an individual’s speech, accused of in a particular crime.

Sound Spectrometric Analysis

It would not be wrong to say that spectrographic analysis of sound is similar to as taking a picture of voice, based on the parameters of examining it visually. Various software are used for the creation and analysis of spectrograph, includes, Pratt, wave surfer, TF32. TF32 is most significantly used for analyzing the spectrograph, due to its simplicity and user friendliness, while working. Various parameters taken into consideration, includes:

a) Formant shape and its position

b) Pitch period

c) Noise speech ratio

d) Length of word and sentence length

e) Amplitude of formant

The parameters discussed above play a major role, during the analysis of the questioned voice sample, in order to find its authenticity. To improve test results, use of FTIR is considered a better alternative as compared to praat and other techniques used for analysis [2-4].

Praat is considered the second most used software, due to advantage it provides its users of noting down speech files and the way in which it simplifies the analyzing and labelling of the sound sample given or run down while finding out the audio report. There are many factors, considered, while writing down or telling about a specific sound spectrograph and analyzing it’s result, which is to recognize the specific voice sample.

Formant frequency

The formant frequency is taken into consideration, while reading up onto the result of the spectrograph thus encountered. The formant frequency in simpler words, is considered as the highest energy of the components of the sound in the fundamental frequency, or it could also be explained as the regions in the spectrogram having an amplitude that is high. The formant values depend on the age, sex of the individual. Formant shapes are characteristic of the movement of tongue, along with the use of the type of the alphabet that is spoken. The formant frequency is of significant importance when forensic phonetics is considered. Lower predictive coding is another matter of concern while predicting the formant frequency of the given sound sample. It is of concern that at least two LPC coefficients are to be calculated in order to find the formant frequency. Another plus point while using the spectrogram is the change in formant frequency, if any kind of disorder is present in the auditory system of the speaker.

Formant bandwidth and amplitude

The formant bandwidth, which is usually having a frequency range of -3Db, it is used for speech and speaker and the recognition of the speech. The bandwidth of the voice spectrogram is considered of significant importance, as it enables us to determine whether the voice in spectrogram is clear or not. It is explained as, having an inverse proportionality relation with the bandwidth, i.e., lower the bandwidth, higher is the clarity of the speech of the individual being tested in the spectrogram. A person’s speech habit influences the bandwidth, which is an important aspect of speech and voice recognition. The other aspect to be taken care of while working with the formant amplitude, is the formant frequency, which has a direct impact on the formant bandwidth. The distance between the formants is also another characteristic factor, while talking about the characteristic features of sound identification in an individual, which is the distance between two formants , i.e., as the distance between the formants decrease , their effect would be enhanced.

Spectrogram, when observed carefully has linear lines present on it, which is the pitch period. The height of the pitch is an indication of the pitch of the individual. The basic frequency is 120Hz for men and 230Hz, that of women. Frequency of an individual is considered to be inversely proportional to the pitch period interval. Various emotions including excitement, anger is seen to change the pitch of an individual. there are various other uses of pitch including, the determination of the sex of the speaker, when unknown.

Harmony ratio

The other important that is to be taken care of, while speaking of the spectrogram analysis of sound, including the noise-sound recognition and the word and sentence length. The NHR is of significant importance because is needed to make a sound evidence admissible in the court of law, because of the level required for the noise ratio taken in record. The spectrogram can also be used to determine, the speed of the speaker ,with would further , enable the investigator to shortlist from the list of suspects , to find out the actual culprit by sentence gap and the other acoustic habits of the speaker by looking at the spectrograph.

It should be noted that, the speaking speed is highly individualized, and enables in shortlisting from the group of suspects. Jitter is another aspect that enables the admissibility of an evidence in the court of law.

Frequency distortion

Jitter is basically the stress value and the degree of naturalness in an individual’s voice. In other words, it could be said as the distortion in frequency. The peak amplitude average is also another factor of importance in identification of speaker, so as to shortlist from the list of the probable speakers ,it enables us to know about the air taken by the lung during the speech of an individual , as it can be an indication of a specification impairment disease so as to enable us in shortlisting [5-7].

Development of Speaker Recognition by Visual Comparison of Spectrogram

a) As till now, we are well aware about the various characteristics taken into consideration, while examination of sound spectrogram. In simple words, a sound spectrogram, is variation as per the spectra of the sound in the speech of the individual.

b) The next method is Kersta method, which is used to determine the SRS, which is “voiceprint” identification method. It was identified in the year 1962, which said that speech spectrogram, is as permanent as that of the fingerprints, as unique as them.

c) The next study was contrary to the previous one, which said that the languages spoken also had a severe impact on the SRS made.

d) The Tosi study leads to the attempted validation of the Kersta approach. the approach said that the experts to be used in order for identification so as to reduce the errors during experiment.

e) After the national academy of sciences report, the SRS was made admissible in the court of law , in the year 1976, as per the FBI request, but it was further added that the evidence is only admissible when tested upon by a qualified expert.

Other Recognition Techniques

The other ways of recognition done by experts could be explained as the ones, done by either of two methods, that includes:

a) Aural perceptual approach: it is basically the use of detailed auditory analysis, that involves the parameters, such as the ones explained above. The various observations based on the linguistics of an individual are included in this approach. the approach also involves the use of IPA, during transcription of the analysis done on the basis of the characters taken into consideration.

b) Phonetic-acoustic approach: it is approach that basically involves, the use of relative amplitude, trajectory and the bandwidth based on the frequencies of the sound waves involved while speaking of an individual. The approach involves the use of various techniques involving visualization and the algorithms involving signal processing of specific signals as per the auditory analysis of the given specimen. The pitch, energy distribution as per the spectra, jitter is taken into consideration while examination of the sound analysis.

Interpretation of Results

a) First of all, the individuality of the speaker or the voice sample is verified. To reach up to a single, a lot of approach could be found out, either reduction process is used or grouping them into smaller and most probable groups is done. Finally, the aim of the expert to find of an individual either to lead the storyline or the accused of the event.

b) It should be kept in mind, that while observing the suspected population, the evidence when talking about voice evidence can only be identified as belonging to an individual.

It should be kept in mind that a particular recording can have multiple noise. In order to find out the actual sound required, “signal dependent filtration” is the one needed. The various filtering techniques used, includes adaptive filtering and the next technique required is of spectral subtraction (Figure 1).