//Spring 2010.
//Project 02.
//Copyright @ 2010 antonio081014 antonio081014 ;
//All codes are in Matlab.
Report Link:
Task 1: Obtain estimated F0 by using the log harmonic product spectrum with K = 4 and K=10.
PS: The log product of the harmonic spectrum is the summation of the log harmonic spectrum.\
1. Down sampling with K=4 and K=10 to get the spectrum for each case.
2. Sum them up.
3. Find the highest peak in the final spectrum. That is the estimated F0.
Code:
fdate = fft(dat(:,15).*hamming(320), 1024); % The 15th frame in the data.
tmp = log(abs(fdat3(1:512)));
figure;
subplot(4,1,1);
plot(tmp);
title(['The frequency plot of the frame #:', num2str(23)]);
xlabel('Freq index');
ylabel('Log-Magnitude');
hold on;
tmp1 = downsample(tmp, 4);
z = zeros(512 - length(tmp1),1);
tmp1 = [tmp1; z];
tmp2 = downsample(tmp, 10);
z = zeros(512 - length(tmp2),1);
tmp2 = [tmp2; z];
subplot(4,1,2);
plot(tmp1, 'r');
xlabel('Freq index, K=4');
ylabel('Log-Magnitude');
subplot(4,1,3);
plot(tmp2, 'g');
xlabel('Freq index, K=10');
ylabel('Log-Magnitude');
product = tmp + tmp1 + tmp2;
subplot(4,1,4);
plot(product, 'r');
xlabel('Freq index');
ylabel('Log-Magnitude');
[x, y] = ginput(3);
% x = [11.79; 57.60; 93.18;]; These are the index in frequency domain.
Task 2: Using overlap-add method, reconstruct the given utterance after filtering each frame using a bandpass filter from 50Hz-2000Hz.
1. Filter each frame.
2. Reconstruct the utterance by using overlap-add method.
Code:
% Filter:
recdat = zeros(320, 100);
for i=1:100 % There is 100 frames here.
fmdate = dat(:,i);
ffmdat = fft(fmdate.*hamming(320),1024); % Take Fourier Transform using 1024 points.
LF = ceil(1024 / fs * 50);
HF = floor(1024 / fs * 2000);
ffmdat(1:LF,1) = 0;
ffmdat(HF:512,1) = 0;
ffmdat(512:end-HF,1) = 0;
ffmdat(end-LF:end,1) = 0;
recdat(:,i) = real(ifft(ffmdat,320));
clear ffmdat;
end
% Reconstruct
update_sp = zeros(1,16000);
start = 1;
stop = start + 320 - 1;
for i=1:100
update_sp(start:stop) = update_sp(start:stop) + recdat(:,i)'; % overlap-add;
start = start + 160;
stop = start + 320 - 1;
if stop > 16000
break
end
end
figure;
plot(update_sp);
title('The filterd speech.');
% soundsc(update_sp);
fm = recdat(:,23);
ffm = fft(fm.*hamming(320), 1024);
figure;
plot(abs(ffm(1:512)));
title(['The filterd frame #:', num2str(23)]);
[xx, yy] = ginput(3);
% xx = 35.2534562211982;72.5806451612903;107.142857142857;];
%%
figure;
x = nn_spch;
window = 128;
noverlap = 64;
nfft = 128;
fs = 16000;
spectrogram(x,window,noverlap,nfft,fs, 'yaxis');
colormap(gray);
title('Power Spectral Density, Original');
xlabel('In Time Domain.');
ylabel('In Freq Domain.');
figure;
x = update_sp;
window = 128;
noverlap = 64;
nfft = 128;
fs = 16000;
spectrogram(x,window,noverlap,nfft,fs, 'yaxis');
colormap(gray);
title('Power Spectral Density, Filterd');
xlabel('In Time Domain.');
ylabel('In Freq Domain.');
ylabel('In Freq Domain.');