Repository: HidekiKawahara/legacy_STRAIGHT Branch: master Commit: 964684981fe1 Files: 70 Total size: 273.9 KB Directory structure: gitextract_lz9wvhjj/ ├── .gitignore ├── LICENSE ├── README.md ├── doc/ │ └── README.md ├── morphing_src/ │ ├── angryHai.mat │ ├── createMobject.m │ ├── directSTRAIGHTmorphing.m │ ├── displayMobject.m │ ├── executeSTRAIGHTanalysisM.m │ ├── executeSTRAIGHTanalysisMExt.m │ ├── executeSTRAIGHTsynthesisM.m │ ├── fixDummyObjectSize.m │ ├── makeLogarithmicLevelDifferenceBasedOnPeaks.m │ ├── neutralHai.mat │ ├── setAnchorFromRawAnchor.m │ ├── timeAlignedDirectSTRAIGHTmorphing.m │ ├── timeFrequencySTRAIGHTmorphing.m │ ├── timeFrequencySTRAIGHTmorphingExt.m │ ├── updateFieldOfMobject.m │ └── waveformMorphing.m └── src/ ├── CheckAnalysisData.m ├── HzToErbRate.m ├── MulticueF0v14.m ├── ReadBinaryData.m ├── SynthesizeLegacy_STRAIGHT_default.m ├── TestAnalysisRegression.m ├── TestAnalysisRegressionR.m ├── TestCopySynthRegression.m ├── TestCopySynthRegressionR.m ├── WriteBinaryData.m ├── aiffread.m ├── aiffwrite.m ├── aperiodiccomp.m ├── aperiodicpartERB2.m ├── boundmes2.m ├── correctdpv.m ├── defaultparamsorg.m ├── exSinStraightSynth.m ├── exSinStraightSynthBU.m ├── exSinStraightSynthBU2.m ├── exstraightAPind.m ├── exstraightsource.m ├── exstraightspec.m ├── exstraightsynth.m ├── f0track5.m ├── fixpF0VexMltpBG4.m ├── fractpitch2.m ├── gdmap.m ├── getvalufromedit.m ├── isOctave.m ├── mktstr.m ├── multanalytFineCSPB.m ├── optimumsmoothing.m ├── plotcpower.m ├── powerchk.m ├── refineF06.m ├── regressionTestBaseGenerator.m ├── regressionTestBaseGeneratorR.m ├── smax.m ├── specreshape.m ├── straight.m ├── straightBodyC03ma.m ├── straightCIv1.m ├── straightPanel98bak.m ├── straightSynthTB06.m ├── straightSynthTB07ca.m ├── straightpanel98.mat ├── straightsound.m ├── syncgui.m └── testBestMix.m ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store .css ================================================ FILE: LICENSE ================================================ Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright 2018 Hideki Kawahara All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: README.md ================================================ # Legacy STRAIGHT The legacy-STRAIGHT is a collection of speech analysis, modification and resynthesis tools. ## Installation Set MATLAB path to "src" directory. ## Quick start Paste the following code to MATLAB command window. It generates copy-synthesized output to the variable "syntheszed_signal". [x, fs] = audioread('vaiueo2d.wav'); f0raw = MulticueF0v14(x,fs); ap = exstraightAPind(x,fs,f0raw); n3sgram=exstraightspec(x,f0raw,fs); syntheszed_signal = exstraightsynth(f0raw,n3sgram,ap,fs); For running this using GNU Octave, please load signal package. ``` pkg load signal ``` ## Release note * [July 19, 2018; Prerelease] The "Quick start" example also runs properly on GNU Octave 4.4.0 on macOS High Sierra (10.13.6) * [July 17, 2018: Prerelease] Added documemts. The first release will be on July 24, 2018. * [July 16, 2018: Prerelease] This release is a copy of the latest version which was distributed by the first author (Hideki Kawahara) to academic communities. The version is named STRAIGHTV40_007. The last update was July 17, 2016. Kansai TLO has also licensed the legacy-STRAIGHT for commercial use. The licensees of the legacy-STRAIGHT agreed to make the legacy-STRAIGHT open to the public after July 15, 2018. ## Acknowledgment The legacy-STRAIGHT was supported by many coauthors, contributors, and funding agencies. *** Hideki Kawahara, July 16, 2018 (start date) ================================================ FILE: doc/README.md ================================================ # Documents for legacy-STRAIGHT This directory consists of documents prepared for the legacy-STRAIGHT. Please note that the legacy-STRAIGHT is an ended project. The latest extended morphing framework uses the TANDEM-STRAIGHT. Dr. Masanori Morise, who invented the core component of the TANDEM-STRAIGHT, also distributes an open-source VOCODER framework called WORLD. [Link to mmorise/WORLD](https://github.com/mmorise/World) ## HTML and PDF documents (in this directory) * Getting started with command mode STRAIGHT (May 5, 2007) * file: gettingStartedV40_006b.zip (archived HTML document) * file: gettingStartedV40_006b.pdf (PDF version of the HTML document) * Auditory morphing using STRAIGHT (This framwork is outdated.)(November 7, 2005) * file: morphingWithSTRAIGHT.tar.gz (archived HTML document) * file: morphingWithSTRAIGHTe.pdf (PDF version of the HTML document) * STRAIGHT technical report (In Japanese) * file: straightTechRep.pdf (PDF focument) ## Publications * Hideki Kawahara, Ikuyo Masuda-Katsuse and Alain de Cheveigne: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech Communication, 27, pp.187-207 (1999) [Link](https://doi.org/10.1016/S0167-6393(98)00085-5) * This is the first journal paper on STRAIGHT. Spectral envelope estimation is still relevant. Descriptions on the source information are outdated. * Hideki Kawahara: STRAIGHT, Exploration of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, Acoustic Science and Technology, Vol.27, No.6, (2006)[Link to pdf](http://www.jstage.jst.go.jp/article/ast/27/6/349/_pdf) * This is a featured paper introducing the underlying concept of STRAIGHT. * Hideki Kawahara, Alain de Cheveigné, Hideki Banno, Toru Takahashi and Toshio Irino, Nearly Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on STRAIGHT, Proc. Interspeech2005, Lisboa, pp.537-540, Sept. 2005.[Link to pdf](https://www.isca-speech.org/archive/archive_papers/interspeech_2005/i05_0537.pdf) * This conference paper introduces the latest F0 extractor for the legacy-STRAIGHT, called NDF. The performance of NDF is still competitive in practical situations. ## Link * [Hideki Kawahara](http://www.wakayama-u.ac.jp/~kawahara/index_e.html) *** Last update: Thu Oct 18 14:12:53 JST 2018 ================================================ FILE: morphing_src/createMobject.m ================================================ function mObject=createMobject % Create Mobjcet for morphing % mObject=createMobject % Designed and coded by Hideki Kawahara % 25/February/2005 % 14/October/2005 Added creator information mObject.date = datestr(now); mObject.pwd = pwd; mObject.waveform = []; mObject.samplingFrequency = 44100; % default frequency mObject.F0 = []; mObject.vuv = []; mObject.spectrogram = []; mObject.aperiodicityIndex = []; mObject.frameUpdateInterval = 1; % default frame is 1ms mObject.anchorTimeLocation = []; mObject.maximumFrequencyPoints = 9; % default max frequency anchor points mObject.anchorFrequency = []; mObject.F0extractionConditions = []; mObject.SpectrumExtractionConditions = []; mObject.creatorInformation = which('createMobject'); ================================================ FILE: morphing_src/directSTRAIGHTmorphing.m ================================================ function mObject3 = directSTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Morphing based on direct mixing of STRAIGHT parameters % (Without time alignment) % mObject3 = directSTRAIGHTmorphing(mObject1,mObject2,mRate); % Designed and coded by Hideki Kawahara % 27/Feb./2005 % Copyright(c) 2005, Hideki Kawahara if mObject1.samplingFrequency ~= mObject2.samplingFrequency mObject3 = []; return end; if mObject1.frameUpdateInterval ~= mObject2.frameUpdateInterval mObject3 = []; return end; nw1 = length(mObject1.F0); nw2 = length(mObject2.F0); [nr1,nc1] = size(mObject1.spectrogram); [nr2,nc2] = size(mObject2.spectrogram); nr3 = max(nr1,nr2); nc3 = max(max(nc1,nc2),max(nw1,nw2)); nsg = zeros(nr3,nc3); nSgram = zeros(nr3,nc3); ap = zeros(nr3,nc3); f0 = zeros(nc3); nVoice = zeros(nc3); switch mixMethod case 'linear' nsg(1:nr1,1:nc1) = (1-mRate)*mObject1.spectrogram; nSgram(1:nr1,1:nc1) = nSgram(1:nr1,1:nc1)+(1-mRate); nsg(1:nr2,1:nc2) = mRate*mObject2.spectrogram+nsg(1:nr2,1:nc2); nSgram(1:nr2,1:nc2) = nSgram(1:nr2,1:nc2)+mRate; nsg = nsg./nSgram; case 'log' nsg(1:nr1,1:nc1) = (1-mRate)*log(mObject1.spectrogram); nSgram(1:nr1,1:nc1) = nSgram(1:nr1,1:nc1)+(1-mRate); nsg(1:nr2,1:nc2) = mRate*log(mObject2.spectrogram)+nsg(1:nr2,1:nc2); nSgram(1:nr2,1:nc2) = nSgram(1:nr2,1:nc2)+mRate; nsg = exp(nsg./nSgram); end; ap(1:nr1,1:nc1) = (1-mRate)*mObject1.aperiodicityIndex; ap(1:nr2,1:nc2) = mRate*mObject2.aperiodicityIndex+ap(1:nr2,1:nc2); f0(mObject1.F0>0) = (1-mRate)*log(mObject1.F0(mObject1.F0>0)); nVoice(mObject1.F0>0) = nVoice(mObject1.F0>0)+(1-mRate); f0(mObject2.F0>0) = mRate*log(mObject2.F0(mObject2.F0>0))+f0(mObject2.F0>0); nVoice(mObject2.F0>0) = nVoice(mObject2.F0>0)+mRate; f0(nVoice>0) = exp(f0(nVoice>0)./nVoice(nVoice>0)); mObject3 = createMobject; mObject3 = updateFieldOfMobject(mObject3,'spectrogram',nsg); mObject3 = updateFieldOfMobject(mObject3,'aperiodicityIndex',ap); mObject3 = updateFieldOfMobject(mObject3,'F0',f0); ================================================ FILE: morphing_src/displayMobject.m ================================================ function displayMobject(mObject,fieldname,note) % M-object information display % displayMobject(mObject,fieldname,note); % Designed and coded by Hideki Kawahara % 27/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 05/Oct./2005 minor bug fix fs = mObject.samplingFrequency; tFrame = mObject.frameUpdateInterval; switch fieldname case 'spectrogram' figure [nrow,ncolumn]=size(mObject.spectrogram); timeSpan = [0 (ncolumn-1)*tFrame]; dBsgram = 20*log10(mObject.spectrogram); maxSgramdB = max(max(dBsgram)); imagesc(timeSpan, [0 fs/2],max(dBsgram,maxSgramdB-70)); axis('xy'); set(gca,'fontsize',14); xlabel('time (ms)'); ylabel('frequency (Hz)'); title([note ' time span 0 ' num2str(timeSpan(2),10) ' (ms) ' datestr(now)]); case 'waveform' figure x = mObject.waveform; timeSpan = (0:length(x)-1)/fs*1000; plot(timeSpan,x);grid on; axis([timeSpan(1) timeSpan(end) 1.1*[min(x) max(x)]]); set(gca,'fontsize',14); xlabel('time (ms)'); title([note ' time span 0 ' num2str(round(timeSpan(end)),8) ' (ms) ' datestr(now)]); case {'anchorFrequency', 'anchorTimeLocation'} figure [nrow,ncolumn]=size(mObject.spectrogram); timeSpan = [0 (ncolumn-1)*tFrame]; dBsgram = 20*log10(mObject.spectrogram); maxSgramdB = max(max(dBsgram)); imagesc(timeSpan, [0 fs/2],max(dBsgram,maxSgramdB-70)); axis('xy'); set(gca,'fontsize',14); xlabel('time (ms)'); ylabel('frequency (Hz)'); title([note ' time span 0 ' num2str(timeSpan(2),10) ' (ms) ' datestr(now)]); if length(mObject.anchorTimeLocation)>0 hold on; for ii=1:length(mObject.anchorTimeLocation) hh = plot(mObject.anchorTimeLocation(ii)*[1 1],[0 fs/2],'w:'); set(hh,'linewidth',2); if sum(mObject.anchorFrequency(ii,:)>0)>0 nFrequency = sum(mObject.anchorFrequency(ii,:)>0); anchorFrequencyVector = mObject.anchorFrequency(ii,mObject.anchorFrequency(ii,:)>0); % 05/Oct./2005 HK for jj=1:nFrequency hh=plot(mObject.anchorTimeLocation(ii),anchorFrequencyVector(jj),'ok'); set(hh,'markersize',9); set(hh,'linewidth',2); hh=plot(mObject.anchorTimeLocation(ii),anchorFrequencyVector(jj),'.w'); set(hh,'markersize',7); set(hh,'linewidth',4); end; end; end; hold off; end; end; ================================================ FILE: morphing_src/executeSTRAIGHTanalysisM.m ================================================ function mObject = executeSTRAIGHTanalysisM(mObject,optionalParameters); % STRAIGHT analysis for mObject % mObject = executeSTRAIGHTanalysisM(mObject,optionalParameters); % % Designed and coded by Hideki Kawahara % 26/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 20/March/2006 bug fix by T. Takahashi and Kawahara x = mObject.waveform; fs = mObject.samplingFrequency; if nargin>1 [f0raw,ap,prmF0] = exstraightsource(x,fs,optionalParameters); [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs,optionalParameters); else [f0raw,ap,prmF0] = exstraightsource(x,fs); [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs); end; if exist('vuv') % reserved for extension mObject.vuv = vuv; else mObject.vuv = (f0raw ~= 0); end; temporalIndexLength=min([length(f0raw),size(n3sgram,2),size(ap,2),length(mObject.vuv)]); mObject.F0 = f0raw(1:temporalIndexLength); mObject.spectrogram = n3sgram(:,1:temporalIndexLength); mObject.aperiodicityIndex = ap(:,1:temporalIndexLength); mObject.vuv = mObject.vuv(1:temporalIndexLength); mObject.frameUpdateInterval = prmF0.F0frameUpdateInterval; mObject.F0extractionConditions = prmF0; mObject.SpectrumExtractionConditions = analysisParamsSp; ================================================ FILE: morphing_src/executeSTRAIGHTanalysisMExt.m ================================================ function mObject = executeSTRAIGHTanalysisMExt(mObject,optionalParameters); % STRAIGHT analysis for mObject % mObject = executeSTRAIGHTanalysisMExt(mObject,optionalParameters); % % Designed and coded by Hideki Kawahara % 26/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 20/March/2006 bug fix by T. Takahashi and Kawahara % 16/Aug./2008 extended for use MulticueF0 as default x = mObject.waveform; fs = mObject.samplingFrequency; if nargin>1 %[f0raw,ap,prmF0] = exstraightsource(x,fs,optionalParameters); [f0raw,vuv,auxouts,prmF0]=MulticueF0v14(x,fs,optionalParameters); [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs,optionalParameters); [ap,analysisParams]=exstraightAPind(x,fs,f0raw,optionalParameters); else %[f0raw,ap,prmF0] = exstraightsource(x,fs); [f0raw,vuv,auxouts,prmF0]=MulticueF0v14(x,fs); [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs); [ap,analysisParams]=exstraightAPind(x,fs,f0raw); end; if exist('vuv') % reserved for extension mObject.vuv = vuv; else mObject.vuv = (f0raw ~= 0); end; temporalIndexLength=min([length(f0raw),size(n3sgram,2),size(ap,2),length(mObject.vuv)]); mObject.F0 = f0raw(1:temporalIndexLength); mObject.spectrogram = n3sgram(:,1:temporalIndexLength); mObject.aperiodicityIndex = ap(:,1:temporalIndexLength); mObject.vuv = mObject.vuv(1:temporalIndexLength); mObject.frameUpdateInterval = prmF0.F0frameUpdateInterval; mObject.F0extractionConditions = prmF0; mObject.SpectrumExtractionConditions = analysisParamsSp; mObject.AperiodicityAnalysisParams = analysisParams; ================================================ FILE: morphing_src/executeSTRAIGHTsynthesisM.m ================================================ function [sy,prmS] = executeSTRAIGHTsynthesisM(mObject,optionalParameters) % STRAIGHT synthesis from mObject % sy = executeSTRAIGHTsynthesisM(mObject,optionalParameters); % % Designed and coded by Hideki Kawahara % 27/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 14/March/2005 bug fix on optional paramters % 10/June/2006 extension for the new F0 extractor fs = mObject.samplingFrequency; f0raw = mObject.F0; if isfield(mObject,'vuv') if length(mObject.vuv) == length(mObject.F0) f0raw = f0raw.*mObject.vuv; end; end; n3sgram = mObject.spectrogram; ap = mObject.aperiodicityIndex; if nargin>1 [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs,optionalParameters); else [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs); end; ================================================ FILE: morphing_src/fixDummyObjectSize.m ================================================ function dummyObject = fixDummyObjectSize(dummyObject,originalObject); frameUpdateInterval = dummyObject.frameUpdateInterval; endMargin = size(originalObject.spectrogram,2)*frameUpdateInterval-max(originalObject.anchorTimeLocation); if size(dummyObject.spectrogram,2)*frameUpdateInterval < max(dummyObject.anchorTimeLocation)+endMargin dummyFrameSize = max(dummyObject.anchorTimeLocation)+endMargin; dimmyFrequencySize = size(originalObject.spectrogram,1); dummyObject.spectrogram = ones(dimmyFrequencySize,dummyFrameSize); dummyObject.aperiodicityIndex = ones(dimmyFrequencySize,dummyFrameSize); dummyObject.F0 = ones(1,dummyFrameSize); dummyObject.vuv = ones(1,dummyFrameSize); end; ================================================ FILE: morphing_src/makeLogarithmicLevelDifferenceBasedOnPeaks.m ================================================ function mObject = makeLogarithmicLevelDifferenceBasedOnPeaks(mObject,levelDifferenceTable) ================================================ FILE: morphing_src/setAnchorFromRawAnchor.m ================================================ function mObject = setAnchorFromRawAnchor(mObject,rawAnchor); % Set anchor points using raw anchor information % mObject = setAnchorFromRawAnchor(mObject,rawAnchor); % Designed and coded by Hideki Kawahara % 27/Feb./2005 % Copyright(c) 2005, Hideki Kawahara TIMINGmARGIN = 10; % threshould for merging location [dm1,indsrt] = sort(rawAnchor(:,1)); sortedAnchor = rawAnchor(indsrt,1); sortedFrequency = rawAnchor(indsrt,2); indexNumber = 1:length(sortedAnchor); %anchorCandidate = sortedAnchor(diff([-100;sortedAnchor])>TIMINGmARGIN); anchorIndex = indexNumber(diff([-100;sortedAnchor])>TIMINGmARGIN); anchorCandidate = sortedAnchor(anchorIndex); mObject.anchorTimeLocation = anchorCandidate; nFrequency = mObject.maximumFrequencyPoints; nAnchor = length(anchorCandidate); sortedAnchor(end+1) = sortedAnchor(end)+1; anchorIndex(end+1) = anchorIndex(end)+1; % Terminator frequencyAnchor = zeros(nAnchor,nFrequency); for ii=1:nAnchor iFrequency = 0; anchorLocation = 0; for jj=1:min(nFrequency,anchorIndex(ii+1)-anchorIndex(ii)+1) if sortedAnchor((jj-1)+anchorIndex(ii)) < sortedAnchor(anchorIndex(ii+1)) frequencyAnchor(ii,jj) = sortedFrequency((jj-1)+anchorIndex(ii)); anchorLocation = anchorLocation+sortedAnchor((jj-1)+anchorIndex(ii)); iFrequency = iFrequency+1; end; end; if iFrequency>1 [dmy1,indsrt] = sort(frequencyAnchor(ii,1:iFrequency)); frequencyAnchor(ii,1:iFrequency) = frequencyAnchor(ii,indsrt); mObject.anchorTimeLocation(ii) = anchorLocation/iFrequency; end; end; mObject.anchorFrequency = frequencyAnchor; ================================================ FILE: morphing_src/timeAlignedDirectSTRAIGHTmorphing.m ================================================ function mObject3 = timeAlignedDirectSTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Morphing based on time-aligned mixing of STRAIGHT parameters % mObject3 = timeAlignedDirectSTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Designed and coded by Hideki Kawahara % 28/Feb./2005 % Copyright(c) 2005, Hideki Kawahara mObject3 = checkForSimilarity(mObject1,mObject2); if length(mObject3) ==0;return;end; dtFrame = mObject1.frameUpdateInterval; endLocation1 = (length(mObject1.F0)-1)*dtFrame; % in ms endLocation2 = (length(mObject2.F0)-1)*dtFrame; % in ms timeAnchor1 = [0;mObject1.anchorTimeLocation;endLocation1]; timeAnchor2 = [0;mObject2.anchorTimeLocation;endLocation2]; locationOn1 = (0:length(mObject1.F0)-1)*dtFrame; locationOn2 = (0:length(mObject2.F0)-1)*dtFrame; mapFrom1to2 = interp1(timeAnchor1,timeAnchor2,locationOn1); [nr1,nc1] = size(mObject1.spectrogram); [nr2,nc2] = size(mObject2.spectrogram); %---- mixing on mObject1's time axis nAxis1 = length(locationOn1); nAxis2 = length(locationOn2); morphedF0 = zeros(nAxis1,1); morphedAp = zeros(nr1,nAxis1); morphedSgram = zeros(nr1,nAxis1); weightSumF0 = zeros(nAxis1,1); for ii=1:nAxis1 mappedIndexOn2 = mapFrom1to2(ii)/dtFrame+1; iFloor = floor(mappedIndexOn2); iFraction = mappedIndexOn2-iFloor; dAp = iFraction*(mObject2.aperiodicityIndex(:,min(iFloor+1,nAxis2))-mObject2.aperiodicityIndex(:,iFloor)); morphedAp(:,ii) = (1-mRate)*mObject1.aperiodicityIndex(:,ii)+mRate*(mObject2.aperiodicityIndex(:,iFloor)+dAp); switch mixMethod case 'linear' dSgram = iFraction*(mObject2.spectrogram(:,min(iFloor+1,nAxis2))-mObject2.spectrogram(:,iFloor)); morphedSgram(:,ii) = (1-mRate)*mObject1.spectrogram(:,ii)+mRate*(mObject2.spectrogram(:,iFloor)+dSgram); case 'log' dSgram = iFraction*(log(mObject2.spectrogram(:,min(iFloor+1,nAxis2)))-log(mObject2.spectrogram(:,iFloor))); tmp = (1-mRate)*log(mObject1.spectrogram(:,ii))+mRate*(log(mObject2.spectrogram(:,iFloor))+dSgram); morphedSgram(:,ii) = exp(tmp); end; if mObject1.F0(ii)>0 morphedF0(ii) = (1-mRate)*log(mObject1.F0(ii)); weightSumF0(ii) = (1-mRate); end; if (mObject2.F0(iFloor)>0) & (mObject2.F0(min(iFloor+1,nAxis2))>0) dF0 = iFraction*(log(mObject2.F0(min(iFloor+1,nAxis2)))-log(mObject2.F0(iFloor))); morphedF0(ii) = mRate*(log(mObject2.F0(iFloor))+dF0)+morphedF0(ii); weightSumF0(ii) = weightSumF0(ii)+mRate; end; end; morphedF0(weightSumF0>0) = exp(morphedF0(weightSumF0>0)./weightSumF0(weightSumF0>0)); %----- mapping back onto morphed time axis timeAnchorMorph = (1-mRate)*timeAnchor1 + mRate*timeAnchor2; locationOnMorph = (0:(timeAnchorMorph(end)/dtFrame))*dtFrame; mapFormMorphTo1 = interp1(timeAnchorMorph,timeAnchor1,locationOnMorph); nAxisMorph = length(locationOnMorph); morphedApOnMorph = zeros(nr1,nAxisMorph); morphedSgramOnMorph = zeros(nr1,nAxisMorph); morphedF0onMorph = zeros(nAxisMorph,1); for ii=1:nAxisMorph mappedIndexOn1 = mapFormMorphTo1(ii)/dtFrame+1; iFloor = floor(mappedIndexOn1); iFraction = mappedIndexOn1-iFloor; morphedApOnMorph(:,ii) = morphedAp(:,iFloor) ... +iFraction*(morphedAp(:,min(iFloor+1,nAxis1))-morphedAp(:,iFloor)); morphedSgramOnMorph(:,ii) = morphedSgram(:,iFloor) ... +iFraction*(morphedSgram(:,min(iFloor+1,nAxis1))-morphedSgram(:,iFloor)); if (morphedF0(iFloor)>0) & (morphedF0(min(iFloor+1,nAxis1))>0) dF0 = iFraction*(morphedF0(min(iFloor+1,nAxis1))-morphedF0(iFloor)); morphedF0onMorph(ii) = morphedF0(iFloor)+dF0; end; end; mObject3.F0 = morphedF0onMorph; mObject3.aperiodicityIndex = morphedApOnMorph; mObject3.spectrogram = morphedSgramOnMorph; mObject3.anchorTimeLocation = timeAnchorMorph(2:end-1); mObject3.anchorFrequency = (1-mRate)*mObject1.anchorFrequency+mRate*mObject2.anchorFrequency; %mObject3 = morphedAp; % This line is a dummy. %%% ------ Internal function to check for object's similarity function mObject3 = checkForSimilarity(mObject1,mObject2) mObject3 = []; if mObject1.samplingFrequency ~= mObject2.samplingFrequency;mObject3 = [];return;end; if mObject1.frameUpdateInterval ~= mObject2.frameUpdateInterval;mObject3 = [];return;end; if length(mObject1.anchorTimeLocation) ~= length(mObject2.anchorTimeLocation);mObject3 = [];return;end; nAnchor = length(mObject1.anchorTimeLocation); for ii=1:nAnchor % check for similarity of anchor structure frequencyAnchor1 = mObject1.anchorFrequency; frequencyAnchor2 = mObject2.anchorFrequency; if (sum(frequencyAnchor1>0) ~= sum(frequencyAnchor2>0)) | ... (sum(frequencyAnchor1<0) ~= sum(frequencyAnchor2<0)) return; end; end; mObject3 = createMobject; ================================================ FILE: morphing_src/timeFrequencySTRAIGHTmorphing.m ================================================ function mObject3 = timeFrequencySTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Morphing based on STRAIGHT parameters % mObject3 = timeFrequencySTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Designed and coded by Hideki Kawahara % 28/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 14/March/2005 bug fix on sampling frequency % 01/Oct./2005 bug fix on similarity check % 04/Oct./2005 partial morphing extension % 18/Oct./2005 direct differential manipulation and API cange % 29/Jan./2006 bug fix on boundary conditions switch nargin case 0 mObject3.morphingObject = createMobject; mixRate.F0 = 0; mixRate.spectrum = 0; mixRate.aperiodicity = 0; mixRate.coordinate = 0; mObject3.mixRate = mixRate; mObject3.mixMethods = {'linear','log','differentialLogarithm'}; return end; mObject3 = checkForSimilarity(mObject1,mObject2); mixRate = checkForMorphingConditions(mRate); mObject1 = checkForIntegrity(mObject1); mObject2 = checkForIntegrity(mObject2); fs = mObject1.samplingFrequency; if length(mObject3) ==0;return;end; dtFrame = mObject1.frameUpdateInterval; endLocation1 = (length(mObject1.F0)-1)*dtFrame; % in ms endLocation2 = (length(mObject2.F0)-1)*dtFrame; % in ms timeAnchor1 = [0;mObject1.anchorTimeLocation;endLocation1]; timeAnchor2 = [0;mObject2.anchorTimeLocation;endLocation2]; locationOn1 = (0:length(mObject1.F0)-1)*dtFrame; locationOn2 = (0:length(mObject2.F0)-1)*dtFrame; mapFrom1to2 = interp1(timeAnchor1,timeAnchor2,locationOn1); [nr1,nc1] = size(mObject1.spectrogram); [nr2,nc2] = size(mObject2.spectrogram); %---- initialize frequency mapping function fmapFrom1to2OnTime1 = generateFrequencyMap(mObject1,mObject2); %---- mixing on mObject1's time axis nAxis1 = length(locationOn1); nAxis2 = length(locationOn2); morphedF0 = zeros(nAxis1,1); morphedAp = zeros(nr1,nAxis1); morphedSgram = zeros(nr1,nAxis1); weightSumF0 = zeros(nAxis1,1); for ii=1:nAxis1 mappedIndexOn2 = mapFrom1to2(ii)/dtFrame+1; iFloor = floor(mappedIndexOn2); iFraction = mappedIndexOn2-iFloor; fIndex = floor(fmapFrom1to2OnTime1(:,ii)/fs*2*(nr1-1))+1; dAp = iFraction*(mObject2.aperiodicityIndex(:,min(iFloor+1,nAxis2))-mObject2.aperiodicityIndex(:,min(iFloor,nAxis2))); ap2on2faxis = mObject2.aperiodicityIndex(:,min(iFloor,nAxis2))+dAp; ap2on1faxis = ap2on2faxis(fIndex); morphedAp(:,ii) = (1-mixRate.aperiodicity)*mObject1.aperiodicityIndex(:,ii)+mixRate.aperiodicity*ap2on1faxis; %04/Oct/2005 HK switch mixMethod case 'linear' dSgram = iFraction*(mObject2.spectrogram(:,min(iFloor+1,nAxis2))-mObject2.spectrogram(:,min(iFloor,nAxis2))); sgram2on2faxis = mObject2.spectrogram(:,min(iFloor,nAxis2))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); morphedSgram(:,ii) = (1-mixRate.spectrum)*mObject1.spectrogram(:,ii)+mixRate.spectrum*sgram2on1faxis; case 'log' dSgram = iFraction*(log(mObject2.spectrogram(:,min(iFloor+1,nAxis2)))-log(mObject2.spectrogram(:,min(iFloor,nAxis2)))); sgram2on2faxis = log(mObject2.spectrogram(:,min(iFloor,nAxis2)))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); tmp = (1-mixRate.spectrum)*log(mObject1.spectrogram(:,ii))+mixRate.spectrum*sgram2on1faxis; morphedSgram(:,ii) = exp(tmp); case 'differentialLogarithm' dSgram = iFraction*(mObject2.spectrogram(:,min(iFloor+1,nAxis2))-mObject2.spectrogram(:,min(iFloor,nAxis2))); sgram2on2faxis = mObject2.spectrogram(:,min(iFloor,nAxis2))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); tmp = (1-mixRate.spectrum)*log(mObject1.spectrogram(:,ii))+mixRate.spectrum*sgram2on1faxis; morphedSgram(:,ii) = exp(tmp); end; if mObject1.F0(ii)>0 morphedF0(ii) = (1-mixRate.F0)*log(mObject1.F0(ii)); weightSumF0(ii) = (1-mixRate.F0); end; if (mObject2.F0(iFloor)>0) & (mObject2.F0(min(iFloor+1,nAxis2))>0) dF0 = iFraction*(log(mObject2.F0(min(iFloor+1,nAxis2)))-log(mObject2.F0(min(iFloor,nAxis2)))); morphedF0(ii) = mixRate.F0*(log(mObject2.F0(min(iFloor,nAxis2)))+dF0)+morphedF0(ii); weightSumF0(ii) = weightSumF0(ii)+mixRate.F0; end; end; morphedF0(weightSumF0>0) = exp(morphedF0(weightSumF0>0)./weightSumF0(weightSumF0>0)); %----- mapping back onto morphed time axis timeAnchorMorph = (1-mixRate.coordinate)*timeAnchor1 + mixRate.coordinate*timeAnchor2; locationOnMorph = (0:(timeAnchorMorph(end)/dtFrame))*dtFrame; mapFormMorphTo1 = interp1(timeAnchorMorph,timeAnchor1,locationOnMorph); nAxisMorph = length(locationOnMorph); morphedApOnMorph = zeros(nr1,nAxisMorph); morphedSgramOnMorph = zeros(nr1,nAxisMorph); morphedF0onMorph = zeros(nAxisMorph,1); %----- set place holders mObject3.samplingFrequency = fs; mObject3.F0 = morphedF0onMorph; mObject3.aperiodicityIndex = morphedApOnMorph; mObject3.spectrogram = morphedSgramOnMorph; mObject3.anchorTimeLocation = timeAnchorMorph(2:end-1); mObject3.anchorFrequency = (1-mixRate.coordinate)*mObject1.anchorFrequency+mixRate.coordinate*mObject2.anchorFrequency; %------ nitialize frequency mapping function fmapFromMorphto1OnTimeMorph = generateFrequencyMap(mObject3,mObject1); for ii=1:nAxisMorph mappedIndexOn1 = mapFormMorphTo1(ii)/dtFrame+1; iFloor = floor(mappedIndexOn1); iFraction = mappedIndexOn1-iFloor; fIndex = floor(fmapFromMorphto1OnTimeMorph(:,ii)/fs*2*(nr1-1))+1; morphedApOnMorph(:,ii) = morphedAp(fIndex,iFloor) ... +iFraction*(morphedAp(fIndex,min(iFloor+1,nAxis1))-morphedAp(fIndex,iFloor)); morphedSgramOnMorph(:,ii) = morphedSgram(fIndex,iFloor) ... +iFraction*(morphedSgram(fIndex,min(iFloor+1,nAxis1))-morphedSgram(fIndex,iFloor)); if (morphedF0(iFloor)>0) & (morphedF0(min(iFloor+1,nAxis1))>0) dF0 = iFraction*(morphedF0(min(iFloor+1,nAxis1))-morphedF0(iFloor)); morphedF0onMorph(ii) = morphedF0(iFloor)+dF0; end; end; mObject3.F0 = morphedF0onMorph; mObject3.aperiodicityIndex = morphedApOnMorph; mObject3.spectrogram = morphedSgramOnMorph; mObject3.anchorTimeLocation = timeAnchorMorph(2:end-1); %mObject3.anchorFrequency = (1-mRate)*mObject1.anchorFrequency+mRate*mObject2.anchorFrequency; %mObject3 = fmapFromMorphto1OnTimeMorph; % This line is a dummy. return; %%% ------ Internal function to check for object's similarity function mObject3 = checkForSimilarity(mObject1,mObject2) mObject3 = []; if mObject1.samplingFrequency ~= mObject2.samplingFrequency;mObject3 = [];return;end; if mObject1.frameUpdateInterval ~= mObject2.frameUpdateInterval;mObject3 = [];return;end; if length(mObject1.anchorTimeLocation) ~= length(mObject2.anchorTimeLocation);mObject3 = [];return;end; nAnchor = length(mObject1.anchorTimeLocation); for ii=1:nAnchor % check for similarity of anchor structure frequencyAnchor1 = mObject1.anchorFrequency(ii,:)';% 01/Oct./2005 by HK frequencyAnchor2 = mObject2.anchorFrequency(ii,:)';% 01/Oct./2005 by HK if (sum(frequencyAnchor1>0) ~= sum(frequencyAnchor2>0)) | ... (sum(frequencyAnchor1<0) ~= sum(frequencyAnchor2<0)) display('Warning!! Object structures are inconsistent!'); % 01/Oct./2005 by HK return; end; end; mObject3 = createMobject; m0bject3.samplingFrequency = mObject1.samplingFrequency; m0bject3.frameUpdateInterval = mObject1.frameUpdateInterval; return; %%%-------- function mixRate = checkForMorphingConditions(mRate); % 04/Oct./2005 added by HK if ~isstruct(mRate) mixRate.F0 = mRate; mixRate.spectrum = mRate; mixRate.aperiodicity = mRate; mixRate.coordinate = mRate; return; end; mixRate.F0 = mRate.F0; mixRate.spectrum = mRate.spectrum; mixRate.aperiodicity = mRate.aperiodicity; mixRate.coordinate = mRate.coordinate; return; %%%-------- function fmapFrom1to2OnTime1 = generateFrequencyMap(mObject1,mObject2); dtFrame = mObject1.frameUpdateInterval; endLocation1 = (length(mObject1.F0)-1)*dtFrame; % in ms timeAnchor1 = [0;mObject1.anchorTimeLocation;endLocation1]; locationOn1 = (0:length(mObject1.F0)-1)*dtFrame; fs = mObject1.samplingFrequency; [nr1,nc1] = size(mObject1.spectrogram); nAnchor = length(mObject1.anchorTimeLocation); fmapFrom1to2 = zeros(nr1,nAnchor); frequencyAxis = (0:nr1-1)'/(nr1-1)*fs/2; numberOfFrequencyAnchors = zeros(nAnchor,1); for ii=1:nAnchor frequencyAnchor1 = mObject1.anchorFrequency(ii,:)'; frequencyAnchor1 = [0;frequencyAnchor1(frequencyAnchor1>0);fs/2]; numberOfFrequencyAnchors(ii) = length(frequencyAnchor1(frequencyAnchor1>0)); frequencyAnchor2 = mObject2.anchorFrequency(ii,:)'; frequencyAnchor2 = [0;frequencyAnchor2(frequencyAnchor2>0);fs/2]; fmapFrom1to2(:,ii) = interp1(frequencyAnchor1,frequencyAnchor2,frequencyAxis); end; for ii=1:nAnchor if numberOfFrequencyAnchors(ii) == 1 if numberOfFrequencyAnchors(min(ii+1,nAnchor)) > 1 fmapFrom1to2(:,ii) = fmapFrom1to2(:,min(ii+1,nAnchor)); elseif numberOfFrequencyAnchors(max(ii-1,1)) > 1 fmapFrom1to2(:,ii) = fmapFrom1to2(:,max(ii-1,1)); end; end; end; fmapFrom1to2 = [fmapFrom1to2(:,1) fmapFrom1to2 fmapFrom1to2(:,nAnchor)]; fmapFrom1to2OnTime1 = interp1(timeAnchor1,fmapFrom1to2',locationOn1)'; return; %%%------- function cleanedUpObject = checkForIntegrity(inputObject); maximumIndex = max([length(inputObject.F0), ... size(inputObject.spectrogram,2) ... size(inputObject.aperiodicityIndex,2)]); if length(inputObject.F0) < maximumIndex inputObject.F0 = [inputObject.F0(:);inputObject.F0(end)*ones(maximumIndex - length(inputObject.F0),1)]; end; if size(inputObject.spectrogram,2) < maximumIndex numberOfFillIn = maximumIndex-size(inputObject.spectrogram,2); inputObject.spectrogram = [inputObject.spectrogram inputObject.spectrogram(:,end)*ones(1,numberOfFillIn)]; end; if size(inputObject.aperiodicityIndex,2) < maximumIndex numberOfFillIn = maximumIndex-size(inputObject.aperiodicityIndex,2); inputObject.aperiodicityIndex = [inputObject.aperiodicityIndex inputObject.aperiodicityIndex(:,end)*ones(1,numberOfFillIn)]; end; cleanedUpObject = inputObject; ================================================ FILE: morphing_src/timeFrequencySTRAIGHTmorphingExt.m ================================================ function mObject3 = timeFrequencySTRAIGHTmorphingExt(mObject1,mObject2,mRate,mixMethod); % Morphing based on STRAIGHT parameters % mObject3 = timeFrequencySTRAIGHTmorphing(mObject1,mObject2,mRate,mixMethod); % Designed and coded by Hideki Kawahara % 28/Feb./2005 % Copyright(c) 2005, Hideki Kawahara % 14/March/2005 bug fix on sampling frequency % 01/Oct./2005 bug fix on similarity check % 04/Oct./2005 partial morphing extension % 18/Oct./2005 direct differential manipulation and API cange % 29/Jan./2006 bug fix on boundary conditions % 24/Oct./2006 modificaton of definition switch nargin case 0 mObject3.morphingObject = createMobject; mixRate.F0 = 0; mixRate.spectrum = 0; mixRate.aperiodicity = 0; mixRate.timeCoordinate = 0; mixRate.freqCoordinate = 0; mObject3.mixRate = mixRate; mObject3.mixMethods = {'linear','log','differentialLogarithm'}; return end; if ~isfield(mObject1,'vuv') mObject1.vuv = (mObject1.F0>0); elseif length(mObject1.vuv) == 0 mObject1.vuv = (mObject1.F0>0); end; if ~isfield(mObject2,'vuv') mObject2.vuv = (mObject2.F0>0); elseif length(mObject2.vuv) == 0 mObject2.vuv = (mObject2.F0>0); end; mObject3 = checkForSimilarity(mObject1,mObject2); mixRate = checkForMorphingConditions(mRate); mObject1 = checkForIntegrity(mObject1); mObject2 = checkForIntegrity(mObject2); fs = mObject1.samplingFrequency; if length(mObject3) ==0;return;end; dtFrame = mObject1.frameUpdateInterval; endLocation1 = (length(mObject1.F0)-1)*dtFrame; % in ms endLocation2 = (length(mObject2.F0)-1)*dtFrame; % in ms timeAnchor1 = [0;mObject1.anchorTimeLocation;endLocation1]; timeAnchor2 = [0;mObject2.anchorTimeLocation;endLocation2]; locationOn1 = (0:length(mObject1.F0)-1)*dtFrame; locationOn2 = (0:length(mObject2.F0)-1)*dtFrame; mapFrom1to2 = interp1(timeAnchor1,timeAnchor2,locationOn1); [nr1,nc1] = size(mObject1.spectrogram); [nr2,nc2] = size(mObject2.spectrogram); %---- initialize frequency mapping function fmapFrom1to2OnTime1 = generateFrequencyMap(mObject1,mObject2); %---- mixing on mObject1's time axis nAxis1 = length(locationOn1); nAxis2 = length(locationOn2); morphedF0 = zeros(nAxis1,1); morphedvuv = zeros(nAxis1,1); morphedAp = zeros(nr1,nAxis1); morphedSgram = zeros(nr1,nAxis1); weightSumF0 = zeros(nAxis1,1); for ii=1:nAxis1 mappedIndexOn2 = mapFrom1to2(ii)/dtFrame+1; iFloor = floor(mappedIndexOn2); iFraction = mappedIndexOn2-iFloor; fIndex = floor(fmapFrom1to2OnTime1(:,ii)/fs*2*(nr1-1))+1; dAp = iFraction*(mObject2.aperiodicityIndex(:,min(iFloor+1,nAxis2))-mObject2.aperiodicityIndex(:,min(iFloor,nAxis2))); ap2on2faxis = mObject2.aperiodicityIndex(:,min(iFloor,nAxis2))+dAp; ap2on1faxis = ap2on2faxis(fIndex); morphedAp(:,ii) = (1-mixRate.aperiodicity)*mObject1.aperiodicityIndex(:,ii)+mixRate.aperiodicity*ap2on1faxis; %04/Oct/2005 HK switch mixMethod case 'linear' dSgram = iFraction*(mObject2.spectrogram(:,min(iFloor+1,nAxis2))-mObject2.spectrogram(:,min(iFloor,nAxis2))); sgram2on2faxis = mObject2.spectrogram(:,min(iFloor,nAxis2))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); morphedSgram(:,ii) = (1-mixRate.spectrum)*mObject1.spectrogram(:,ii)+mixRate.spectrum*sgram2on1faxis; case 'log' dSgram = iFraction*(log(mObject2.spectrogram(:,min(iFloor+1,nAxis2)))-log(mObject2.spectrogram(:,min(iFloor,nAxis2)))); sgram2on2faxis = log(mObject2.spectrogram(:,min(iFloor,nAxis2)))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); tmp = (1-mixRate.spectrum)*log(mObject1.spectrogram(:,ii))+mixRate.spectrum*sgram2on1faxis; morphedSgram(:,ii) = exp(tmp); case 'differentialLogarithm' dSgram = iFraction*(mObject2.spectrogram(:,min(iFloor+1,nAxis2))-mObject2.spectrogram(:,min(iFloor,nAxis2))); sgram2on2faxis = mObject2.spectrogram(:,min(iFloor,nAxis2))+dSgram; sgram2on1faxis = sgram2on2faxis(fIndex); tmp = (1-mixRate.spectrum)*log(mObject1.spectrogram(:,ii))+mixRate.spectrum*sgram2on1faxis; morphedSgram(:,ii) = exp(tmp); end; if mObject1.F0(ii)>0 morphedF0(ii) = (1-mixRate.F0)*log(mObject1.F0(ii)); weightSumF0(ii) = (1-mixRate.F0); end; if (mObject2.F0(iFloor)>0) & (mObject2.F0(min(iFloor+1,nAxis2))>0) dF0 = iFraction*(log(mObject2.F0(min(iFloor+1,nAxis2)))-log(mObject2.F0(min(iFloor,nAxis2)))); morphedF0(ii) = mixRate.F0*(log(mObject2.F0(min(iFloor,nAxis2)))+dF0)+morphedF0(ii); weightSumF0(ii) = weightSumF0(ii)+mixRate.F0; end; morphedvuv(ii) = ((mObject1.vuv(ii)*abs(1-mixRate.F0)+abs(mixRate.F0)*mObject2.vuv(min(iFloor,nAxis2)))>0); end; morphedF0(weightSumF0>0) = exp(morphedF0(weightSumF0>0)./weightSumF0(weightSumF0>0)); %----- mapping back onto morphed time axis timeAnchorMorph = (1-mixRate.timeCoordinate)*timeAnchor1 + mixRate.timeCoordinate*timeAnchor2; locationOnMorph = (0:(timeAnchorMorph(end)/dtFrame))*dtFrame; mapFormMorphTo1 = interp1(timeAnchorMorph,timeAnchor1,locationOnMorph); nAxisMorph = length(locationOnMorph); morphedApOnMorph = zeros(nr1,nAxisMorph); morphedSgramOnMorph = zeros(nr1,nAxisMorph); morphedF0onMorph = zeros(nAxisMorph,1); morphedVUVonMorph = zeros(nAxisMorph,1); %----- set place holders mObject3.samplingFrequency = fs; mObject3.F0 = morphedF0onMorph; mObject3.vuv = morphedVUVonMorph; mObject3.aperiodicityIndex = morphedApOnMorph; mObject3.spectrogram = morphedSgramOnMorph; mObject3.anchorTimeLocation = timeAnchorMorph(2:end-1); mObject3.anchorFrequency = (1-mixRate.freqCoordinate)*mObject1.anchorFrequency+mixRate.freqCoordinate*mObject2.anchorFrequency; %------ nitialize frequency mapping function fmapFromMorphto1OnTimeMorph = generateFrequencyMap(mObject3,mObject1); for ii=1:nAxisMorph mappedIndexOn1 = mapFormMorphTo1(ii)/dtFrame+1; iFloor = floor(mappedIndexOn1); iFraction = mappedIndexOn1-iFloor; fIndex = floor(fmapFromMorphto1OnTimeMorph(:,ii)/fs*2*(nr1-1))+1; morphedApOnMorph(:,ii) = morphedAp(fIndex,iFloor) ... +iFraction*(morphedAp(fIndex,min(iFloor+1,nAxis1))-morphedAp(fIndex,iFloor)); morphedSgramOnMorph(:,ii) = morphedSgram(fIndex,iFloor) ... +iFraction*(morphedSgram(fIndex,min(iFloor+1,nAxis1))-morphedSgram(fIndex,iFloor)); if (morphedF0(iFloor)>0) & (morphedF0(min(iFloor+1,nAxis1))>0) dF0 = iFraction*(morphedF0(min(iFloor+1,nAxis1))-morphedF0(iFloor)); morphedF0onMorph(ii) = morphedF0(iFloor)+dF0; end; morphedVUVonMorph(ii) = morphedvuv(iFloor); end; mObject3.F0 = morphedF0onMorph; mObject3.vuv = morphedVUVonMorph; mObject3.aperiodicityIndex = morphedApOnMorph; mObject3.spectrogram = morphedSgramOnMorph; mObject3.anchorTimeLocation = timeAnchorMorph(2:end-1); %mObject3.anchorFrequency = (1-mRate)*mObject1.anchorFrequency+mRate*mObject2.anchorFrequency; %mObject3 = fmapFromMorphto1OnTimeMorph; % This line is a dummy. return; %%% ------ Internal function to check for object's similarity function mObject3 = checkForSimilarity(mObject1,mObject2) mObject3 = []; if mObject1.samplingFrequency ~= mObject2.samplingFrequency;mObject3 = [];return;end; if mObject1.frameUpdateInterval ~= mObject2.frameUpdateInterval;mObject3 = [];return;end; if length(mObject1.anchorTimeLocation) ~= length(mObject2.anchorTimeLocation);mObject3 = [];return;end; nAnchor = length(mObject1.anchorTimeLocation); for ii=1:nAnchor % check for similarity of anchor structure frequencyAnchor1 = mObject1.anchorFrequency(ii,:)';% 01/Oct./2005 by HK frequencyAnchor2 = mObject2.anchorFrequency(ii,:)';% 01/Oct./2005 by HK if (sum(frequencyAnchor1>0) ~= sum(frequencyAnchor2>0)) | ... (sum(frequencyAnchor1<0) ~= sum(frequencyAnchor2<0)) display('Warning!! Object structures are inconsistent!'); % 01/Oct./2005 by HK return; end; end; mObject3 = createMobject; m0bject3.samplingFrequency = mObject1.samplingFrequency; m0bject3.frameUpdateInterval = mObject1.frameUpdateInterval; return; %%%-------- function mixRate = checkForMorphingConditions(mRate); % 04/Oct./2005 added by HK if ~isstruct(mRate) mixRate.F0 = mRate; mixRate.spectrum = mRate; mixRate.aperiodicity = mRate; mixRate.timeCoordinate = mRate; mixRate.freqCoordinate = mRate; return; end; mixRate.F0 = mRate.F0; mixRate.spectrum = mRate.spectrum; mixRate.aperiodicity = mRate.aperiodicity; mixRate.timeCoordinate = mRate.timeCoordinate; mixRate.freqCoordinate = mRate.freqCoordinate; return; %%%-------- function fmapFrom1to2OnTime1 = generateFrequencyMap(mObject1,mObject2); dtFrame = mObject1.frameUpdateInterval; endLocation1 = (length(mObject1.F0)-1)*dtFrame; % in ms timeAnchor1 = [0;mObject1.anchorTimeLocation;endLocation1]; locationOn1 = (0:length(mObject1.F0)-1)*dtFrame; fs = mObject1.samplingFrequency; [nr1,nc1] = size(mObject1.spectrogram); nAnchor = length(mObject1.anchorTimeLocation); fmapFrom1to2 = zeros(nr1,nAnchor); frequencyAxis = (0:nr1-1)'/(nr1-1)*fs/2; numberOfFrequencyAnchors = zeros(nAnchor,1); for ii=1:nAnchor frequencyAnchor1 = mObject1.anchorFrequency(ii,:)'; frequencyAnchor1 = [0;frequencyAnchor1(frequencyAnchor1>0);fs/2]; numberOfFrequencyAnchors(ii) = length(frequencyAnchor1(frequencyAnchor1>0)); frequencyAnchor2 = mObject2.anchorFrequency(ii,:)'; frequencyAnchor2 = [0;frequencyAnchor2(frequencyAnchor2>0);fs/2]; fmapFrom1to2(:,ii) = interp1(frequencyAnchor1,frequencyAnchor2,frequencyAxis); end; for ii=1:nAnchor if numberOfFrequencyAnchors(ii) == 1 if numberOfFrequencyAnchors(min(ii+1,nAnchor)) > 1 fmapFrom1to2(:,ii) = fmapFrom1to2(:,min(ii+1,nAnchor)); elseif numberOfFrequencyAnchors(max(ii-1,1)) > 1 fmapFrom1to2(:,ii) = fmapFrom1to2(:,max(ii-1,1)); end; end; end; fmapFrom1to2 = [fmapFrom1to2(:,1) fmapFrom1to2 fmapFrom1to2(:,nAnchor)]; fmapFrom1to2OnTime1 = interp1(timeAnchor1,fmapFrom1to2',locationOn1)'; return; %%%------- function cleanedUpObject = checkForIntegrity(inputObject); maximumIndex = max([length(inputObject.F0), ... size(inputObject.spectrogram,2) ... size(inputObject.aperiodicityIndex,2)]); if length(inputObject.F0) < maximumIndex inputObject.F0 = [inputObject.F0(:);inputObject.F0(end)*ones(maximumIndex - length(inputObject.F0),1)]; end; if size(inputObject.spectrogram,2) < maximumIndex numberOfFillIn = maximumIndex-size(inputObject.spectrogram,2); inputObject.spectrogram = [inputObject.spectrogram inputObject.spectrogram(:,end)*ones(1,numberOfFillIn)]; end; if size(inputObject.aperiodicityIndex,2) < maximumIndex numberOfFillIn = maximumIndex-size(inputObject.aperiodicityIndex,2); inputObject.aperiodicityIndex = [inputObject.aperiodicityIndex inputObject.aperiodicityIndex(:,end)*ones(1,numberOfFillIn)]; end; cleanedUpObject = inputObject; ================================================ FILE: morphing_src/updateFieldOfMobject.m ================================================ function mObject = updateFieldOfMobject(mObject,fieldName,fieldValue) if isfield(mObject,fieldName) mObject = setfield(mObject,fieldName,fieldValue); else disp([fieldName ' is not in a Mobject.']); end; ================================================ FILE: morphing_src/waveformMorphing.m ================================================ function mObject3 = waveformMorphing(mObject1,mObject2,mRate); % Morphing with minimum information % (Actually this is not real morphing. % It is simply blending two waveform.) % mObject3 = waveformMorphing(mObject1,mObject2,mRate); % Designed and coded by Hideki Kawahara % 27/Feb./2005 % Copyright(c) 2005, Hideki Kawahara nLength = max(length(mObject1.waveform),length(mObject2.waveform)); if mObject1.samplingFrequency ~= mObject2.samplingFrequency mObject3 = []; return end; x = zeros(nLength,1); x(1:length(mObject1.waveform)) = (1-mRate)*mObject1.waveform; x(1:length(mObject2.waveform)) = mRate*mObject2.waveform + x(1:length(mObject2.waveform)); mObject3=createMobject; mObject3.waveform = x; mObject3.samplingFrequency = mObject1.samplingFrequency; ================================================ FILE: src/CheckAnalysisData.m ================================================ function output = ... CheckAnalysisData(f0raw, ap, n3sgram, target_analysis_dir, tmp_name_root) output = true; tolerance = 10 ^ (-6); f0_file_path = ... [target_analysis_dir tmp_name_root 'f0.bin']; ap_file_path = ... [target_analysis_dir tmp_name_root 'ap.bin']; sp_file_path = ... [target_analysis_dir tmp_name_root 'sp.bin']; f0_ref = ReadBinaryData(f0_file_path); ap_ref = ReadBinaryData(ap_file_path); sp_ref = ReadBinaryData(sp_file_path); f0_median = median(f0_ref(f0_ref > 30 & f0_ref < 1000)); ap_std = std(ap_ref(:)); sp_std = std(sp_ref(:)); if std(f0raw(:) - f0_ref(:)) / f0_median > tolerance return; end; if std(ap(:) - ap_ref(:)) / ap_std > tolerance return; end; if std(n3sgram(:) - sp_ref(:)) / sp_std > tolerance return; end; end ================================================ FILE: src/HzToErbRate.m ================================================ function y=HzToErbRate(x) % by Matrin Cooke, adopted from MAD library y=(21.4*log10(4.37e-3*x+1)); ================================================ FILE: src/MulticueF0v14.m ================================================ function [f0raw,vuv,auxouts,prm]=MulticueF0v14(x,fs,f0floor,f0ceil) % Source information extraction using multiple cues % Default values are used when other arguments are missing. % You can modify specific parameters by assigning them. % The control parameters open to user in this version are % F0 search range. % Examples: % f0=MulticueF0v14(x,fs); % x: input signal (monaural signal) % fs: sampling frequency (Hz) % f0: fundamental frequency (Hz) % f0 is set to zero when unvoiced. % f0=MulticueF0v14(x,fs,f0floor,f0ceil) % f0floor: Lower limit of F0 search (Hz) % f0ceil: Upper limit of F0 search (Hz) % [f0raw,vuv,auxouts]=MulticueF0v14(x,fs,f0floor,f0ceil) % f0raw: fundamental frequency without V/UV information % vuv: V/UV indicator, 1:voiced, 0: unvoiced % auxouts: base information for f0 extraction (structure variable) % % [f0raw,vuv,auxouts,prmouts]=MulticueF0v14(x,fs,prmin) % prmin: structure variable for control parameters % f0raw: fundamental frequency without V/UV information % vuv: V/UV indicator, 1:voiced, 0: unvoiced % auxouts: base information for f0 extraction (structure variable) % prmouts: structure variable showing used control parameters % % Copyright(c) Wakayama University, 2004 % This version is very experimental. No warranty. % Please contact: kawahara@sys.wakayama-u.ac.jp % Designed and coded by Hideki Kawahara % 31/August/2004 first conceiled version % 30/June/2016 refactored for Octave compatibility switch nargin case {2,4} case 3 if ~isstruct(f0floor) displayusage; f0raw=[];vuv=[]; return; else prmin = f0floor; end; otherwise displayusage; f0raw=[];vuv=[]; return; end switch nargin case 3 case 4 prmin.F0searchLowerBound=f0floor; prmin.F0searchUpperBound=f0ceil; prmin.DisplayPlots=0; otherwise prmin.DisplayPlots=0; end; [f0raw,vuv,auxouts,prm]=SourceInfobyMultiCues050111(x,fs,prmin); nn=min(length(vuv),length(f0raw)); f0raw=f0raw(1:nn)'; vuv=vuv(1:nn)'; switch nargout case 1 f0raw=f0raw.*vuv; case {3,4} otherwise displayusage; return; end; end function displayusage fprintf(' Source information extraction using multiple cues\n'); fprintf(' Default values are used when other arguments are missing.\n'); fprintf(' You can modify specific parameters by assigning them.\n'); fprintf(' The control parameters open to user in this version are\n'); fprintf(' F0 search range.\n'); fprintf(' Example:1\n'); fprintf(' f0=MulticueF0v14(x,fs);\n'); fprintf(' x: input signal (monaural signal)\n'); fprintf(' fs: sampling frequency (Hz)\n'); fprintf(' f0: fundamental frequency (Hz)\n'); fprintf(' f0 is set to zero when unvoiced.\n'); fprintf(' f0=MulticueF0v14(x,fs,f0floor,f0ceil)\n'); fprintf(' f0floor: Lower limit of F0 search (Hz)\n'); fprintf(' f0ceil: Upper limit of F0 search (Hz)\n'); fprintf(' [f0raw,vuv,auxouts]=MulticueF0v14(x,fs,f0floor,f0ceil)\n'); fprintf(' f0raw: fundamental frequency without V/UV information\n'); fprintf(' vuv: V/UV indicator, 1:voiced, 0: unvoiced\n'); fprintf(' auxouts: base information for f0 extraction (structure variable)\n'); fprintf(' [f0raw,vuv,auxouts,prmouts]=MulticueF0v14(x,fs,prmin)\n'); fprintf(' prmin: structure variable for control parameters\n'); fprintf(' f0raw: fundamental frequency without V/UV information\n'); fprintf(' vuv: V/UV indicator, 1:voiced, 0: unvoiced\n'); fprintf(' auxouts: base information for f0 extraction (structure variable)\n'); fprintf(' prmouts: structure variable showing used control parameters\n'); fprintf('\n'); fprintf(' Copyright(c) Wakayama University, 2004,2005\n'); fprintf(' This version is very experimental. No warranty.\n'); fprintf(' Please contact: kawahara@sys.wakayama-u.ac.jp\n'); end function [f0raw,vuv,auxouts,prm]=SourceInfobyMultiCues050111(x,fs,prmin) % Source information extraction function % with combined source information % minimum requisite is to provide x and fs and receive f0. % Default values are used when other arguments are missing. % You can modify specific parameters by assigning using prmin. % Example:1 % SourceInfobyMultiCues050111(x,fs); % Simplest usage % Example:2 % f0raw=SourceInfobyMultiCues050111(x,fs); % F0 for voiced segment is what you get. % Example:3 % [f0raw,vuv,auxouts,prm]=SourceInfobyMultiCues050111(x,fs); % You can check what defaults were and raw information. % Example:4 % [f0raw,vuv,auxouts,prm]=SourceInfobyMultiCues050111(x,fs,prmin); % You have full control (and responsibility). % Designed and coded by Hideki Kawahara % 24/June/2004 % Assuming x consists of data % Assuming fs consists of sampling frequency (Hz) %------ check input arguments prm=zsetdefaultparams; switch nargin case {2,3} otherwise help SourceInfobyMultiCues040701 f0raw=[];vuv=[]; return; end [nn,mm]=size(x); if min(nn,mm)>1 display('Using only the first channel.'); if nnl1ms % bug fix 16/Aug./2008 zv=randn(size(x(x==0))); zv=cumsum(zv-mean(zv)); zv=zv/std(zv)*std(x)/10000; x(x==0)=zv; end; %------ set initial parameters if nargin==3 if isfield(prmin,'F0searchLowerBound')==1; prm.F0searchLowerBound=prmin.F0searchLowerBound;end; if isfield(prmin,'F0searchUpperBound')==1; prm.F0searchUpperBound=prmin.F0searchUpperBound;end; if isfield(prmin,'F0frameUpdateInterval')==1; prm.F0frameUpdateInterval=prmin.F0frameUpdateInterval;end; if isfield(prmin,'NofChannelsInOctave')==1; prm.NofChannelsInOctave=prmin.NofChannelsInOctave;end; if isfield(prmin,'IFWindowStretch')==1; prm.IFWindowStretch=prmin.IFWindowStretch;end; if isfield(prmin,'DisplayPlots')==1; prm.DisplayPlots=prmin.DisplayPlots;end; if isfield(prmin,'IFsmoothingLengthRelToFc')==1; prm.IFsmoothingLengthRelToFc=prmin.IFsmoothingLengthRelToFc;end; if isfield(prmin,'IFminimumSmoothingLength')==1; prm.IFminimumSmoothingLength=prmin.IFminimumSmoothingLength;end; if isfield(prmin,'IFexponentForNonlinearSum')==1; prm.IFexponentForNonlinearSum=prmin.IFexponentForNonlinearSum;end; if isfield(prmin,'IFnumberOfHarmonicForInitialEstimate')==1; prm.IFnumberOfHarmonicForInitialEstimate=prmin.IFnumberOfHarmonicForInitialEstimate;end; if isfield(prmin,'TimeConstantForPowerCalculation')==1; prm.TimeConstantForPowerCalculation=prmin.TimeConstantForPowerCalculation;end; if isfield(prmin,'ACtimeWindowLength')==1; prm.ACtimeWindowLength=prmin.ACtimeWindowLength;end; if isfield(prmin,'ACnumberOfFrequencySegments')==1; prm.ACnumberOfFrequencySegments=prmin.ACnumberOfFrequencySegments;end; if isfield(prmin,'ACfrequencyDomainWindowWidth')==1; prm.ACfrequencyDomainWindowWidth=prmin.ACfrequencyDomainWindowWidth;end; if isfield(prmin,'ACpowerExponentForNonlinearity')==1; prm.ACpowerExponentForNonlinearity=prmin.ACpowerExponentForNonlinearity;end; if isfield(prmin,'ACamplitudeCompensationInShortLag')==1; prm.ACamplitudeCompensationInShortLag=prmin.ACamplitudeCompensationInShortLag;end; if isfield(prmin,'ACexponentForACdistance')==1; prm.ACexponentForACdistance=prmin.ACexponentForACdistance;end; if isfield(prmin,'AClagSmoothingLength')==1; prm.AClagSmoothingLength=prmin.AClagSmoothingLength;end; if isfield(prmin,'ACtemporalSmoothingLength')==1; prm.ACtemporalSmoothingLength=prmin.ACtemporalSmoothingLength;end; if isfield(prmin,'ThresholdForSilence')==1; prm.ThresholdForSilence=prmin.ThresholdForSilence;end; if isfield(prmin,'ThresholdForVUV')==1; prm.ThresholdForVUV=prmin.ThresholdForVUV;end; if isfield(prmin,'WeightForAutocorrelationMap')==1; prm.WeightForAutocorrelationMap=prmin.WeightForAutocorrelationMap;end; if isfield(prmin,'WeightForInstantaneousFqMap')==1; prm.WeightForInstantaneousFqMap=prmin.WeightForInstantaneousFqMap;end; if isfield(prmin,'VUVthresholdOfAC1')==1; prm.VUVthresholdOfAC1=prmin.VUVthresholdOfAC1;end; if isfield(prmin,'SDforNormalizeMixingDistance')==1; prm.SDforNormalizeMixingDistance=prmin.SDforNormalizeMixingDistance;end; if isfield(prmin,'SDforTrackingNormalization')==1; prm.SDforTrackingNormalization=prmin.SDforTrackingNormalization;end; if isfield(prmin,'MaxumumPermissibleOctaveJump')==1; prm.MaxumumPermissibleOctaveJump=prmin.MaxumumPermissibleOctaveJump;end; if isfield(prmin,'ThresholdToStartSearch')==1; prm.ThresholdToStartSearch=prmin.ThresholdToStartSearch;end; if isfield(prmin,'ThresholdToQuitSearch')==1; prm.ThresholdToQuitSearch=prmin.ThresholdToQuitSearch;end; if isfield(prmin,'ThresholdForReliableRegion')==1; prm.ThresholdForReliableRegion=prmin.ThresholdForReliableRegion;end; end; %----- copy modified analysis conditions to internal variables f0floor=prm.F0searchLowerBound; % f0floor f0ceil=prm.F0searchUpperBound; % f0ceil shiftm=prm.F0frameUpdateInterval; % % F0 calculation interval (ms) nvo=prm.NofChannelsInOctave; % nvo=24; % Number of channels in one octave mu=prm.IFWindowStretch; % mu=1.2; % window stretch from isometric window imgi=prm.DisplayPlots; % imgi=1; % image display indicator (1: display image) smp=prm.IFsmoothingLengthRelToFc; % smp=1; % smoothing length relative to fc (ratio) minm=prm.IFminimumSmoothingLength; % minm=5; % minimum smoothing length (ms) pcIF=prm.IFexponentForNonlinearSum; % pc=0.5; % exponent to represent nonlinear summation ncIF=prm.IFnumberOfHarmonicForInitialEstimate; % nc=1; % number of harmonic component to use (1,2,3) tcpower=prm.TimeConstantForPowerCalculation; % tcpower=10; % time constant for power calculation (ms) wtlm=prm.ACtimeWindowLength; % Time window length for Autocorrelation based method (ms) ndiv=prm.ACnumberOfFrequencySegments; % for Autocorrelation method wflf=prm.ACfrequencyDomainWindowWidth; % for Autocorrelation method (Hz) pcAC=prm.ACpowerExponentForNonlinearity; % for Autocorrelation method ampAC=prm.ACamplitudeCompensationInShortLag; % for Autocorrelation method (ratio) betaAC=prm.ACexponentForACdistance; % Nonlinear distance measure for post processing lagslAC=prm.AClagSmoothingLength; % Lag smoothing length for post processing (s) !! timeslAC=prm.ACtemporalSmoothingLength; % Temporal smoothing length for post processing (ms) wAC=prm.WeightForAutocorrelationMap; % weight for combining maps (Autocorrelation) wIF=prm.WeightForInstantaneousFqMap; % weight for combining maps (Instantaneous Frequency) mixsd=prm.SDforNormalizeMixingDistance; % Normalization factor for mixing F0 distance (octave) nvc=ceil(log(f0ceil/f0floor)/log(2)*nvo); ... % Number of channels in whole search range %------- extract fixed points of frequency to instantaneous frequency map [f0v,vrv,~,~,~]= ... zfixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pcIF,ncIF); [~,pos]=zmultiCandIF(f0v,vrv); [y,ind,~]=zremoveACinduction(x,fs,pos); %------- Pre processing of AC induction if necessary if ind==1 x=y; [f0v,vrv,~,~,~]= ... zfixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pcIF,ncIF); end; %---- selecting multiple F0 candidates based on IF [val,pos]=zmultiCandIF(f0v,vrv); if imgi==1 hh=figure;semilogy(pos,'+');grid on;hold on; set(gca,'fontsize',16); axis([0 length(x)/fs*1000 f0floor f0ceil]); end; %---- selecting multiple F0 candidates based on modified Autocorrelation dn=max(1,floor(fs/max(8000,3*2*f0ceil))); if imgi==1; h1=figure; else h1=-1; end; [lagspec,lx]= ... zlagspectestnormal(decimate(x,dn),fs/dn,shiftm,length(x)/fs*1000,shiftm,wtlm,ndiv,wflf,pcAC,ampAC,h1); [f02,pl2]=zmultiCandAC(lx,lagspec,betaAC,lagslAC,timeslAC); if imgi==1 figure(hh);semilogy(f02,'o');hold off xlabel('time (ms)');ylabel('frequency (Hz)'); title('F0 candidates: o:autocorrelation +:instantaneous frequency') end; %----- Combine multiple source information with dynamic range normalization auxouts.F0candidatesByIF=pos; auxouts.CNofcandidatesByIF=val; auxouts.F0candidatesByAC=f02; auxouts.ACofcandidatesByAC=pl2; [f0cand,relv]=zcombineRanking4(auxouts,mixsd,wAC,wIF,prm); % New mixing routine if imgi==1 figure semilogy(f0cand,'+');grid on; set(gca,'fontsize',16); axis([0 length(f0cand) f0floor f0ceil]); title('F0 candidates by mixed source information'); xlabel('time (ms)') ylabel('frequency (Hz)') end; %----- Calculate power envelope pws=zVpowercalc(x,fs,tcpower,shiftm,2000); pwsdb=10*log10(abs(pws)+0.00000000001); mxpwsdb=max(pwsdb); [hstgrm,binlvl]=hist(pwsdb,mxpwsdb+(-60:2)); q10=interp1(cumsum(hstgrm+0.000000001)/sum(hstgrm)*100,binlvl,10); % 10% quantile level [~,minid]=min(abs(q10-binlvl)); bb=max(1,min(length(binlvl),minid+(-5:5))); % search range 10 dB % safeguard noiselevel=sum(hstgrm(bb).*binlvl(bb))/sum(hstgrm(bb)); if imgi==1 figure plot(pwsdb);grid on; set(gca,'fontsize',16); axis([0 length(pwsdb) noiselevel-10 max(pwsdb)]); hold on; plot([0 length(pwsdb)],noiselevel*[1 1],'r'); plot([0 length(pwsdb)],noiselevel*[1 1]+3,'r-.'); title('Instantaneous power solid line:noise level, dash-dot line:threshold') xlabel('time (ms)');ylabel('power (dB)') end; %----- F0 tracking ac1=zeros(1,length(f0cand)); for ii=1:length(f0cand) ac1(ii)=zfirstac(x,fs,round(ii/1000*fs),30); end; auxouts.F0candidatesByMix=f0cand; auxouts.RELofcandidatesByMix=relv; auxouts.FirstAutoCorrelation=ac1; auxouts.InstantaneousPower=pwsdb; [f0s,rels,csegs]=zcontiguousSegment10(auxouts,prm); [f0raw0,~]=zfillf0gaps6(auxouts,f0s,rels,csegs,prm); if imgi==1; figure semilogy(f0raw0,'c');grid on; set(gca,'fontsize',16); axis([0 length(f0raw0) f0floor f0ceil]); drawnow; end; %------ F0 refinement using first three harmonic components f0raw0(isnan(f0raw0))=zeros(size(f0raw0(isnan(f0raw0)))); f0raw0(f0raw0>f0ceil)=f0raw0(f0raw0>f0ceil)*0+f0ceil; f0raw0((f0raw00))=f0raw0((f0raw00))*0+f0floor; [f0raw2,ecr,ac1]=zrefineF06m(decimate(x,dn),fs/dn,f0raw0,1024,1.1,3,1,1,length(f0raw0)); if imgi==1; hold on; semilogy(f0raw2,'g');grid on; end; %----- new V/UV decision routine 15/Aug./2004 auxouts.BackgroundNoiselevel=noiselevel; vuv=zvuvdecision4(f0raw2,auxouts); nnll=min(length(f0raw2),length(vuv)); f0raw3=f0raw2(1:nnll).*vuv(1:nnll); if imgi==1 semilogy(f0raw3,'k');hold off title('F0 estimates, cyan:initial, greeen:fine-tuned, black:voiced part') xlabel('time (ms)');ylabel('frequency (Hz)'); end; f0raw=f0raw2(1:nnll);vuv=vuv(1:nnll); auxouts.F0candidatesByIF=pos; auxouts.CNofcandidatesByIF=val; auxouts.F0candidatesByAC=f02; auxouts.ACofcandidatesByAC=pl2; auxouts.F0candidatesByMix=f0cand; auxouts.RELofcandidatesByMix=relv; auxouts.RefinedCN=ecr; auxouts.FirstAutoCorrelation=ac1; auxouts.F0initialEstimate=f0raw0; auxouts.BackgroundNoiselevel=noiselevel; auxouts.InstantaneousPower=pwsdb; auxouts.RefinedF0estimates=f0raw; auxouts.VUVindicator=vuv; if imgi==1; displaysummary(auxouts,f0floor,f0ceil); end; switch nargout case 0 f0raw=auxouts;eval(['help ' mfilename]); case 1 f0raw=f0raw3; case 2 case {3,4} otherwise eval(['help ' mfilename]); return; end; end %------ function prm=zsetdefaultparams prm.F0searchLowerBound=40; % f0floor prm.F0searchUpperBound=800; % f0ceil prm.F0frameUpdateInterval=1; % shiftm % F0 calculation interval (ms) prm.NofChannelsInOctave=24; % nvo=24; % Number of channels in one octave prm.IFWindowStretch=1.2; % mu=1.2; % window stretch from isometric window prm.DisplayPlots=0; % imgi=1; % image display indicator (1: display image) prm.IFsmoothingLengthRelToFc=1; % smp=1; % smoothing length relative to fc (ratio) prm.IFminimumSmoothingLength=5; % minm=5; % minimum smoothing length (ms) prm.IFexponentForNonlinearSum=0.5; % pc=0.5; % exponent to represent nonlinear summation prm.IFnumberOfHarmonicForInitialEstimate=1; % nc=1; % number of harmonic component to use (1,2,3) prm.TimeConstantForPowerCalculation=10; % tcpower=10; % time constant for power calculation (ms) prm.ACtimeWindowLength=60; % Time window length for Autocorrelation based method (ms) prm.ACnumberOfFrequencySegments=8; % for Autocorrelation method prm.ACfrequencyDomainWindowWidth=2200; % for Autocorrelation method (Hz) prm.ACpowerExponentForNonlinearity=0.5; % for Autocorrelation method prm.ACamplitudeCompensationInShortLag=1.6; %2.2; % for Autocorrelation method (ratio) 23/July/2004 prm.ACexponentForACdistance=4; % Nonlinear distance measure for post processing prm.AClagSmoothingLength=0.0001; % Lag smoothing length for post processing (s) !! 0.01 to 0.0001 23/July/2004 prm.ACtemporalSmoothingLength=20; % Temporal smoothing length for post processing (ms) prm.ThresholdForSilence=3; % for silence decision above average noise level (dB) prm.ThresholdForVUV=0.6; % for V/UV decision based on first autocorrelation and C/N prm.WeightForAutocorrelationMap=1; % weight for combining maps (Autocorrelation) prm.WeightForInstantaneousFqMap=1; % weight for combining maps (Instantaneous Frequency) prm.VUVthresholdOfAC1=-0.1; % First autocorrelation thershould for VUV in segment search prm.SDforNormalizeMixingDistance=0.3; % Normalization factor for mixing F0 distance (octave) prm.SDforTrackingNormalization=0.2; prm.MaxumumPermissibleOctaveJump=0.4; prm.ThresholdToStartSearch=0.3; prm.ThresholdToQuitSearch=0.35; prm.ThresholdForReliableRegion=0.25; prm.WhoAmI=mfilename; end %%%------------ function oki=displaysummary(f,f0floor,f0ceil) oki=1; nn=length(f.RefinedF0estimates); figure subplot(211); semilogy(f.F0initialEstimate,'c');grid on; hold on; semilogy(f.RefinedF0estimates.*f.VUVindicator,'b');grid on; set(gca,'fontsize',14); xlabel('time (ms)'); ylabel('frequency (Hz)'); axis([1 nn f0floor f0ceil]); subplot(212); plot(f.RELofcandidatesByMix,'.');grid on; set(gca,'fontsize',14); xlabel('time (ms)'); ylabel('relative periodicity'); axis([1 nn 0 1]); drawnow; end %%%------------ function [f0v,vrv,dfv,nf,aav]=zfixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pc,nc) % Fixed point analysis to extract F0 % [f0v,vrv,dfv,nf]=fixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pc,nc) % x : input signal % fs : sampling frequency (Hz) % f0floor : lowest frequency for F0 search % nvc : total number of filter channels % nvo : number of channels per octave % mu : temporal stretching factor % imgi : image display indicator (1: display image) % shiftm : frame shift in ms % smp : smoothing length relative to fc (ratio) % minm : minimum smoothing length (ms) % pc : exponent to represent nonlinear summation % nc : number of harmonic component to use (1,2,3) % Designed and coded by Hideki Kawahara % 28/March/1999 x=cleaninglownoise(x,fs,f0floor); fxx=f0floor*2.0.^((0:nvc-1)/nvo)'; fxh=max(fxx); dn=max(1,floor(fs/(fxh*6.3))); if nc>2 pm3=zmultanalytFineCSPB(decimate(x,dn),fs/dn,f0floor,nvc,nvo,mu,3); % error crrect 2002.9.19 (mu was fixed 1.1) pif3=zwvlt2ifq(pm3,fs/dn); [~,mm]=size(pif3); pif3=pif3(:,1:3:mm); pm3=pm3(:,1:3:mm); end; if nc>1 pm2=zmultanalytFineCSPB(decimate(x,dn),fs/dn,f0floor,nvc,nvo,mu,2);% error crrect 2002.9.19(mu was fixed 1.1) pif2=zwvlt2ifq(pm2,fs/dn); [~,mm]=size(pif2); pif2=pif2(:,1:3:mm); pm2=pm2(:,1:3:mm); end; pm1=zmultanalytFineCSPB(decimate(x,dn*3),fs/(dn*3),f0floor,nvc,nvo,mu,1);% error crrect 2002.9.19(mu was fixed 1.1) %%%% safe guard added on 15/Jan./2003 mxpm1=max(max(abs(pm1))); eeps=mxpm1/10000000; pm1(pm1==0)=pm1(pm1==0)+eeps; %%%% safe guard end pif1=zwvlt2ifq(pm1,fs/(dn*3)); [~,mm1]=size(pif1); mm=mm1; if nc>1 [~,mm2]=size(pif2); mm=min(mm1,mm2); end; if nc>2 [~,mm3]=size(pif3); mm=min([mm1 mm2 mm3]); end; if nc == 2 for ii=1:mm pif2(:,ii)=(pif1(:,ii).*(abs(pm1(:,ii))).^pc ... +pif2(:,ii)/2.*(abs(pm2(:,ii))).^pc )... ./((abs(pm1(:,ii))).^pc+(abs(pm2(:,ii))).^pc); end; end; if nc == 3 for ii=1:mm pif2(:,ii)=(pif1(:,ii).*(abs(pm1(:,ii))).^pc ... +pif2(:,ii)/2.*(abs(pm2(:,ii))).^pc ... +pif3(:,ii)/3.*(abs(pm3(:,ii))).^pc )... ./((abs(pm1(:,ii))).^pc+(abs(pm2(:,ii))).^pc+(abs(pm3(:,ii))).^pc); end; end; if nc == 1 pif2=pif1; end; pif2=pif2*2*pi; dn=dn*3; [slp,~]=zifq2gpm2(pif2,f0floor,nvo); [nn,mm]=size(pif2); dpif=(pif2(:,2:mm)-pif2(:,1:mm-1))*fs/dn; dpif(:,mm)=dpif(:,mm-1); [dslp,~]=zifq2gpm2(dpif,f0floor,nvo); damp=(abs(pm1(:,2:mm))-abs(pm1(:,1:mm-1)))*fs/dn; damp(:,mm)=damp(:,mm-1); damp=damp./abs(pm1); fxx=f0floor*2.0.^((0:nn-1)/nvo)'*2*pi; mmp=0*dslp; [c1,c2b]=znrmlcf2(1); for ii=1:nn c2=c2b*(fxx(ii)/2/pi)^2; cff=damp(ii,:)/fxx(ii)*2*pi*0; mmp(ii,:)=(dslp(ii,:)./(1+cff.^2)/sqrt(c2)).^2+(slp(ii,:)./sqrt(1+cff.^2)/sqrt(c1)).^2; end; if smp~=0 smap=zsmoothmapB(mmp,fs/dn,f0floor,nvo,smp,minm,0.4); else smap=mmp; end; fixpp=zeros(round(nn/3),mm); fixvv=fixpp+100000000; fixdf=fixpp+100000000; fixav=fixpp+1000000000; nf=zeros(1,mm); for ii=1:mm [ff,vv,df,aa]=zfixpfreq3(fxx,pif2(:,ii),smap(:,ii),dpif(:,ii)/2/pi,pm1(:,ii)); kk=length(ff); fixpp(1:kk,ii)=ff; fixvv(1:kk,ii)=vv; fixdf(1:kk,ii)=df; fixav(1:kk,ii)=aa; nf(ii)=kk; end; fixpp(fixpp==0)=fixpp(fixpp==0)+1000000; np=max(nf); f0v=fixpp(1:np,round(1:shiftm/dn*fs/1000:mm))/2/pi; vrv=fixvv(1:np,round(1:shiftm/dn*fs/1000:mm)); dfv=fixdf(1:np,round(1:shiftm/dn*fs/1000:mm)); aav=fixav(1:np,round(1:shiftm/dn*fs/1000:mm)); nf=nf(round(1:shiftm/dn*fs/1000:mm)); if imgi == 1;end; end %---------------------------------------------------------------- function pif=zwvlt2ifq(pm,fs) % Wavelet to instantaneous frequency map % fqv=wvlt2ifq(pm,fs) % Coded by Hideki Kawahara % 02/March/1999 [~,mm]=size(pm); pm=pm./(abs(pm)); pif=abs(pm(:,:)-[pm(:,1),pm(:,1:mm-1)]); pif=fs/pi*asin(pif/2); pif(:,1)=pif(:,2); end %---------------------------------------------------------------- function [slp,pbl]=zifq2gpm2(pif,f0floor,nvo) % Instantaneous frequency 2 geometric parameters % [slp,pbl]=ifq2gpm(pif,f0floor,nvo) % slp : first order coefficient % pbl : second order coefficient % Coded by Hideki Kawahara % 02/March/1999 [nn,~]=size(pif); fx=f0floor*2.0.^((0:nn-1)/nvo)*2*pi; c=2.0^(1/nvo); g=[1/c/c 1/c 1;1 1 1;c*c c 1]; h=inv(g); slp=((pif(2:nn-1,:)-pif(1:nn-2,:))/(1-1/c) ... +(pif(3:nn,:)-pif(2:nn-1,:))/(c-1))/2; slp=[slp(1,:);slp;slp(nn-2,:)]; pbl=pif(1:nn-2,:)*h(2,1)+pif(2:nn-1,:)*h(2,2)+pif(3:nn,:)*h(2,3); pbl=[pbl(1,:);pbl;pbl(nn-2,:)]; for ii=1:nn slp(ii,:)=slp(ii,:)/fx(ii); pbl(ii,:)=pbl(ii,:)/fx(ii); end; end %------------------------------------------ function p=zGcBs(x,k) tt=x+0.0000001; p=tt.^k.*exp(-pi*tt.^2).*(sin(pi*tt+0.0001)./(pi*tt+0.0001)).^2; end %-------------------------------------------- function smap=zsmoothmapB(map,fs,f0floor,nvo,mu,mlim,pex) [nvc,mm]=size(map); t0=1/f0floor; lmx=round(6*t0*fs*mu); wl=2^ceil(log(lmx)/log(2)); gent=((1:wl)-wl/2)/fs; smap=map; mpv=1; zt=0*gent; iiv=1:mm; for ii=1:nvc t=gent*mpv; %t0*mu/mpv*1000 t=t(abs(t)<3.5*mu*t0); wbias=round((length(t)-1)/2); wd1=exp(-pi*(t/(t0*(1-pex))/mu).^2); wd2=exp(-pi*(t/(t0*(1+pex))/mu).^2); wd1=wd1/sum(wd1); wd2=wd2/sum(wd2); tm=fftfilt(wd1,[map(ii,:) zt]); tm=fftfilt(wd2,[1.0./tm(iiv+wbias) zt]); smap(ii,:)=1.0./tm(iiv+wbias); if t0*mu/mpv*1000 > mlim mpv=mpv*(2.0^(1/nvo)); end; end; end %-------------------------------------------- function [ff,vv,df,aa]=zfixpfreq3(fxx,pif2,mmp,dfv,pm) aav=abs(pm); nn=length(fxx); iix=(1:nn)'; cd1=pif2-fxx; cd2=[diff(cd1);cd1(nn)-cd1(nn-1)]; cdd1=[cd1(2:nn);cd1(nn)]; fp=(cd1.*cdd1<0).*(cd2<0); ixx=iix(fp>0); ff=pif2(ixx)+(pif2(ixx+1)-pif2(ixx)).*cd1(ixx)./(cd1(ixx)-cdd1(ixx)); vv=mmp(ixx)+(mmp(ixx+1)-mmp(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); df=dfv(ixx)+(dfv(ixx+1)-dfv(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); aa=aav(ixx)+(aav(ixx+1)-aav(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); end %-------------------------------------------- function [c1,c2]=znrmlcf2(f) n=100; x=0:1/n:3; g=zGcBs(x,0); dg=[diff(g) 0]*n; dgs=dg/2/pi/f; xx=2*pi*f*x; c1=sum((xx.*dgs).^2)/n*2; c2=sum((xx.^2.*dgs).^2)/n*2; end %-------------------------------------------- function x=cleaninglownoise(x,fs,f0floor) flm=50; flp=round(fs*flm/1000); nn=length(x); wlp=fir1(flp*2,f0floor/(fs/2)); wlp(flp+1)=wlp(flp+1)-1; wlp=-wlp; tx=[x(:)' zeros(1,2*length(wlp))]; ttx=fftfilt(wlp,tx); x=ttx((1:nn)+flp); end %%%--------- function pm=zmultanalytFineCSPB(x,fs,f0floor,nvc,nvo,mu,mlt) % Dual waveleta analysis using cardinal spline manipulation % pm=multanalytFineCSPB(x,fs,f0floor,nvc,nvo,mu,mlt) % Input parameters % % x : input signal (2kHz sampling rate is sufficient.) % fs : sampling frequency (Hz) % f0floor : lower bound for pitch search (60Hz suggested) % nvc : number of total voices for wavelet analysis % nvo : number of voices in an octave % mu : temporal stretch factor % mlt : harmonic ID# % Outpur parameters % pm : wavelet transform using iso-metric Gabor function % % If you have any questions, mailto:kawahara@hip.atr.co.jp % % Copyright (c) ATR Human Information Processing Research Labs. 1996 % Invented and coded by Hideki Kawahara % 30/Oct./1996 % 07/Dec./2002 waitbar was added t0=1/f0floor; lmx=round(6*t0*fs*mu); wl=2^ceil(log(lmx)/log(2)); x=x(:)'; nx=length(x); tx=[x,zeros(1,wl)]; gent=((1:wl)-wl/2)/fs; pm=zeros(nvc,nx); mpv=1; for ii=1:nvc tb=gent*mpv; t=tb(abs(tb)<3.5*mu*t0); wd1=exp(-pi*(t/t0/mu).^2); wd2=max(0,1-abs(t/t0/mu)); wd2=wd2(wd2>0); wwd=conv(wd2,wd1); wwd=wwd(abs(wwd)>0.00001); wbias=round((length(wwd)-1)/2); wwd=wwd.*exp(1i*2*pi*mlt*t(round((1:length(wwd))-wbias+length(t)/2))/t0); pmtmp1=fftfilt(wwd,tx); pm(ii,:)=pmtmp1(wbias+1:wbias+nx)*sqrt(mpv); mpv=mpv*(2.0^(1/nvo)); end; end %%%----- function [val,pos]=zmultiCandIF(f0v,vrv) % [val,pos]=multiCandIF(f0v,vrv) % F0 candidates based on instantaneous frequency % fixed points % f0v : fixed point frequencies (Hz) % vrv : fixed point N/C (ratio) % by Hideki Kawahara % 23/June/2004 [nr,nc]=size(f0v); [nr2,nc2]=size(vrv); if (nr~=nr2) || (nc~=nc2);val=[];pos=[];return;end; vrvdb=-zdBpower(vrv); mxfq=100000; val=zeros(nc,3); pos=ones(nc,3); for ii=1:nc f=f0v(:,ii)'; v=vrvdb(:,ii)'; v=v(f1 v(mxp)=v(mxp)*0-50; [~,mxp]=max(v); pos(ii,2)=f(mxp); val(ii,2)=v(mxp); if length(f)>2 v(mxp)=v(mxp)*0-50; [~,mxp]=max(v); pos(ii,3)=f(mxp); val(ii,3)=v(mxp); else pos(ii,3)=pos(ii,2);val(ii,3)=val(ii,2); end; else pos(ii,2)=pos(ii,1);val(ii,2)=val(ii,1); end; end; end %%%------ function y=zdBpower(x) y=10*log10(x); end %%%------ function [y,ind,fq]=zremoveACinduction(x,fs,pos) % [y,ind,fq]=removeACinduction(x,fs,pos); % Function to remove AC induction % x : input speech signal % fs : sampling frequency (Hz) % pos : Locations of Top-three F0 candidates (Hz) % Output parameter % y : speech signal without AC induction % ind : 1 indicates AC induction was detected % fq : frequency of AC induction % Designed and coded by Hideki Kawahawra, % 24/June/2004 x=x(:); ind=0; f=pos(:); h50=sum(abs(f-50)<5)/sum(f>0); h60=sum(abs(f-60)<5)/sum(f>0); if (h50<0.2) && (h60<0.2);y=x;fq=0;return;end; ind=1; if h50>h60 fq=50; else fq=60; end; tx=(1:length(x))'/fs; fqv=((-0.3:0.025:0.3)+fq); txv=tx*fqv; fk=x'*exp(-1i*2*pi*txv)/length(x); [~,ix]=max(abs(fk)); fq=fqv(ix); y=x-2*real(fk(ix)*exp(1i*2*pi*fq*tx)); end %%%---- function [lagspec,lx]=zlagspectestnormal(x,fs,stp,edp,shiftm,wtlm,ndiv,wflf,pc,amp,h) % Lag spectrogram for F0 extraction % [lagspec,lx]=lagspectestnormal(x,fs,stp,edp,shiftm,wtlm,ndiv,wflf,pc,amp,h) % x : waveform % fs : sampling frequency (Hz) % stp : starting position (ms) % edp : end position (ms) % shiftm : frame shift for analysis (ms) % wtlm : time window length (ms) % ndiv : number of segment in the frequency domain % wflf : frequency domain window length (Hz) % pc : power exponent for nonlinearity % amp : amount of lag window compensation % h : handle for graph % 16/June/2004 Simplified version % 17/June/2004 with normalization nftm=floor((edp-stp)/shiftm); pm=stp; [~,~,~,lx]=ztestspecspecnormal(x,fs,pm,wtlm,ndiv,wflf,pc,amp); nlx=length(lx); lagspec=zeros(nlx,nftm); for ii=1:nftm pmmul=stp+(ii-1)*shiftm; [acc,~,~,lx]=ztestspecspecnormal(x,fs,pmmul,wtlm,ndiv,wflf,pc,amp);%keyboard; lagspec(:,ii)=mean(acc,2)/mean(acc(1,:)); end; if h>0 figure(h); imagesc([stp edp],[0 max(lx)]*1000,max(0,lagspec)); axis('xy') axis([stp edp 0 40]); set(gca,'fontsize',16); xlabel('time (ms)') ylabel('lag (ms)') title(['wtl=' num2str(wtlm) 'ms ndiv=' num2str(ndiv) ' wfl=' num2str(wflf) 'Hz PC=' num2str(pc) ... ' fs=' num2str(fs) 'Hz amp=' num2str(amp)]); drawnow; end; end %%%------ function [acc,abase,fx,lx]=ztestspecspecnormal(x,fs,pm,wtlm,ndiv,wflf,pc,amp) % Modified auto correlation % [acc,abase,fx,lx]=testspecspecnormal(x,fs,pm,wtlm,ndiv,wflf,pc,amp); % input parameters % x : signal to be analyzed % fs : sampling frequency (Hz) % pm : position to be tested (ms) % wtlm : time window length (ms) % ndiv : number of division on frequency axis % wflf : frequency window length (hz) % pc : power exponent % amp : amount of lag window compensation % output parameters % acc : spectrogram on frequency axis % : (periodicity gram on local frequency area) % fx : frequency axis % lx : lag axis % Test program for spectrum check % by Hideki Kawahara 27 March 2004 % 29/March/2004 streamlined % 12/June/2004 Bias term removed % 16/June/2004 Simplified version % 17/June/2004 Spectral normalization version x=x(:); % make x a column vector wtlms=round(wtlm/1000*fs); % windowlength in samples wtlmso=floor(wtlms/2)*2+1; bb=(1:wtlmso)-(wtlmso-1)/2; % time base for window; fftl=2^ceil(log2(wtlmso)); % set FFT length to 2's exponent x=[zeros(fftl,1);x;zeros(fftl,1)]; % safeguard p=round(pm/1000*fs); % analysis position in samples fx=(0:fftl-1)/fftl*fs; tx=(0:fftl-1); tx(tx>fftl/2)=tx(tx>fftl/2)-fftl; tx=tx/fs; lagw=exp(-(tx/0.0035).^2); % EGGF0testn12 lagw2=exp(-(tx/0.0016).^2);% EGGF0testn12 xt=x(fftl+bb+p); % waveform segment to be analyzed if sum(abs(xt))<1e-10 % bug fix 11/Jan./2005 xt=xt+randn(size(xt)); end; abase=abs(fft(xt.*blackman(wtlmso),fftl)); ac=ifft(abase.^2); npw=real(fft(ac.*lagw')); pw=abase.^2.0./real(npw); fsp=fs/fftl; wflfs=round(wflf/fsp); % frequency window length in bins wflfso=floor(wflfs/2)*2+1; bbf=(1:wflfso)-(wflfso-1)/2; % index for frequency window fftlf=2^ceil(log2(wflfso)+2); lx=(0:fftlf/2-1)/(fsp*fftlf); nsht=fftl/2/ndiv; acc=zeros(fftlf/2,ndiv+1); w2=hanning(wflfso); ampw=1-lagw*(1-1/amp); ampw=(1-lagw2(1:fftlf/2)'*(1-1/amp))./ampw(1:fftlf/2)'; for ii=1:ndiv+1 p=rem(round(fftl/2+bbf+(ii-1)*nsht),fftl)+1; ac=abs(fft((pw(p)).*w2,fftlf))*(npw(p((wflfso-1)/2))).^pc; acc(:,ii)=ac(1:fftlf/2).*ampw; end; end %%%------- function pws=zVpowercalc(x,fs,wtc,shiftm,fc) % pws=Vpowercalc(x,fs,wtc,shiftm,fc) % x : waveform % fs : sampling frequency (Hz) % wtc : window time constatnt (ms) % shifrm : frame update interval (ms) % fc : LPF cut-off frequency (Hz) %---- window design for pwer smoothing t=(0:1/fs:wtc*5/1000); w=exp(-t/(wtc/1000)); w=w-w(end); w=w/sum(w); %----- window for preprocesing LPF lw=round(fs/fc*2); b=fir1(lw-1,2*fc/fs); nn=length(x); x=fftfilt(b,[x(:);zeros(lw,1)]); x=x((1:nn)+round(lw/2)-1); yf=fftfilt(w,x.^2); yb=fftfilt(w,x(end:-1:1).^2); yb=yb(end:-1:1); y=min(yf,yb); nn=length(x); pws=interp1((0:nn-1)/fs*1000,y,0:shiftm:(nn-1)/fs*1000); end %%%----- function [f0r,ecr,ac1]=zrefineF06m(x,fs,f0raw,fftl,eta,nhmx,shiftm,nl,nu) % F0 estimation refinement % [f0r,ecr]=refineF06m(x,fs,f0raw,fftl,nhmx,shiftm,nl,nu) % x : input waveform % fs : sampling frequency (Hz) % f0raw : F0 candidate (Hz) % fftl : FFT length % eta : temporal stretch factor % nhmx : highest harmonic number % shiftm : frame shift period (ms) % nl : lower frame number % nu : uppter frame number % % Example of usage (with STRAIGHT) % f0raw=f0raw(:)'; f0i=f0raw; f0i(f0i==0)=f0i(f0i==0)+160; fax=(0:fftl-1)/fftl*fs; nfr=length(f0i); % 07/August/1999 shiftl=shiftm/1000*fs; x=[zeros(fftl,1); x(:) ; zeros(fftl,1)]'; ec1=cos(2*pi*(0:fftl-1)/fftl); % first auto correlation basis function ac1=f0raw*0; tt=((1:fftl)-fftl/2)/fs; th=(0:fftl-1)/fftl*2*pi; rr=exp(-1i*th); f0t=100; w1=max(0,1-abs(tt'*f0t/eta)); w1=w1(w1>0); wg=exp(-pi*(tt*f0t/eta).^2); wgg=(wg(abs(wg)>0.0002)); wo=fftfilt(wgg,[w1; zeros(length(wgg),1)])'; xo=(0:length(wo)-1)/(length(wo)-1); nlo=length(wo)-1; if nl*nu <0 nl=1; nu=nfr; end; bx=1:fftl/2+1; pif=zeros(fftl/2+1,nfr); dpif=zeros(fftl/2+1,nfr); pwm=zeros(fftl/2+1,nfr); for kk=nl:nu if f0i(kk) < 40 f0i(kk)=40; end; f0t=f0i(kk); xi=0:1/nlo*f0t/100:1; wa=interp1(xo,wo,xi,'*linear'); wal=length(wa); bb=1:wal; bias=round(fftl-wal/2+(kk-1)*shiftl); dcl=mean(x(bb+bias)); txm1=x(bb+bias-1); tx0=x(bb+bias); txp1=x(bb+bias+1); if (sum(abs(txm1))<1e-20)||(sum(abs(txm1))<1e-20)||(sum(abs(txm1))<1e-20)||(sum(abs(txm1))<1e-20) xtmp=x+randn(size(x)); % this if clause is a bug fix. 11/Jan./2005 dcl=mean(xtmp(bb+bias)); txm1=xtmp(bb+bias-1); tx0=xtmp(bb+bias); txp1=xtmp(bb+bias+1); end; ff0=fft((txm1-dcl).*wa,fftl); ff1=fft((tx0-dcl).*wa,fftl); ff2=fft((txp1-dcl).*wa,fftl); ff0(ff0==0)=ff0(ff0==0)+0.000000001; ff1(ff1==0)=ff1(ff1==0)+0.000000001; ff2(ff2==0)=ff2(ff2==0)+0.000000001; fd=ff2.*rr-ff1; fd0=ff1.*rr-ff0; crf=fax+(real(ff1).*imag(fd)-imag(ff1).*real(fd))./(abs(ff1).^2)*fs/pi/2; crf0=fax+(real(ff0).*imag(fd0)-imag(ff0).*real(fd0))./(abs(ff0).^2)*fs/pi/2; pif(:,kk)=crf(bx)'*2*pi; dpif(:,kk)=(crf(bx)-crf0(bx))'*2*pi; pwm(:,kk)=abs(ff1(bx)'); % 29/July/1999 ac1(kk)=sum(abs(ff1).^2.0.*ec1)/sum(abs(ff1).^2); end; slp=([pif(2:fftl/2+1,:);pif(fftl/2+1,:)]-pif)/(fs/fftl*2*pi); dslp=([dpif(2:fftl/2+1,:);dpif(fftl/2+1,:)]-dpif)/(fs/fftl*2*pi)*fs; mmp=slp*0; [c1,c2]=znrmlcf3(shiftm); fxx=((0:fftl/2)+0.5)/fftl*fs*2*pi; %--- calculation of relative noise level for ii=1:fftl/2+1; c2=c2*(fxx(ii)/2/pi)^2; mmp(ii,:)=(dslp(ii,:)/sqrt(c2)).^2+(slp(ii,:)/sqrt(c1)).^2; end; %--- Temporal smoothing sml=round(1.5*fs/1000/2/shiftm)*2+1; % 3 ms, and odd number smb=round((sml-1)/2); % bias due to filtering smmp=fftfilt((hanning(sml).^2)/sum((hanning(sml).^2)),[mmp zeros(fftl/2+1,sml*2)]'+0.00001)'; smmp(smmp==0)=smmp(smmp==0)+0.0000000001; smmp=1.0./fftfilt(hanning(sml)/sum(hanning(sml)),1.0./smmp')'; smmp=smmp(:,max(1,(1:nfr)+sml-2)); % fixed by H.K. on 10/Dec./2002 %--- Power adaptive weighting (29/July/1999) spwm=fftfilt(hanning(sml)/sum(hanning(sml)),[pwm zeros(fftl/2+1,sml*2)]')'; spwm(spwm==0)=spwm(spwm==0)+0.00000001; spfm=fftfilt(hanning(sml)/sum(hanning(sml)),[pwm.*pif zeros(fftl/2+1,sml*2)]'+0.00001)'; spif=spfm./spwm; spif=spif(:,(1:nfr)+smb); idx=max(0,f0i/fs*fftl); fqv=zeros(nhmx,nfr); vvv=zeros(nhmx,nfr); iidx=(0:nfr-1)*(fftl/2+1)+1; for ii=1:nhmx iidx=idx+iidx; vvv(ii,:)=(smmp(floor(iidx))+(iidx-floor(iidx)).*(smmp(floor(iidx)+1)-smmp(floor(iidx))))/(ii*ii); fqv(ii,:)=(spif(floor(iidx))+(iidx-floor(iidx)).*(spif(floor(iidx)+1)-spif(floor(iidx))))/2/pi/ii; % 29/July/199 end; vvvf=1.0./sum(1.0./vvv); f0r=sum(fqv./sqrt(vvv))./sum(1.0./sqrt(vvv)).*(f0raw>0); ecr=sqrt(1.0./vvvf).*(f0raw>0)+(f0raw<=0); end %-------------------- function [c1,c2]=znrmlcf3(f) n=100; x=0:1/n:3; g=GcBs(x,0); dg=[diff(g) 0]*n; dgs=dg/2/pi/f; xx=2*pi*f*x; c1=sum((xx.*dgs).^2)/n; c2=sum((xx.^2.*dgs).^2)/n; end %--------------------- function p=GcBs(x,k) tt=x+0.0000001; p=tt.^k.*exp(-pi*tt.^2).*(sin(pi*tt+0.0001)./(pi*tt+0.0001)).^2; end %%%---------------- function [f0,pl]=zcombineRanking4(p,beta,wAC,wIF,prm) % F0candidatesByIF: [2978x3 double] % CNofcandidatesByIF: [2978x3 double] % F0candidatesByAC: [2977x3 double] % ACofcandidatesByAC: [2977x3 double] % RefinedCN: [1x2977 double] % FirstAutoCorrelation: [1x2977 double] % F0initialEstimate: [2977x1 double] % BackgroundNoiselevel: -69.1041 % InstantaneousPower: [1x2979 double] f0floor=prm.F0searchLowerBound; % f0floor f0ceil=prm.F0searchUpperBound; % f0ceil n=min([length(p.F0candidatesByIF) length(p.F0candidatesByAC)]); nvo=24; nvc=ceil(log2(f0ceil/f0floor))*nvo; fx=f0floor*2.0.^((0:nvc-1)/nvo); lfx=log2(fx); logf0if=log2(p.F0candidatesByIF); logf0ac=log2(p.F0candidatesByAC); relif=max(0.000000001,(p.CNofcandidatesByIF-min(p.CNofcandidatesByIF(:,1)))./ ... (max(p.CNofcandidatesByIF(:,1))-min(p.CNofcandidatesByIF(:,1)))); relac=max(0.000000001,((p.ACofcandidatesByAC-min(p.ACofcandidatesByAC(:,1)))./ ... (max(p.ACofcandidatesByAC(:,1))-min(p.ACofcandidatesByAC(:,1))))); f0=zeros(n,6); pl=zeros(n,6); initv=0*lfx; for ii=1:n IFmap=initv; ACmap=initv; for jj=1:3 IFmap=IFmap+relif(ii,jj)^2*exp(-((logf0if(ii,jj)-lfx)/beta).^2); % 27/July/2004 ^2 ACmap=ACmap+relac(ii,jj)^2*exp(-((logf0ac(ii,jj)-lfx)/beta).^2); % 27/July/2004 ^2 end; f0map=sqrt(wIF*IFmap+wAC*ACmap)/sqrt(2); % 27/July/2004 Addition f0mapbak=f0map; ix=find((diff([f0map(1) f0map]).*diff([f0map f0map(end)]))<0); if ~isempty(ix) [~,mxp]=max(f0map(ix)); [pl(ii,1),~]=zzParabolicInterp2(f0mapbak((-1:1)+ix(mxp)),ix(mxp)); [~,idsrt]=sort(-f0map(ix)); nix=length(ix); for jj=1:6 if jj>nix pl(ii,jj)=pl(ii,jj-1); f0(ii,jj)=f0(ii,jj-1); else [pl(ii,jj),f0(ii,jj)]=zzParabolicInterp2(f0mapbak((-1:1)+ix((idsrt(jj)))),ix((idsrt(jj)))); end; end; else pl(ii,1)=0; end; end; f0=f0floor*2.0.^(f0/nvo); end %------- function [val,pos]=zzParabolicInterp2(yv,xo) lp=diff(yv); a=lp(1)-lp(2); b=(lp(1)+lp(2))/2; xp=b/a+xo; val=yv(2)+0.5*a*(b/a)^2+b*(b/a); pos=xp-1; end %%%--- function ac1=zfirstac(x,fs,ix,wlms) wl=round(fs*wlms/1000); fftl=2.0^ceil(log2(wl)); xx=x(:); idx=ix+(1:wl)-round(wl/2); xt=xx(min(length(xx),max(1,idx))); if sum(abs(xt))<1e-20 % bug fix 11/Jan./2005 xt=xt+randn(size(xt)); end; fw=abs(fft(xt.*hanning(wl),fftl)).^2; fx=(0:fftl-1)/fftl*fs; [~,ixx]=min(abs(fx-4000)); if ixx>fftl/2 ixx=fftl/2; end; c=cos(2*pi*fx(1:ixx)/(fx(ixx)*2)); ac1=sum(c'.*fw(1:ixx))/sum(fw(1:ixx)); end %%%------- function [f0,pl]=zmultiCandAC(lx,lagspec,beta,lagsp,timesp) % F0 candidates extraction from time-lag representation % using palabolic interpolation % and a new harmonic supression technique % [f0,pl]=multiCandAC(lx,lagspec,beta,lagsp,timesp) % lx : lag axis % lagspec : time-lag representation % beta : nonlinear distance measure % lagsp : lag smoothing parameter (ms) % timesp : temporal smoothing parameter (ms) % output parameters % f0 : fundamental frequency (Hz) % pl : peak level % Designed and coded by Hideki Kawahara % 30/March/2004 % 16/June/2004 Peak picking first % 17/June/2004 Peak selection taking into interaction account [nr,nc]=size(lagspec); imm=diff([lagspec(1,:);lagspec]).*(diff([lagspec;lagspec(end,:)])); dlag=diff([lagspec(1,:);lagspec]); lagspecz=lagspec; lagspecz(lx<0.002,:)=lagspecz(lx<0.002,:)-(ones(nc,1)*exp(-(lx(lx<0.002)/0.00055).^2))'; %------ Harmonic supression mapm=zgendeconvmatrix(nr,0.6); lagspecz=log(exp((lagspecz-mapm*lagspecz)*20)+1)/20; lagspec=lagspecz; tls=[lagspecz;lagspecz(end,:);lagspecz(end:-1:2,:)].^beta; llx=[lx lx(end) lx(end:-1:2)]'; lagw=exp(-(llx/(lagsp/1000)).^2); % This should be propotional to lag. lagw=lagw/sum(lagw); flagw=real(fft(lagw)); for ii=1:nc tls(:,ii)=real(ifft(fft(tls(:,ii)).*flagw)); end; tmsm=round((timesp-1)/2)*2+1; % temporal smoothing (ms) assuming shift=1 ms wt=hanning(tmsm); wt=wt/sum(wt); % temporal smoothing using lagsms=fftfilt(wt,[zeros(nr,tmsm) tls(1:nr,:) zeros(nr,tmsm)]')'; lagsms=lagsms(:,(1:nc)+(tmsm-1)/2*3); lagsms=abs(lagsms).^(1/beta); f0=zeros(nc,3); pl=zeros(nc,3); for ii=1:nc ix=find((imm(:,ii)<0)&(dlag(:,ii)>0)); [~,mxp]=max(lagsms(ix,ii)); [pl(ii,1),pos]=ParabolicInterp(lagspec((-1:1)+ix(mxp),ii),ix(mxp)); f0(ii,1)=pos/lx(2); if length(ix)>1 lagsms(ix(mxp),ii)=lagsms(ix(mxp),ii)*0; [~,mxp]=max(lagsms(ix,ii)); [pl(ii,2),pos]=ParabolicInterp(lagspec((-1:1)+ix(mxp),ii),ix(mxp)); f0(ii,2)=pos/lx(2); if length(ix)>2 lagsms(ix(mxp),ii)=lagsms(ix(mxp),ii)*0; [~,mxp]=max(lagsms(ix,ii)); [pl(ii,3),pos]=ParabolicInterp(lagspec((-1:1)+ix(mxp),ii),ix(mxp)); f0(ii,3)=pos/lx(2); else f0(ii,3)=f0(ii,2);pl(ii,3)=pl(ii,2); end; else f0(ii,2)=f0(ii,1);pl(ii,2)=pl(ii,1); end; end; end function [val,pos]=ParabolicInterp(yv,xo) lp=diff(yv); a=lp(1)-lp(2); b=(lp(1)+lp(2))/2; xp=b/a+xo; val=yv(2)+0.5*a*(b/a)^2+b*(b/a); if xp>max(xo)+1 xp=max(xo)+1; val=yv(end); end; if xpnr ub=length(f0); else ub=min(length(f0),cseg(ii,1)); end; if ii==1 lb=1; elseif ii>nr lb=max(1,cseg(ii-1,2)); else lb=round((cseg(ii,1)+cseg(ii-1,2))/2); end; bp=lb;ep=ub; f0raw3=f0raw0; f0raw3(ep)=f0(ep);%f0cand(acp,1); lastf0=f0raw3(ep); [f0raw3,~]=ztraceInAsegment2(f0raw3,f0cand,relv,pwrdb,ep,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); f0raw0=f0raw3; f0c(bp:ep)=f0raw0(bp:ep); lb=cseg(ii,2); if iireliablepowerth;sprob=sprob+log2(dmy2);end; else f0raw0(jj)=lastf0+(1/(dd+1))*(f0cand(jjmx,idx)-lastf0); if pwsdb(jj)>reliablepowerth;sprob=sprob+log2(dmy2);end;%-0.1;end; end; else f0raw0(jj)=lastf0; if pwsdb(jj)>reliablepowerth;sprob=sprob+log2(dmy2)-10;end; end lastf0=f0raw0(jj); end; lastf0=lastf0in;%f0cand(acp,1); for jj=acp+1:ub bsb=min(nn,jj:jj+5); % seraching bound [dmy,idx]=max(exp(-((log2(f0cand(bsb,:)')-log2(lastf0))/nsd).^2).*(relv(bsb,:)')); [dmy2,idxx]=max(dmy); idx=idx(idxx); jjmx=bsb(idxx); if abs(log2(f0cand(jjmx,idx))-log2(lastf0))reliablepowerth;sprob=sprob+log2(dmy2);end; else f0raw0(jj)=lastf0+(1/(dd+1))*(f0cand(jjmx,idx)-lastf0); if pwsdb(jj)>reliablepowerth;sprob=sprob+log2(dmy2);end;%-0.1;end; end; else f0raw0(jj)=lastf0; if pwsdb(jj)>reliablepowerth;sprob=sprob+log2(dmy2)-10;end; end lastf0=f0raw0(jj); end; sprob=2.0.^(sprob/(ub-lb+1)); % fix (+1) on 11/Jan./05 end %%%----- function [f0,rel,cseg]=zcontiguousSegment10(p,prm) % This version was revised from 808. % Further refinement for additional scan f0floor=prm.F0searchLowerBound; % f0floor f0ceil=prm.F0searchUpperBound; % f0ceil pwsdb=p.InstantaneousPower; f0cand=p.F0candidatesByMix; relv=p.RELofcandidatesByMix; pwrdb=p.InstantaneousPower; relv(relv==0)=relv(relv==0)+0.00001; f0jumpt=prm.MaxumumPermissibleOctaveJump; nsdt=prm.SDforTrackingNormalization; nn=min(length(pwsdb),length(f0cand)); pwsdb=pwsdb(1:nn); f0cand=f0cand(1:nn,:); relv=relv(1:nn,:); DispOn=prm.DisplayPlots; %---- noiselevel mxpwsdb=max(pwsdb); [hstgrm,binlvl]=hist(pwsdb,mxpwsdb+(-60:2)); q10=interp1(cumsum(hstgrm+0.000000001)/sum(hstgrm)*100,binlvl,10); % 10% quantile level [~,minid]=min(abs(q10-binlvl)); bb=max(1,min(length(binlvl),minid+(-5:5))); % search range 10 dB % safeguard noiselevel=sum(hstgrm(bb).*binlvl(bb))/sum(hstgrm(bb)); wellovernoize=(4*noiselevel+mxpwsdb)/5; if wellovernoize>mxpwsdb-10; wellovernoize=mxpwsdb-10; noiselevel=(5*wellovernoize-mxpwsdb)/4; end; % safeguard 25/Sept./2004 %---- search for contiguous segments that consists of best candidates f0=f0cand(:,1)*0; rel=relv(:,1)*0; maskr=f0cand*0+1; % masker for preventing multiple assignment [dmy,idx]=sort(-relv(:,1)); idx=idx(-dmy>0.16); if DispOn figure semilogy(f0cand(:,1),'c');grid on; axis([0 nn f0floor f0ceil]); hold on drawnow end; nseg=0; segv=zeros(length(idx),2); sratev=zeros(length(idx),1); segstr = struct; for ii=1:length(idx); if (maskr(idx(ii),1)>0) && (pwsdb(idx(ii))>wellovernoize) [f0seg,relseg,lb,ub,srate,maskr]=zsearchforContiguousSegment(f0cand,relv,maskr,idx(ii),pwsdb,noiselevel); if (~isempty(f0seg)) && (srate>0.12) && ((ub-lb+1)>13) nseg=nseg+1; segv(nseg,:)=[lb ub]; segstr(nseg).f0Segment=f0seg(lb:ub); segstr(nseg).reliabilitySegment=relseg(lb:ub); sratev(nseg)=srate*(1-1/max(1.4,sqrt((ub-lb+1)/40))); % reliability with DF normalization if DispOn disp(['Segment (' num2str(lb,7) ':' num2str(ub,7) ') with rel=' num2str(srate)]); semilogy(lb:ub,f0seg(lb:ub));drawnow; end; end; end; end; segv=segv(1:nseg,:); sratev=sratev(1:nseg); [~,idrel]=sort(-sratev); if DispOn hold off figure semilogy(f0cand(:,1),'c');grid on; axis([0 nn f0floor f0ceil]); hold on drawnow end; for ii=1:nseg icp=idrel(ii); lb=segv(icp,1); ub=segv(icp,2); validind=(sum(f0(lb:ub)>0)==0); if validind;f0(lb:ub)=segstr(icp).f0Segment;rel(lb:ub)=segstr(icp).reliabilitySegment;end; if DispOn && validind semilogy(lb:ub,segstr(icp).f0Segment);drawnow; end; end; %---- scan and reorganize segments InInd=0; cseg=zeros(nseg,2); crseg=0; for ii=1:nn if (InInd==0) && (f0(ii)>0) crseg=crseg+1; cseg(crseg,1)=ii; InInd=1; elseif (InInd==1) && (f0(ii)==0) || ((sum((ii-1)==segv(:,2))>0) && (pwsdb(ii)<(noiselevel+4*mxpwsdb)/5)) % mod 09/Aug./04 cseg(crseg,2)=ii-1; InInd=0; end; end; if cseg(crseg,2)==0;cseg(crseg,2)=nn;end; cseg=cseg(1:crseg,:); %---- check for each segment if it is contiguous enough nf0=length(f0); for ii=1:crseg lb=cseg(ii,1);ub=cseg(ii,2); maxjmp=max(abs(diff(log2(f0(lb:ub))))); if maxjmp>0.4 disp(['Discontinuity in (' num2str(lb,7) ':' num2str(ub,7) '), Max jump=' num2str(maxjmp,7) ' oct.']) f0raw0=f0; dmy=max(relv(lb:ub,:), 2); [~,ixmx]=max(dmy(:)); cpos=lb+ixmx-1; bp=lb;ep=ub; %%f0bak=f0raw0; f0raw1=f0raw0; f0raw2=f0raw0; f0raw3=f0raw0; f0raw0(cpos)=f0cand(cpos,1); lastf0=f0cand(cpos,1); [f0rawm,sprob0]=ztraceInAsegment2(f0raw0,f0cand,relv,pwrdb,cpos,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); f0raw1(cpos)=f0cand(cpos,2);%f0cand(acp,1); lastf0=f0raw1(cpos); [f0raw1,sprob1]=ztraceInAsegment2(f0raw1,f0cand,relv,pwrdb,cpos,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); f0raw2(bp)=f0(bp);%f0cand(acp,1); lastf0=f0raw2(bp); [f0raw2,sprob2]=ztraceInAsegment2(f0raw2,f0cand,relv,pwrdb,bp,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); f0raw3(ep)=f0cand(ep,1);%f0cand(acp,1); lastf0=f0raw3(ep); [f0raw3,sprob3]=ztraceInAsegment2(f0raw3,f0cand,relv,pwrdb,ep,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); [~,imx]=max([sprob0 sprob1 sprob2 sprob3]); switch imx case 1 f0raw0=f0rawm; case 2 f0raw0=f0raw1; case 3 f0raw0=f0raw2; case 4 f0raw0=f0raw3; end; f0(lb:ub)=f0raw0(lb:ub); end; end; %---- get robust distribution and check for anomalies [hgf0,~]=sort(log2(f0(f0>0))); id10=round(0.1*length(hgf0)); id90=round(0.9*length(hgf0)); rsd=std(hgf0(id10:id90)); mf0=mean(hgf0(id10:id90)); csego=cseg; cseg=cseg*0; nseg=0; f0o=f0; f0=f0*0; for ii=1:crseg lb=csego(ii,1);ub=csego(ii,2); if abs(mean(log2(f0o(lb:ub)))-mf0)(2*noiselevel+mxpwsdb)/3; nseg=nseg+1; cseg(nseg,1)=lb; cseg(nseg,2)=ub; f0(lb:ub)=f0o(lb:ub); end; end; cseg=cseg(1:nseg,:); %---- check for isolated small segments segv=cseg; cseg=cseg*0; nseg=0; f0bk=f0; f0=f0*0; f0bk(1)=1; lastend=1; [nrseg,~]=size(cseg); % bug fix, 31/Aug./2004 for ii=1:nrseg lb=segv(ii,1);ub=segv(ii,2); if ii0.6 ... && abs(log2(f0bk(ub))-log2(f0bk(nexttop)))>0.6 && (ub-lb+1)<50 && mean(relseg(lb:ub))<0.5 %do nothing else nseg=nseg+1; f0(lb:ub)=f0bk(lb:ub); cseg(nseg,:)=[lb ub]; end; lastend=ub; end; %---- check for dominant peaks if it is selected as a voiced segment %----- mark syllable centers wsml=81; pwsdbl=[ones(wsml,1)*pwsdb(1);pwsdb(:);ones(2*wsml,1)*pwsdb(end)]; pwsdbs=fftfilt(hanning(wsml)/sum(hanning(wsml)),pwsdbl); pwsdbs=pwsdbs((1:length(pwsdb))+round(3*wsml/2)); dpwsdbs=diff([pwsdbs(1);pwsdbs]); dpwsdbsm=diff([pwsdbs;pwsdbs(end)]); pv=find((dpwsdbs.*dpwsdbsm<0)&(dpwsdbsm<=0)); dv=find((dpwsdbs.*dpwsdbsm<0)&(dpwsdbsm>0)); % ------- f0raw0=f0cand(:,1); avf0=mean(f0(f0>0)); logavf0=log2(avf0); relv2=relv.*exp(-((log2(f0cand)-logavf0)/1).^2); reliablelevel=(noiselevel+2*mxpwsdb)/3; for ii=1:length(pv) if pwsdb(pv(ii)) > reliablelevel if f0(pv(ii))==0 disp(['Missing dominant segment that is centered at:' num2str(pv(ii),7) ' (ms)']); lb=max(dv(dvpv(ii))); if isempty(ub);ub=nn;end; bp=lb;ep=ub; peaklvl=pwsdb(pv(ii)); for bp=pv(ii)-1:-1:lb if pwsdb(bp) < peaklvl-9; break;end; end; for ep=pv(ii)+1:ub if pwsdb(ep) < peaklvl-9; break;end; end; disp(['segment (' num2str(bp,7) ':' num2str(ep,7) ') is isolated.']); lb=bp;ub=ep; mx=max(relv2(lb:ub,:), 2); [~,imx2]=max(mx(:)); cpos=lb+imx2-1; f0raw1=f0raw0; f0raw0(cpos)=f0cand(cpos,1); lastf0=f0cand(cpos,1); [f0rawm,sprob0]=ztraceInAsegment2(f0raw0,f0cand,relv2,pwrdb+10,cpos,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); f0raw1(cpos)=f0cand(cpos,2);%f0cand(acp,1); lastf0=f0raw1(cpos); [f0raw1,sprob1]=ztraceInAsegment2(f0raw1,f0cand,relv2,pwrdb+10,cpos,lastf0,bp,ep,nf0,f0jumpt,nsdt,noiselevel); [~,imx]=max([sprob0 sprob1]); switch imx case 1 f0raw0=f0rawm; case 2 f0raw0=f0raw1; end; f0(lb:ub)=f0raw0(lb:ub); end; end; end; %---- final scan and reorganize segments InInd=0; cseg=zeros(nn,2); crseg=0; for ii=1:nn if (InInd==0) && (f0(ii)>0) crseg=crseg+1; cseg(crseg,1)=ii; InInd=1; elseif (InInd==1) && (f0(ii)==0) cseg(crseg,2)=ii-1; InInd=0; end; end; if cseg(crseg,2)==0;cseg(crseg,2)=nn;end; cseg=cseg(1:crseg,:); if DispOn h=semilogy(f0); set(h,'linewidth',2); end; end %---- internal functions function [f0seg,relseg,lb,ub,srate,maskr]=zsearchforContiguousSegment(f0cand,relv,maskrin,acp,pwsdb,noiselevel) f0seg=f0cand(:,1)*0; relseg=f0seg; srate=0; maskr=maskrin; ok=1; nn=length(f0seg); lastf0=f0cand(acp,1); f0seg(acp)=lastf0; relseg(acp)=relv(acp,1); lb=acp;ub=acp; for ii=acp-1:-1:1 [bestdistance,idx]=min(abs(log2(lastf0)-log2(f0cand(ii,:)))); if (bestdistance>0.1) || (pwsdb(ii)0.1) || (pwsdb(ii)(2*maxpwsdb+noiselevel)/3); [pv,~]=zpeakdipdetect(p,81); np=length(pv); nn=min(length(vuv),length(f0)); vuv=vuv*0; lastp=2; for ii=1:np if (pwsdb(pv(ii))>(1.2*maxpwsdb+noiselevel)/2.2) && (pv(ii)>lastp) lb=lastp; %max(dv(dvpv(ii))); cp=pv(ii); bp=cp;ep=cp; for bp=cp-1:-1:lb if (pwsdb(bp)<(maxpwsdb+2.3*noiselevel)/3.3) || ... ((pwsdb(bp)<(1.5*maxpwsdb+noiselevel)/2.5) && (rel(bp)<0.3)) || ... ((pwsdb(bp)<(1.5*maxpwsdb+noiselevel)/2.5) && (abs(log2(f0(bp)/f0(bp-1)))>0.1)) break end; end; [dmy,ix]=min(abs(onv-bp)); if dmy<20; bp=max(1,onv(ix)-biast);end; for ep=cp+1:ub %min(length(f0)-1,ub) % safe giard 11/Jan./05 if (pwsdb(ep)<(maxpwsdb+5*noiselevel)/6) || ... % ((pwsdb(ep)<(maxpwsdb+1.3*noiselevel)/2.3) && (rel(ep)<0.25)) || ... ((pwsdb(ep)<(maxpwsdb+0.7*noiselevel)/1.7) && (abs(log2(f0(ep)/f0(ep+1)))>0.1)) break; end; end; vuv(bp:ep)=vuv(bp:ep)*0+1; lastp=ep; end; end; end %---- check for dominant peaks if it is selected as a voiced segment function [pv,dv]=zpeakdipdetect(p,wsml) pwsdb=p.InstantaneousPower; %----- mark syllable centers pwsdbl=[ones(wsml,1)*pwsdb(1);pwsdb(:);ones(2*wsml,1)*pwsdb(end)]; pwsdbs=fftfilt(hanning(wsml)/sum(hanning(wsml)),pwsdbl); pwsdbs=pwsdbs((1:length(pwsdb))+round(3*wsml/2)); dpwsdbs=diff([pwsdbs(1);pwsdbs]); dpwsdbsm=diff([pwsdbs;pwsdbs(end)]); pv=find((dpwsdbs.*dpwsdbsm<0)&(dpwsdbsm<=0)); dv=find((dpwsdbs.*dpwsdbsm<0)&(dpwsdbsm>0)); end ================================================ FILE: src/ReadBinaryData.m ================================================ function data = ReadBinaryData(path_name) data = []; fid = fopen(path_name); magic = int8('magic'); read_magic = fread(fid, 5, 'int8'); for ii = 1:5 if magic(ii) ~= read_magic(ii) return; end; end; n_row = fread(fid, 1, 'int32'); n_column = fread(fid, 1, 'int32'); data = zeros(n_row, n_column); for ii = 1:n_row data(ii, :) = double(fread(fid, n_column, 'float32')); end; fclose(fid); end ================================================ FILE: src/SynthesizeLegacy_STRAIGHT_default.m ================================================ function syntheszed_signal = SynthesizeLegacy_STRAIGHT_default(x, fs) % Conditions are based on the web document % f0raw = MulticueF0v14(x,fs); ap = exstraightAPind(x,fs,f0raw); n3sgram=exstraightspec(x,f0raw,fs); syntheszed_signal = exstraightsynth(f0raw,n3sgram,ap,fs); end ================================================ FILE: src/TestAnalysisRegression.m ================================================ function output = TestAnalysisRegression(n_test) if ~isOctave rng('shuffle'); % initialize frozen random number end; output = false; original_speech_dir = '/Users/kawahara/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisData/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveData/'; target_files = dir([target_wave_dir '*.wav']); n_files = length(target_files); selected_id = randi(n_files, n_test); for ii = 1:n_test tmp_name = target_files(selected_id(ii)).name; [x, fs] = audioread([original_speech_dir tmp_name(1:4) '/' tmp_name]); disp([num2str(ii) ': ' tmp_name ' ' datestr(now)]); if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; f0raw = MulticueF0v14(x, fs); ap = exstraightAPind(x, fs, f0raw); n3sgram=exstraightspec(x, f0raw, fs); tmp_name_root = tmp_name(1:end - 4); if ~CheckAnalysisData(f0raw, ap, n3sgram, target_analysis_dir, tmp_name_root) disp(['Failed: ' tmp_name ' data is not similar.']); return; end; %y = exstraightsynth(f0raw,n3sgram,ap,fs); end; disp(['Success! ' num2str(n_test) ' files are passed analysis regression.']); output = true; end ================================================ FILE: src/TestAnalysisRegressionR.m ================================================ function output = TestAnalysisRegressionR(n_test) output = false; original_speech_dir = '/Users/kawahara/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisDataR/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveDataR/'; target_files = dir([target_wave_dir '*.wav']); n_files = length(target_files); selected_id = randi(n_files, n_test); for ii = 1:n_test tmp_name = target_files(selected_id(ii)).name; [x, fs] = audioread([original_speech_dir tmp_name(1:4) '/' tmp_name]); disp([num2str(ii) ': ' tmp_name ' ' datestr(now)]); if ~isOctave; rng(12345); end; % initialize frozen random number f0raw = MulticueF0v14(x, fs); ap = exstraightAPind(x, fs, f0raw); n3sgram=exstraightspec(x, f0raw, fs); tmp_name_root = tmp_name(1:end - 4); if ~CheckAnalysisData(f0raw, ap, n3sgram, target_analysis_dir, tmp_name_root) disp(['Failed: ' tmp_name ' data is not similar.']); return; end; %y = exstraightsynth(f0raw,n3sgram,ap,fs); end; disp(['Success! ' num2str(n_test) ' files are passed analysis regression.']); output = true; end ================================================ FILE: src/TestCopySynthRegression.m ================================================ function output = TestCopySynthRegression(n_test) output = false; original_speech_dir = '~/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; target_analysis_dir = '~/m-file/STRAIGHTV40_007e/analysisData/'; target_wave_dir = '~/m-file/STRAIGHTV40_007e/waveData/'; target_files = dir([target_wave_dir '*.wav']); n_files = length(target_files); selected_id = randi(n_files, n_test); for ii = 1:n_test tmp_name = target_files(selected_id(ii)).name; [x, fs] = audioread([original_speech_dir tmp_name(1:4) '/' tmp_name]); disp([num2str(ii) ': ' tmp_name ' ' datestr(now)]); rng(12345); % initialize frozen random number f0raw = MulticueF0v14(x, fs); ap = exstraightAPind(x, fs, f0raw); n3sgram=exstraightspec(x, f0raw, fs); tmp_name_root = tmp_name(1:end - 4); if ~CheckAnalysisData(f0raw, ap, n3sgram, target_analysis_dir, tmp_name_root) disp(['Failed: ' tmp_name ' data is not similar.']); end; wave_pathname = [target_wave_dir tmp_name]; [sy, fs] = audioread(wave_pathname); y = exstraightsynth(f0raw,n3sgram,ap,fs); if std(sy - y / max(abs(y)) * 0.9) / std(sy) > 10 ^ (-3) disp(['Failed! ' tmp_name ' copy synthesis test.']); keyboard return; end; end; disp(['Success! ' num2str(n_test) ' files are passed copy-synth regression.']); output = true; end ================================================ FILE: src/TestCopySynthRegressionR.m ================================================ function output = TestCopySynthRegressionR(n_test) output = false; original_speech_dir = '/Users/kawahara/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; if isOctave target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisDataO/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveDataO/'; else target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisDataR/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveDataR/'; end; target_files = dir([target_wave_dir '*.wav']); n_files = length(target_files); selected_id = randi(n_files, n_test); command1 = 'rand("seed", 12345);'; command2 = 'randn("seed", 12345);'; for ii = 1:n_test tmp_name = target_files(selected_id(ii)).name; [x, fs] = audioread([original_speech_dir tmp_name(1:4) '/' tmp_name]); disp([num2str(ii) ': ' tmp_name ' ' datestr(now)]); if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; f0raw = MulticueF0v14(x, fs); ap = exstraightAPind(x, fs, f0raw); n3sgram=exstraightspec(x, f0raw, fs); tmp_name_root = tmp_name(1:end - 4); if ~CheckAnalysisData(f0raw, ap, n3sgram, target_analysis_dir, tmp_name_root) disp(['Failed: ' tmp_name ' data is not similar.']); end; wave_pathname = [target_wave_dir tmp_name]; [sy, fs] = audioread(wave_pathname); if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; y = exstraightsynth(f0raw,n3sgram,ap,fs); disp(['Relative error SD: ' num2str(100 * std(sy - y / max(abs(y)) * 0.9) / std(sy)) ' %']); if std(sy - y / max(abs(y)) * 0.9) / std(sy) > 10 ^ (-3) disp(['Failed! ' tmp_name ' copy synthesis test.']); keyboard return; end; end; disp(['Success! ' num2str(n_test) ' files are passed copy-synth regression.']); output = true; end ================================================ FILE: src/WriteBinaryData.m ================================================ function WriteBinaryData(path_name, data) [n_row, n_column] = size(data); fid = fopen(path_name, 'w'); magic = int8('magic'); fwrite(fid, magic, 'int8'); fwrite(fid, int32(n_row), 'int32'); fwrite(fid, int32(n_column), 'int32'); for ii = 1:n_row fwrite(fid, single(data(ii, :)), 'float32'); end; fclose(fid); end ================================================ FILE: src/aiffread.m ================================================ function [x,fs]=aiffread(fname) % function [x,fs]=aiffread(fname) % Read AIFF and AIFF-C file % This is a reduced version and does not fulfill the % AIFF-C standard. % Coded by Hideki Kawahara based on "Audio Interchange file format AIFF-C draft" % by Apple Computer inc. 8/26/91 % 14/Feb./1998 % 17/Feb./1998 % 14/Jan./1999 bug fix for Windows fid=fopen(fname,'r','ieee-be.l64'); id.form=fread(fid,4,'char'); id.formcksz=fread(fid,1,'int32'); id.formtp=fread(fid,4,'char'); x=[];fs=44100; if ~strcmp(char(id.form),['F';'O';'R';'M']) char(id.form) disp('This is not a proper AIFF file.'); return; end; if ~strcmp(char(id.formtp),['A';'I';'F';'F']) && ~strcmp(char(id.formtp),['A';'I';'F';'C']) char(id.formtp) disp('This is not a proper AIFF file.'); return; end; [id.comm,na]=fread(fid,4,'uchar'); while na>3 switch(strcat(char(id.comm)')) case 'FVER' id.fsize=fread(fid,1,'int32'); id.timesta=fread(fid,1,'uint32'); if id.timesta ~= 2726318400 disp(['I cannot recognize timestump ' num2str(id.timesta)]); end; [id.comm,na]=fread(fid,4,'uchar'); if na==0 if isempty(x); disp('End of file reached!');fclose(fid);return;end; end; case 'COMM' id.commsz=fread(fid,1,'int32'); id.commnch=fread(fid,1,'int16'); id.commdsz=fread(fid,1,'uint32'); id.samplesize=fread(fid,1,'int16'); id.srex1=fread(fid,1,'uint16'); id.srex2=fread(fid,1,'uint64'); if strcmp(char(id.formtp),['A';'I';'F';'C']) id.compress=fread(fid,4,'char'); if ~strcmp(char(id.compress),['N';'O';'N';'E']) disp('Compression is not supported.'); return; end; fread(fid,id.commsz-22,'char'); end; fs=2^(id.srex1-16383)*id.srex2/hex2dec('8000000000000000'); [id.comm,na]=fread(fid,4,'uchar'); if na==0 if isempty(x); disp('End of file reached!');fclose(fid);return;end; end; case 'SSND' id.ckdatasize=fread(fid,1,'uint32'); id.offset=fread(fid,1,'int32'); id.blksz=fread(fid,1,'int32'); switch(id.samplesize) case 8 x=fread(fid,id.ckdatasize-8,'int8'); x=reshape(x,id.commnch,id.commsz)'; case 16 x=fread(fid,(id.ckdatasize-8)/2,'int16'); x=reshape(x,id.commnch,id.commdsz)'; case 24 x=fread(fid,(id.ckdatasize-8)/3,'bit24'); x=reshape(x,id.commnch,id.commdsz)'; end; [id.comm,na]=fread(fid,4,'uchar'); if na==0 if isempty(x); disp('End of file reached!');fclose(fid);return;end; end; otherwise id.fsize=fread(fid,1,'int32'); if feof(fid) || id.fsize > id.formcksz || id.fsize <=0 fclose(fid); return; end; id.skip=fread(fid,id.fsize,'char'); [id.comm,na]=fread(fid,4,'uchar'); if na==0 if isempty(x); disp('End of file reached!');fclose(fid);return;end; end; end; end; %id fclose(fid); ================================================ FILE: src/aiffwrite.m ================================================ function ok=aiffwrite(x,fs,nbits,fname) % function ok=aiffwrite(x,fs,nbits,fname) % Write AIFF file % This is a reduced version and does not fulfill the % AIFF standard. % Coded by Hideki Kawahara based on "Audio Interchange file format AIFF-C draft" % by Apple Computer inc. 8/26/91 % 14/Feb./1998 % 14/Jan./1999 bug fix for Windows ok=1; [nr,nc]=size(x); if nc>nr ok=[]; disp('Data must be a set of column vector.'); return; end; nex=floor(log(fs)/log(2)); vv=fs/2^(nex+1)*2^(4*16); nex2=nex+16383; fid=fopen(fname,'w','ieee-be.l64'); fwrite(fid,'FORM','char'); cksize=46+nr*nc*(nbits/8); fwrite(fid,cksize,'int32'); fwrite(fid,'AIFF','char'); fwrite(fid,'COMM','char'); fwrite(fid,18,'int32'); fwrite(fid,nc,'int16'); fwrite(fid,nr,'int32'); fwrite(fid,nbits,'int16'); fwrite(fid,nex2,'uint16'); fwrite(fid,vv,'uint64'); fwrite(fid,'SSND','char'); fwrite(fid,nr*nc*(nbits/8)+8,'int32'); fwrite(fid,0,'int32'); fwrite(fid,0,'int32'); y=x'; switch(nbits) case 8 fwrite(fid,y(:),'int8'); case 16 fwrite(fid,y(:),'int16'); case 24 fwrite(fid,y(:),'bit24'); end; fclose(fid); ================================================ FILE: src/aperiodiccomp.m ================================================ function ap=aperiodiccomp(apv,dpv,ashift,f0,nshift,imgi) % ap=aperiodiccomp(apv,dpv,ashift,f0,nshift,fftl,imgi); % Calculate aperiodicity index % Input parameters % apv, dpv : Upper and lower envelope % ashift : shift step for aperiodicity index calculation (ms) % f0 : fundamental frequency (Hz) % nshift : shift step for f0 information (ms) % fftl : FFT size % imgi : display indicator, 1: display on (default) 0: off % modified to add the waitbar on 08/Dec./2002 % modified by Takahashi 10/Aug./2005 % modified by Kawahara 10/Sept./2005 if nargin==5; imgi=1; end; %[nn,mm]=size(nsgram); mm=length(f0); %%nn=fftl/2+1; [~,m2]=size(apv); x=(0:m2-1)'*ashift; xi=(0:mm-1)'*nshift; xi=min(max(x),xi); if imgi==1; hpg=waitbar(0.1,'Interpolating periodicity information'); end; if imgi==1; drawnow; end; %ap=interp1q(x,(dpv-apv)',xi)';%,'*linear')'; ap = interp1(x, (dpv-apv)',xi, 'linear', 'extrap')'; if imgi==1; close(hpg); end; ================================================ FILE: src/aperiodicpartERB2.m ================================================ function [apv,dpv,apve,dpve]=aperiodicpartERB2(x,fs,f0,shiftm,intshiftm,mm,imgi) % Relative aperiodic energy estimation with ERB smoothing % [apv,dpv,apve,dpve]=aperiodicpartERB2(x,fs,f0,shiftm,intshiftm,mm,imgi) % x : input speech % fs : sampling frequency (Hz) % f0 : fundamental frequency (Hz) % shiftm : frame shift (ms) for input F0 data % intshiftm : frame shift (ms) for internal processing % mm : length of frequency axis (usually 2^N+1) % imgi : display indicator, 1: display on (default) 0: off % 19/August/1999 % 21/August/1999 % 30/May/2001 % 10/April/2002 completely rewrote % 07/Dec./2002 waitbar was added % 13/Jan./2005 bug fix % 08/April/2005 safe guard % 10/Aug./2005 modified by Takahashi on wait bar % 10/Sept./2005 modified by Kawahara on wait bar % 16/Sept./2005 minor bug fix if nargin==6; imgi=1; end; % 10/Sept./2005 if imgi==1; hpg=waitbar(0,'ERB-based multiband periodicity calculation'); end; f0(isnan(f0)>0)=zeros(size(f0(isnan(f0)>0))); % safe guard lowerF0limit = 40; % safe guard 16/Sept./2005 fftl=2.0^ceil(log2(6.7*fs/lowerF0limit)+1); % FFT size selection to be scalable if ~isempty(f0(f0>0));avf0=mean(f0(f0>0));else avf0=180;end; % 08/April/2005 %%f0bk=f0; f0(f0==0)=f0(f0==0)+avf0; f0(f00); wcc=fftfilt(wb,[zeros(1,fftl),w,zeros(1,fftl)]); wcc=wcc/max(wcc); [~,mxp]=max(wcc); wcc=wcc-wcc(1); wcc=wcc/sum(wcc); ww=wcc(round((1:fftl)-fftl/2+mxp))'; bb=(1:fftl)-fftl/2; %----- spectrum smoother design fff=[2:fftl 1]; ffb=[fftl 1:fftl-1]; %----- lifter design qx=(0:fftl-1)/fs; lft=1.0./(1+exp((qx-1.4/40)*1000))'; lft(fftl:-1:fftl/2)=lft(2:fftl/2+2); %------ preparation for EREB smoothing evv=(0:1024)/1024*HzToErbRate(fs/2); % ERB axis for smoothing eew=1; % effective smoothing width in ERB lh=round(2*eew/evv(2)); % number of samples for 2*eew on evv axis we=hanning(lh)/sum(hanning(lh)); % Hanning window is used for smoothing bx=(1:length(evv)); % index for extraction hvv=228.8*(10.0.^(0.0467*evv)-1); % frequency axis represented in Hz hvv(1)=0; hvv(end)=fs/2; % safeguard evx=(0:0.5:max(evv)); bss=(1:fftl/2-1); bss2=1:fftl/2; apv=zeros(mm,length(tidx)); dpv=zeros(mm,length(tidx)); apve=zeros(length(evx),length(tidx)); dpve=apve; for ii=1:length(tidx); idp=round(tidx(ii))+bias; sw=abs(fft(xii(idp+bb).*ww)); sws=(sw*2+sw(ffb)+sw(fff))/4; sms=real(ifft(real(fft(log(sws))).*lft))/log(10)*20; %smoothed dB spectrum plits=[0; (((diff(sms(bss2)).*diff(sms(bss2+1)))<0).*sms(bss).*(diff(sms(bss2))>0))]; dlits=[0; (((diff(sms(bss2)).*diff(sms(bss2+1)))<0).*sms(bss).*(diff(sms(bss2))<0))]; gg=fxfi(abs(plits)>0); gfg=(sms(abs(plits)>0)); dd=fxfi(abs(dlits)>0); dfd=(sms(abs(dlits)>0)); gga=[0;gg;fs/2]*f0ii(round(tidx(ii)))/40; dda=[0;dd;fs/2]*f0ii(round(tidx(ii)))/40; dfda=[dfd(1) ;dfd ;dfd(end)]; % dip level (dB) gfga=[gfg(1); gfg ;gfg(end)]; % peak level (dB) dfdap=10.0.^(dfda/10); % dip level (power) gfgap=10.0.^(gfga/10); % peak level (power) ape=interp1(HzToErbRate(gga),gfgap,evv); % Upper power envelope on ERB dpe=interp1(HzToErbRate(dda),dfdap,evv); % Lower power envelope on ERB apef=[ape(lh:-1:2) ape ape(end-1:-1:end-lh)]; % ape with mirrored ends dpef=[dpe(lh:-1:2) dpe dpe(end-1:-1:end-lh)]; % dpe with mirrored ends apefs=fftfilt(we,apef); % smoothed ape dpefs=fftfilt(we,dpef); % smoothed dpe apefs=apefs(bx+lh-1+round(lh/2)); dpefs=dpefs(bx+lh-1+round(lh/2)); apr=interp1(hvv,apefs,fxa); % smoothed ape on linear axis dpr=interp1(hvv,dpefs,fxa); % smoothed dpe on linear axis dpv(:,ii)=dpr'; apv(:,ii)=apr'; dpve(:,ii)=interp1(evv,dpefs,evx)'; apve(:,ii)=interp1(evv,apefs,evx)'; if imgi==1 && rem(ii,2)==0 %10/Aug./2005 waitbar(0.1+0.9*ii/length(tidx)); %,hpg); end; end; if imgi==1; fprintf('\n'); end;%10/Aug./2005 if imgi==1; close(hpg); end;%10/Aug./2005 ================================================ FILE: src/boundmes2.m ================================================ function bv=boundmes2(apv,dpv,fs,shiftm,intshiftm,mm) % boundary calculation for MBE model % bv=boundmes2(apv,dpv,fs,shiftm,intshiftm,mm); % apv : peak envelope % dpv : dip envelope % fs : sampling frequency (Hz) % shiftm : frame shift of F0 data % intshiftm : frame shift for envelope data % mm : number of elements in frequency axis % 01/Sept./1999 % by Hideki Kawahara lx=log10((1:mm-1)/(mm-1)/2*fs); fx=(1:mm-1)/(mm-1)/2*fs; wwv=10.0.^(apv/20); lyv=((dpv-apv)/20); [~,kk]=size(apv); bv=zeros(1,kk); for ii=1:kk bv(ii)=sum((lyv(2:mm,ii)'-lx).*wwv(2:mm,ii)'./fx)/sum(wwv(2:mm,ii)./fx'); end; % Assuming shiftm >= 1 ms if ne(round(shiftm),shiftm) bv=[]; return; end; if ne(round(intshiftm),intshiftm) bv=[]; return; end; if shiftm==intshiftm return; end; if intshiftm>1 bv=interp(bv,intshiftm); if shiftm>1 bv=bv(1:shiftm:length(bv)); end; end; ================================================ FILE: src/correctdpv.m ================================================ function dpv=correctdpv(apv,dpv,shiftap,f0raw,ecrt,shiftm,fs) % dpv=correctdpv(apv,dpv,shiftap,ecrt,shiftm,fs) % Apperiodicity correction based on C/N estimation % dpv : lower spectral envelope % apv : upper spectral envelope % shiftap : frame shift for apv and dpv (ms) % f0raw : fundamental frequency (Hz) % ecrt : C/N (absolute value) % shiftm : frame shift for F0 and spectrum (ms) % fs : sampling frequency (Hz) % Designed and coded by Hideki Kawahara % 04/Feb./2003 % 30/April/2005 modification for Matlab v7.0 compatibility [nn,mm]=size(apv); nf0=length(f0raw); fx=(0:nn-1)/(nn-1)/2*fs; f0raw(f0raw==0)=f0raw(f0raw==0)+40; % safe guard for ii=1:mm iif=min(nf0,round((ii-1)*shiftap/shiftm)+1); if ~isnan(ecrt(iif)) bdr=1.0./(1+exp(-(fx-2.5*f0raw(iif))/f0raw(iif)*4)); bdr=(bdr+1.0/ecrt(iif))/(1+1.0/ecrt(iif)); dpv(:,ii)=min(dpv(:,ii),apv(:,ii)+20*log10(bdr(:))); end; end; ================================================ FILE: src/defaultparamsorg.m ================================================ function ok=defaultparamsorg % function to define default parameters. % Please copy this file as defaultparams.m and edit % necessary parameters. % If defaultparams.m exists, definitions in defaultparams.m % override original default parameters. % 08/Dec./2002 by H.K. global f0floor f0ceil fs framem shiftm f0shiftm ... fftl eta pc framel fftl2 acth pwth pcnv fconv sconv delsp gdbw cornf fname delfracind ... tpath mag delfrac hr upsampleon defaultch % paraminitialized f0floor=40; % Lower limit of F0 search range f0ceil=800; % Upper limit of F0 search range fs=22050; % sampling frequency (Hz) framem=40; % default frame length limit for pitch extraction (ms) shiftm=1; % default frame shift (ms) for spectrogram f0shiftm=1; % default frame shift (ms) for F0 information fftl=1024; % default FFT length eta=1.4; % time window stretch factor pc=0.6; % exponent for nonlinearity mag=0.2; % This parameter should be revised. framel=framem*fs/1000; if fftl < framel fftl=2^ceil(log(framel)/log(2)); end; fftl2=fftl/2; defaultch=1; % 17/Feb./2001 %-------------- Decision parameter for source information acth=0.5; % Threshold for normalized correlation (dimension less) pwth=32; % Threshold for instantaneous power below maximum (dB) %----------------------------------------------------- % Synthesis parameters %----------------------------------------------------- pcnv=1.0; % pitch stretch fconv=1.0; % frequency stretch sconv=1.0; % time stretch % delsp=2; % standard deviation of random group delay in ms delsp=0.5; % standard deviation of random group delay in ms 26/June/2002 gdbw=70; % smoothing window length of random group delay (in Hz) % cornf=3000; % corner frequency for random phase (Hz) cornf=4000; % corner frequency for random phase (Hz) 26/June 2002 delfrac=0.2; % This parameter should be revised. delfracind=0; %----------------------------------------------------- % file parameters %----------------------------------------------------- fname='none'; % input data file name hr='on'; tpath=pwd; if strcmp(computer,'MAC2')==0 tpath=[tpath '/']; end; upsampleon=0; ok=1; return; ================================================ FILE: src/exSinStraightSynth.m ================================================ function [sy,prmS] = exSinStraightSynth(f0raw,n3sgram,fs,optionalParamsS) % STRAIGHT synthesis based on sinusoidal plus noise model % [sy,prmS] = exSinStraightSynth(f0raw,n3sgram,ap,fs,optionalParams) % Input % f0raw : fundamental frequency (Hz) % n3sgram : STRAIGHT spectrogram % fs : sampling frequency % optionalParamsS : optional parameters % spectralUpdateInterval : frame rate (ms) % initialPhase : initial phase of sinusoids (radian) % initialAmplitude : initial amplitude for defining waveform % lowestF0 : lowest F0 of the synthesized speech (Hz) % minimumPhase : minimum phase indicator (defult 0) % Output % sy : synthesized speech waveform % prmS : parameters used in synthesis % Originally coded when visiting CNBH on 2003 % Revised by Hideki Kawahara % 11/December/2005 by Hideki Kawahara sy = []; switch nargin case 3 prmS = zinitializeParameters(fs); case 4 prmS = replaceSuppliedParameters(fs,optionalParamsS); otherwise help exSinStraightSynth fs = 44100; prmS = zinitializeParameters(fs); return; end; shiftm = prmS.spectralUpdateInterval; initialPhase = prmS.initialPhase; initialAmplitude = prmS.initialAmplitude; minimumPhase = prmS.minimumPhase; cdm = unwrap(zspectrum2minimumphase(n3sgram,fs)); [amx,fmx,cmx]= sinucompgd(f0raw,fs,n3sgram,cdm,shiftm); amx(isnan(amx))=0; cmx(isnan(cmx))=0; deltaPhase = 2*pi*fmx/fs; phaseDeviation = cmx*minimumPhase; [~,nFrequency] = size(deltaPhase); lPhaseVector = length(initialPhase); deltaPhase(1,:) = initialPhase(min(lPhaseVector,1:nFrequency))+deltaPhase(1,:); amx = amx*diag(initialAmplitude(min(lPhaseVector,1:nFrequency))); sy=sum(real(amx.*exp(1i*(cumsum(deltaPhase)+phaseDeviation))), 2); return; %%%---- internal functions function prmS = zinitializeParameters(fs) prmS.spectralUpdateInterval = 1; %shiftm=1; % default frame shift (ms) for spectrogram prmS.lowestF0 = 50; % compatible default is 50 Hz prmS.initialPhase = zeros(1,ceil(fs/prmS.lowestF0/2)); prmS.initialAmplitude = ones(1,ceil(fs/prmS.lowestF0/2)); prmS.minimumPhase = 0; % default is zero phase prmS.samplingFrequency = fs; return; %%%---- function prmS = replaceSuppliedParameters(fs,prmin) prmS = zinitializeParameters(fs); if isfield(prmin,'spectralUpdateInterval')==1; prmS.spectralUpdateInterval=prmin.spectralUpdateInterval;end; if isfield(prmin,'lowestF0')==1; prmS.lowestF0=prmin.lowestF0;end; if isfield(prmin,'initialPhase')==1; prmS.initialPhase=prmin.initialPhase;end; if isfield(prmin,'initialAmplitude')==1; prmS.initialAmplitude=prmin.initialAmplitude;end; if isfield(prmin,'minimumPhase')==1; prmS.minimumPhase=prmin.minimumPhase;end; return; %%%---- function [amx,fmx,cmx]= sinucompgd(f0raw,fs,n3sgram,cdm,shiftm) % [amx,fmx]=sinucomp(f0raw,fs,n3sgram,shiftm) % program to generate matrix for sinusoidal synthesis % % Designed and Coded by Hideki Kawahara % 07/Sept./2003 t=0:1/fs:(length(f0raw)-1)/1000/shiftm; f0i=interp1((0:length(f0raw)-1)/1000/shiftm,f0raw,t)'; f0l=min(f0raw(f0raw>0)); ng=n3sgram'; ng(:,1) = ng(:,1)*0; [~,mm]=size(ng); % ---- instantaneous frequency matrix --- nh=ceil(fs/2/f0l); nt=length(f0i); fmx=zeros(nt,nh); tmx=fmx; for ii=0:nh-1 fmx(:,ii+1)=ii*f0i; tmx(:,ii+1)=t'; end; % ---- instantaneous amplitude matrix --- [ff,tt]=meshgrid((0:(mm-1))*fs/((mm-1)*2),(0:(length(f0raw)-1))/1000/shiftm); amx=interp2(ff,tt,ng,fmx,tmx,'*linear'); cmx=interp2(ff,ff,cdm',fmx,fmx,'*linear'); return; %%%--- function cph=zspectrum2minimumphase(n3sgram,~) % cph=spectrum2minimumphase(n3sgram,fs) % function to calculate minimum phase map from % smoothed time frequency representation % Designed and coded by Hideki Kawahara % 7/Sept./2003 % 11/Dec./2005 revised [nRow,nColumn]=size(n3sgram); fftl=(nRow-1)*2; reversedIndex=fftl/2:-1:2; cph=zeros(nRow,nColumn); for ii=1:nColumn dftSegment=[n3sgram(:,ii);n3sgram(reversedIndex,ii)]; complexCepstrum=real(fft(log(dftSegment))); causalCepstrum=[complexCepstrum(1);2*complexCepstrum(2:fftl/2);0*complexCepstrum(fftl/2+1:fftl)]; causalLogSpectrum=ifft(causalCepstrum); cph(:,ii)=-imag(causalLogSpectrum(1:fftl/2+1)); end; ================================================ FILE: src/exSinStraightSynthBU.m ================================================ function sy = exSinStraightSynth(f0raw,fs,n3sgram,shiftm) gdm=gdmap(n3sgram,fs); [amx,fmx,gmx]= sinucompgd(f0raw,fs,n3sgram,gdm,shiftm); amx(isnan(amx))=0; sy=sum(amx'.*cos(cumsum(2*pi*fmx/fs))'); function [amx,fmx,gmx]= sinucompgd(f0raw,fs,n3sgram,gdm,shiftm) % [amx,fmx]=sinucomp(f0raw,fs,n3sgram,shiftm) % program to generate matrix for sinusoidal synthesis % % Designed and Coded by Hideki Kawahara % 07/Sept./2003 t=0:1/fs:(length(f0raw)-1)/1000/shiftm; f0i=interp1((0:length(f0raw)-1)/1000/shiftm,f0raw,t)'; f0l=min(f0raw(f0raw>0)); ng=n3sgram'; ng(:,1) = ng(:,1)*0; gd=gdm'; [nn,mm]=size(ng); % ---- instantaneous frequency matrix --- nh=ceil(fs/2/f0l); nt=length(f0i); fmx=zeros(nt,nh); tmx=fmx; for ii=1:nh fmx(:,ii)=ii*f0i; tmx(:,ii)=t'; end; % ---- instantaneous amplitude matrix --- amx=zeros(nt,nh); [ff,tt]=meshgrid((0:(mm-1))*fs/((mm-1)*2),(0:(length(f0raw)-1))/1000/shiftm); %keyboard; amx=interp2(ff,tt,ng,fmx,tmx); gmx=interp2(ff,tt,gd,fmx,tmx); function gdm=gdmap(n3sgram,fs) % gdm=gdmap(n3sgram,fs) % function to calculate group delay map from % smoothed time frequency representation % Designed and coded by Hideki Kawahara % 7/Sept./2003 [nn,mm]=size(n3sgram); fftl=(nn-1)*2; rbb2=fftl/2:-1:2; gdm=zeros(nn,mm); for ii=1:mm ff=[n3sgram(:,ii);n3sgram(rbb2,ii)]; ccp=real(fft(log(ff))); ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(-ifft(ccp2)); gdt=-diff(imag(ffx)/(2*pi*fs/fftl)); gdm(:,ii)=[gdt(1);gdt(1:fftl/2)]; end; ================================================ FILE: src/exSinStraightSynthBU2.m ================================================ function [sy,prmS] = exSinStraightSynth(f0raw,n3sgram,fs,optionalParamsS) % STRAIGHT synthesis based on sinusoidal plus noise model % [sy,prmS] = exSinStraightSynth(f0raw,n3sgram,ap,fs,optionalParams) % Input % f0raw : fundamental frequency (Hz) % n3sgram : STRAIGHT spectrogram % fs : sampling frequency % optionalParamsS : optional parameters % spectralUpdateInterval : frame rate (ms) % initialPhase : initial phase of sinusoids (radian) % lowestF0 : lowest F0 of the synthesized speech (Hz) % Output % sy : synthesized speech waveform % prmS : parameters used in synthesis % Originally coded when visiting CNBH on 2003 % Revised by Hideki Kawahara % 10/December/2005 by Hideki Kawahara switch nargin case 3 prmS = zinitializeParameters(fs); case 4 prmS = replaceSuppliedParameters(fs,optionalParamsS); end; shiftm = prmS.spectralUpdateInterval; initialPhase = prmS.initialPhase; initialAmplitude = prmS.initialAmplitude; lowestF0 = prmS.lowestF0; % compatible default is 50 Hz minimumPhase = prmS.minimumPhase; %[groupDelayMap,cdm]=spectrum2GroupDelay(n3sgram,fs); cdm =spectrum2minimumphase(n3sgram,fs); [amx,fmx,cmx]= sinucompgd(f0raw,fs,n3sgram,cdm,shiftm); amx(isnan(amx))=0; %gmx(isnan(gmx))=0; cmx(isnan(cmx))=0; deltaPhase = 2*pi*fmx/fs; %phaseDeviation = -2*pi*gmx.*fmx*minimumPhase; phaseDeviation = cmx*minimumPhase; [nTime,nFrequency] = size(deltaPhase); lPhaseVector = length(initialPhase); deltaPhase(1,:) = initialPhase(min(lPhaseVector,1:nFrequency))+deltaPhase(1,:); amx = amx*diag(initialAmplitude(min(lPhaseVector,1:nFrequency))); sy=sum(real(amx.*exp(i*(cumsum(deltaPhase)+phaseDeviation)))'); %%%---- internal functions function prmS = zinitializeParameters(fs); prmS.spectralUpdateInterval = 1; %shiftm=1; % default frame shift (ms) for spectrogram prmS.lowestF0 = 50; % compatible default is 50 Hz prmS.initialPhase = zeros(1,ceil(fs/prmS.lowestF0/2)); prmS.initialAmplitude = ones(1,ceil(fs/prmS.lowestF0/2)); prmS.minimumPhase = 0; % default is zero phase return; %%%---- function prmS = replaceSuppliedParameters(fs,prmin); prmS = zinitializeParameters(fs); if isfield(prmin,'spectralUpdateInterval')==1; prmS.spectralUpdateInterval=prmin.spectralUpdateInterval;end; if isfield(prmin,'lowestF0')==1; prmS.lowestF0=prmin.lowestF0;end; if isfield(prmin,'initialPhase')==1; prmS.initialPhase=prmin.initialPhase;end; if isfield(prmin,'initialAmplitude')==1; prmS.initialAmplitude=prmin.initialAmplitude;end; if isfield(prmin,'minimumPhase')==1; prmS.minimumPhase=prmin.minimumPhase;end; return; %%%---- function [amx,fmx,cmx]= sinucompgd(f0raw,fs,n3sgram,cdm,shiftm) % [amx,fmx]=sinucomp(f0raw,fs,n3sgram,shiftm) % program to generate matrix for sinusoidal synthesis % % Designed and Coded by Hideki Kawahara % 07/Sept./2003 t=0:1/fs:(length(f0raw)-1)/1000/shiftm; f0i=interp1((0:length(f0raw)-1)/1000/shiftm,f0raw,t)'; f0l=min(f0raw(f0raw>0)); ng=n3sgram'; ng(:,1) = ng(:,1)*0; %gd=gdm'; [nn,mm]=size(ng); % ---- instantaneous frequency matrix --- nh=ceil(fs/2/f0l); nt=length(f0i); fmx=zeros(nt,nh); tmx=fmx; for ii=0:nh-1 fmx(:,ii+1)=ii*f0i; tmx(:,ii+1)=t'; end; % ---- instantaneous amplitude matrix --- amx=zeros(nt,nh); [ff,tt]=meshgrid((0:(mm-1))*fs/((mm-1)*2),(0:(length(f0raw)-1))/1000/shiftm); %keyboard; amx=interp2(ff,tt,ng,fmx,tmx,'*linear'); %gmx=interp2(ff,tt,gd,fmx,tmx,'*linear'); cmx=interp2(ff,ff,cdm',fmx,fmx,'*linear'); return; %%%--- function cph=spectrum2minimumphase(n3sgram,fs) % gdm=spectrum2GroupDelay(n3sgram,fs) % function to calculate group delay map from % smoothed time frequency representation % Designed and coded by Hideki Kawahara % 7/Sept./2003 [nRow,nColumn]=size(n3sgram); fftl=(nRow-1)*2; reversedIndex=fftl/2:-1:2; %gdm=zeros(nRow,nColumn); cph=zeros(nRow,nColumn); for ii=1:nColumn dftSegment=[n3sgram(:,ii);n3sgram(reversedIndex,ii)]; complexCepstrum=real(fft(log(dftSegment))); causalCepstrum=[complexCepstrum(1);2*complexCepstrum(2:fftl/2);0*complexCepstrum(fftl/2+1:fftl)]; causalLogSpectrum=ifft(causalCepstrum); % rawGroupDelay=-diff(-imag(causalLogSpectrum)/(2*pi*fs/fftl)); % gdm(:,ii)=[rawGroupDelay(1);rawGroupDelay(1:fftl/2)]; cph(:,ii)=-imag(causalLogSpectrum(1:fftl/2+1)); end; ================================================ FILE: src/exstraightAPind.m ================================================ function [ap,analysisParams]=exstraightAPind(x,fs,f0,optionalParams) % Aperiodicity index extraction for STRAIGHT % [ap,analysisParams]=exstraightAPind(x,fs,f0,optionalParams) % Input parameters % x : input signal. if it is multi channel, only the first channel is used % fs : sampling frequency (Hz) % f0 : fundamental frequency (Hz) % optionalParams : Optional parameters for analysis % Output parameters % ap : amount of aperiodic component in the time frequency represntation % : represented in dB % analysisParams : Analysis parameters actually used % % Usage: % Case 1: The simplest method % ap=exstraightAPind(x,fs,f0); % Case 2: You can get to know what parameters were used. % [ap,analysisParams]=exstraightAPind(x,fs,f0); % CAse 3: You can have full control of STRAIGHT synthesis. % Please use case 2 to find desired parameters to modify. % [ap,analysisParams]=exstraightAPind(x,fs,f0,optionalParams); % Notes on programing style % This routine is based on the current (2005.1.31) implementation of % STRAIGHT that consist of many legacy fragments. They were intentionally % kept for maintaining historic record. Revised functions written in a % reasonable stylistic practice will be made available soon. % Designed and coded by Hideki Kawahara % 15/January/2005 % 01/February/2005 extended for user control % 13/March/2005 Aperiodicity index extraction part is isolated % 30/April/2005 modification for Matlab v7.0 compatibility % 11/Sept./2005 waitbar control is fixed. % 05/July/2006 default values are modified, framem %---Check for number of input parameters switch nargin case 3 prm=zinitializeParameters; case 4 prm=replaceSuppliedParameters(optionalParams); otherwise disp('Number of arguments is 3 or 4!'); return; end % Initialize default parameters f0ceil = prm.F0searchUpperBound; % f0ceil framem = prm.F0defaultWindowLength; % default frame length for pitch extraction (ms) f0shiftm = prm.F0frameUpdateInterval; % shiftm % F0 calculation interval (ms) fftl=1024; % default FFT length framel=framem*fs/1000; if fftl < framel fftl=2^ceil(log(framel)/log(2)); end; [nr,nc]=size(x); if nr>nc x=x(:,1); else x=x(1,:)'; end; imageOn = prm.DisplayPlots; % imgi=1; % image display indicator (1: display image) % paramaters for F0 refinement fftlf0r = prm.refineFftLength; %fftlf0r=1024; % FFT length for F0 refinement tstretch = prm.refineTimeStretchingFactor; %tstretch=1.1; % time window stretching factor nhmx = prm.refineNumberofHarmonicComponent; %nhmx=3; % number of harmonic components for F0 refinement iPeriodicityInterval = prm.periodicityFrameUpdateInterval; % frame update interval for periodicity index (ms) %---- F0 refinement nstp=1; % start position of F0 refinement (samples) nedp=length(f0); % last position of F0 refinement (samples) dn=floor(fs/(f0ceil*3*2)); % fix by H.K. at 28/Jan./2003 [f0raw,ecr]=refineF06(decimate(x,dn),fs/dn,f0,fftlf0r,tstretch,nhmx,f0shiftm,nstp,nedp,imageOn); % 31/Aug./2004 ecrt=ecr; ecrt(f0raw==0)=ecrt(f0raw==0)*NaN; %----- aperiodicity estimation f0raw=f0; [apvq,dpvq,~,~]=aperiodicpartERB2(x,fs,f0raw,f0shiftm,iPeriodicityInterval,fftl/2+1,imageOn); % 10/April/2002 apv=10*log10(apvq); % for compatibility dpv=10*log10(dpvq); % for compatibility %- --------- % Notes on aperiodicity estimation: The previous implementation of % aperiodicity estimation was sensitive to low frequency noise. It is a % bad news, because environmental noise usually has its power in the low % frequency region. The following corrction uses the C/N information % which is the byproduct of fixed point based F0 estimation. % by H.K. 04/Feb./2003 %- --------- dpv=correctdpv(apv,dpv,iPeriodicityInterval,f0raw,ecrt,f0shiftm,fs); % Aperiodicity correction 04/Feb./2003 by H.K. if imageOn bv=boundmes2(apv,dpv,fs,f0shiftm,iPeriodicityInterval,fftl/2+1); figure; semilogy((0:length(bv)-1)*f0shiftm,0.5./10.0.^(bv));grid on; set(gcf,'PaperPosition', [0.634517 0.634517 19.715 28.4084]); end; ap=aperiodiccomp(apv,dpv,iPeriodicityInterval,f0raw,f0shiftm,imageOn); % 11/Sept./2005 switch nargout case 1 case 2 analysisParams=prm; otherwise disp('Number of output parameters has to be 1 or 2!') end; end %%%---- internal functions %%%------ function prm=zinitializeParameters prm.F0searchLowerBound=40; % f0floor prm.F0searchUpperBound=800; % f0ceil prm.F0defaultWindowLength = 80; % default frame length for pitch extraction (ms) prm.F0frameUpdateInterval=1; % shiftm % F0 calculation interval (ms) prm.NofChannelsInOctave=24; % nvo=24; % Number of channels in one octave prm.IFWindowStretch=1.2; % mu=1.2; % window stretch from isometric window prm.DisplayPlots=0; % imgi=1; % image display indicator (1: display image) prm.IFsmoothingLengthRelToFc=1; % smp=1; % smoothing length relative to fc (ratio) prm.IFminimumSmoothingLength=5; % minm=5; % minimum smoothing length (ms) prm.IFexponentForNonlinearSum=0.5; % pc=0.5; % exponent to represent nonlinear summation prm.IFnumberOfHarmonicForInitialEstimate=1; % nc=1; % number of harmonic component to use (1,2,3) prm.refineFftLength=1024; %fftlf0r=1024; % FFT length for F0 refinement prm.refineTimeStretchingFactor=1.1; %tstretch=1.1; % time window stretching factor prm.refineNumberofHarmonicComponent=3; %nhmx=3; % number of harmonic components for F0 refinement prm.periodicityFrameUpdateInterval=5; % frame update interval for periodicity index (ms)return prm.note=' '; % Any text to be printed on the source information plot end %%%-------- function prm=replaceSuppliedParameters(prmin) prm=zinitializeParameters; if isfield(prmin,'F0searchLowerBound')==1; prm.F0searchLowerBound=prmin.F0searchLowerBound;end; if isfield(prmin,'F0searchUpperBound')==1; prm.F0searchUpperBound=prmin.F0searchUpperBound;end; if isfield(prmin,'F0defaultWindowLength')==1; prm.F0defaultWindowLength=prmin.F0defaultWindowLength;end; if isfield(prmin,'F0frameUpdateInterval')==1; prm.F0frameUpdateInterval=prmin.F0frameUpdateInterval;end; if isfield(prmin,'NofChannelsInOctave')==1; prm.NofChannelsInOctave=prmin.NofChannelsInOctave;end; if isfield(prmin,'IFWindowStretch')==1; prm.IFWindowStretch=prmin.IFWindowStretch;end; if isfield(prmin,'DisplayPlots')==1; prm.DisplayPlots=prmin.DisplayPlots;end; if isfield(prmin,'IFsmoothingLengthRelToFc')==1; prm.IFsmoothingLengthRelToFc=prmin.IFsmoothingLengthRelToFc;end; if isfield(prmin,'IFminimumSmoothingLength')==1; prm.IFminimumSmoothingLength=prmin.IFminimumSmoothingLength;end; if isfield(prmin,'IFexponentForNonlinearSum')==1; prm.IFexponentForNonlinearSum=prmin.IFexponentForNonlinearSum;end; if isfield(prmin,'IFnumberOfHarmonicForInitialEstimate')==1; prm.IFnumberOfHarmonicForInitialEstimate=prmin.IFnumberOfHarmonicForInitialEstimate;end; if isfield(prmin,'refineFftLength')==1; prm.refineFftLength=prmin.refineFftLength;end; if isfield(prmin,'refineTimeStretchingFactor')==1; prm.refineTimeStretchingFactor=prmin.refineTimeStretchingFactor;end; if isfield(prmin,'refineNumberofHarmonicComponent')==1; prm.refineNumberofHarmonicComponent=prmin.refineNumberofHarmonicComponent;end; if isfield(prmin,'periodicityFrameUpdateInterval')==1; prm.periodicityFrameUpdateInterval=prmin.periodicityFrameUpdateInterval;end; if isfield(prmin,'note')==1; prm.note=prmin.note;end; end ================================================ FILE: src/exstraightsource.m ================================================ function [f0raw,ap,analysisParams]=exstraightsource(x,fs,optionalParams) % Source information extraction for STRAIGHT % [f0raw,ap,analysisParams]=exstraightsource(x,fs,optionalParams) % Input parameters % x : input signal. if it is multi channel, only the first channel is used % fs : sampling frequency (Hz) % optionalParams : Optional parameters for analysis % Output parameters % f0raw : fundamental frequency (Hz) % ap : amount of aperiodic component in the time frequency represntation % : represented in dB % analysisParams : Analysis parameters actually used % % Usage: % Case 1: The simplest method % [f0raw,ap]=exstraightsource(x,fs); % Case 2: You can get to know what parameters were used. % [f0raw,ap,analysisParams]=exstraightsource(x,fs); % CAse 3: You can have full control of STRAIGHT synthesis. % Please use case 2 to find desired parameters to modify. % [f0raw,ap,analysisParams]=exstraightsource(x,fs,optionalParams); % Notes on programing style % This routine is based on the current (2005.1.31) implementation of % STRAIGHT that consist of many legacy fragments. They were intentionally % kept for maintaining historic record. Revised functions written in a % reasonable stylistic practice will be made available soon. % Designed and coded by Hideki Kawahara % 15/January/2005 % 01/February/2005 extended for user control % 30/April/2005 modification for Matlab v7.0 compatibility %---Check for number of input parameters switch nargin case 2 prm=zinitializeParameters; case 3 prm=replaceSuppliedParameters(optionalParams); otherwise disp('Number of arguments is 2 or 3!'); return; end % Initialize default parameters f0floor = prm.F0searchLowerBound; % f0floor f0ceil = prm.F0searchUpperBound; % f0ceil framem = prm.F0defaultWindowLength; % default frame length for pitch extraction (ms) f0shiftm = prm.F0frameUpdateInterval; % shiftm % F0 calculation interval (ms) fftl=1024; % default FFT length framel=framem*fs/1000; if fftl < framel fftl=2^ceil(log(framel)/log(2)); end; [nr,nc]=size(x); if nr>nc x=x(:,1); else x=x(1,:)'; end; nvo = prm.NofChannelsInOctave; % nvo=24; % Number of channels in one octave mu = prm.IFWindowStretch; % mu=1.2; % window stretch from isometric window imageOn = prm.DisplayPlots; % imgi=1; % image display indicator (1: display image) smp = prm.IFsmoothingLengthRelToFc; % smp=1; % smoothing length relative to fc (ratio) minsm = prm.IFminimumSmoothingLength; % minm=5; % minimum smoothing length (ms) pcf0 = prm.IFexponentForNonlinearSum; % pc=0.5; % exponent to represent nonlinear summation nh = prm.IFnumberOfHarmonicForInitialEstimate; % nc=1; % number of harmonic component to use (1,2,3) fname= prm.note; %=' '; % Any text to be printed on the source information plot nvc=ceil(log(f0ceil/f0floor)/log(2)*nvo); % number of channels % paramaters for F0 refinement fftlf0r = prm.refineFftLength; %fftlf0r=1024; % FFT length for F0 refinement tstretch = prm.refineTimeStretchingFactor; %tstretch=1.1; % time window stretching factor nhmx = prm.refineNumberofHarmonicComponent; %nhmx=3; % number of harmonic components for F0 refinement iPeriodicityInterval = prm.periodicityFrameUpdateInterval; % frame update interval for periodicity index (ms) %---- F0 extraction based on a fixed-point method in the frequency domain [f0v,vrv,dfv,~,aav]=fixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imageOn,f0shiftm,smp,minsm,pcf0,nh); if imageOn title([fname ' ' datestr(now,0)]); drawnow; end; %---- post processing for V/UV decision and F0 tracking [pwt,pwh]=zplotcpower(x,fs,f0shiftm,imageOn); [f0raw,irms,~,~]=f0track5(f0v,vrv,dfv,pwt,pwh,aav,f0shiftm,imageOn); % 11/Sept./2005 f0t=f0raw;avf0=mean(f0raw(f0raw>0)); f0t(f0t==0)=f0t(f0t==0)*NaN;tt=1:length(f0t); if imageOn subplot(615);plot(tt*f0shiftm,f0t,'g');grid on; if ~isnan(avf0) axis([1 max(tt)*f0shiftm ... min(avf0/sqrt(2),0.95*min(f0raw(f0raw>0))) ... max(avf0*sqrt(2),1.05*max(f0raw(f0raw>0)))]); end; ylabel('F0 (Hz)'); hold on; end; %---- F0 refinement nstp=1; % start position of F0 refinement (samples) nedp=length(f0raw); % last position of F0 refinement (samples) dn=floor(fs/(f0ceil*3*2)); % fix by H.K. at 28/Jan./2003 [f0raw,ecr]=refineF06(decimate(x,dn),fs/dn,f0raw,fftlf0r,tstretch,nhmx,f0shiftm,nstp,nedp,imageOn); % 31/Aug./2004% 11/Sept.2005 if imageOn f0t=f0raw; f0t(f0t==0)=f0t(f0t==0)*NaN;tt=1:length(f0t); subplot(615);plot(tt*f0shiftm,f0t,'k');hold off; drawnow end; %----------- 31/July/1999 ecrt=ecr; ecrt(f0raw==0)=ecrt(f0raw==0)*NaN; if imageOn tirms=irms; tirms(f0raw==0)=tirms(f0raw==0)*NaN; tirms(f0raw>0)=-20*log10(tirms(f0raw>0)); subplot(616);hrms=plot(tt*f0shiftm,tirms,'g',tt*f0shiftm,20*log10(ecrt),'r'); %31/July/1999 set(hrms,'LineWidth',2);hold on plot(tt*f0shiftm,-10*log10(vrv),'k.'); grid on;hold off axis([1 max(tt)*f0shiftm -10 60]); xlabel('time (ms)');ylabel('C/N (dB)'); drawnow; end; %------------------------------------------------------------------------------------- f0raw(f0raw<=0)=f0raw(f0raw<=0)*0; % safeguard 31/August/2004 f0raw(f0raw>f0ceil)=f0raw(f0raw>f0ceil)*0+f0ceil; % safeguard 31/August/2004 if nargout == 1; return; end; %----- aperiodicity estimation [apvq,dpvq,~,~]=aperiodicpartERB2(x,fs,f0raw,f0shiftm,iPeriodicityInterval,fftl/2+1,imageOn); % 10/April/2002$11/Sept./2005 apv=10*log10(apvq); % for compatibility dpv=10*log10(dpvq); % for compatibility %- --------- % Notes on aperiodicity estimation: The previous implementation of % aperiodicity estimation was sensitive to low frequency noise. It is a % bad news, because environmental noise usually has its power in the low % frequency region. The following corrction uses the C/N information % which is the byproduct of fixed point based F0 estimation. % by H.K. 04/Feb./2003 %- --------- dpv=correctdpv(apv,dpv,iPeriodicityInterval,f0raw,ecrt,f0shiftm,fs); % Aperiodicity correction 04/Feb./2003 by H.K. if imageOn bv=boundmes2(apv,dpv,fs,f0shiftm,iPeriodicityInterval,fftl/2+1); figure; semilogy((0:length(bv)-1)*f0shiftm,0.5./10.0.^(bv));grid on; set(gcf,'PaperPosition', [0.634517 0.634517 19.715 28.4084]); end; ap=aperiodiccomp(apv,dpv,iPeriodicityInterval,f0raw,f0shiftm,imageOn); % 11/Sept./2005 switch nargout case 2 case 3 analysisParams=prm; otherwise disp('Number of output parameters has to be 2 or 3!') end; end %%%---- internal functions function [pw,pwh]=zplotcpower(x,fs,shiftm,imageOn) flm=8; % 01/August/1999 fl=round(flm*fs/1000); w=hanning(2*fl+1); w=w/sum(w); nn=length(x); flpm=40; flp=round(flpm*fs/1000); wlp=fir1(flp*2,70/(fs/2)); wlp(flp+1)=wlp(flp+1)-1; wlp=-wlp; tx=[x(:)' zeros(1,2*length(wlp))]; ttx=fftfilt(wlp,tx); ttx=ttx((1:nn)+flp); tx=[ttx(:)' zeros(1,2*length(w))]; pw=fftfilt(w,tx.^2); pw=pw((1:nn)+fl); mpw=max(pw); pw=pw(round(1:shiftm*fs/1000:nn)); pw(pw3kHz) '); end; end %%%------ function prm=zinitializeParameters prm.F0searchLowerBound=40; % f0floor prm.F0searchUpperBound=800; % f0ceil prm.F0defaultWindowLength = 80; % default frame length for pitch extraction (ms) prm.F0frameUpdateInterval=1; % shiftm % F0 calculation interval (ms) prm.NofChannelsInOctave=24; % nvo=24; % Number of channels in one octave prm.IFWindowStretch=1.2; % mu=1.2; % window stretch from isometric window prm.DisplayPlots=0; % imgi=1; % image display indicator (1: display image) prm.IFsmoothingLengthRelToFc=1; % smp=1; % smoothing length relative to fc (ratio) prm.IFminimumSmoothingLength=5; % minm=5; % minimum smoothing length (ms) prm.IFexponentForNonlinearSum=0.5; % pc=0.5; % exponent to represent nonlinear summation prm.IFnumberOfHarmonicForInitialEstimate=1; % nc=1; % number of harmonic component to use (1,2,3) prm.refineFftLength=1024; %fftlf0r=1024; % FFT length for F0 refinement prm.refineTimeStretchingFactor=1.1; %tstretch=1.1; % time window stretching factor prm.refineNumberofHarmonicComponent=3; %nhmx=3; % number of harmonic components for F0 refinement prm.periodicityFrameUpdateInterval=5; % frame update interval for periodicity index (ms)return prm.note=' '; % Any text to be printed on the source information plot end %%%-------- function prm=replaceSuppliedParameters(prmin) prm=zinitializeParameters; if isfield(prmin,'F0searchLowerBound')==1; prm.F0searchLowerBound=prmin.F0searchLowerBound;end; if isfield(prmin,'F0searchUpperBound')==1; prm.F0searchUpperBound=prmin.F0searchUpperBound;end; if isfield(prmin,'F0defaultWindowLength')==1; prm.F0defaultWindowLength=prmin.F0defaultWindowLength;end; if isfield(prmin,'F0frameUpdateInterval')==1; prm.F0frameUpdateInterval=prmin.F0frameUpdateInterval;end; if isfield(prmin,'NofChannelsInOctave')==1; prm.NofChannelsInOctave=prmin.NofChannelsInOctave;end; if isfield(prmin,'IFWindowStretch')==1; prm.IFWindowStretch=prmin.IFWindowStretch;end; if isfield(prmin,'DisplayPlots')==1; prm.DisplayPlots=prmin.DisplayPlots;end; if isfield(prmin,'IFsmoothingLengthRelToFc')==1; prm.IFsmoothingLengthRelToFc=prmin.IFsmoothingLengthRelToFc;end; if isfield(prmin,'IFminimumSmoothingLength')==1; prm.IFminimumSmoothingLength=prmin.IFminimumSmoothingLength;end; if isfield(prmin,'IFexponentForNonlinearSum')==1; prm.IFexponentForNonlinearSum=prmin.IFexponentForNonlinearSum;end; if isfield(prmin,'IFnumberOfHarmonicForInitialEstimate')==1; prm.IFnumberOfHarmonicForInitialEstimate=prmin.IFnumberOfHarmonicForInitialEstimate;end; if isfield(prmin,'refineFftLength')==1; prm.refineFftLength=prmin.refineFftLength;end; if isfield(prmin,'refineTimeStretchingFactor')==1; prm.refineTimeStretchingFactor=prmin.refineTimeStretchingFactor;end; if isfield(prmin,'refineNumberofHarmonicComponent')==1; prm.refineNumberofHarmonicComponent=prmin.refineNumberofHarmonicComponent;end; if isfield(prmin,'periodicityFrameUpdateInterval')==1; prm.periodicityFrameUpdateInterval=prmin.periodicityFrameUpdateInterval;end; if isfield(prmin,'note')==1; prm.note=prmin.note;end; end ================================================ FILE: src/exstraightspec.m ================================================ function [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs,optionalParamsSP) % Spectral information extraction for STRAIGHT % [n3sgram,nalysisParamsSp]=exstraightspec(x,f0raw,fs,optionalParamsSP) % Input parameters % x : input signal. only the first channel is analyzed % f0raw : fundamental frequency (Hz) in 1 ms temporal resolution % : set 0 for aperiodic part % fs : sampling freuency (Hz) % optionalParamsSP : spectrum analysis parameters % Output parameters % n3sgram : Smoothed time frequency representation (spectrogram) % analysisParamsSp : Actually used parameters % % Usage: % Case 1: The simplest method % n3sgram = exstraightspec(x,f0raw,fs); % Case 2: You can get to know what parameters were used. % [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs); % CAse 3: You can have full control of STRAIGHT synthesis. % Please use case 2 to find desired parameters to modify. % [n3sgram,analysisParamsSp]=exstraightspec(x,f0raw,fs,optionalParamsSP); % Designed and coded by Hideki Kawahara % 15/January/2005 % 01/February/2005 % 11/Sept./2005 waitbar control is fixed. % 05/July/2006 default values are modified, eta, framem %---Check for number of input parameters switch nargin case 3 prm=zinitializeParameters; case 4 prm=replaceSuppliedParameters(optionalParamsSP); otherwise disp('Number of arguments is 2 or 3!'); return; end % Initialize parameters imageOn = prm.DisplayPlots; %imageOn=0; % Display indicator. 0: No graphics, 1: Show graphics framem = prm.defaultFrameLength; %framem=40; % default frame length for pitch extraction (ms) shiftm = prm.spectralUpdateInterval; %shiftm=1; % default frame shift (ms) for spectrogram eta = prm.spectralTimeWindowStretch; %eta=1.4; % time window stretch factor pc = prm.spectralExponentForNonlinearity; %pc=0.6; % exponent for nonlinearity mag = prm.spectralTimeDomainCompensation; %mag=0.2; % This parameter should be revised. framel=framem*fs/1000; fftl=1024; % default FFT length if fftl < framel fftl=2^ceil(log(framel)/log(2)); end; [nr,nc]=size(x); if nr>nc xold=x(:,1); else xold=x(1,:)'; end; %---- Spectral estimation xamp=std(xold); scaleconst=2200; % magic number for compatibility 15/Jan./2005 xold=xold/xamp*scaleconst; f0var=1; f0varL=1; % These are obsolate dummy variables. meaningless [n2sgrambk,~]=straightBodyC03ma(xold,fs,shiftm,fftl,f0raw,f0var,f0varL,eta,pc,imageOn); % 11/Sept./2005 if mag>0 n3sgram=specreshape(fs,n2sgrambk,eta,pc,mag,f0raw,imageOn); % 11/Sept./2005 else n3sgram=n2sgrambk; end; n3sgram=n3sgram/scaleconst*xamp; analysisParamsSp = prm; return; %%%--- Internal functions function prm=zinitializeParameters prm.DisplayPlots = 0; %imageOn=0; % Display indicator. 0: No graphics, 1: Show graphics prm.defaultFrameLength = 80; %framem=40; % default frame length for pitch extraction (ms) prm.spectralUpdateInterval = 1; %shiftm=1; % default frame shift (ms) for spectrogram prm.spectralTimeWindowStretch = 1.0; %eta=1.4; % time window stretch factor prm.spectralExponentForNonlinearity = 0.6; %pc=0.6; % exponent for nonlinearity prm.spectralTimeDomainCompensation = 0.2; %mag=0.2; % This parameter should be revised. %%%---- function prm=replaceSuppliedParameters(prmin) prm=zinitializeParameters; if isfield(prmin,'DisplayPlots')==1; prm.DisplayPlots=prmin.DisplayPlots;end; if isfield(prmin,'defaultFrameLength')==1; prm.defaultFrameLength=prmin.defaultFrameLength;end; if isfield(prmin,'spectralUpdateInterval')==1; prm.spectralUpdateInterval=prmin.spectralUpdateInterval;end; if isfield(prmin,'spectralTimeWindowStretch')==1; prm.spectralTimeWindowStretch=prmin.spectralTimeWindowStretch;end; if isfield(prmin,'spectralExponentForNonlinearity')==1; prm.spectralExponentForNonlinearity=prmin.spectralExponentForNonlinearity;end; if isfield(prmin,'spectralTimeDomainCompensation')==1; prm.spectralTimeDomainCompensation=prmin.spectralTimeDomainCompensation;end; return; ================================================ FILE: src/exstraightsynth.m ================================================ function [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs,optionalParamsS) % Synthesis using STRAIGHT parameters with linear modifications % [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs,optionalParamsS) % Input parameters % f0raw : fundamental frequency (Hz) % n3sgram : STRAIGHT spectrogram (in absolute value) % ap : aperiodic component (dB re. to total power) % fs : sampling frequency (Hz) % optionalParamsS : optional synthesis parameters % Output parameters % sy : synthesized speech % prmS : Actually used synthesis parameters % % Usage: % Case 1: The simplest method % sy = exstraightsynth(f0raw,n3sgram,ap,fs); % Case 2: You can get to know what parameters were used. % [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs); % CAse 3: You can have full control of STRAIGHT synthesis. % Please use case 2 to find desired parameters to modify. % [sy,prmS] = exstraightsynth(f0raw,n3sgram,ap,fs,optionalParamsS); % Designed and coded by Hideki Kawahara % 15/January/2005 % 01/February/2005 revised for generalization % 14/February/2005 fixed typo % 30/April/2005 modification for Matlab v7.0 compatibility % 11/Sept./2005 display indicator field is defined. % 27/Nov./2005 enabled setting lower limit of F0 % 03/July/2015 refactord for MATLAB R2016a and Octave %---- Check input parameters switch nargin case 4 prmS=zinitializeParameters; case 5 prmS=replaceSuppliedParameters(optionalParamsS); otherwise disp('Number of arguments is not relevant. Type help exstraightsynth.'); return end; %--- Initialize parameters shiftm = prmS.spectralUpdateInterval; %shiftm=1; % default frame shift (ms) for spectrogram delsp = prmS.groupDelayStandardDeviation; %delsp=0.5; % standard deviation of random group delay in ms gdbw = prmS.groupDelaySpatialBandWidth; %gdbw=70; % smoothing window length of random group delay (in Hz) cornf = prmS.groupDelayRandomizeCornerFrequency; %cornf=4000; % corner frequency for random phase (Hz) delfrac = prmS.ratioToFundamentalPeriod; %delfrac=0.2; % Fractional group delay (ratio) delfracind = prmS.ratioModeIndicator; %delfracind=0; % Use fractional group dealy, if this is set 1. normalizedOut = prmS.levelNormalizationIndicator; %normalizedOut = 1; % Normalize voiced part level, when this is set 1. headRoom = prmS.headRoomToClip; %headRoom = 22; % Head room from voiced part rms to the clipping level. (dB) lsegment = prmS.powerCheckSegmentLength; %lsegment = 15; % Segment length for voiced power check (ms) imap = prmS.timeAxisMappingTable; % imap = 1 represents identity mapping. pconv = prmS.fundamentalFrequencyMappingTable; %pconv = 1 represents identity mapping. fconv = prmS.frequencyAxisMappingTable; %fconv = 1 represents identity mapping. sconv = prmS.timeAxisStretchingFactor; %sconv = 1; % This is a simple coefficient. imgi = prmS.DisplayPlots; % default 0, 1: display on lowestF0 = prmS.lowestF0; % compatible default is 50 Hz [sy,statusReport] =straightSynthTB07ca(n3sgram,f0raw,shiftm,fs, ... pconv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind,ap,imap,imgi,lowestF0); % revised 27/Nov./2005 if normalizedOut dBsy=zpowerchk(sy,fs,lsegment); % 23/Sept./1999 cf=(20*log10(32768)-headRoom)-dBsy; sy=sy*(10.0.^(cf/20)); end; prmS.statusReport = statusReport; end %%%----- Internal functions function pow=zpowerchk(x,fs,segms) % Calculate average power of voiced portion % pow=powerchk(x,fs,segms) % x : signal % fs : sampling frequency (Hz) % segms : segment length (ms) % 23/Sept./1999 updated x1=x(:); iv=(1:length(x1))'; x1(isnan(x1))=iv(isnan(x1))*0+0.0000000001; x2=x1.*x1; n=round(segms/1000*fs); % 23/Sept./1999 nw=ceil(length(x)/n); if rem(length(x),n)>0 x2=[x2;0.000001*randn(n*nw-length(x),1).^2]; % 23/Sept./1999 end; x2(x2==0)=x2(x2==0)+0.000001; pw=sum(reshape(x2,n,nw))/n; pow=10*log10(mean(pw(pw>(mean(pw)/30)))); end %%%---- Initialize parameters function prm=zinitializeParameters prm.spectralUpdateInterval = 1; %shiftm=1; % default frame shift (ms) for spectrogram prm.groupDelayStandardDeviation = 0.5; %delsp=0.5; % standard deviation of random group delay in ms prm.groupDelaySpatialBandWidth = 70; %gdbw=70; % smoothing window length of random group delay (in Hz) prm.groupDelayRandomizeCornerFrequency = 4000; %cornf=4000; % corner frequency for random phase (Hz) prm.ratioToFundamentalPeriod = 0.2; %delfrac=0.2; % Fractional group delay (ratio) prm.ratioModeIndicator = 0; %delfracind=0; % Use fractional group dealy, if this is set 1. prm.levelNormalizationIndicator = 1; %normalizedOut = 1; % Normalize voiced part level, when this is set 1. prm.headRoomToClip = 22; %headRoom = 22; % Head room from voiced part rms to the clipping level. (dB) prm.powerCheckSegmentLength = 15; %lsegment = 15; % Segment length for voiced power check (ms) prm.timeAxisMappingTable = 1; % imap = 1 represents identity mapping. prm.fundamentalFrequencyMappingTable = 1; %pconv = 1 represents identity mapping. prm.frequencyAxisMappingTable = 1; %fconv = 1 represents identity mapping. prm.timeAxisStretchingFactor = 1; %sconv = 1; % This is a simple coefficient. prm.DisplayPlots = 0; % default 0, 1:disply on prm.lowestF0 = 50; % default that was not as same as the previous version but consistent end %%%---- function prm=replaceSuppliedParameters(prmin) prm=zinitializeParameters; if isfield(prmin,'spectralUpdateInterval')==1; prm.spectralUpdateInterval=prmin.spectralUpdateInterval;end; if isfield(prmin,'groupDelayStandardDeviation')==1; prm.groupDelayStandardDeviation=prmin.groupDelayStandardDeviation;end; if isfield(prmin,'groupDelaySpatialBandWidth')==1; prm.groupDelaySpatialBandWidth=prmin.groupDelaySpatialBandWidth;end; if isfield(prmin,'groupDelayRandomizeCornerFrequency')==1; prm.groupDelayRandomizeCornerFrequency=prmin.groupDelayRandomizeCornerFrequency;end; if isfield(prmin,'ratioToFundamentalPeriod')==1; prm.ratioToFundamentalPeriod=prmin.ratioToFundamentalPeriod;end; if isfield(prmin,'ratioModeIndicator')==1; prm.ratioModeIndicator=prmin.ratioModeIndicator;end; if isfield(prmin,'levelNormalizationIndicator')==1; prm.levelNormalizationIndicator=prmin.levelNormalizationIndicator;end; if isfield(prmin,'headRoomToClip')==1; prm.headRoomToClip=prmin.headRoomToClip;end; if isfield(prmin,'powerCheckSegmentLength')==1; prm.powerCheckSegmentLength=prmin.powerCheckSegmentLength;end; if isfield(prmin,'timeAxisMappingTable')==1; prm.timeAxisMappingTable=prmin.timeAxisMappingTable;end; if isfield(prmin,'fundamentalFrequencyMappingTable')==1; prm.fundamentalFrequencyMappingTable=prmin.fundamentalFrequencyMappingTable;end; if isfield(prmin,'frequencyAxisMappingTable')==1; prm.frequencyAxisMappingTable=prmin.frequencyAxisMappingTable;end; if isfield(prmin,'timeAxisStretchingFactor')==1; prm.timeAxisStretchingFactor=prmin.timeAxisStretchingFactor;end; if isfield(prmin,'DisplayPlots')==1; prm.DisplayPlots=prmin.DisplayPlots;end; if isfield(prmin,'lowestF0')==1; prm.lowestF0=prmin.lowestF0;end; end ================================================ FILE: src/f0track5.m ================================================ function [f0,irms,df,amp]=f0track5(f0v,vrv,dfv,pwt,pwh,aav,shiftm,imgi) % F0 trajectory tracker % [f0,irms,df,amp]=f0track2(f0v,vrv,dfv,shiftm,imgi) % f0 : extracted F0 (Hz) % irms : relative interfering energy in rms % % f0v : fixed point frequency vector % vrv : relative interfering energy vector % dfv : fixed point slope vector % pwt : total power % pwh : power in higher frequency range % aav : amplitude list for fixed points % shiftm : frame update period (ms) % imgi : display indicator, 1: display on (default), 0: off % % This is a very primitive and conventional algorithm. % coded by Hideki Kawahara % copyright(c) Wakayama University/CREST/ATR % 10/April/1999 first version % 17/May/1999 relative fq jump thresholding % 01/August/1999 parameter tweeking % 07/Dec./2002 waitbar was added % 13/Jan./2005 bug fix on lines 58, 97 (Thanx Ishikasa-san) % 30/April/2005 modification for Matlab v7.0 compatibility % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar if nargin==7; imgi=1; end; %10/Sept./2005 vrv=sqrt(vrv); [~,mm]=size(vrv); mm=min(mm,length(pwt)); f0=zeros(1,mm); irms=ones(1,mm); df=ones(1,mm); amp=zeros(1,mm); von=0; [mxvr,ixx]=min(vrv); hth=0.12; % highly confident voiced threshould (updated on 01/August/1999) lth=0.9; % threshold to loose confidence bklm=100; % back track length for voicing decision lalm=10; % look ahead length for silence decision bkls=bklm/shiftm; lals=lalm/shiftm; htr=10*log10(pwh./pwt); thf0j=0.04*sqrt(shiftm); % 4 % of F0 is the limit of jump ii=1; f0ref=0; htrth=-2.0; % was -3 mod 2002.6.3 if imgi==1; hpg=waitbar(0,'F0 tracking'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 while ii < mm+1 if (von == 0) && (mxvr(ii)10000)+(f0v(jxx,jj)>10000); if (((gomi>thf0j) || (vrv(jxx,jj)>lth) || (htr(jj)>htrth))&&(f0v(jxx,jj)<1000)) && htr(jj)>-18 % disp(['break pt1 at ' num2str(jj)]) break end; if (gomi>thf0j) % disp(['break pt2 at ' num2str(jj)]) break end; f0(jj)=f0v(jxx,jj); irms(jj)=vrv(jxx,jj); df(jj)=dfv(jxx,jj); amp(jj)=aav(jxx,jj); f0ref=f0(jj); end; f0ref=f0v(ixx(ii),ii); end; if (f0ref>0) && (f0ref<10000) [gomi,jxx]=min(abs((f0v(:,ii)-f0ref)/f0ref)); else gomi=10; end; if (von ==1) && (mxvr(ii)>hth) for jj=ii:min(mm,ii+lals) ii=jj; [gomi,jxx]=min(abs((f0v(:,ii)-f0ref)/f0ref)); gomi=gomi+(f0ref>10000)+(f0v(jxx,ii)>10000); if (gomi< thf0j) && ((htr(ii)=1000)) f0(ii)=f0v(jxx,ii); irms(ii)=vrv(jxx,ii); df(ii)=dfv(jxx,ii); amp(ii)=aav(jxx,ii); f0ref=f0(ii); end; if (gomi>thf0j) || (vrv(jxx,ii)>lth) || ((htr(ii)>htrth)&&(f0v(jxx,ii)<1000)) von = 0;f0ref=0; break end; end; elseif (von==1) && (gomi < thf0j) && ((htr(ii)=1000)) f0(ii)=f0v(jxx,ii); irms(ii)=vrv(jxx,ii); df(ii)=dfv(jxx,ii); amp(ii)=aav(jxx,ii); f0ref=f0(ii); else von=0; end; if imgi==1; waitbar(ii/mm); end; %,hpg); % 07/Dec./2002 by H.K.%10/Aug./2005 ii=ii+1; end; if imgi==1; close(hpg); end;%10/Aug./2005 ================================================ FILE: src/fixpF0VexMltpBG4.m ================================================ function [f0v,vrv,dfv,nf,aav]=fixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pc,nc) % Fixed point analysis to extract F0 % [f0v,vrv,dfv,nf]=fixpF0VexMltpBG4(x,fs,f0floor,nvc,nvo,mu,imgi,shiftm,smp,minm,pc,nc) % x : input signal % fs : sampling frequency (Hz) % f0floor : lowest frequency for F0 search % nvc : total number of filter channels % nvo : number of channels per octave % mu : temporal stretching factor % imgi : image display indicator (1: display image, default) % shiftm : frame shift in ms % smp : smoothing length relative to fc (ratio) % minm : minimum smoothing length (ms) % pc : exponent to represent nonlinear summation % nc : number of harmonic component to use (1,2,3) % Designed and coded by Hideki Kawahara % 28/March/1999 % 04/April/1999 revised to multi component version % 07/April/1999 bi-reciprocal smoothing for multi component compensation % 01/May/1999 first derivative of Amplitude is taken into account % 17/Dec./2000 display bug fix % 19/Sep./2002 bug fix (mu information was discarded.) % 07/Dec./2002 waitbar was added % 30/April/2005 modification for Matlab v7.0 compatibility % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 mofidied by Kawahara on waitbar % 11/Sept./2005 fixed waitbar problem %f0floor=40; %nvo=12; %nvc=52; %mu=1.1; x=cleaninglownoise(x,fs,f0floor); fxx=f0floor*2.0.^((0:nvc-1)/nvo)'; fxh=max(fxx); dn=max(1,floor(fs/(fxh*6.3))); if nc>2 pm3=multanalytFineCSPB(decimate(x,dn),fs/dn,f0floor,nvc,nvo,mu,3,imgi); % error crrect 2002.9.19 (mu was fixed 1.1) pif3=zwvlt2ifq(pm3,fs/dn); [~,mm]=size(pif3); pif3=pif3(:,1:3:mm); pm3=pm3(:,1:3:mm); end; if nc>1 pm2=multanalytFineCSPB(decimate(x,dn),fs/dn,f0floor,nvc,nvo,mu,2,imgi);% error crrect 2002.9.19(mu was fixed 1.1) pif2=zwvlt2ifq(pm2,fs/dn); [~,mm]=size(pif2); pif2=pif2(:,1:3:mm); pm2=pm2(:,1:3:mm); end; pm1=multanalytFineCSPB(decimate(x,dn*3),fs/(dn*3),f0floor,nvc,nvo,mu,1,imgi);% error crrect 2002.9.19(mu was fixed 1.1) %%%% safe guard added on 15/Jan./2003 mxpm1=max(max(abs(pm1))); eeps=mxpm1/10000000; pm1(pm1==0)=pm1(pm1==0)+eeps; %%%% safe guard end pif1=zwvlt2ifq(pm1,fs/(dn*3)); %keyboard; [~,mm1]=size(pif1); mm=mm1; if nc>1 [~,mm2]=size(pif2); mm=min(mm1,mm2); end; if nc>2 [~,mm3]=size(pif3); mm=min([mm1 mm2 mm3]); end; if nc == 2 for ii=1:mm pif2(:,ii)=(pif1(:,ii).*(abs(pm1(:,ii))).^pc ... +pif2(:,ii)/2.*(abs(pm2(:,ii))).^pc )... ./((abs(pm1(:,ii))).^pc+(abs(pm2(:,ii))).^pc); end; end; if nc == 3 for ii=1:mm pif2(:,ii)=(pif1(:,ii).*(abs(pm1(:,ii))).^pc ... +pif2(:,ii)/2.*(abs(pm2(:,ii))).^pc ... +pif3(:,ii)/3.*(abs(pm3(:,ii))).^pc )... ./((abs(pm1(:,ii))).^pc+(abs(pm2(:,ii))).^pc+(abs(pm3(:,ii))).^pc); end; end; if nc == 1 pif2=pif1; end; %pif2=zwvlt2ifq(pm,fs/dn)*2*pi; pif2=pif2*2*pi; dn=dn*3; [slp,~]=zifq2gpm2(pif2,f0floor,nvo); [nn,mm]=size(pif2); dpif=(pif2(:,2:mm)-pif2(:,1:mm-1))*fs/dn; dpif(:,mm)=dpif(:,mm-1); [dslp,~]=zifq2gpm2(dpif,f0floor,nvo); damp=(abs(pm1(:,2:mm))-abs(pm1(:,1:mm-1)))*fs/dn; damp(:,mm)=damp(:,mm-1); damp=damp./abs(pm1); %[c1,c2]=znormwght(1000); fxx=f0floor*2.0.^((0:nn-1)/nvo)'*2*pi; mmp=0*dslp; [c1,c2b]=znrmlcf2(1); if imgi==1; hpg=waitbar(0,'P/N map calculation'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 for ii=1:nn % [c1,c2]=znrmlcf2(fxx(ii)/2/pi); % This is OK, but the next Eq is much faster. c2=c2b*(fxx(ii)/2/pi)^2; cff=damp(ii,:)/fxx(ii)*2*pi*0; mmp(ii,:)=(dslp(ii,:)./(1+cff.^2)/sqrt(c2)).^2+(slp(ii,:)./sqrt(1+cff.^2)/sqrt(c1)).^2; if imgi==1; waitbar(ii/nn); end; %,hpg); % 07/Dec./2002 by H.K.%10/Aug./2005 end; if imgi==1; close(hpg); end;%10/Aug./2005 if smp~=0 smap=zsmoothmapB(mmp,fs/dn,f0floor,nvo,smp,minm,0.4); else smap=mmp; end; fixpp=zeros(round(nn/3),mm); fixvv=fixpp+100000000; fixdf=fixpp+100000000; fixav=fixpp+1000000000; nf=zeros(1,mm); if imgi==1; hpg=waitbar(0,'Fixed pints calculation'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 for ii=1:mm [ff,vv,df,aa]=zfixpfreq3(fxx,pif2(:,ii),smap(:,ii),dpif(:,ii)/2/pi,pm1(:,ii)); kk=length(ff); fixpp(1:kk,ii)=ff; fixvv(1:kk,ii)=vv; fixdf(1:kk,ii)=df; fixav(1:kk,ii)=aa; nf(ii)=kk; if imgi==1 && rem(ii,10)==0; waitbar(ii/mm); end;% 07/Dec./2002 by H.K.%10/Aug./2005 end; if imgi==1; close(hpg); end; % 07/Dec./2002 by H.K.%10/Aug./2005 fixpp(fixpp==0)=fixpp(fixpp==0)+1000000; %keyboard %[vvm,ivv]=min(fixvv); % %for ii=1:mm % ff00(ii)=fixpp(ivv(ii),ii); % esgm(ii)=fixvv(ivv(ii),ii); %end; np=max(nf); f0v=fixpp(1:np,round(1:shiftm/dn*fs/1000:mm))/2/pi; vrv=fixvv(1:np,round(1:shiftm/dn*fs/1000:mm)); dfv=fixdf(1:np,round(1:shiftm/dn*fs/1000:mm)); aav=fixav(1:np,round(1:shiftm/dn*fs/1000:mm)); nf=nf(round(1:shiftm/dn*fs/1000:mm)); if imgi==1 cnmap(fixpp,smap,fs,dn,nvo,f0floor,shiftm); end; %ff00=ff00(round(1:shiftm/dn*fs/1000:mm)); %esgm=sqrt(esgm(round(1:shiftm/dn*fs/1000:mm))); %keyboard; return; %------------------------------------------------------------------ function okid=cnmap(fixpp,smap,fs,dn,nvo,f0floor,shiftm) % This function had a bug in map axis. % 17/Dec./2000 bug fix by Hideki Kawahara. dt=dn/fs; [nn,mm]=size(smap); aa=figure; set(aa,'PaperPosition',[0.3 0.25 8 10.9]); set(aa,'Position',[30 130 520 680]); subplot(211); imagesc([0 (mm-1)*dt*1000],[1 nn],20*log10(smap(:,round(1:shiftm/dn*fs/1000:mm))));axis('xy') hold on; tx=((1:shiftm/dn*fs/1000:mm)-1)*dt*1000; plot(tx,(nvo*log(fixpp(:,round(1:shiftm/dn*fs/1000:mm))/f0floor/2/pi)/log(2)+0.5)','ko'); plot(tx,(nvo*log(fixpp(:,round(1:shiftm/dn*fs/1000:mm))/f0floor/2/pi)/log(2)+0.5)','w.'); hold off xlabel('time (ms)'); ylabel('channel #'); colormap(jet); okid=1; return; %------------------------------------------------------------------ %%function pm=zmultanalytFineCSPm(x,fs,f0floor,nvc,nvo,mu,mlt); % Dual waveleta analysis using cardinal spline manipulation % pm=multanalytFineCSP(x,fs,f0floor,nvc,nvo); % Input parameters % % x : input signal (2kHz sampling rate is sufficient.) % fs : sampling frequency (Hz) % f0floor : lower bound for pitch search (60Hz suggested) % nvc : number of total voices for wavelet analysis % nvo : number of voices in an octave % mu : temporal stretch factor % Outpur parameters % pm : wavelet transform using iso-metric Gabor function % % If you have any questions, mailto:kawahara@hip.atr.co.jp % % Copyright (c) ATR Human Information Processing Research Labs. 1996 % Invented and coded by Hideki Kawahara % 30/Oct./1996 %t0=1/f0floor; %lmx=round(6*t0*fs*mu); %wl=2^ceil(log(lmx)/log(2)); %x=x(:)'; %nx=length(x); %tx=[x,zeros(1,wl)]; %gent=((1:wl)-wl/2)/fs; %nvc=18; %wd=zeros(nvc,wl); %wd2=zeros(nvc,wl); %ym=zeros(nvc,nx); %pm=zeros(nvc,nx); %mpv=1; %mu=1.0; %for ii=1:nvc % t=gent*mpv; % t=t(abs(t)<3.5*mu*t0); % wbias=round((length(t)-1)/2); % wd1=exp(-pi*(t/t0/mu).^2);%.*exp(i*2*pi*t/t0); % wd2=max(0,1-abs(t/t0/mu)); % wd2=wd2(wd2>0); % wwd=conv(wd2,wd1); % wwd=wwd(abs(wwd)>0.0001); % wbias=round((length(wwd)-1)/2); % wwd=wwd.*exp(i*2*pi*mlt*t(round((1:length(wwd))-wbias+length(t)/2))/t0); % pmtmp1=fftfilt(wwd,tx); % pm(ii,:)=pmtmp1(wbias+1:wbias+nx)*sqrt(mpv); % mpv=mpv*(2.0^(1/nvo)); % keyboard; %end; %[nn,mm]=size(pm); %pm=pm(:,1:mlt:mm); %---------------------------------------------------------------- function pif=zwvlt2ifq(pm,fs) % Wavelet to instantaneous frequency map % fqv=wvlt2ifq(pm,fs) % Coded by Hideki Kawahara % 02/March/1999 [~,mm]=size(pm); pm=pm./(abs(pm)); pif=abs(pm(:,:)-[pm(:,1),pm(:,1:mm-1)]); pif=fs/pi*asin(pif/2); pif(:,1)=pif(:,2); %---------------------------------------------------------------- function [slp,pbl]=zifq2gpm2(pif,f0floor,nvo) % Instantaneous frequency 2 geometric parameters % [slp,pbl]=ifq2gpm(pif,f0floor,nvo) % slp : first order coefficient % pbl : second order coefficient % Coded by Hideki Kawahara % 02/March/1999 [nn,~]=size(pif); fx=f0floor*2.0.^((0:nn-1)/nvo)*2*pi; c=2.0^(1/nvo); g=[1/c/c 1/c 1;1 1 1;c*c c 1]; h=inv(g); %slp=pif(1:nn-2,:)*h(1,1)+pif(2:nn-1,:)*h(1,2)+pif(3:nn,:)*h(1,3); slp=((pif(2:nn-1,:)-pif(1:nn-2,:))/(1-1/c) ... +(pif(3:nn,:)-pif(2:nn-1,:))/(c-1))/2; slp=[slp(1,:);slp;slp(nn-2,:)]; pbl=pif(1:nn-2,:)*h(2,1)+pif(2:nn-1,:)*h(2,2)+pif(3:nn,:)*h(2,3); pbl=[pbl(1,:);pbl;pbl(nn-2,:)]; for ii=1:nn slp(ii,:)=slp(ii,:)/fx(ii); pbl(ii,:)=pbl(ii,:)/fx(ii); end; %------------------------------------------ %function [c1,c2]=znormwght(n) %zz=0:1/n:3; %hh=[diff(zGcBs(zz,0)) 0]*n; %c1=sum((zz.*hh).^2)/n; %c2=sum((2*pi*zz.^2.*hh).^2)/n; %------------------------------------------- function p=zGcBs(x,k) tt=x+0.0000001; p=tt.^k.*exp(-pi*tt.^2).*(sin(pi*tt+0.0001)./(pi*tt+0.0001)).^2; %-------------------------------------------- function smap=zsmoothmapB(map,fs,f0floor,nvo,mu,mlim,pex) [nvc,mm]=size(map); %mu=0.4; t0=1/f0floor; lmx=round(6*t0*fs*mu); wl=2^ceil(log(lmx)/log(2)); gent=((1:wl)-wl/2)/fs; smap=map; mpv=1; zt=0*gent; iiv=1:mm; for ii=1:nvc t=gent*mpv; %t0*mu/mpv*1000 t=t(abs(t)<3.5*mu*t0); wbias=round((length(t)-1)/2); wd1=exp(-pi*(t/(t0*(1-pex))/mu).^2); wd2=exp(-pi*(t/(t0*(1+pex))/mu).^2); wd1=wd1/sum(wd1); wd2=wd2/sum(wd2); tm=fftfilt(wd1,[map(ii,:) zt]); tm=fftfilt(wd2,[1.0./tm(iiv+wbias) zt]); smap(ii,:)=1.0./tm(iiv+wbias); if t0*mu/mpv*1000 > mlim mpv=mpv*(2.0^(1/nvo)); end; end; %-------------------------------------------- %function [ff,vv,df]=zfixpfreq2(fxx,pif2,mmp,dfv) % %nn=length(fxx); %iix=(1:nn)'; %cd1=pif2-fxx; %cd2=[diff(cd1);cd1(nn)-cd1(nn-1)]; %cdd1=[cd1(2:nn);cd1(nn)]; %fp=(cd1.*cdd1<0).*(cd2<0); %ixx=iix(fp>0); %ff=pif2(ixx)+(pif2(ixx+1)-pif2(ixx)).*cd1(ixx)./(cd1(ixx)-cdd1(ixx)); %vv=mmp(ixx); %vv=mmp(ixx)+(mmp(ixx+1)-mmp(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); %df=dfv(ixx)+(dfv(ixx+1)-dfv(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); %-------------------------------------------- function [ff,vv,df,aa]=zfixpfreq3(fxx,pif2,mmp,dfv,pm) aav=abs(pm); nn=length(fxx); iix=(1:nn)'; cd1=pif2-fxx; cd2=[diff(cd1);cd1(nn)-cd1(nn-1)]; cdd1=[cd1(2:nn);cd1(nn)]; fp=(cd1.*cdd1<0).*(cd2<0); ixx=iix(fp>0); ff=pif2(ixx)+(pif2(ixx+1)-pif2(ixx)).*cd1(ixx)./(cd1(ixx)-cdd1(ixx)); %vv=mmp(ixx); vv=mmp(ixx)+(mmp(ixx+1)-mmp(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); df=dfv(ixx)+(dfv(ixx+1)-dfv(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); aa=aav(ixx)+(aav(ixx+1)-aav(ixx)).*(ff-fxx(ixx))./(fxx(ixx+1)-fxx(ixx)); %-------------------------------------------- function [c1,c2]=znrmlcf2(f) n=100; x=0:1/n:3; g=zGcBs(x,0); dg=[diff(g) 0]*n; dgs=dg/2/pi/f; xx=2*pi*f*x; c1=sum((xx.*dgs).^2)/n*2; c2=sum((xx.^2.*dgs).^2)/n*2; %-------------------------------------------- function x=cleaninglownoise(x,fs,f0floor) flm=50; flp=round(fs*flm/1000); nn=length(x); wlp=fir1(flp*2,f0floor/(fs/2)); wlp(flp+1)=wlp(flp+1)-1; wlp=-wlp; tx=[x(:)' zeros(1,2*length(wlp))]; ttx=fftfilt(wlp,tx); x=ttx((1:nn)+flp); return; ================================================ FILE: src/fractpitch2.m ================================================ function phs=fractpitch2(fftl) % Phase rotator for fractional pitch % This program produces 'phs' as the phase rotator. % by Hideki Kawahara % 22/August/1996 amp=15; t=((1:fftl)-fftl/2-1)/fftl*2; phs=t+(1-exp(amp*t))./(1+exp(amp*t)) ... -(1+(1-exp(amp))/(1+exp(amp)))*t; phs(1)=0; phs=phs*pi; ================================================ FILE: src/gdmap.m ================================================ function gdm=gdmap(n3sgram,fs) % gdm=gdmap(n3sgram,fs) % function to calculate group delay map from % smoothed time frequency representation % Designed and coded by Hideki Kawahara % 7/Sept./2003 [nn,mm]=size(n3sgram); fftl=(nn-1)*2; rbb2=fftl/2:-1:2; gdm=zeros(nn,mm); for ii=1:mm ff=[n3sgram(:,ii);n3sgram(rbb2,ii)]; ccp=real(fft(log(ff))); ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(-ifft(ccp2)); gdt=-diff(imag(ffx)/(2*pi*fs/fftl)); gdm(:,ii)=[gdt(1);gdt(1:fftl/2)]; end; ================================================ FILE: src/getvalufromedit.m ================================================ function y=getvalufromedit(co,defv) ss=get(gco,'String'); y=str2num(ss); if (length(y) <1) | (length(y)>1) y=defv; end; ================================================ FILE: src/isOctave.m ================================================ function output = isOctave v = ver; output = strcmp('Octave', v(1).Name); end ================================================ FILE: src/mktstr.m ================================================ function tstr=mktstr % return time string in hh:mm:ss format % by Hideki Kawahara % 05/Jan./1995 anatime=fix(clock); tstr=[num2str(anatime(4)) ':' num2str(anatime(5)) ':' num2str(anatime(6))]; ================================================ FILE: src/multanalytFineCSPB.m ================================================ function pm=multanalytFineCSPB(x,fs,f0floor,nvc,nvo,mu,mlt,imgi) % Dual waveleta analysis using cardinal spline manipulation % pm=multanalytFineCSPB(x,fs,f0floor,nvc,nvo,mu,mlt) % Input parameters % % x : input signal (2kHz sampling rate is sufficient.) % fs : sampling frequency (Hz) % f0floor : lower bound for pitch search (60Hz suggested) % nvc : number of total voices for wavelet analysis % nvo : number of voices in an octave % mu : temporal stretch factor % mlt : harmonic ID# % imgi : display indicator, 1: dispaly on (default), 0: display off % Outpur parameters % pm : wavelet transform using iso-metric Gabor function % % If you have any questions, mailto:kawahara@hip.atr.co.jp % % Copyright (c) ATR Human Information Processing Research Labs. 1996 % Invented and coded by Hideki Kawahara % 30/Oct./1996 % 07/Dec./2002 waitbar was added % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar if nargin==7; imgi=1; end;%10/Sept./2005 t0=1/f0floor; lmx=round(6*t0*fs*mu); wl=2^ceil(log(lmx)/log(2)); x=x(:)'; nx=length(x); tx=[x,zeros(1,wl)]; gent=((1:wl)-wl/2)/fs; pm=zeros(nvc,nx); mpv=1; if imgi==1; hpg=waitbar(0,['wavelet analysis for initial F0 ' ... 'and P/N estimation with HM#:' num2str(mlt)]); end; % 07/Dec./2002 by H.K.%10/Aug./2005 for ii=1:nvc tb=gent*mpv; t=tb(abs(tb)<3.5*mu*t0); wd1=exp(-pi*(t/t0/mu).^2); wd2=max(0,1-abs(t/t0/mu)); wd2=wd2(wd2>0); wwd=conv(wd2,wd1); wwd=wwd(abs(wwd)>0.00001); wbias=round((length(wwd)-1)/2); wwd=wwd.*exp(1i*2*pi*mlt*t(round((1:length(wwd))-wbias+length(t)/2))/t0); pmtmp1=fftfilt(wwd,tx); pm(ii,:)=pmtmp1(wbias+1:wbias+nx)*sqrt(mpv); mpv=mpv*(2.0^(1/nvo)); if imgi==1; waitbar(ii/nvc); end; %,hpg);% 07/Dec./2002 by H.K.%10/Aug./2005 end; if imgi==1; close(hpg); end; % 07/Dec./2002 by H.K.%10/Aug./2005 ================================================ FILE: src/optimumsmoothing.m ================================================ function ovc=optimumsmoothing(eta,pc) % ovc=optimumsmoothing(eta,pc) % Calculate the optimum smoothing function % ovc : coefficients for 2nd order cardinal B-spline % eta : temporal stretch factor % pc : power exponent for nonlinearity % 05/July/2006 dirty patch. This routine has to be re-programmed. fx=-8:0.05:8; cb=max(0,1-abs(fx)); gw=exp(-pi*(fx*eta*1.4).^2).^pc; cmw=conv(cb,gw); bb=(1:length(cb)); bbc=bb+(length(cb)-1)/2; cmw=cmw(bbc)/max(cmw); ss=(abs(fx-round(fx))<0.025).*(1:length(cb)); ss=ss(ss>0); cmws=cmw(ss); nn=length(cmws); idv=1:nn; hh=zeros(2*nn,nn); for ii=1:nn hh((ii-1)+idv,ii)=cmws'; end; bv=zeros(2*nn,1); bv(nn+1)=1; % This is the original unit impulse. h=hh'*hh; ov = h \ (hh'*bv); idc=(nn-1)/2+2; ovc=ov(idc+(0:3)); ================================================ FILE: src/plotcpower.m ================================================ function [pw,pwh]=plotcpower(x,fs,shiftm) % 30/April/2005 modification for Matlab v7.0 compatibility flm=8; % originally; 01/August/1999 . fl=round(flm*fs/1000); w=hanning(2*fl+1); w=w/sum(w); nn=length(x); flpm=40; flp=round(flpm*fs/1000); wlp=fir1(flp*2,70/(fs/2)); wlp(flp+1)=wlp(flp+1)-1; wlp=-wlp; tx=[x(:)' zeros(1,2*length(wlp))]; ttx=fftfilt(wlp,tx); ttx=ttx((1:nn)+flp); tx=[ttx(:)' zeros(1,2*length(w))]; pw=fftfilt(w,tx.^2); pw=pw((1:nn)+fl); mpw=max(pw); pw=pw(round(1:shiftm*fs/1000:nn)); pw(pw3kHz) '); end ================================================ FILE: src/powerchk.m ================================================ function pow=powerchk(x,fs,segms) % Calculate average power of voiced portion % pow=powerchk(x,fs,segms) % x : signal % fs : sampling frequency (Hz) % segms : segment length (ms) % 23/Sept./1999 updated % 30/April/2005 modification for Matlab v7.0 compatibility x1=x(:); iv=(1:length(x1))'; x1(isnan(x1))=iv(isnan(x1))*0+0.0000000001; x2=x1.*x1; n=round(segms/1000*fs); % 23/Sept./1999 nw=ceil(length(x)/n); if rem(length(x),n)>0 x2=[x2;0.000001*randn(n*nw-length(x),1).^2]; % 23/Sept./1999 end; x2(x2==0)=x2(x2==0)+0.000001; pw=sum(reshape(x2,n,nw))/n; pow=10*log10(mean(pw(pw>(mean(pw)/30)))); ================================================ FILE: src/refineF06.m ================================================ function [f0r,ecr]=refineF06(x,fs,f0raw,fftl,eta,nhmx,shiftm,nl,nu,imgi) % F0 estimation refinement % [f0r,ecr]=refineF06(x,fs,f0raw,fftl,nhmx,shiftm,nl,nu,imgi) % x : input waveform % fs : sampling frequency (Hz) % f0raw : F0 candidate (Hz) % fftl : FFT length % eta : temporal stretch factor % nhmx : highest harmonic number % shiftm : frame shift period (ms) % nl : lower frame number % nu : uppter frame number % imgi : display indicator, 1: display on (default), 0: off % % Example of usage (with STRAIGHT) % % global xold fs f0shiftm f0raw % % dn=floor(fs/(800*3*2)); % [f0raw,ecr]=refineF02(decimate(xold,dn),fs/dn,f0raw,512,1.1,3,f0shiftm,1,length(f0raw)); % Designed and coded by Hideki Kawahara % 28/July/1999 % 29/July/1999 test version using power weighting % 30/July/1999 GcBs is added (bug fix) % 07/August/1999 small bug fix % 07/Dec./2002 wqitbar was added % 13.May/2005 minor vulnerability fix % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar % 16/Sept./2005 minor bug fix % 26/Sept./2005 bug fix if nargin==9; imgi=1; end; f0i=f0raw(:); f0i(f0i==0)=f0i(f0i==0)+160; fax=(0:fftl-1)/fftl*fs; nfr=length(f0i); % 07/August/1999 shiftl=shiftm/1000*fs; x=[zeros(fftl,1); x(:) ; zeros(fftl,1)]'; tt=((1:fftl)-fftl/2)/fs; th=(0:fftl-1)/fftl*2*pi; rr=exp(-1i*th); f0t=100; w1=max(0,1-abs(tt'*f0t/eta)); w1=w1(w1>0); wg=exp(-pi*(tt*f0t/eta).^2); wgg=(wg(abs(wg)>0.0002)); wo=fftfilt(wgg,[w1; zeros(length(wgg),1)])'; xo=(0:length(wo)-1)/(length(wo)-1); nlo=length(wo)-1; if nl*nu <0 nl=1; nu=nfr; end; bx=1:fftl/2+1; pif=zeros(fftl/2+1,nfr); dpif=zeros(fftl/2+1,nfr); pwm=zeros(fftl/2+1,nfr); rmsValue = std(x); % 26/Sept./2005 by HK if imgi==1; hpg=waitbar(0,'F0 refinement using F0 adaptive analysis'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 for kk=nl:nu if f0i(kk) < 40 f0i(kk)=40; end; f0t=f0i(kk); xi=0:1/nlo*f0t/100:1; wa=interp1(xo,wo,xi,'*linear'); wal=length(wa); bb=1:wal; bias=round(fftl-wal/2+(kk-1)*shiftl); if std(x(bb+bias))*std(x(bb+bias-1))*std(x(bb+bias+1)) == 0 % 26/Sept./2005 by HK x(bb+bias) = randn(length(bb),1)*rmsValue/100000; end; dcl=mean(x(bb+bias)); ff0=fft((x(bb+bias-1)-dcl).*wa,fftl); ff1=fft((x(bb+bias)-dcl).*wa,fftl); ff2=fft((x(bb+bias+1)-dcl).*wa,fftl); fd=ff2.*rr-ff1; fd0=ff1.*rr-ff0; crf=fax+(real(ff1).*imag(fd)-imag(ff1).*real(fd))./(abs(ff1).^2)*fs/pi/2; crf0=fax+(real(ff0).*imag(fd0)-imag(ff0).*real(fd0))./(abs(ff0).^2)*fs/pi/2; pif(:,kk)=crf(bx)'*2*pi; dpif(:,kk)=(crf(bx)-crf0(bx))'*2*pi; pwm(:,kk)=abs(ff1(bx)'); % 29/July/1999 if imgi==1; waitbar((kk-nl)/(nu-nl)); end; % ,hpg) % 07/Dec./2002 by H.K.%10/Aug./2005 end; if imgi==1; close(hpg); end; slp=([pif(2:fftl/2+1,:);pif(fftl/2+1,:)]-pif)/(fs/fftl*2*pi); dslp=([dpif(2:fftl/2+1,:);dpif(fftl/2+1,:)]-dpif)/(fs/fftl*2*pi)*fs; mmp=slp*0; [c1,c2]=znrmlcf2(shiftm); fxx=((0:fftl/2)+0.5)/fftl*fs*2*pi; %--- calculation of relative noise level if imgi==1; hpg=waitbar(0,'P/N calculation'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 for ii=1:fftl/2+1; c2=c2*(fxx(ii)/2/pi)^2; mmp(ii,:)=(dslp(ii,:)/sqrt(c2)).^2+(slp(ii,:)/sqrt(c1)).^2; if imgi==1 && rem(ii,10)==0;waitbar(ii/(fftl/2+1));end; % 07/Dec./2002 by H.K.%10/Aug./2005 end; if imgi==1; close(hpg); end; % 07/Dec./2002 by H.K.%10/Aug./2005 %--- Temporal smoothing sml=round(1.5*fs/1000/2/shiftm)*2+1; % 3 ms, and odd number smb=round((sml-1)/2); % bias due to filtering if imgi==1; hpg=waitbar(0,'P/N smoothing'); end; % 07/Dec./2002 by H.K.%10/Aug./2005 %This smoothing is modified (30 Nov. 2000). smmp=fftfilt((hanning(sml).^2)/sum((hanning(sml).^2)), ... [mmp zeros(fftl/2+1,sml*2)]'+max(max(mmp((~isnan(mmp))&(mmp0); ecr=sqrt(1.0./vvvf).*(f0raw(:)'>0)+(f0raw(:)'<=0); if imgi==1; close(hpg); end;%10/Aug./2005 %keyboard; %-------------------- function [c1,c2]=znrmlcf2(f) n=100; x=0:1/n:3; g=GcBs(x,0); dg=[diff(g) 0]*n; dgs=dg/2/pi/f; xx=2*pi*f*x; c1=sum((xx.*dgs).^2)/n; c2=sum((xx.^2.*dgs).^2)/n; %--------------------- function p=GcBs(x,k) tt=x+0.0000001; p=tt.^k.*exp(-pi*tt.^2).*(sin(pi*tt+0.0001)./(pi*tt+0.0001)).^2; ================================================ FILE: src/regressionTestBaseGenerator.m ================================================ %% Regression test data generator % This program should be executed at the very beginning of refactoring a % major revision. This is for making legacy STRAIGHT v40_007d to be % compatible with MATLAB R2015b and Octave % Copyright(c) 2016, Hideki Kawahara, (kawahara@sys.wakayama-u.ac.jp) clear all close all original_speech_dir = '~/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; target_analysis_dir = '~/m-file/STRAIGHTV40_007e/analysisData/'; target_wave_dir = '~/m-file/STRAIGHTV40_007e/waveData/'; mkdir(target_analysis_dir); mkdir(target_wave_dir); dir_list = dir([original_speech_dir 'p*']); %% n_dirs = length(dir_list); n_files = 0; for ii = 1:n_dirs tmp_files = dir([original_speech_dir dir_list(ii).name '/*.wav']); n_files = n_files + length(tmp_files); end; %% n_test = 2; % number of files tested for each speaker l_segment = 0.1; % 100 ms segment for ii = 1:n_dirs rng(12345); % initialize frozen random number seq_id = 0; basic_stat_table = zeros(n_files, 4); tmp_files = dir([original_speech_dir dir_list(ii).name '/*.wav']); for jj = 1:length(tmp_files) [x, fs] = audioread([original_speech_dir dir_list(ii).name '/' ... tmp_files(jj).name ]); seq_id = seq_id + 1; l_in_sample_segment = min(length(x), round(fs * l_segment)); n_segment = floor(length(x) / l_in_sample_segment); rms_level = zeros(n_segment, 1); for kk = 1:n_segment rms_level(kk) = 20 * ... log10(std(x((kk - 1) * l_in_sample_segment + ... (1:l_in_sample_segment)))); end; sorted_level = sort(rms_level); basic_stat_table(seq_id, 1) = length(x) / fs; basic_stat_table(seq_id, 2) = max(rms_level) - min(rms_level); basic_stat_table(seq_id, 3) = ... sorted_level(round(length(sorted_level) * 0.85)); basic_stat_table(seq_id, 4) = max(abs(x)); end; basic_stat_table = basic_stat_table(1:seq_id, :); % select safe region sorted_length = sort(basic_stat_table(:, 1)); sorted_dynamic_range = sort(basic_stat_table(:, 2)); sorted_85percent = sort(basic_stat_table(:, 3)); l_10 = sorted_length(round(seq_id * 0.1)); l_90 = sorted_length(round(seq_id * 0.9)); d_10 = sorted_dynamic_range(round(seq_id * 0.1)); d_90 = sorted_dynamic_range(round(seq_id * 0.9)); v_10 = sorted_85percent(round(seq_id * 0.1)); v_90 = sorted_85percent(round(seq_id * 0.9)); index_list = 1:seq_id; safe_index = index_list( ... l_10 < basic_stat_table(:, 1) & l_90 > basic_stat_table(:, 1) & ... d_10 < basic_stat_table(:, 2) & d_90 > basic_stat_table(:, 2) & ... v_10 < basic_stat_table(:, 3) & v_90 > basic_stat_table(:, 3) & ... basic_stat_table(:, 4) < 0.95); selection_index = 1:length(safe_index); [~, tmp_index] = sort(rand(n_test, 1)); selection_index = selection_index(tmp_index(1:n_test)); for kk = 1:n_test id = safe_index(selection_index(kk)); [x, fs] = audioread([original_speech_dir dir_list(ii).name '/' ... tmp_files(id).name ]); rng(12345); % initialize frozen random number f0raw = MulticueF0v14(x,fs); ap = exstraightAPind(x,fs,f0raw); n3sgram=exstraightspec(x,f0raw,fs); rng(12345); % initialize frozen random number y = exstraightsynth(f0raw,n3sgram,ap,fs); disp([num2str(kk) ': ' tmp_files(id).name ' at:' datestr(now)]); audiowrite([target_wave_dir '/' tmp_files(id).name], ... y / max(abs(y)) * 0.9, fs); path_name_f0 = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'f0.bin']; path_name_ap = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'ap.bin']; path_name_sp = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'sp.bin']; WriteBinaryData(path_name_f0, f0raw) WriteBinaryData(path_name_ap, ap) WriteBinaryData(path_name_sp, n3sgram) end; end; ================================================ FILE: src/regressionTestBaseGeneratorR.m ================================================ %% Regression test data generator % This program should be executed at the very beginning of refactoring a % major revision. This is for making legacy STRAIGHT v40_007d to be % compatible with MATLAB R2015b and Octave % Revised for initialization % Copyright(c) 2016, Hideki Kawahara, (kawahara@sys.wakayama-u.ac.jp) clear all close all original_speech_dir = '/Users/kawahara/Music/VCTK_CORPUS/VCTK-Corpus/wav48/'; if isOctave target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisDataO/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveDataO/'; else target_analysis_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/analysisDataR/'; target_wave_dir = '/Users/kawahara/m-file/STRAIGHTV40_007e/waveDataR/'; end; mkdir(target_analysis_dir); mkdir(target_wave_dir); dir_list = dir([original_speech_dir 'p*']); %% n_dirs = length(dir_list); n_files = 0; for ii = 1:n_dirs tmp_files = dir([original_speech_dir dir_list(ii).name '/*.wav']); n_files = n_files + length(tmp_files); end; %% n_test = 2; % number of files tested for each speaker l_segment = 0.1; % 100 ms segment command1 = 'rand("seed", 12345);'; % for Octave command2 = 'randn("seed", 12345);';% for Octave for ii = 1:n_dirs if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; seq_id = 0; basic_stat_table = zeros(n_files, 4); tmp_files = dir([original_speech_dir dir_list(ii).name '/*.wav']); for jj = 1:length(tmp_files) [x, fs] = audioread([original_speech_dir dir_list(ii).name '/' ... tmp_files(jj).name ]); seq_id = seq_id + 1; l_in_sample_segment = min(length(x), round(fs * l_segment)); n_segment = floor(length(x) / l_in_sample_segment); rms_level = zeros(n_segment, 1); for kk = 1:n_segment rms_level(kk) = 20 * ... log10(std(x((kk - 1) * l_in_sample_segment + ... (1:l_in_sample_segment)))); end; sorted_level = sort(rms_level); basic_stat_table(seq_id, 1) = length(x) / fs; basic_stat_table(seq_id, 2) = max(rms_level) - min(rms_level); basic_stat_table(seq_id, 3) = ... sorted_level(round(length(sorted_level) * 0.85)); basic_stat_table(seq_id, 4) = max(abs(x)); end; basic_stat_table = basic_stat_table(1:seq_id, :); % select safe region sorted_length = sort(basic_stat_table(:, 1)); sorted_dynamic_range = sort(basic_stat_table(:, 2)); sorted_85percent = sort(basic_stat_table(:, 3)); l_10 = sorted_length(round(seq_id * 0.1)); l_90 = sorted_length(round(seq_id * 0.9)); d_10 = sorted_dynamic_range(round(seq_id * 0.1)); d_90 = sorted_dynamic_range(round(seq_id * 0.9)); v_10 = sorted_85percent(round(seq_id * 0.1)); v_90 = sorted_85percent(round(seq_id * 0.9)); index_list = 1:seq_id; safe_index = index_list( ... l_10 < basic_stat_table(:, 1) & l_90 > basic_stat_table(:, 1) & ... d_10 < basic_stat_table(:, 2) & d_90 > basic_stat_table(:, 2) & ... v_10 < basic_stat_table(:, 3) & v_90 > basic_stat_table(:, 3) & ... basic_stat_table(:, 4) < 0.95); selection_index = 1:length(safe_index); [~, tmp_index] = sort(rand(n_test, 1)); selection_index = selection_index(tmp_index(1:n_test)); for kk = 1:n_test id = safe_index(selection_index(kk)); [x, fs] = audioread([original_speech_dir dir_list(ii).name '/' ... tmp_files(id).name ]); if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; f0raw = MulticueF0v14(x,fs); ap = exstraightAPind(x,fs,f0raw); n3sgram=exstraightspec(x,f0raw,fs); if isOctave eval(command1); eval(command2); else rng(12345); % initialize frozen random number end; y = exstraightsynth(f0raw,n3sgram,ap,fs); disp([num2str(kk) ': ' tmp_files(id).name ' at:' datestr(now)]); audiowrite([target_wave_dir '/' tmp_files(id).name], ... y / max(abs(y)) * 0.9, fs); path_name_f0 = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'f0.bin']; path_name_ap = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'ap.bin']; path_name_sp = [target_analysis_dir '/' tmp_files(id).name(1:end-4) 'sp.bin']; WriteBinaryData(path_name_f0, f0raw) WriteBinaryData(path_name_ap, ap) WriteBinaryData(path_name_sp, n3sgram) end; end; ================================================ FILE: src/smax.m ================================================ function y=smax(x,a,b) y0=1.0/(1+exp(-a*(0-b))); y1=1.0/(1+exp(-a*(1-b))); y=(1.0./(1+exp(-a*(x-b)))-y0)/(y1-y0); ================================================ FILE: src/specreshape.m ================================================ function n2sgram3=specreshape(fs,n2sgram,eta,pc,mag,f0,imgi) % Spectral compensation using Time Domain technique % n2sgram3=specreshape(fs,n2sgram,eta,pc,mag,f0); % fs : sampling frequency (Hz) % n2sgram : Straight smoothed spectrogram (optimum smoother is assumed) % eta : temporal stretch factor % pc : power exponent for nonlinearity % mag : magnification factor of Time Domain compensation % f0 : fundamental frequency (Hz) % imgi : display indicator, 1: display on (default), 0: off % coded by Hideki Kawahara % 13/Aug./1997 % 08/Dec./2002 % Note: This part may be redundant. It is better to % evaluate contribution of this part again. (08/Dec./2002) % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar if nargin==6; imgi=1; end;%10/Sept./2005 [nn,mm]=size(n2sgram); fftl=(nn-1)*2; fbb=1:nn; rbb=(nn-1:-1:2); rbb2=(fftl:-1:nn+1); bb3=(2:nn-1); n2sgram3=n2sgram*0; ovc=optimumsmoothing(eta,pc); hh=[1 1 1 1; 0 1/2 2/3 3/4; 0 0 1/3 2/4; 0 0 0 1/4]; %%bb=inv(hh)*ovc; bb=hh \ ovc; tt=((0:fftl-1))'/fs; pb2=(pi/(eta^2)+(pi^2)/3*(bb(1)+4*bb(2)+9*bb(3)+16*bb(4)))*tt.^2; if imgi==1; hpg=waitbar(0,'time domain spectral compensation of windowing effects'); end; % 08/Dec./2002%10/Aug./2005 for ii=1:mm ffs=[n2sgram(:,ii);n2sgram(rbb,ii)]; ccs2=real(fft(ffs)).*min(20,(1+mag*pb2*f0(ii)^2)); ccs2(rbb2)=ccs2(bb3); ngg=real(ifft(ccs2)); n2sgram3(:,ii)=ngg(fbb); if imgi==1 && rem(ii,20)==0;%10/Aug./2005 waitbar(ii/mm);% 08/Dec./2002 end; end; if imgi==1; close(hpg); end; % 08/Dec./2002%10/Aug./2005 %n2sgram3=(abs(n2sgram3)+n2sgram3)/2; n2sgram3=(abs(n2sgram3)+n2sgram3)/2+0.1; ================================================ FILE: src/straight.m ================================================ % Starter command for GUI-STRAIGHT straightCIv1 GUIinitialize; ================================================ FILE: src/straightBodyC03ma.m ================================================ function [n2sgram,nsgram]=straightBodyC03ma(x,fs,shiftm,fftl,f0raw,f0var,f0varL,eta,pc,imgi) % [n2sgram,nsgram]=straightBodyC03ma(x,fs,shiftm,fftl,f0raw,f0var,f0varL,eta,pc,imgi) % n2sgram : smoothed spectrogram % nsgram : isometric spectrogram % x : input waveform % fs : sampling frequency (Hz) % shiftm : frame shift (ms) % fftl : length of FFT % f0raw : Pitch information to gude analysis (TEMPO) assumed % f0var : expected f0 variance including zerocross information % f0varL : expected f0 variance % eta : % pc : % imgi : display indicator 1: display on (default), 0: off % f0shiftm : frame shift (ms) for F0 analysis % STRAIGHT body: Interporation using adaptive gaussian weighting % and 2-dimensional Bartlett window % by Hideki Kawahara % 02/July/1996 % 07/July/1996 % 07/Sep./1996 % 09/Sep./1996 guiding F0 information can be coarse % 14/Oct./1996 correction for over smoothing % 19/Oct./1996 Alternating Gaussian Correction % 01/Nov./1996 Temporal integration using Fluency theory (didn't work) % 03/Nov./1996 Temporal integration using Fluency theory % 25/Dec./1996 Quasi optimum smooting % 01/Feb./1997 Minimum variance analysis % 03/Feb./1997 Clean up % 08/Feb./1997 Fine tuning for onset enhancement % 13/Feb./1997 another fine temporal structure % 16/Feb./1997 better alternating Gaussian % 21/Feb./1997 no need for temporal interpolation! % 19/June/1997 Control of Analysis Paramters % 21/July/1997 Discard of optimum comp. and introduction TD compensation % 11/Aug./1997 Re-installation of temporal smooting % 08/Feb./1998 debug and speed up using closed form % 22/April/1999 Compatible with new F0 extraction routine % 31/March/2002 modified for ICSLP2002 % 03/Feb./2003 Bug fix in the modification on 31/March/2002 % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar % 05/Oct./2005 bug fix on smoothing (both in time and frequency) % 04/July/2006 bug fix on compensatory time window if nargin==9; imgi=1; end; % 10/Sept./2005 f0l=f0raw(:) + 0 * f0var + 0 * f0varL; % + 0 * f0var + 0 * f0varL are dummy framem=80; framel=round(framem*fs/1000); if fftl0),[wGaussian zeros(1,length(tt))]); wPSGSeed = wPSGSeed/max(wPSGSeed); [~,maxLocation] = max(wPSGSeed); tNominal = ((1:length(wPSGSeed))-maxLocation)/fs; %---- end of bug fix ttm=[0.00001 1:fftl/2 -fftl/2+1:-1]/fs; lft=1.0./(1+exp(-(abs((1:fftl)-fftl/2-1)-fftl/30)/2)); % safeguard 05/Oct./2005 by HK if imgi==1; hpg=waitbar(0,'F0 adaptive time-frequency analysis.'); end;% 10/Aug./2005 for ii=1:nframe if imgi==1 && rem(ii,10)==0 % 10/Aug./2005 waitbar(ii/nframe); end; f0=f0l(max(1,ii)); if f0==0 f0=160; % 09/Sept./1999 end; f0x(ii)=f0; t0=1/f0; %wxe = interp1q(tNominal',wPSGSeed',tt'*f0/fNominal)'; %bug fix 04/July/2006 wxe = interp1(tNominal',wPSGSeed',tt'*f0/fNominal,'linear','extrap')'; wxe(isnan(wxe))=zeros(size(wxe(isnan(wxe)))); wxe=wxe/sqrt(sum(wxe.^2)); wxd=bcf*wxe.*sin(pi*tt/t0); iix=round(ist:ist+framel-1); pw=sqrt(abs(fft((tx(iix)-mean(tx(iix))).*wxe,fftl)).^2+ ... abs(fft((tx(iix)-mean(tx(iix))).*wxd,fftl)).^2).^pc; nsgram(:,ii)=pw(bbase)'; f0p2=floor((f0/fs*fftl)/2+1); % modified by H.K. on 3/Feb./2003 f0p=ceil((f0/fs*fftl)+1); % modified by H.K. on 3/Feb./2003 f0pr=f0/fs*fftl+1; % added by H.K. on 3/Feb./2003 tmppw=interp1(1:f0p,pw(1:f0p),f0pr-((1:f0p2)-1)); % added by H.K. on 3/Feb./2003 pw(1:f0p2)=tmppw; % modified by H.K. on 3/Feb./2003 pw(fftl:-1:fftl-f0p2+2)=pw(2:f0p2); % local level equalization ww2t=(sin(ttm/(t0/3)*pi)./(ttm/(t0/3)*pi)).^2; spw2=real(ifft(ww2t.*fft(pw).*lft)); spw2(spw2==0)=spw2(spw2==0)+eps; %%% safe guard added on 15/Jan./2003 % Optimum weighting wwt=(sin(ttm/t0*pi)./(ttm/t0*pi)).^2.*(ovc(1)+ovc(2)*2*cos(ttm/t0*2*pi) ... +ovc(3)*2*cos(ttm/(t0/2)*2*pi)); spw=real(ifft(wwt.*fft(pw./spw2)))/wwt(1); % smooth half wave rectification n2sgram(:,ii) = (spw2(bbase).*(0.25*(log(2*cosh(spw(bbase)*4/1.4))*1.4+spw(bbase)*4)/2))'; ist=ist+shiftl; end; if imgi==1; close(hpg); end; % added 06/Dec./2002% 10/Aug./2005 if imgi==1; fprintf('\n'); end;% 10/Aug./2005 nsgram=nsgram.^(1/pc); n2sgram=n2sgram.^(2/pc); %----------------------------------------------------- % Dirty hack for controling time constant in % unvoiced part analysis %----------------------------------------------------- if imgi==1; hpg=waitbar(0,'spline-based F0 adaptive smooting'); end;% 10/Aug./2005 ttlv=sum(sum(n2sgram)); ncw=round(2*fs/1000); lbb=round(300/fs*fftl); % 22/Sept./1999 h3=(conv(hanning(round(fs/1000)),exp(-1400/fs*(0:ncw*2)))); % 30/July/1999 pwc=fftfilt(h3,abs([xh2, zeros(1,ncw*10)]).^2); % 30/July/1999, % 08/Sept./1999 if imgi==1; waitbar(0.1); end; % 08/Dec./2002% 10/Aug./2005 pwc=pwc(round(1:fs/(1000/shiftm):length(pwc))); [nn,mm]=size(n2sgram); pwc=pwc(1:mm); pwc=pwc/sum(pwc)*sum(sum(n2sgram(lbb:nn,:))); if imgi==1; waitbar(0.2); end; % 08/Dec./2002% 10/Aug./2005 pwch=fftfilt(h3,abs([xhh, zeros(1,ncw*10)]).^2);% 30/July/1999 if imgi==1; waitbar(0.3); end; % 08/Dec./2002% 10/Aug./2005 pwch=pwch(round(1:fs/(1000/shiftm):length(pwch))); [~,mm]=size(n2sgram); pwch=pwch(1:mm); pwch=pwch/sum(pwch)*ttlv; ipwm=7; % impact detection window size ipl=round(ipwm/shiftm); ww=hanning(ipl*2+1); ww=ww/sum(ww); apwt=fftfilt(ww,[pwch(:)' zeros(1,length(ww)*2)]); apwt=apwt((1:length(pwch))+ipl); dpwt=fftfilt(ww,[diff(pwch(:)').^2 zeros(1,length(ww)*2)]); dpwt=dpwt((1:length(pwch))+ipl); mmaa=max(apwt); apwt(apwt<=0)=apwt(apwt<=0)*0+mmaa; % bug fix 03/Sept./1999 rr=(sqrt(dpwt)./apwt); lmbd=(1.0./(1+exp(-(sqrt(rr)-0.75)*20))); pwc=pwc.*lmbd+(1-lmbd).*sum(n2sgram); % time constant controller % Shaping amplitude envelope for ii=1:mm if f0raw(ii)==0 n2sgram(:,ii)=pwc(ii)*n2sgram(:,ii)/sum(n2sgram(:,ii)); end; if imgi==1 && rem(ii,10)==0% 10/Aug./2005 waitbar(0.4+0.5*ii/mm); % 08/Dec./2002 end; end; n2sgram=abs(n2sgram+0.0000000001); n2sgram=sqrt(n2sgram); if imgi==1; waitbar(1); end; % 08/Dec./2002% 10/Aug./2005 if imgi==1; fprintf('\n'); end; if imgi==1; close(hpg); end; ================================================ FILE: src/straightCIv1.m ================================================ function oki = straightCIv1(actionstr) % STRAIGHT command interpreter % STRAIGHT --- Speech Transformation and Representation using % Adaptive Interpolation of weiGHTed spectrum % Interactive interface for trial use % % (c) Copyright ATR Human Information Processing Research % Laboratories, 1996,1997 % Author and Inventor: Hideki Kawahara % 02/July/1996 % 04/July/1996 % 07/July/1996 % 07/Sep./1996 % 01/Nov./1996 % 25/Nov./1996 TEMPO enhancement % 30/Nov./1996 This version is with AG window trick % 25/Dec./1996 This version is with AG window trick % 27/Dec./1996 using fundamentalness as U/V measure % 02/Feb./1997 without V/UV decision % 05/Feb./1997 Bug Fix % 10/Feb./1997 Maximum liklihood F0 tracking % 15/Feb./1997 frequency range division % 15/Feb./1997 Big bug fix in spectral calculation % 22/Feb./1997 No need for temporal smoothing (thanks for the new window) % 11/Mar./1997 New wavelet for F0 extraction % 03/May /1997 Quick bug fix for Matlab v5 % 30/May /1997 Quick bug fix for Matlab v5 % 21/June/1997 Quick bug fix for new fundamentalness % 12/Aug./1997 Revised optimum smoother (minor revision) % 13/Aug./1997 Revised SPIKES scaling property and TD enhancement % 19/Dec./1997 Revised for mixed mode excitation % 09/Jan./1998 GUI command line interpreter % 29/Jan./1998 Integrated with mixed mode and GUI % 02/Feb./1998 Minor inconvenience fix % 03/Fev./1998 Minor bug fix % 13/April/1999 New pitch extractor % 19/April/1999 Minor modificatio to display information % 22/April/1999 Minor modificatio for unvoiced spectrum % 05/May /1999 New periodic/aperiodic mixing ratio % 21/July/1999 reduced version with crisp V/UV decision % 17/Feb./2001 Bug fix for multi channel sound files % 30/May/2001 New aperiodicity measure and control % 8/April/2002 modified to remove magic numbers % 26/June/2002 minor modification: two globals are added. % 08/Dec./2002 Default parameters were made user definable. % 04/Feb./2003 Aperiodicity correction and bug fix in STRAIGHTbody proc. % 31/Aug./2004 Bug fix for f0 abnormal values % 30/April/2005 modification for Matlab v7.0 compatibility global n2sgram nsgram n3sgram n2sgrambk nwsgram xold x f0floor f0ceil fs framem shiftm f0shiftm ... fftl eta pc framel fftl2 acth pwth pcnv fconv sconv delsp gdbw cornf fname ofname delfracind ... tpath cpath ... %%paraminitialized mag delfrac hr f0raw f0l f0var f0varL sy pcorr pecorr ... %%gobjlist ... upsampleon hhb pwt pwh amp defaultendian indefaultendian outdefaultendian f0varbak global maphandles global bv % 02/Sept./1999 global apv dpv % 21/Sept./1999 global defaultch % 16/Feb./2001 global apve dpve % 22/Oct./2001 global f0v vrv % 25/June/2002 for debug global ecrt % 03/Feb./2003 for C/N based correction of dpv oki = true; switch(actionstr) %----------------------------------------------- % Initialization part %----------------------------------------------- case 'GUIinitialize' clear global straightCIv1 initializeparams; straightPanel98bak; syncgui; straightCIv1 syncbuttons; case 'resetparamsbtn' straightCIv1 initializeparams; syncgui; case 'initialize' clear global n2sgram nsgram n3sgram n2sgrambk nwsgram xold x f0raw f0l sy hhb straightCIv1 initializeparams; syncgui; straightCIv1 syncbuttons; case 'initializeparams' if exist('defaultparams.m','file')==0 % 08/Dec./2002 f0floor=40; f0ceil=800; fs=22050; % sampling frequency (Hz) framem=40; % default frame length for pitch extraction (ms) shiftm=1; % default frame shift (ms) for spectrogram f0shiftm=1; % default frame shift (ms) for F0 information fftl=1024; % default FFT length eta=1.4; % time window stretch factor pc=0.6; % exponent for nonlinearity mag=0.2; % This parameter should be revised. framel=framem*fs/1000; if fftl < framel fftl=2^ceil(log(framel)/log(2)); end; fftl2=fftl/2; defaultch=1; % 17/Feb./2001 %-------------- Decision parameter for source information acth=0.5; % Threshold for normalized correlation (dimension less) pwth=32; % Threshold for instantaneous power below maximum (dB) %----------------------------------------------------- % Synthesis parameters %----------------------------------------------------- pcnv=1.0; % pitch stretch fconv=1.0; % frequency stretch sconv=1.0; % time stretch % delsp=2; % standard deviation of random group delay in ms delsp=0.5; % standard deviation of random group delay in ms 26/June/2002 gdbw=70; % smoothing window length of random group delay (in Hz) % cornf=3000; % corner frequency for random phase (Hz) cornf=4000; % corner frequency for random phase (Hz) 26/June 2002 delfrac=0.2; % This parameter should be revised. delfracind=0; %----------------------------------------------------- % file parameters %----------------------------------------------------- fname='none'; % input data file name hr='on'; tpath=pwd; if strcmp(computer,'MAC2')==0 tpath=[tpath '/']; end; upsampleon=0; else % 08/Dec./2002 defaultparams; end; % of if exist('defaultparams.m','file')==0 % 08/Dec./2002 defaultendian=chkdefaultendian; indefaultendian=defaultendian; outdefaultendian=defaultendian; case 'resetvalues' straightCIv1 initializeparams syncgui; %----------------------------------------------------- % file I/O part %----------------------------------------------------- case 'bininputformat' hh=findobj('Tag','bininputformat'); indefaultendian=get(hh,'Value'); case 'binoutputformat' hh=findobj('Tag','binoutputformat'); outdefaultendian=get(hh,'Value'); case 'readfile' [fname,cpath]=uigetfile(... {'*.wav';'*.aif';'*.WAV';'*.aiff';'*.dat';'*.dat'},... 'sound file input'); if fname(1)~=0 tcpath=[char(39) cpath char(39)]; eval(['cd ' tcpath]); if ~isempty(strfind(lower(fname),'.wav')) % 16/Feb./2001 [x,fs]=audioread(fname); x=x*32768; elseif ~isempty(strfind(lower(fname),'.aif')) % 16/Feb./2001 [x,fs]=aiffread(fname); else if indefaultendian==1 fid=fopen(fname,'r','ieee-le'); else fid=fopen(fname,'r','ieee-be'); end; x=fread(fid,'short')'; fclose(fid); end; [tnn,tmm]=size(x); % 16/Feb./2001 if min(tnn,tmm)>1 % 16/Feb./2001 switch tnn>tmm case 1, x=x(:,defaultch); case 0, x=x(defaultch,:); end; end; x=x(:)'; % make sure that the vector is row vector xold=x; x=xold+std(x)/1000*randn(size(x)); % 03/Feb./2001 else disp('file input is cancelled. '); disp(' '); end; syncgui; straightCIv1 syncbuttons; case 'savefile' tcpath=[char(39) tpath char(39)]; eval(['cd ' tcpath]); tsy=sy; tfs=fs; if upsampleon switch fs case {8000, 10000, 11025, 12000} tfs=fs*4; tsy=interp(sy,4); case {16000, 20000, 22050, 24000} tfs=fs*2; tsy=interp(sy,2); end; end; [ofname,tpath]=uiputfile('*','sound file output'); if ofname(1)~=0 if ~isempty(strfind(ofname,'.wav')) audiowrite([tpath ofname], tsy/32768,tfs); elseif ~isempty(strfind(ofname,'.aif')) ok=aiffwrite(tsy,tfs,16,ofname); if isempty(ok) disp(['File output is failed. ' ofname ' was not written.']); end; else if outdefaultendian==1 fid2=fopen([tpath ofname],'w','ieee-le'); else fid2=fopen([tpath ofname],'w','ieee-be'); end; fwrite(fid2,tsy,'short'); fclose(fid2); end; disp(['% Saved successfully as: ' ofname]); disp(' '); else disp('file output is cancelled. '); disp(' '); end; %----------------------------------------------------- % Display spectrogram group %----------------------------------------------------- case 'dispnsgram' mxil=max(max(20*log10(nsgram+0.001))); [nsy,nsx]=size(nsgram); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2, ... max(20*log10(nsgram+0.001),mxil-50)); axis('xy'); colormap(1-gray); title(['Equal resolution spectrum of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'dispnwsgram' mxil=max(max(20*log10(nwsgram+0.001))); [nsy,nsx]=size(nwsgram); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2, ... max(20*log10(nwsgram+0.001),mxil-50)); axis('xy'); colormap(1-gray); title(['Wide band spectrum of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'dispn2sgram' mxil=max(max(20*log10(n2sgram+0.001))); [nsy,nsx]=size(n2sgram); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2,... max(real(20*log10(n2sgram+0.001)),mxil-50)); axis('xy'); colormap(1-gray); title(['enhanced spectrum of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'dispn2sgrambk' mxil=max(max(20*log10(n2sgrambk+0.001))); [nsy,nsx]=size(n2sgrambk); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2,... max(real(20*log10(n2sgrambk+0.001)),mxil-50)); axis('xy'); colormap(1-gray); title(['Interpolated spectrum of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'dispn3sgram' mxil=max(max(20*log10(n3sgram+0.001))); [nsy,nsx]=size(n3sgram); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2,... max(real(20*log10(n3sgram+0.001)),mxil-50)); axis('xy'); colormap(1-gray); title(['enhanced spectrum (without 2nd strctr) of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'disphhbspectrograme' bb=1:length(n2sgram(1,:)); mmx=hhb(:,bb).*n3sgram+(1-hhb(:,bb)).*nwsgram; mxil=max(max(20*log10(mmx+0.001))); [nsy,nsx]=size(mmx); figure; imagesc((0:nsx-1)*shiftm,(0:nsy-1)/nsy*fs/2,... max(real(20*log10(mmx+0.001)),mxil-50)); axis('xy'); colormap(1-gray); title(['final composite spectrum of ' fname ' ' date ' ' mktstr]); xlabel('time (ms)'); ylabel('frequency (Hz)'); case 'showF0' figure(gcf+1); subplot(111);plot((1:length(f0l))*f0shiftm,f0l); grid on; title(['F0 of ' fname ' ' date ' ' mktstr]); ylabel('frequency (Hz)'); xlabel('time (ms)'); %----------------------------------------------- % audio display part %----------------------------------------------- case 'playoriginal' straightsound(xold,fs); case 'playsynth' straightsound(sy,fs); %----------------------------------------------- % parameter modification part %----------------------------------------------- case 'peekvars' % This is the most poweful interaction keyboard; syncgui; straightCIv1 syncbuttons; case 'getfsmenu' fs=getfsfrommenu(gco); syncgui; case 'editf0ceil' f0ceil=getvalufromedit(gco,800); syncgui; case 'editf0floor' f0floor=getvalufromedit(gco,40); syncgui; case 'editshiftm' f0shiftm=getvalufromedit(gco,1); shiftm=f0shiftm; syncgui; case 'fftledit' syncgui; case 'wndwstrtchedit' eta=getvalufromedit(gco,1.4); syncgui; case 'pwrcnstntedit' pc=getvalufromedit(gco,0.6); syncgui; case 'magfactoredit' mag=getvalufromedit(gco,0.2); syncgui; case 'delfracedit' delfrac=getvalufromedit(gco,0.2); syncgui; case 'delspedit' delsp=getvalufromedit(gco,2); syncgui; case 'cornfedit' cornf=getvalufromedit(gco,2400); syncgui; case 'gdbwedit' gdbw=getvalufromedit(gco,70); syncgui; case 'pcnvedit' pcnv=getvalufromedit(gco,1); syncgui; case 'fconvedit' fconv=getvalufromedit(gco,1); syncgui; case 'sconvedit' sconv=getvalufromedit(gco,1); syncgui; case 'tpathedit' tpath=get(gco,'Value'); syncgui; case 'pcnvslider' pcnv=10.0.^get(gco,'Value'); syncgui; case 'fconvslider' fconv=3.0.^get(gco,'Value'); syncgui; case 'sconvslider' sconv=10.0.^get(gco,'Value'); syncgui; case 'delfracradio' delfracind=~delfracind; syncgui; case 'delspradio' delfracind=~delfracind; syncgui; case 'upsamplebtn' upsampleon=~upsampleon; syncgui; %-------------------------------------------------- % non-linear manipulations %-------------------------------------------------- case 'FqNLbtn' hh=findobj('Tag','FqNLbtn'); if get(hh,'UserData') ==0 || isempty(get(hh,'UserData')) set(hh,'UserData',1); set(hh,'BackgroundColor',[0.9 0.33333 0.33333]); bendline initialize; else set(hh,'UserData',0); set(hh,'BackgroundColor',[0.733333 0.733333 0.733333]); bendline close; end; % This part is obsolate. This part will be revised completely. case 'interactSGRAM' figure; mxil=max(max(20*log10(n2sgram))); imagesc(max(20*log10(n2sgram),mxil-45)); axis('xy'); colormap(1-gray); title(['Interpolated spectrum of ' fname ' ' date ' ' mktstr]); disp('% Now you have to define trajectory using mouse'); disp('% Please type "return", if you are ready.'); disp('% It is recommended for you to select important portion '); disp('% using "zoom on" command.'); disp('% Please do not forget to issue "zoom off" before continue.'); disp('% In graphical input interaction, click defines point and return'); disp('% notifies it is the last point.'); keyboard; disp('% Interaction started. Put the cursor inside the graphics.'); zoom off; getTrace; disp('% You can modify spectrum using the following command.'); disp('% n2sgram=n2sgrambk.*(1+nsgm).^X;'); disp('% X*6 dB amplification is made.'); disp('% Default 6dB amplification was already done.'); disp('% If you are OK, type "return". '); disp('% Otherwise, please change.'); n2sgram=n2sgrambk.*(1+nsgm); %------------------------------------------- % mapping control part %------------------------------------------- case 'frequencymapmod' [nii,~]=size(n2sgram); vx=(0:nii-1)/(nii-1); idcv=vx*maphandles(20)+sin(vx*pi)*maphandles(21)+sin(2*pi*vx)*maphandles(22); fconv=max(1,min(nii,idcv*(nii-1)+1)); %------------------------------------------- % synthesis part %------------------------------------------- case 'synthesizechar' % This part is useless now disp('%-------- Current Synthesis parameters ------'); disp(['% delsp=' num2str(delsp) ... '; % standard deviation of random group delay in ms']); disp(['% gdbw=' num2str(gdbw) ... '; % smoothing window length of random group delay (in Hz)']); disp(['% cornf=' num2str(cornf) ... '; % corner frequency for random phase (in Hz)']); disp(['% pcnv=' num2str(pcnv) ... '; % pitch stretch']); disp(['% fconv=' num2str(fconv) ... '; % frequency stretch']); disp(['% sconv=' num2str(sconv) ... '; % time stretch']); disp('% '); disp('% If you are happy with these parameters please type "return".'); disp('% You can change these setting using Matlab command(s)'); disp('% If you want to restore default parameters please type'); disp('% "default22kparams" There are similar prog. for 12k,16k files.'); keyboard; disp('% Now, synthesis is in progress. Please wait a moment.'); syncgui; straightCIv1 synthesize case 'synthesizegradedqqq' % OBSOLATE!!! hh=findobj('Tag','FqNLbtn'); if ~isempty(hh) if ~isempty(get(hh,'UserData')) if get(hh,'UserData')==1 straightCIv1 frequencymapmod end; end; end; sy=straightSynthTC01(n3sgram,nwsgram,f0raw,hhb,shiftm,fs, ... pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind); dBsy=powerchk(sy,fs,15); % 23/Sept./1999 cf=(20*log10(32768)-22)-dBsy; sy=sy*(10.0.^(cf/20)); disp('% Done!'); straightCIv1 syncbuttons; case 'synthesizegraded' hh=findobj('Tag','FqNLbtn'); if ~isempty(hh) if ~isempty(get(hh,'UserData') ) if get(hh,'UserData')==1 straightCIv1 frequencymapmod end; end; end; sy=straightSynthTB07ca(n3sgram,f0raw,shiftm,fs, ... pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind, ... aperiodiccomp(apv,dpv,5,f0raw,f0shiftm),1); % 8/April/2002 dBsy=powerchk(sy,fs,15); % 23/Sept./1999 cf=(20*log10(32768)-22)-dBsy; sy=sy*(10.0.^(cf/20)); disp('% Done!'); straightCIv1 syncbuttons; case 'synthesize' hh=findobj('Tag','FqNLbtn'); if ~isempty(hh) if ~isempty(get(hh,'UserData') ) if get(hh,'UserData')==1 straightCIv1 frequencymapmod end; end; end; sy=straightSynthTB06(n3sgram,f0raw,f0var,f0varL,shiftm,fs, ... pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind); dBsy=powerchk(sy,fs,15); % 23/Sept./1999 cf=(20*log10(32768)-22)-dBsy; sy=sy*(10.0.^(cf/20)); disp('% Done!'); straightCIv1 syncbuttons; %------------------------------------------------------ % analysis part % This part is modified to introduce a new F0 and % source information extraction method. (19/April/1999) %------------------------------------------------------ case 'source' nvo=24; nvc=ceil(log(f0ceil/f0floor)/log(2)*nvo); [f0v,vrv,dfv,~,aav]=fixpF0VexMltpBG4(xold,fs,f0floor,nvc,nvo,1.2,1,shiftm,1,5,0.5,1); title([fname ' ' datestr(now,0)]); %drawnow; [~,~]=size(f0v); subplot(614); [pwt,pwh]=plotcpower(xold,fs,shiftm);drawnow; [f0raw,irms,~,amp]=f0track5(f0v,vrv,dfv,pwt,pwh,aav,shiftm); f0t=f0raw;avf0=mean(f0raw(f0raw>0)); f0t(f0t==0)=f0t(f0t==0)*NaN;tt=1:length(f0t); % keyboard; subplot(615);plot(tt*shiftm,f0t,'g');grid on; if ~isnan(avf0) axis([1 max(tt)*shiftm ... min(avf0/sqrt(2),0.95*min(f0raw(f0raw>0))) ... max(avf0*sqrt(2),1.05*max(f0raw(f0raw>0)))]); end; ylabel('F0 (Hz)'); %----------- 31/July/1999 hold on; dn=floor(fs/(f0ceil*3*2)); % fix by H.K. at 28/Jan./2003 [f0raw,ecr]=refineF06(decimate(xold,dn),fs/dn,f0raw,1024,1.1,3,f0shiftm,1,length(f0raw)); % 31/Aug./2004 f0t=f0raw;%%avf0=mean(f0raw(f0raw>0)); f0t(f0t==0)=f0t(f0t==0)*NaN;tt=1:length(f0t); subplot(615);plot(tt*shiftm,f0t,'k');hold off; drawnow %----------- 31/July/1999 tirms=irms; tirms(f0raw==0)=tirms(f0raw==0)*NaN; tirms(f0raw>0)=-20*log10(tirms(f0raw>0)); ecrt=ecr; ecrt(f0raw==0)=ecrt(f0raw==0)*NaN; subplot(616);hrms=plot(tt*shiftm,tirms,'g',tt*shiftm,20*log10(ecrt),'r'); %31/July/1999 set(hrms,'LineWidth',2);hold on plot(tt*shiftm,-10*log10(vrv),'k.'); grid on;hold off axis([1 max(tt)*shiftm -10 60]); xlabel('time (ms)');ylabel('C/N (dB)'); drawnow; irmsz=irms*0; %---------- This part is for maintaining compatibility with old synthesis routine ---- f0var=max(0.00001,irms-irmsz).^2; f0var(f0var>0.99)=f0var(f0var>0.99)*0+100; f0var(f0raw==0)=f0var(f0raw==0)*0+100; f0varbak = f0var; % backup for f0var (18/July/1999) f0var=f0var/2; % 2 is a magic number. If everything is OK, it should be 1. f0var=(f0var>0.9); % This modification is to make V/UV decision crisp (18/July/1999) f0varL=f0var; %------------------------------------------------------------------------------------- f0raw(f0raw<=0)=f0raw(f0raw<=0)*0; % safeguard 31/August/2004 f0raw(f0raw>f0ceil)=f0raw(f0raw>f0ceil)*0+f0ceil; % safeguard 31/August/2004 straightCIv1 syncbuttons; %-------------------------------------------------------------- % classic STRAIGHT with a single V/UV measure %-------------------------------------------------------------- case 'straightcore' disp('% Now, adaptive window analysis has started. Please wait a moment.'); [n2sgrambk,nsgram]=straightBodyC03ma(xold,fs,shiftm,fftl,f0raw,f0var,f0varL,eta,pc); %% if mag>0 n2sgram=specreshape(fs,n2sgrambk,eta,pc,mag,f0raw); else n2sgram=n2sgrambk; end; straightCIv1 syncbuttons; %-------------------------------------------------------------- % revised STRAIGHT with a multi band graded V/UV decision (OBSOLATE!!) %-------------------------------------------------------------- case 'bandcorrbtnqqq' [n2sgrambk,nsgram,nwsgram]= ... straightBodyB04m(xold,fs,shiftm,fftl,f0raw,eta,pc); straightCIv1 syncbuttons; if mag>0 n2sgram=specreshape(fs,n2sgrambk,eta,pc,mag,f0raw); else n2sgram=n2sgrambk; end; [pcorr,pecorr]=BcorrMap(xold,fs,f0raw,shiftm); wvm3=wfromMap4(pcorr,pecorr,n2sgram,fs); emap=max(pcorr,pecorr); hh=wvm3'*emap; a=0.32;b=15;c=0.15; % blending parameter; this is very tentative hhb=max(0,(1.0./(1+exp(-(hh-a)*b))-1.0/(1+exp(-(c-a)*b))) ... /(1.0/(1+exp(-(1-a)*b))-1.0/(1+exp(-(c-a)*b)))); straightCIv1 syncbuttons; %----------------------------------------- % MBE type analysis 2/Sept./1999 %----------------------------------------- case 'bandcorrbtn' [n2sgrambk,nsgram]=straightBodyC03ma(xold,fs,shiftm,fftl,f0raw,f0var,f0varL,eta,pc); if mag>0 n2sgram=specreshape(fs,n2sgrambk,eta,pc,mag,f0raw); else n2sgram=n2sgrambk; end; [apvq,dpvq,apve,dpve]=aperiodicpartERB2(xold,fs,f0raw,f0shiftm,5,fftl/2+1); % 10/April/2002 apv=10*log10(apvq); % for compatibility dpv=10*log10(dpvq); % for compatibility %- --------- % Notes on aperiodicity estimation: The previous implementation of % aperiodicity estimation was sensitive to low frequency noise. It is a % bad news, because environmental noise usually has its power in the low % frequency region. The following corrction uses the C/N information % which is the byproduct of fixed point based F0 estimation. % by H.K. 04/Feb./2003 %- --------- dpv=correctdpv(apv,dpv,5,f0raw,ecrt,f0shiftm,fs); % Aperiodicity correction 04/Feb./2003 by H.K. bv=boundmes2(apv,dpv,fs,f0shiftm,5,fftl/2+1); figure; semilogy((0:length(bv)-1)*f0shiftm,0.5./10.0.^(bv));grid on; straightCIv1 syncbuttons; case 'remove2ndstructue' n3sgram=rmv2nd(n2sgram,f0raw,fs); straightCIv1 syncbuttons; case 'bypassbtn' n3sgram=n2sgram; straightCIv1 syncbuttons; %----------------------------------------------------------------- % suppress buttons which are nor appropriate %----------------------------------------------------------------- case 'syncbuttons' hh=findobj('Tag','analyzesrcbtn'); if length(xold) >fftl set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','bandcorrbtn'); if length(f0raw) >5 set(hh,'Enable','on'); % Enabled again 02/Sept./1999 else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','analyzespcbtn'); if length(f0raw) >5 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','bypassbtn'); if length(n2sgram) >2 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','remove2ndbtn'); if length(n2sgram) >2 % set(hh,'Enable','on'); % commented out at 21/July/1999 else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','synthgradbtn'); if (length(bv) >2) && (length(n3sgram)>2) set(hh,'Enable','on'); % enabled again at 02/Sept./1999 else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','synthesizebtn'); if length(n3sgram) >2 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','savetobtn'); if length(sy) >fftl set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','adaptivespecbtn'); if length(nsgram) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','widespecbtn'); if length(nwsgram) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','smoothedspecbtn'); if length(n2sgrambk) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','dispn2sgrambtn'); if length(n2sgram) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','removedspecbtn'); if length(n3sgram) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','cmpstspecgrambtn'); if length(hhb) >1 % set(hh,'Enable','on'); % commented out on 21/July/1999 else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','playorgbtn'); if length(xold) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; %---------------------------------------------------- hh=findobj('Tag','playsynthbtn'); if length(sy) >1 set(hh,'Enable','on'); else set(hh,'Enable','off'); end; end; end function defaultendian=chkdefaultendian % defaultendian : 1-littel endian, 2-big endian gg=computer; switch gg(1:3) case 'PCW' defaultendian=1; case 'MAC' defaultendian=2; case 'SUN' defaultendian=2; case 'SOL' defaultendian=2; case 'HP7' defaultendian=1; case 'SGI' defaultendian=2; case 'ALP' defaultendian=1; case 'AXP' defaultendian=1; case 'LNX' defaultendian=1; otherwise defaultendian=2; end; end ================================================ FILE: src/straightPanel98bak.m ================================================ function fig = straightPanel98bak() % This is the machine-generated representation of a Handle Graphics object % and its children. Note that handle values may change when these objects % are re-created. This may cause problems with any callbacks written to % depend on the value of the handle at the time the object was saved. % % To reopen this object, just type the name of the M-file at the MATLAB % prompt. The M-file and its associated MAT-file must be on your path. load straightpanel98 h0 = figure('Color',[0.8 0.8 0.8], ... 'Colormap',mat0, ... 'Position',[336 165 646 559], ... 'Tag','STRAIGHT control panel v.1'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[309 226 200 292], ... 'Style','frame', ... 'Tag','Frame2'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.65 0.65 0.65], ... 'Position',[313 293 193 37], ... 'Style','frame', ... 'Tag','Frame5'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[309 32 330 189], ... 'Style','frame', ... 'Tag','Frame1'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.8 0.8 0.8], ... 'FontName','Helvetica', ... 'FontSize',18, ... 'Position',[201 528 284 23], ... 'String','STRAIGHT control panel', ... 'Style','text', ... 'Tag','StaticText1');% font size 24->18 03/Sept./1999 h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','clear all,close all', ... 'Position',[311 3 329 25], ... 'String','close', ... 'Tag','closebutton'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[516 226 123 291], ... 'Style','frame', ... 'Tag','Frame1'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 peekvars', ... 'Position',[529 439 95 21], ... 'String','peek variables', ... 'Tag','peakbutton'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 dispnsgram', ... 'Enable','off', ... 'Position',[515 173 112 20], ... 'String','adaptive spectrogram', ... 'Tag','adaptivespecbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 dispnwsgram', ... 'Enable','off', ... 'Position',[517 141 111 20], ... 'String','a. wide spectrogram', ... 'Tag','widespecbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 dispn2sgrambk', ... 'Enable','off', ... 'Position',[320 171 159 20], ... 'String','smthd spectrogram', ... 'Tag','smoothedspecbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 dispn3sgram', ... 'Enable','off', ... 'Position',[321 108 158 20], ... 'String','rmvd spectrogram', ... 'Tag','removedspecbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 playsynth', ... 'Enable','off', ... 'Position',[320 45 145 20], ... 'String','Play synthesized', ... 'Tag','playsynthbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 playoriginal', ... 'Enable','off', ... 'Position',[483 44 148 20], ... 'String','Play original', ... 'Tag','playorgbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 savefile', ... 'Enable','off', ... 'Position',[329 236 113 22], ... 'String','save to file', ... 'Tag','savetobtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 synthesize', ... 'Enable','off', ... 'Position',[424 266 80 22], ... 'String','synthesize', ... 'Tag','synthesizebtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 straightcore', ... 'Enable','off', ... 'Position',[424 334 79 22], ... 'String','analyze 1CHX', ... 'Tag','analyzespcbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 source', ... 'Enable','off', ... 'Position',[355 367 113 22], ... 'String','analyze source', ... 'Tag','analyzesrcbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 readfile', ... 'Position',[354 401 113 22], ... 'String','read from file', ... 'Tag','readfilebtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 initialize', ... 'Position',[353 438 113 22], ... 'String','initialize', ... 'Tag','initializebtn'); h1 = uicontrol('Parent',h0, ... 'Position',[5 252 297 268], ... 'Style','frame', ... 'Tag','Frame1'); h1 = uicontrol('Parent',h0, ... 'Position',[160 427 125 20], ... 'String','sampling frequency Hz', ... 'Style','text', ... 'Tag','StaticText2'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.85 0.85 0.85], ... 'Callback','straightCIv1 getfsmenu', ... 'Position',[170 411 108 20], ... 'String',[48000;44100;32000;24000;22050;20000;16000;12500;12000;11050;10000; 8000], ... 'Style','popupmenu', ... 'Tag','samplingfreqmenu', ... 'Value',5); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[353 483 111 20], ... 'String','Procedures', ... 'Style','text', ... 'Tag','StaticText3'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[431 194 74 20], ... 'String','Display', ... 'Style','text', ... 'Tag','StaticText4'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback',mat2, ... 'Position',[27 459 72 20], ... 'String','800', ... 'Style','edit', ... 'Tag','f0ceiledit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 editf0floor', ... 'Position',[28 411 72 20], ... 'String','40', ... 'Style','edit', ... 'Tag','f0flooredit'); h1 = uicontrol('Parent',h0, ... 'Position',[102 407 32 20], ... 'String','Hz', ... 'Style','text', ... 'Tag','StaticText5'); h1 = uicontrol('Parent',h0, ... 'Position',[101 454 32 20], ... 'String','Hz', ... 'Style','text', ... 'Tag','StaticText5'); h1 = uicontrol('Parent',h0, ... 'Position',[20 431 94 20], ... 'String','F0 lower bound', ... 'Style','text', ... 'Tag','StaticText5'); h1 = uicontrol('Parent',h0, ... 'Position',[20 483 95 17], ... 'String','F0 higher bound', ... 'Style','text', ... 'Tag','StaticText5'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 remove2ndstructue', ... 'Enable','off', ... 'Position',[425 302 77 20], ... 'String','remove 2nd', ... 'Tag','remove2ndbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 bandcorrbtn', ... 'Enable','off', ... 'Position',[319 334 85 20], ... 'String','analyze MBX', ... 'Tag','bandcorrbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 dispn2sgram', ... 'Enable','off', ... 'Position',[320 139 158 20], ... 'String','enhanced spectrogram', ... 'Tag','dispn2sgrambtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 synthesizegraded', ... 'Enable','off', ... 'Position',[320 266 86 21], ... 'String','synthsize grad', ... 'Tag','synthgradbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 disphhbspectrograme', ... 'Enable','off', ... 'Position',[321 77 158 20], ... 'String','cmpst spectrogram', ... 'Tag','cmpstspecgrambtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[15 293 278 99], ... 'Style','frame', ... 'Tag','Frame3'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[6 4 297 240], ... 'Style','frame', ... 'Tag','Frame4'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[24 366 89 15], ... 'String','FFT lngth', ... 'Style','text', ... 'Tag','StaticText6'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[121 362 75 20], ... 'String','w strtch in t', ... 'Style','text', ... 'Tag','StaticText7'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[210 362 71 20], ... 'String','pwr cnstnt', ... 'Style','text', ... 'Tag','StaticText8'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[36 322 72 20], ... 'String','mag. factor', ... 'Style','text', ... 'Tag','StaticText9'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 fftledit', ... 'Position',[41 347 56 20], ... 'String','1024', ... 'Style','edit', ... 'Tag','fftledit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 wndwstrtchedit', ... 'Position',[128 347 60 20], ... 'String','1.4', ... 'Style','edit', ... 'Tag','wndwstrtchedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 pwrcnstntedit', ... 'Position',[219 346 54 20], ... 'String','0.6', ... 'Style','edit', ... 'Tag','pwrcnstntedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 magfactoredit', ... 'Position',[42 307 56 20], ... 'String','0.2', ... 'Style','edit', ... 'Tag','magfactoredit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 pcnvslider', ... 'Min',-1, ... 'Position',[45 90 186 20], ... 'String','F0 conversion', ... 'Style','slider', ... 'Tag','pcnvslider'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 fconvslider', ... 'Min',-1, ... 'Position',[45 50 187 20], ... 'Style','slider', ... 'Tag','fconvslider'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 sconvslider', ... 'Min',-1, ... 'Position',[44 10 188 20], ... 'Style','slider', ... 'Tag','sconvslider'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 delfracedit', ... 'Position',[153 214 60 20], ... 'String','0.2', ... 'Style','edit', ... 'Tag','delfracedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 pcnvedit', ... 'Position',[236 91 60 20], ... 'String','1', ... 'Style','edit', ... 'Tag','pcnvedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 fconvedit', ... 'Position',[237 50 60 20], ... 'String','1', ... 'Style','edit', ... 'Tag','fconvedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 sconvedit', ... 'Position',[237 11 60 20], ... 'String','1', ... 'Style','edit', ... 'Tag','sconvedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[26 215 122 20], ... 'String','relative tg dispersion', ... 'Style','text', ... 'Tag','StaticText10'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[27 192 121 20], ... 'String','absolute tg dispersion', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 delspedit', ... 'Position',[154 191 60 20], ... 'String','2', ... 'Style','edit', ... 'Tag','delspedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[28 170 121 20], ... 'String','corner frequency', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 cornfedit', ... 'Position',[154 169 60 20], ... 'String','3000', ... 'Style','edit', ... 'Tag','cornfedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[27 148 121 20], ... 'String','tg smoothness', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 gdbwedit', ... 'Position',[154 148 60 20], ... 'String','70', ... 'Style','edit', ... 'Tag','gdbwedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[70 112 121 20], ... 'String','F0 conversion', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[47 70 161 20], ... 'String','frequency axis conversion', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[49 29 156 20], ... 'String','temporal axis conversion', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[221 167 31 20], ... 'String','Hz', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[220 190 35 20], ... 'String','ms', ... 'Style','text', ... 'Tag','StaticText11'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[221 147 31 20], ... 'String','Hz', ... 'Style','text', ... 'Tag','gdbwtxt'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 tpathedit', ... 'Position',[21 259 268 20], ... 'String','hmac117_HD:MATLAB 5:', ... 'Style','edit', ... 'Tag','tpathedit'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 delfracradio', ... 'Position',[251 213 40 20], ... 'Style','radiobutton', ... 'Tag','delfracradio'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 delspradio', ... 'Position',[252 191 42 20], ... 'Style','radiobutton', ... 'Tag','delspradio', ... 'Value',1); h1 = uicontrol('Parent',h0, ... 'Position',[145 481 147 20], ... 'String','original sound file', ... 'Style','text', ... 'Tag','StaticText12'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[523 487 111 20], ... 'String','AUX', ... 'Style','text', ... 'Tag','StaticText3'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Position',[144 461 149 20], ... 'String','none', ... 'Style','edit', ... 'Tag','soundfilename'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 bypassbtn', ... 'Enable','off', ... 'Position',[320 302 82 20], ... 'String','bypass', ... 'Tag','bypassbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 upsamplebtn', ... 'Position',[446 237 60 20], ... 'String','up ENBL', ... 'Style','radiobutton', ... 'Tag','upsamplebtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 resetparamsbtn', ... 'Position',[531 243 92 20], ... 'String','reset parameters', ... 'Tag','resetparamsbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'ButtonDownFcn','straightCIv1 F0NLbtn', ... 'Position',[12 91 28 20], ... 'String','NL', ... 'Tag','F0NLbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Callback','straightCIv1 FqNLbtn', ... 'Position',[12 50 28 20], ... 'String','NL', ... 'Tag','FqNLbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'ButtonDownFcn','straightCIv1 txNLbtn', ... 'Position',[11 11 28 20], ... 'String','NL', ... 'Tag','txNLbtn'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[119 322 80 20], ... 'String','frame rate(ms)', ... 'Style','text', ... 'Tag','StaticText13'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[1 1 1], ... 'Callback','straightCIv1 editshiftm', ... 'Position',[129 306 60 20], ... 'Style','edit', ... 'Tag','shiftmedit'); h1 = uicontrol('Parent',h0, ... 'Callback','straightCIv1 bininputformat', ... 'Position',[532 394 89 20], ... 'String','PC/Alpha (little-endian)|Sun/Mac (big-endian)', ... 'Style','popupmenu', ... 'Tag','bininputformat', ... 'Value',1); h1 = uicontrol('Parent',h0, ... 'Callback','straightCIv1 binoutputformat', ... 'Position',[532 351 87 22], ... 'String','PC/Alpha (little-endian)|Sun/Mac (big-endian)', ... 'Style','popupmenu', ... 'Tag','binoutputformat', ... 'Value',1); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[522 417 108 16], ... 'String','Bin.IN format', ... 'Style','text', ... 'Tag','StaticText14'); h1 = uicontrol('Parent',h0, ... 'BackgroundColor',[0.733333 0.733333 0.733333], ... 'Position',[522 375 108 17], ... 'String','Bin.OUT format', ... 'Style','text', ... 'Tag','StaticText15'); if nargout > 0, fig = h0; end ================================================ FILE: src/straightSynthTB06.m ================================================ function sy=straightSynthTB06(n2sgram,f0raw,f0var,f0varL,shiftm,fs, ... pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind); % Straight synthesis with all-pass filter design based on % TEMPO analysis result % sy=straightSynthTB06(n2sgram,f0raw,f0var,f0varL,shiftm,fs, ... % pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind); % sy : synthsized speech % n2sgram : amplitude spectrogram % f0raw : pitch pattern (Hz) % f0var : expected F0 variation with fricative modification % f0varL : expected F0 variation % shiftm : frame shift (ms) for spectrogram % fs : sampling freqnency (Hz) % pcnv : pitch stretch factor % fconv : freqnency stretch factor % sconv : speaking duratin stretch factor % gdbw : finest resolution in group delay (Hz) % delfrac : ratio of standard deviation of group delay in terms of F0 % delsp : standard deviation of group delay (ms) % cornf : lower corner frequency for phase randomization (Hz) % delfracind : selector of fixed and proportional group delay % Straight synthesis with all-pass filter design % by Hideki Kawahara % (c) ATR Human Info. Proc. Res. Labs. 1996 % 07/July/1996 % 12/Aug./1996 % 22/Aug./1996 % 06/Sep./1996 BUG FIX!!! wrong sign % 07/Sep./1996 converted to function script % 09/Sep./1996 coarse F0 information is possible % 16/Sep./1996 tolerant to F0 extraction errors % 02/Nov./1996 Now pitch extraction is perfect. No need for the hack. % 02/Feb./1997 Without V/UV discrimination % 08/June/1999 minor bug fix f0l=f0raw; [nii,njj]=size(n2sgram); fftl=nii+nii-2; fftl2=fftl/2; if length(fconv)==1 idcv=min([0:fftl/2]/fconv+1,fftl/2+1); % f. stretch conv. tabel elseif length(fconv)==nii idcv=fconv(:)'; end; sy=zeros([round((njj*shiftm/1000*fs)*sconv+3*fftl+1),1]); syo=sy; mixVhigh=sqrt(0.25./(f0var+0.25)); mixNhigh=sqrt(1-0.25./(f0var+0.25)); mixVlow=sqrt(0.25./(f0varL+0.25)); mixNlow=sqrt(1-0.25./(f0varL+0.25)); phs=fractpitch2(fftl); % phs will have smooth phase function for unit delay a=([0:fftl2-1,0,-(fftl2-1:-1:1)])/fftl2; sz=a'*pi; ta=[0:fftl2-1]/fftl2/2*2*pi; t=[ta,0,-ta(fftl2:-1:2)]; fftl2=fftl/2; nsyn=length(sy); idx=1; bb=1:fftl; bb2=1:fftl2; rbb2=fftl/2:-1:2; %------- shaping for low-frequency noize supression fxa=(0:fftl2)/fftl*fs; f0tmp=f0l.*(mixVlow>0.8); lowcutf=mean(f0tmp(f0tmp>0))*0.7*pcnv; %lowcutfav=mean(f0l(f0l>0))*0.8; lowcutfav=lowcutf; %wlcutav=1.0./(1+exp(-5*(fxa-lowcutfav)/(lowcutfav/3))); wlcutav=1.0./(1+exp(-14*(fxa-lowcutfav)/(lowcutfav/1))); %keyboard; %------- parameters for noize based apf design t=([1:fftl]-fftl/2-1)/fftl*2; adjd=1.0./(1+exp(-20*t)); % correction function for smooth transition at fs/2 gw=exp(-0.25*pi*(fs*(t/2)/gdbw).^2); % slope difinition function gw=gw/sum(gw); % gdbw is the equvalent rectangular band width fgw=real(fft(fftshift(gw))); % gw is the spectral smoothing window df=fs/fftl*2*pi; % normalization constant for integration and differentiation fw=(1:fftl2+1)/fftl*fs; % frequency axis trbw=300; % width of transition area rho=1.0./(1+exp(-(fw-cornf)/trbw)); % rondom group delay weighting function [snn,smm]=size(n2sgram); fqx=(0:snn-1)/snn*fs/2; chigh=1.0./(1+exp(-(fqx-600)/100))'; clow=1.0-chigh; f0arc=0; lft=1-hanning(fftl); lft=1.0./(1+exp(-(lft-0.5)*60)); ww=1.0./(1+exp(-(hanning(fftl)-0.3)*23)); % lifter for iin=1; dmx=max(max(n2sgram)); while (idx < nsyn-fftl-10) & (ceil(iin) 1000000 ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(fft(ccp2.*lft)/fftl); nidx=round(idx); % wlcut=1.0./(1+exp(-20*(fxa-lowcutf)/lowcutf)); nf0=fs/f0; frt=idx-nidx; frtz=exp(i*phs*frt)'; % This was in a wrong sign! nz=randn(1,fftl2+1).*((rho*0+1)*mixNlow(round(ii))+(1-mixNlow(round(ii)))*rho); nz=real(ifft(fft([nz,nz(rbb2)]).*fgw)); nz=nz*sqrt(fftl*gdbw/fs); % correction factor for noise if delfracind, delsp=delfrac*1000/f0; end; nz=nz*delsp*df/1000; mz=cumsum([nz(1:fftl2+1),nz(rbb2)])-nz(1); mmz=-(mz-adjd*(rem((mz(fftl)+mz(2)),2*pi)-2*pi)); pz=exp(-i*mmz)'; %.*[wlcut wlcut(rbb2)]'; tx=fftshift(real(ifft(exp(ffx).*pz.*frtz.*[mix;mix(rbb2)]))).*ww; % tx=fftshift(real(ifft(ff.*pz.*frtz.*[mix;mix(rbb2)]))).*ww; sy(bb+nidx)=sy(bb+nidx)+tx*sqrt(nf0); % if abs(round(ii)-90)<10 % keyboard; % end; idx=idx+nf0; iin=min(length(f0l),idx/fs*1000/shiftm/sconv+1); if (mixVlow(round(ii))<0.8) & (mixVlow(round(iin))>0.8) idxo=idx; ipos=min(find(mixVlow(round(ii:iin))>0.8))-1+ii; if length(ipos)==0 idx=idxo; else idx=max(idxo-nf0+1,(ipos-1)*fs/1000*shiftm*sconv); end; end; % disp([idx,iin]) end; %sy=sy*0; ii=1; idx=1; f0=500; f0=1000; %wlcutfric=1.0./(1+exp(-14*(fxa-lowcutfav*2)/(lowcutfav))); wlcutfric=1.0./(1+exp(-14*(fxa-lowcutfav)/(lowcutfav))); % 31/July/1999 while (idx < nsyn-fftl) & (ii0.03 mix=mixNlow(ii)*clow(round(idcv(:)))+mixNhigh(ii)*chigh(round(idcv(:))); ff=[n2sgram(round(idcv(:)),ii);n2sgram(round(idcv(rbb2)),ii)]; % ff=ff.*[wlcut wlcut(rbb2)]'; ff=ff.*[wlcutfric wlcutfric(rbb2)]'; % ccp=real(fft(log(ff+0.001))); % 23rd July, 1999 ccp=real(fft(log(ff+dmx/100000))); % 23rd July, 1999 % 24th Sept. 1999 ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(fft(ccp2.*lft)/fftl); nf0=fs/f0; %============= deleted on 18/July/1999 ====== % if f0l(ii) > 0 % f0x=lowcutf; % f0l(ii)*pcnv; % f0x=f0l(ii)*pcnv; % wlcut=1.0./(1+exp(-20*(fxa-f0x*0.8)/lowcutf)); % wlcut=wlcutav; % tx=fftshift(real(ifft(exp(ffx).*[wlcut.*mix' wlcut(rbb2).*mix(rbb2)']'))); % else % tx=fftshift(real(ifft(exp(ffx).*[wlcutav.*mix' wlcutav(rbb2).*mix(rbb2)']'))); % end; tx=fftshift(real(ifft(exp(ffx)))); %============= end of modification on 18/July/1999 ==== rx=randn([round(nf0),1]); tnx=fftfilt(rx,tx); sy(bb+nidx)=sy(bb+nidx)+tnx(bb).*ww; end; idx=idx+nf0; ii=min(length(f0l),idx/fs*1000/shiftm/sconv+1); end; sy2=sy(fftl/2+(1:round((njj*shiftm/1000*fs)*sconv))); lowcutf=70; if lowcutf <70 lowcutf=70; end; %[b,a]=butter(5,lowcutf/fs*2,'high'); %sy=filter(b,a,sy2); sy=sy2; ================================================ FILE: src/straightSynthTB07ca.m ================================================ function [sy,synthSataus]=straightSynthTB07ca(n2sgram,f0raw,shiftm,fs, ... pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind,ap,imap,imgi,lowestF0) % Straight synthesis with all-pass filter design based on % TEMPO analysis result % sy=straightSynthTB07ca(n2sgram,f0raw,f0var,f0varL,shiftm,fs, ... % pcnv,fconv,sconv,gdbw,delfrac,delsp,cornf,delfracind,ap,imap,imgi)); % sy : synthsized speech % n2sgram : amplitude spectrogram % f0raw : pitch pattern (Hz) % f0var : expected F0 variation with fricative modification % f0varL : expected F0 variation % shiftm : frame shift (ms) for spectrogram % fs : sampling freqnency (Hz) % pcnv : pitch stretch factor % fconv : freqnency stretch factor % sconv : speaking duratin stretch factor (overridden if || imap || >1 ) % gdbw : finest resolution in group delay (Hz) % delfrac : ratio of standard deviation of group delay in terms of F0 % delsp : standard deviation of group delay (ms) % cornf : lower corner frequency for phase randomization (Hz) % delfracind : selector of fixed and proportional group delay % ap : aperiodicity measure % imap : arbirtary mapping from new time (sample) to old time (frame) % imgi : display indicator, 1: display on (default), 0: off % lowestF0 : lower limit of the resynthesized fundamental frequency (Hz) % Straight synthesis with all-pass filter design % by Hideki Kawahara % (c) ATR Human Info. Proc. Res. Labs. 1996 % 07/July/1996 % 12/Aug./1996 % 22/Aug./1996 % 06/Sep./1996 BUG FIX!!! wrong sign % 07/Sep./1996 converted to function script % 09/Sep./1996 coarse F0 information is possible % 16/Sep./1996 tolerant to F0 extraction errors % 02/Nov./1996 Now pitch extraction is perfect. No need for the hack. % 02/Feb./1997 Without V/UV discrimination % 08/June/1999 minor bug fix % 03/Sep./1999 Graded excitation with one parameter % 29/Nov./1999 Arbitrary time axis mapping % 30/May/2001 revised aperiodicity control % 08/April/2002 revised to remove magical LPF % 11/August/2002 bug fix for V/UV transition % 24/August/2002 more precise F0 control % 23/Sept./2002 minor adjustment for the length of the resynthesized signal % 05/Dec./2002 minor bug fix based on M. Tsuzaki's comment % 17/Dec./2002 bug fix in mid point selection % 10/Aug./2005 modified by Takahashi on waitbar % 10/Sept./2005 modified by Kawahara on waitbar % 27/Nov./2005 modified by Kawahara for % 21/April/2010 bug fix by Hideki Kawahara for aperiodicity % 03/July/2016 refactored for MATLAB R2016a and Octave 4.0.2 %if nargin<=14; imgi=1; end; % 10/Sept./2005 statusReport = 'ok';% 27/Nov./2005 switch nargin % 27/Nov./2005 case {1,2,3,4,5,6,7,8,9,10,11,12,13,14} imgi = 1; lowestF0 = 50; case {15} lowestF0 = 50; end; f0l=f0raw; [nii,njj]=size(n2sgram); njj=min([njj,length(f0raw)]); % 18/Sep./1999 f0l=f0l(1:njj); %03/Sep./1999 if min(f0l(f0l>0))*pcnv < lowestF0 statusReport = ['Minimum synthesized F0 exceeded the lower limit(' num2str(lowestF0) ' Hz).']; end; fftLengthForLowestF0 = 2^ceil(log2(2*round(fs/lowestF0)));% 27/Nov./2005 fftl=nii+nii-2; if fftl < fftLengthForLowestF0 % 27/Nov./2005 niiNew = fftLengthForLowestF0/2+1; statusReport = 'The FFT length was inconsistent and replaced'; n2sgram = interp1(0:nii-1,n2sgram,(0:niiNew-1)*(nii-1)/(niiNew-1)); ap = interp1(0:nii-1,ap,(0:niiNew-1)*(nii-1)/(niiNew-1)); fftl = fftLengthForLowestF0; nii = niiNew; end; % safeguard for ap mismatch 21/April/2010 if size(ap,1) ~= size(n2sgram,1) apDouble = zeros(size(n2sgram,1),size(ap,2)); for ik = 1:size(ap,2) apDouble(:,ik) = interp1((0:size(ap,1)-1),ap(:,ik),... (0:size(n2sgram,1)-1)/((size(n2sgram,1)-1)/(size(ap,1)-1)),'linear','extrap'); end; ap = apDouble; end; aprms=10.0.^(ap/20); % 23/Sept./1999 aprm=min(1,max(0.001,aprms*1.6-0.015)); % 30/May/2001 if length(fconv)==1 idcv=min((0:fftl/2)/fconv+1,fftl/2+1); % f. stretch conv. tabel elseif length(fconv)==nii idcv=fconv(:)'; elseif length(fconv) ~= nii idcv = 1:fftl/2+1; statusReport = [statusReport '\n' 'Frequency axix mapping function is not consistent with lowestF0.']; end; if length(imap)>1 sy=zeros(length(imap)+3*fftl,1);disp('here!!'); else sy=zeros([round((njj*shiftm/1000*fs)*sconv+3*fftl+1),1]); imap=1:length(sy); imap=min(length(f0l),((imap-1)/fs*1000/shiftm/sconv+1)); end; imap=[imap ones(1,round(fs*0.2))*length(f0l)]; % safe guard ix=find(imap>=length(f0l), 1, 'first'); rmap=interp1(imap(1:ix),1:ix,1:length(f0l)); phs=fractpitch2(fftl); % phs will have smooth phase function for unit delay fftl2=fftl/2; nsyn=length(sy); idx=1; bb=1:fftl; rbb2=fftl/2:-1:2; %------- parameters for noize based apf design t=((1:fftl)-fftl/2-1)/fftl*2; adjd=1.0./(1+exp(-20*t)); % correction function for smooth transition at fs/2 gw=exp(-0.25*pi*(fs*(t/2)/gdbw).^2); % slope difinition function gw=gw/sum(gw); % gdbw is the equvalent rectangular band width fgw=real(fft(fftshift(gw))); % gw is the spectral smoothing window df=fs/fftl*2*pi; % normalization constant for integration and differentiation fw=(1:fftl2+1)/fftl*fs; % frequency axis trbw=300; % width of transition area rho=1.0./(1+exp(-(fw-cornf)/trbw)); % rondom group delay weighting function %--------- frozen group delay component calculation ------ nz=randn(1,fftl2+1).*rho; % This is not effective. Left for randn status. %--------- lft=1-hanning(fftl)+nz(1)*0; % +nz(1)*0 is dummy lft=1.0./(1+exp(-(lft-0.5)*60)); ww=1.0./(1+exp(-(hanning(fftl)-0.3)*23)); % lifter for iin=1; if imgi==1; hpg=waitbar(0,'voiced part synthesis'); end; % 10/Aug./2005 icntr=0; dmx=max(max(n2sgram)); while (idx < nsyn-fftl-10) && (ceil(iin)0) && (f0l(round(ii))>0) if f0l(round((ii+tii)/2))>0 % fix by H.K. on 17/Dec./2002 f0=max(lowestF0/pcnv,f0l(round((ii+tii)/2))); % mid point else f0=f0l(round(ii)); end; f0=f0*pcnv; end; %- -------- ff=[n2sgram(round(idcv(:)),round(ii)); ... n2sgram(round(idcv(rbb2)),round(ii))]; ccp=real(fft(log(ff+dmx/1000000))); % 24 Sept. 1999 10000 -> 1000000 ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(fft(ccp2.*lft)/fftl); nidx=round(idx); nf0=fs/f0; frt=idx-nidx; frtz=exp(1i*phs*frt)'; % This was in a wrong sign! nz=randn(1,fftl2+1).*rho; %((rho*0+1)*mixNlow(round(ii))+(1-mixNlow(round(ii)))*rho); nz=real(ifft(fft([nz,nz(rbb2)]).*fgw)); nz=nz*sqrt(fftl*gdbw/fs); % correction factor for noise if delfracind, delsp=delfrac*1000/f0; end; nz=nz*delsp*df/1000; mz=cumsum([nz(1:fftl2+1),nz(rbb2)])-nz(1); mmz=-(mz-adjd*(rem((mz(fftl)+mz(2)),2*pi)-2*pi)); pzr=exp(-1i*mmz)'; %.*[wlcut wlcut(rbb2)]'; % set ineffective 01/June/2001 pz=pzr; % This makes random group delay to be effective wnz=aprm(round(idcv(:)),round(ii)); % 06/May/2001 This is correct! wpr=sqrt(max(0,1-wnz.*wnz)); % 23/Sept./1999 rx=randn(round(nf0),1); %----------- temporal envelope control of the aperiodic component --- zt0=nf0/fs+rx(1)*0; % +rx(1)*0 is a dummy ztc=0.01; % time constant 10ms (for example) ztp=((1:round(nf0))'-1)/fs; nev=sqrt(2*zt0/ztc/(1-exp(-2*zt0/ztc)))*exp(-ztp/ztc); rx=randn(round(nf0),1); wfv=fft((rx-mean(rx)).*nev,fftl); % DC component removal 8/April/2002 %-------------------------------------------------------------------- ep=0*real(ffx); nf0n=round(nf0); gh=hanning(nf0n*2); ep(1:nf0n)=gh(nf0n:-1:1); ep(end:-1:end-nf0n+2)=ep(2:nf0n); % bug fix on 29/Jan./2003 ep=-ep/sum(ep); ep(1)=ep(1)+1; epf=fft(ep); tx=fftshift(real(ifft(epf.*exp(ffx).*pz.*frtz.*[wpr;wpr(rbb2)]))).*ww; % 8/April/2002 tx2=fftshift(real(ifft(exp(ffx).*frtz.*[wnz;wnz(rbb2)].*wfv))).*ww; % 31/May/2001 sy(bb+nidx)=sy(bb+nidx)+(tx*sqrt(nf0)+tx2)*(f0raw(round(ii))>0); % 02/ Sept./1999 idx=idx+nf0; iin=min(max(1,round(imap(round(idx)))),min(njj,length(f0raw))); % modification on 5/Dec/2002 based on comments by M. Tsuzaki if (f0raw(round(ii))==0) && (f0raw(round(iin))>0) % (mixVlow(round(ii))<0.8) & (mixVlow(round(iin))>0.8) idxo=idx; ipos=find(f0raw(round(ii:iin))>0, 1, 'first')-1+ii; if isempty(ipos) idx=idxo; else idx=max(idxo-nf0+1,rmap(round(ipos))); % 11/August/2002 (Was -1 mistake??) end; end; end; if imgi==1; close(hpg); end; % 10/Aug./2005 ii=1; idx=1; f0=1000; if imgi==1; hpg=waitbar(0,'unvoiced part synthesis'); end; % 10/Aug./2005 icntr=0; while (idx < nsyn-fftl) && (ii0.03 ff=[n2sgram(round(idcv(:)),ii);n2sgram(round(idcv(rbb2)),ii)]; ccp=real(fft(log(ff+dmx/100000))); % 23rd July, 1999 % 24th Sept. ccp2=[ccp(1);2*ccp(2:fftl/2);0*ccp(fftl/2+1:fftl)]; ffx=(fft(ccp2.*lft)/fftl); nf0=fs/f0; tx=fftshift(real(ifft(exp(ffx)))); rx=randn([round(nf0),1]); tnx=fftfilt(rx-mean(rx),tx); % DC component removal 8/April/2002 sy(bb+nidx)=sy(bb+nidx)+tnx(bb).*ww; end; idx=idx+nf0; ii=round(imap(round(idx))); end; if imgi==1; close(hpg); end; % 10/Aug./2005 sy2=sy(fftl/2+(1:ix)); sy=sy2; switch nargout case {1} case {2} synthSataus = statusReport; end; end ================================================ FILE: src/straightsound.m ================================================ function ok=straightsound(x,fs) % Up sampling for reducing aliasing % Requested by Dr. Uematsu of NTT, 02/02/1998 switch fs case 8000 soundsc(interp(x/32768,4),fs*4); case 10000 soundsc(interp(x/32768,4),fs*4); case 11025 soundsc(interp(x/32768,4),fs*4); case 12000 soundsc(interp(x/32768,4),fs*4); case 16000 soundsc(interp(x/32768,2),fs*2); case 20000 soundsc(interp(x/32768,2),fs*2); case 22050 soundsc(interp(x/32768,2),fs*2); case 24000 soundsc(interp(x/32768,2),fs*2); otherwise, soundsc(x/32768,fs); end ok='ok'; ================================================ FILE: src/syncgui.m ================================================ function oki=syncgui() % synchronize GUI and internal values global n2sgram nsgram n3sgram n2sgrambk n3sgramE xold x f0floor f0ceil fs framem shiftm f0shiftm ... fftl eta pc framel fftl2 acth pwth pcnv fconv sconv delsp gdbw cornf fname ofname delfracind ... tpath cpath paraminitialized mag delfrac hr f0raw f0l f0var f0varL sy pcorr pecorr ... upsampleon gobjlist hhb defaultendian indefaultendian outdefaultendian framel=round(framem*fs/1000); if fftl