1*b290403dSRicardo Garcia# libsonic Home Page 2*b290403dSRicardo Garcia 3*b290403dSRicardo Garcia[Download the latest tar-ball from here](download). 4*b290403dSRicardo Garcia 5*b290403dSRicardo GarciaThe source code repository can be cloned using git: 6*b290403dSRicardo Garcia 7*b290403dSRicardo Garcia $ git clone git://github.com/waywardgeek/sonic.git 8*b290403dSRicardo Garcia 9*b290403dSRicardo GarciaThe source code for the Android version, sonic-ndk, can be cloned with: 10*b290403dSRicardo Garcia 11*b290403dSRicardo Garcia $ git clone git://github.com/waywardgeek/sonic-ndk.git 12*b290403dSRicardo Garcia 13*b290403dSRicardo GarciaThere is a simple test app for android that demos capabilities. You can 14*b290403dSRicardo Garcia[install the Android application from here](Sonic-NDK.apk) 15*b290403dSRicardo Garcia 16*b290403dSRicardo GarciaThere is a new native Java port, which is very fast! Checkout Sonic.java and 17*b290403dSRicardo GarciaMain.java in the latest tar-ball, or get the code from git. 18*b290403dSRicardo Garcia 19*b290403dSRicardo Garcia## Overview 20*b290403dSRicardo Garcia 21*b290403dSRicardo GarciaSonic is free software for speeding up or slowing down speech. While similar to 22*b290403dSRicardo Garciaother algorithms that came before, Sonic is optimized for speed ups of over 2X. 23*b290403dSRicardo GarciaThere is a simple sonic library in ANSI C, and one in pure Java. Both are 24*b290403dSRicardo Garciadesigned to easily be integrated into streaming voice applications, like TTS 25*b290403dSRicardo Garciaback ends. While a very new project, it is already integrated into: 26*b290403dSRicardo Garcia 27*b290403dSRicardo Garcia- espeak 28*b290403dSRicardo Garcia- Debian Sid as package libsonic 29*b290403dSRicardo Garcia- Android Astro Player Nova 30*b290403dSRicardo Garcia- Android Osplayer 31*b290403dSRicardo Garcia- Multiple closed source TTS engines 32*b290403dSRicardo Garcia 33*b290403dSRicardo GarciaThe primary motivation behind sonic is to enable the blind and visually impaired 34*b290403dSRicardo Garciato improve their productivity with free software speech engines, like espeak. 35*b290403dSRicardo GarciaSonic can also be used by the sighted. For example, sonic can improve the 36*b290403dSRicardo Garciaexperience of listening to an audio book on an Android phone. 37*b290403dSRicardo Garcia 38*b290403dSRicardo GarciaSonic is Copyright 2010, 2011, Bill Cox, all rights reserved. It is released 39*b290403dSRicardo Garciaas under the Apache 2.0 license. Feel free to contact me at 40*b290403dSRicardo Garcia<[email protected]>. One user was concerned about patents. I believe the 41*b290403dSRicardo Garciasonic algorithms do not violate any patents, as most of it is very old, based 42*b290403dSRicardo Garciaon [PICOLA](https://web.archive.org/web/20120731100136/http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and 43*b290403dSRicardo Garciathe new part, for greater than 2X speed up, is clearly a capability most 44*b290403dSRicardo Garciadevelopers ignore, and would not bother to patent. 45*b290403dSRicardo Garcia 46*b290403dSRicardo Garcia## Comparison to Other Solutions 47*b290403dSRicardo Garcia 48*b290403dSRicardo GarciaIn short, Sonic is better for speech, while WSOLA is better for music. 49*b290403dSRicardo Garcia 50*b290403dSRicardo GarciaA popular alternative is SoundTouch. SoundTouch uses WSOLA, an algorithm 51*b290403dSRicardo Garciaoptimized for changing the tempo of music. No WSOLA based program performs well 52*b290403dSRicardo Garciafor speech (contrary to the inventor's estimate of WSOLA). Listen to [this 53*b290403dSRicardo Garciasoundstretch sample](soundstretch.wav), which uses SoundTouch, and compare 54*b290403dSRicardo Garciait to [this sonic sample](sonic.wav). Both are sped up by 2X. WSOLA 55*b290403dSRicardo Garciaintroduces unacceptable levels of distortion, making speech impossible to 56*b290403dSRicardo Garciaunderstand at high speed (over 2.5X) by blind speed listeners. 57*b290403dSRicardo Garcia 58*b290403dSRicardo GarciaHowever, there are decent free software algorithms for speeding up speech. They 59*b290403dSRicardo Garciaare all in the TD-PSOLA family. For speech rates below 2X, sonic uses PICOLA, 60*b290403dSRicardo Garciawhich I find to be the best algorithm available. A slightly buggy 61*b290403dSRicardo Garciaimplementation of PICOLA is available in the spandsp library. I find the one in 62*b290403dSRicardo GarciaRockBox quite good, though it's limited to 2X speed up. So far as I know, only 63*b290403dSRicardo Garciasonic is optimized for speed factors needed by the blind, up to 6X. 64*b290403dSRicardo Garcia 65*b290403dSRicardo GarciaSonic does all of its CPU intensive work with integer math, and works well on 66*b290403dSRicardo GarciaARM CPUs without FPUs. It supports multiple channels (stereo), and is also able 67*b290403dSRicardo Garciato change the pitch of a voice. It works well in streaming audio applications, 68*b290403dSRicardo Garciaand can deal with sound streams in 16-bit signed integer, 32-bit floating point, 69*b290403dSRicardo Garciaor 8-bit unsigned formats. The source code is in plain ANSI C. In short, it's 70*b290403dSRicardo Garciaproduction ready. 71*b290403dSRicardo Garcia 72*b290403dSRicardo Garcia## Using libsonic in your program 73*b290403dSRicardo Garcia 74*b290403dSRicardo GarciaSonic is still a new library, but is in Debian Sid. It will take a while 75*b290403dSRicardo Garciafor it to filter out into all the other distros. For now, feel free to simply 76*b290403dSRicardo Garciaadd sonic.c and sonic.h to your application (or Sonic.java), but consider 77*b290403dSRicardo Garciaswitching to -lsonic once the library is available on your distro. 78*b290403dSRicardo Garcia 79*b290403dSRicardo GarciaThe file [main.c](main.c) is the source code for the sonic command-line application. It 80*b290403dSRicardo Garciais meant to be useful as example code. Feel free to copy directly from main.c 81*b290403dSRicardo Garciainto your application, as main.c is in the public domain. Dependencies listed 82*b290403dSRicardo Garciain debian/control like libsndfile are there to compile the sonic command-line 83*b290403dSRicardo Garciaapplication. Libsonic has no external dependencies. 84*b290403dSRicardo Garcia 85*b290403dSRicardo GarciaThere are basically two ways to use sonic: batch or stream mode. The simplest 86*b290403dSRicardo Garciais batch mode where you pass an entire sound sample to sonic. All you do is 87*b290403dSRicardo Garciacall one function, like this: 88*b290403dSRicardo Garcia 89*b290403dSRicardo Garcia sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels); 90*b290403dSRicardo Garcia 91*b290403dSRicardo GarciaThis will change the speed and pitch of the sound samples pointed to by samples, 92*b290403dSRicardo Garciawhich should be 16-bit signed integers. Stereo mode is supported, as 93*b290403dSRicardo Garciais any arbitrary number of channels. Samples for each channel should be 94*b290403dSRicardo Garciaadjacent in the input array. Because the samples are modified in-place, be sure 95*b290403dSRicardo Garciathat there is room in the samples array for the speed-changed samples. In 96*b290403dSRicardo Garciageneral, if you are speeding up, rather than slowing down, it will be safe to 97*b290403dSRicardo Garciahave no extra padding. If your sound samples are mono, and you don't want to 98*b290403dSRicardo Garciascale volume or playback rate, and if you want normal pitch scaling, then call 99*b290403dSRicardo Garciait like this: 100*b290403dSRicardo Garcia 101*b290403dSRicardo Garcia sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1); 102*b290403dSRicardo Garcia 103*b290403dSRicardo GarciaThe other way to use libsonic is in stream mode. This is more complex, but 104*b290403dSRicardo Garciaallows sonic to be inserted into a sound stream with fairly low latency. The 105*b290403dSRicardo Garciacurrent maximum latency in sonic is 31 milliseconds, which is enough to process 106*b290403dSRicardo Garciatwo pitch periods of voice as low as 65 Hz. In general, the latency is equal to 107*b290403dSRicardo Garciatwo pitch periods, which is typically closer to 20 milliseconds. 108*b290403dSRicardo Garcia 109*b290403dSRicardo GarciaTo process a sound stream, you must create a sonicStream object, which contains 110*b290403dSRicardo Garciaall of the state used by sonic. Sonic should be thread safe, and multiple 111*b290403dSRicardo GarciasonicStream objects can be used at the same time. You create a sonicStream 112*b290403dSRicardo Garciaobject like this: 113*b290403dSRicardo Garcia 114*b290403dSRicardo Garcia sonicStream stream = sonicCreateStream(sampleRate, numChannels); 115*b290403dSRicardo Garcia 116*b290403dSRicardo GarciaWhen you're done with a sonic stream, you can free it's memory with: 117*b290403dSRicardo Garcia 118*b290403dSRicardo Garcia sonicDestroyStream(stream); 119*b290403dSRicardo Garcia 120*b290403dSRicardo GarciaBy default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means 121*b290403dSRicardo Garciano change at all to the sound stream. Sonic detects this case, and simply 122*b290403dSRicardo Garciacopies the input to the output to reduce CPU load. To change the speed, pitch, 123*b290403dSRicardo Garciarate, or volume, set the parameters using: 124*b290403dSRicardo Garcia 125*b290403dSRicardo Garcia sonicSetSpeed(stream, speed); 126*b290403dSRicardo Garcia sonicSetPitch(stream, pitch); 127*b290403dSRicardo Garcia sonicSetRate(stream, rate); 128*b290403dSRicardo Garcia sonicSetVolume(stream, volume); 129*b290403dSRicardo Garcia 130*b290403dSRicardo GarciaThese four parameters are floating point numbers. A speed of 2.0 means to 131*b290403dSRicardo Garciadouble speed of speech. A pitch of 0.95 means to lower the pitch by about 5%, 132*b290403dSRicardo Garciaand a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we 133*b290403dSRicardo Garciaexceed the maximum range of a 16-bit integer. Speech rate scales how fast 134*b290403dSRicardo Garciaspeech is played. A 2.0 value will make you sound like a chipmunk talking very 135*b290403dSRicardo Garciafast. A 0.7 value will make you sound like a giant talking slowly. 136*b290403dSRicardo Garcia 137*b290403dSRicardo GarciaBy default, pitch is modified by changing the rate, and then using speed 138*b290403dSRicardo Garciamodification to bring the speed back to normal. This allows for a wide range of 139*b290403dSRicardo Garciapitch changes, but changing the pitch makes the speaker sound larger or smaller, 140*b290403dSRicardo Garciatoo. If you want to make the person sound like the same person, but talking at 141*b290403dSRicardo Garciaa higher or lower pitch, then enable the vocal chord emulation mode for pitch 142*b290403dSRicardo Garciascaling, using: 143*b290403dSRicardo Garcia 144*b290403dSRicardo Garcia sonicSetChordPitch(stream, 1); 145*b290403dSRicardo Garcia 146*b290403dSRicardo GarciaHowever, only small changes to pitch should be used in this mode, as it 147*b290403dSRicardo Garciaintroduces significant distortion otherwise. 148*b290403dSRicardo Garcia 149*b290403dSRicardo GarciaAfter setting the sound parameters, you write to the stream like this: 150*b290403dSRicardo Garcia 151*b290403dSRicardo Garcia sonicWriteShortToStream(stream, samples, numSamples); 152*b290403dSRicardo Garcia 153*b290403dSRicardo GarciaYou read the sped up speech samples from sonic like this: 154*b290403dSRicardo Garcia 155*b290403dSRicardo Garcia samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize); 156*b290403dSRicardo Garcia if(samplesRead > 0) { 157*b290403dSRicardo Garcia /* Do something with the output samples in outBuffer, like send them to 158*b290403dSRicardo Garcia * the sound device. */ 159*b290403dSRicardo Garcia } 160*b290403dSRicardo Garcia 161*b290403dSRicardo GarciaYou may change the speed, pitch, rate, and volume parameters at any time, without 162*b290403dSRicardo Garciahaving to flush or create a new sonic stream. 163*b290403dSRicardo Garcia 164*b290403dSRicardo GarciaWhen your sound stream ends, there may be several milliseconds of sound data in 165*b290403dSRicardo Garciathe sonic stream's buffers. To force sonic to process those samples use: 166*b290403dSRicardo Garcia 167*b290403dSRicardo Garcia sonicFlushStream(stream); 168*b290403dSRicardo Garcia 169*b290403dSRicardo GarciaThen, read those samples as above. That's about all there is to using libsonic. 170*b290403dSRicardo GarciaThere are some more functions as a convenience for the user, like 171*b290403dSRicardo GarciasonicGetSpeed. Other sound data formats are supported: signed char and float. 172*b290403dSRicardo GarciaIf float, the sound data should be between -1.0 and 1.0. Internally, all sound 173*b290403dSRicardo Garciadata is converted to 16-bit integers for processing. 174