sonic/doc/index.md

*b290403dSRicardo Garcia# libsonic Home Page
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia[Download the latest tar-ball from here](download).
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThe source code repository can be cloned using git:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    $ git clone git://github.com/waywardgeek/sonic.git
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThe source code for the Android version, sonic-ndk, can be cloned with:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    $ git clone git://github.com/waywardgeek/sonic-ndk.git
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThere is a simple test app for android that demos capabilities.  You can
*b290403dSRicardo Garcia[install the Android application from here](Sonic-NDK.apk)
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThere is a new native Java port, which is very fast!  Checkout Sonic.java and
*b290403dSRicardo GarciaMain.java in the latest tar-ball, or get the code from git.
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia## Overview
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaSonic is free software for speeding up or slowing down speech.  While similar to
*b290403dSRicardo Garciaother algorithms that came before, Sonic is optimized for speed ups of over 2X.
*b290403dSRicardo GarciaThere is a simple sonic library in ANSI C, and one in pure Java.  Both are
*b290403dSRicardo Garciadesigned to easily be integrated into streaming voice applications, like TTS
*b290403dSRicardo Garciaback ends.  While a very new project, it is already integrated into:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia- espeak
*b290403dSRicardo Garcia- Debian Sid as package libsonic
*b290403dSRicardo Garcia- Android Astro Player Nova
*b290403dSRicardo Garcia- Android Osplayer
*b290403dSRicardo Garcia- Multiple closed source TTS engines
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThe primary motivation behind sonic is to enable the blind and visually impaired
*b290403dSRicardo Garciato improve their productivity with free software speech engines, like espeak.
*b290403dSRicardo GarciaSonic can also be used by the sighted.  For example, sonic can improve the
*b290403dSRicardo Garciaexperience of listening to an audio book on an Android phone.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaSonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
*b290403dSRicardo Garciaas under the Apache 2.0 license.  Feel free to contact me at
*b290403dSRicardo Garcia<[email protected]>.  One user was concerned about patents.  I believe the
*b290403dSRicardo Garciasonic algorithms do not violate any patents, as most of it is very old, based
*b290403dSRicardo Garciaon [PICOLA](https://web.archive.org/web/20120731100136/http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
*b290403dSRicardo Garciathe new part, for greater than 2X speed up, is clearly a capability most
*b290403dSRicardo Garciadevelopers ignore, and would not bother to patent.
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia## Comparison to Other Solutions
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaIn short, Sonic is better for speech, while WSOLA is better for music.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaA popular alternative is SoundTouch.  SoundTouch uses WSOLA, an algorithm
*b290403dSRicardo Garciaoptimized for changing the tempo of music.  No WSOLA based program performs well
*b290403dSRicardo Garciafor speech (contrary to the inventor's estimate of WSOLA).  Listen to [this
*b290403dSRicardo Garciasoundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
*b290403dSRicardo Garciait to [this sonic sample](sonic.wav).  Both are sped up by 2X.  WSOLA
*b290403dSRicardo Garciaintroduces unacceptable levels of distortion, making speech impossible to
*b290403dSRicardo Garciaunderstand at high speed (over 2.5X) by blind speed listeners.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaHowever, there are decent free software algorithms for speeding up speech.  They
*b290403dSRicardo Garciaare all in the TD-PSOLA family.  For speech rates below 2X, sonic uses PICOLA,
*b290403dSRicardo Garciawhich I find to be the best algorithm available.  A slightly buggy
*b290403dSRicardo Garciaimplementation of PICOLA is available in the spandsp library.  I find the one in
*b290403dSRicardo GarciaRockBox quite good, though it's limited to 2X speed up.  So far as I know, only
*b290403dSRicardo Garciasonic is optimized for speed factors needed by the blind, up to 6X.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaSonic does all of its CPU intensive work with integer math, and works well on
*b290403dSRicardo GarciaARM CPUs without FPUs.  It supports multiple channels (stereo), and is also able
*b290403dSRicardo Garciato change the pitch of a voice.  It works well in streaming audio applications,
*b290403dSRicardo Garciaand can deal with sound streams in 16-bit signed integer, 32-bit floating point,
*b290403dSRicardo Garciaor 8-bit unsigned formats.  The source code is in plain ANSI C.  In short, it's
*b290403dSRicardo Garciaproduction ready.
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia## Using libsonic in your program
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaSonic is still a new library, but is in Debian Sid.  It will take a while
*b290403dSRicardo Garciafor it to filter out into all the other distros.  For now, feel free to simply
*b290403dSRicardo Garciaadd sonic.c and sonic.h to your application (or Sonic.java), but consider
*b290403dSRicardo Garciaswitching to -lsonic once the library is available on your distro.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThe file [main.c](main.c) is the source code for the sonic command-line application.  It
*b290403dSRicardo Garciais meant to be useful as example code.  Feel free to copy directly from main.c
*b290403dSRicardo Garciainto your application, as main.c is in the public domain.  Dependencies listed
*b290403dSRicardo Garciain debian/control like libsndfile are there to compile the sonic command-line
*b290403dSRicardo Garciaapplication.  Libsonic has no external dependencies.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThere are basically two ways to use sonic: batch or stream mode.  The simplest
*b290403dSRicardo Garciais batch mode where you pass an entire sound sample to sonic.  All you do is
*b290403dSRicardo Garciacall one function, like this:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThis will change the speed and pitch of the sound samples pointed to by samples,
*b290403dSRicardo Garciawhich should be 16-bit signed integers.  Stereo mode is supported, as
*b290403dSRicardo Garciais any arbitrary number of channels.  Samples for each channel should be
*b290403dSRicardo Garciaadjacent in the input array.  Because the samples are modified in-place, be sure
*b290403dSRicardo Garciathat there is room in the samples array for the speed-changed samples.  In
*b290403dSRicardo Garciageneral, if you are speeding up, rather than slowing down, it will be safe to
*b290403dSRicardo Garciahave no extra padding.  If your sound samples are mono, and you don't want to
*b290403dSRicardo Garciascale volume or playback rate, and if you want normal pitch scaling, then call
*b290403dSRicardo Garciait like this:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThe other way to use libsonic is in stream mode.  This is more complex, but
*b290403dSRicardo Garciaallows sonic to be inserted into a sound stream with fairly low latency.  The
*b290403dSRicardo Garciacurrent maximum latency in sonic is 31 milliseconds, which is enough to process
*b290403dSRicardo Garciatwo pitch periods of voice as low as 65 Hz.  In general, the latency is equal to
*b290403dSRicardo Garciatwo pitch periods, which is typically closer to 20 milliseconds.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaTo process a sound stream, you must create a sonicStream object, which contains
*b290403dSRicardo Garciaall of the state used by sonic.  Sonic should be thread safe, and multiple
*b290403dSRicardo GarciasonicStream objects can be used at the same time.  You create a sonicStream
*b290403dSRicardo Garciaobject like this:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicStream stream = sonicCreateStream(sampleRate, numChannels);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaWhen you're done with a sonic stream, you can free it's memory with:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicDestroyStream(stream);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaBy default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
*b290403dSRicardo Garciano change at all to the sound stream.  Sonic detects this case, and simply
*b290403dSRicardo Garciacopies the input to the output to reduce CPU load.  To change the speed, pitch,
*b290403dSRicardo Garciarate, or volume, set the parameters using:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicSetSpeed(stream, speed);
*b290403dSRicardo Garcia    sonicSetPitch(stream, pitch);
*b290403dSRicardo Garcia    sonicSetRate(stream, rate);
*b290403dSRicardo Garcia    sonicSetVolume(stream, volume);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThese four parameters are floating point numbers.  A speed of 2.0 means to
*b290403dSRicardo Garciadouble speed of speech.  A pitch of 0.95 means to lower the pitch by about 5%,
*b290403dSRicardo Garciaand a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
*b290403dSRicardo Garciaexceed the maximum range of a 16-bit integer.  Speech rate scales how fast
*b290403dSRicardo Garciaspeech is played.  A 2.0 value will make you sound like a chipmunk talking very
*b290403dSRicardo Garciafast.  A 0.7 value will make you sound like a giant talking slowly.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaBy default, pitch is modified by changing the rate, and then using speed
*b290403dSRicardo Garciamodification to bring the speed back to normal.  This allows for a wide range of
*b290403dSRicardo Garciapitch changes, but changing the pitch makes the speaker sound larger or smaller,
*b290403dSRicardo Garciatoo.  If you want to make the person sound like the same person, but talking at
*b290403dSRicardo Garciaa higher or lower pitch, then enable the vocal chord emulation mode for pitch
*b290403dSRicardo Garciascaling, using:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicSetChordPitch(stream, 1);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaHowever, only small changes to pitch should be used in this mode, as it
*b290403dSRicardo Garciaintroduces significant distortion otherwise.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaAfter setting the sound parameters, you write to the stream like this:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicWriteShortToStream(stream, samples, numSamples);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaYou read the sped up speech samples from sonic like this:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
*b290403dSRicardo Garcia    if(samplesRead > 0) {
*b290403dSRicardo Garcia	/* Do something with the output samples in outBuffer, like send them to
*b290403dSRicardo Garcia	 * the sound device. */
*b290403dSRicardo Garcia    }
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaYou may change the speed, pitch, rate, and volume parameters at any time, without
*b290403dSRicardo Garciahaving to flush or create a new sonic stream.
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaWhen your sound stream ends, there may be several milliseconds of sound data in
*b290403dSRicardo Garciathe sonic stream's buffers.  To force sonic to process those samples use:
*b290403dSRicardo Garcia
*b290403dSRicardo Garcia    sonicFlushStream(stream);
*b290403dSRicardo Garcia
*b290403dSRicardo GarciaThen, read those samples as above.  That's about all there is to using libsonic.
*b290403dSRicardo GarciaThere are some more functions as a convenience for the user, like
*b290403dSRicardo GarciasonicGetSpeed.  Other sound data formats are supported: signed char and float.
*b290403dSRicardo GarciaIf float, the sound data should be between -1.0 and 1.0.  Internally, all sound
*b290403dSRicardo Garciadata is converted to 16-bit integers for processing.