xref: /aosp_15_r20/external/sonic/doc/index.md (revision b290403dc9d28f89f133eb7e190ea8185d440ecd)
1*b290403dSRicardo Garcia# libsonic Home Page
2*b290403dSRicardo Garcia
3*b290403dSRicardo Garcia[Download the latest tar-ball from here](download).
4*b290403dSRicardo Garcia
5*b290403dSRicardo GarciaThe source code repository can be cloned using git:
6*b290403dSRicardo Garcia
7*b290403dSRicardo Garcia    $ git clone git://github.com/waywardgeek/sonic.git
8*b290403dSRicardo Garcia
9*b290403dSRicardo GarciaThe source code for the Android version, sonic-ndk, can be cloned with:
10*b290403dSRicardo Garcia
11*b290403dSRicardo Garcia    $ git clone git://github.com/waywardgeek/sonic-ndk.git
12*b290403dSRicardo Garcia
13*b290403dSRicardo GarciaThere is a simple test app for android that demos capabilities.  You can
14*b290403dSRicardo Garcia[install the Android application from here](Sonic-NDK.apk)
15*b290403dSRicardo Garcia
16*b290403dSRicardo GarciaThere is a new native Java port, which is very fast!  Checkout Sonic.java and
17*b290403dSRicardo GarciaMain.java in the latest tar-ball, or get the code from git.
18*b290403dSRicardo Garcia
19*b290403dSRicardo Garcia## Overview
20*b290403dSRicardo Garcia
21*b290403dSRicardo GarciaSonic is free software for speeding up or slowing down speech.  While similar to
22*b290403dSRicardo Garciaother algorithms that came before, Sonic is optimized for speed ups of over 2X.
23*b290403dSRicardo GarciaThere is a simple sonic library in ANSI C, and one in pure Java.  Both are
24*b290403dSRicardo Garciadesigned to easily be integrated into streaming voice applications, like TTS
25*b290403dSRicardo Garciaback ends.  While a very new project, it is already integrated into:
26*b290403dSRicardo Garcia
27*b290403dSRicardo Garcia- espeak
28*b290403dSRicardo Garcia- Debian Sid as package libsonic
29*b290403dSRicardo Garcia- Android Astro Player Nova
30*b290403dSRicardo Garcia- Android Osplayer
31*b290403dSRicardo Garcia- Multiple closed source TTS engines
32*b290403dSRicardo Garcia
33*b290403dSRicardo GarciaThe primary motivation behind sonic is to enable the blind and visually impaired
34*b290403dSRicardo Garciato improve their productivity with free software speech engines, like espeak.
35*b290403dSRicardo GarciaSonic can also be used by the sighted.  For example, sonic can improve the
36*b290403dSRicardo Garciaexperience of listening to an audio book on an Android phone.
37*b290403dSRicardo Garcia
38*b290403dSRicardo GarciaSonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
39*b290403dSRicardo Garciaas under the Apache 2.0 license.  Feel free to contact me at
40*b290403dSRicardo Garcia<[email protected]>.  One user was concerned about patents.  I believe the
41*b290403dSRicardo Garciasonic algorithms do not violate any patents, as most of it is very old, based
42*b290403dSRicardo Garciaon [PICOLA](https://web.archive.org/web/20120731100136/http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
43*b290403dSRicardo Garciathe new part, for greater than 2X speed up, is clearly a capability most
44*b290403dSRicardo Garciadevelopers ignore, and would not bother to patent.
45*b290403dSRicardo Garcia
46*b290403dSRicardo Garcia## Comparison to Other Solutions
47*b290403dSRicardo Garcia
48*b290403dSRicardo GarciaIn short, Sonic is better for speech, while WSOLA is better for music.
49*b290403dSRicardo Garcia
50*b290403dSRicardo GarciaA popular alternative is SoundTouch.  SoundTouch uses WSOLA, an algorithm
51*b290403dSRicardo Garciaoptimized for changing the tempo of music.  No WSOLA based program performs well
52*b290403dSRicardo Garciafor speech (contrary to the inventor's estimate of WSOLA).  Listen to [this
53*b290403dSRicardo Garciasoundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
54*b290403dSRicardo Garciait to [this sonic sample](sonic.wav).  Both are sped up by 2X.  WSOLA
55*b290403dSRicardo Garciaintroduces unacceptable levels of distortion, making speech impossible to
56*b290403dSRicardo Garciaunderstand at high speed (over 2.5X) by blind speed listeners.
57*b290403dSRicardo Garcia
58*b290403dSRicardo GarciaHowever, there are decent free software algorithms for speeding up speech.  They
59*b290403dSRicardo Garciaare all in the TD-PSOLA family.  For speech rates below 2X, sonic uses PICOLA,
60*b290403dSRicardo Garciawhich I find to be the best algorithm available.  A slightly buggy
61*b290403dSRicardo Garciaimplementation of PICOLA is available in the spandsp library.  I find the one in
62*b290403dSRicardo GarciaRockBox quite good, though it's limited to 2X speed up.  So far as I know, only
63*b290403dSRicardo Garciasonic is optimized for speed factors needed by the blind, up to 6X.
64*b290403dSRicardo Garcia
65*b290403dSRicardo GarciaSonic does all of its CPU intensive work with integer math, and works well on
66*b290403dSRicardo GarciaARM CPUs without FPUs.  It supports multiple channels (stereo), and is also able
67*b290403dSRicardo Garciato change the pitch of a voice.  It works well in streaming audio applications,
68*b290403dSRicardo Garciaand can deal with sound streams in 16-bit signed integer, 32-bit floating point,
69*b290403dSRicardo Garciaor 8-bit unsigned formats.  The source code is in plain ANSI C.  In short, it's
70*b290403dSRicardo Garciaproduction ready.
71*b290403dSRicardo Garcia
72*b290403dSRicardo Garcia## Using libsonic in your program
73*b290403dSRicardo Garcia
74*b290403dSRicardo GarciaSonic is still a new library, but is in Debian Sid.  It will take a while
75*b290403dSRicardo Garciafor it to filter out into all the other distros.  For now, feel free to simply
76*b290403dSRicardo Garciaadd sonic.c and sonic.h to your application (or Sonic.java), but consider
77*b290403dSRicardo Garciaswitching to -lsonic once the library is available on your distro.
78*b290403dSRicardo Garcia
79*b290403dSRicardo GarciaThe file [main.c](main.c) is the source code for the sonic command-line application.  It
80*b290403dSRicardo Garciais meant to be useful as example code.  Feel free to copy directly from main.c
81*b290403dSRicardo Garciainto your application, as main.c is in the public domain.  Dependencies listed
82*b290403dSRicardo Garciain debian/control like libsndfile are there to compile the sonic command-line
83*b290403dSRicardo Garciaapplication.  Libsonic has no external dependencies.
84*b290403dSRicardo Garcia
85*b290403dSRicardo GarciaThere are basically two ways to use sonic: batch or stream mode.  The simplest
86*b290403dSRicardo Garciais batch mode where you pass an entire sound sample to sonic.  All you do is
87*b290403dSRicardo Garciacall one function, like this:
88*b290403dSRicardo Garcia
89*b290403dSRicardo Garcia    sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);
90*b290403dSRicardo Garcia
91*b290403dSRicardo GarciaThis will change the speed and pitch of the sound samples pointed to by samples,
92*b290403dSRicardo Garciawhich should be 16-bit signed integers.  Stereo mode is supported, as
93*b290403dSRicardo Garciais any arbitrary number of channels.  Samples for each channel should be
94*b290403dSRicardo Garciaadjacent in the input array.  Because the samples are modified in-place, be sure
95*b290403dSRicardo Garciathat there is room in the samples array for the speed-changed samples.  In
96*b290403dSRicardo Garciageneral, if you are speeding up, rather than slowing down, it will be safe to
97*b290403dSRicardo Garciahave no extra padding.  If your sound samples are mono, and you don't want to
98*b290403dSRicardo Garciascale volume or playback rate, and if you want normal pitch scaling, then call
99*b290403dSRicardo Garciait like this:
100*b290403dSRicardo Garcia
101*b290403dSRicardo Garcia    sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);
102*b290403dSRicardo Garcia
103*b290403dSRicardo GarciaThe other way to use libsonic is in stream mode.  This is more complex, but
104*b290403dSRicardo Garciaallows sonic to be inserted into a sound stream with fairly low latency.  The
105*b290403dSRicardo Garciacurrent maximum latency in sonic is 31 milliseconds, which is enough to process
106*b290403dSRicardo Garciatwo pitch periods of voice as low as 65 Hz.  In general, the latency is equal to
107*b290403dSRicardo Garciatwo pitch periods, which is typically closer to 20 milliseconds.
108*b290403dSRicardo Garcia
109*b290403dSRicardo GarciaTo process a sound stream, you must create a sonicStream object, which contains
110*b290403dSRicardo Garciaall of the state used by sonic.  Sonic should be thread safe, and multiple
111*b290403dSRicardo GarciasonicStream objects can be used at the same time.  You create a sonicStream
112*b290403dSRicardo Garciaobject like this:
113*b290403dSRicardo Garcia
114*b290403dSRicardo Garcia    sonicStream stream = sonicCreateStream(sampleRate, numChannels);
115*b290403dSRicardo Garcia
116*b290403dSRicardo GarciaWhen you're done with a sonic stream, you can free it's memory with:
117*b290403dSRicardo Garcia
118*b290403dSRicardo Garcia    sonicDestroyStream(stream);
119*b290403dSRicardo Garcia
120*b290403dSRicardo GarciaBy default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
121*b290403dSRicardo Garciano change at all to the sound stream.  Sonic detects this case, and simply
122*b290403dSRicardo Garciacopies the input to the output to reduce CPU load.  To change the speed, pitch,
123*b290403dSRicardo Garciarate, or volume, set the parameters using:
124*b290403dSRicardo Garcia
125*b290403dSRicardo Garcia    sonicSetSpeed(stream, speed);
126*b290403dSRicardo Garcia    sonicSetPitch(stream, pitch);
127*b290403dSRicardo Garcia    sonicSetRate(stream, rate);
128*b290403dSRicardo Garcia    sonicSetVolume(stream, volume);
129*b290403dSRicardo Garcia
130*b290403dSRicardo GarciaThese four parameters are floating point numbers.  A speed of 2.0 means to
131*b290403dSRicardo Garciadouble speed of speech.  A pitch of 0.95 means to lower the pitch by about 5%,
132*b290403dSRicardo Garciaand a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
133*b290403dSRicardo Garciaexceed the maximum range of a 16-bit integer.  Speech rate scales how fast
134*b290403dSRicardo Garciaspeech is played.  A 2.0 value will make you sound like a chipmunk talking very
135*b290403dSRicardo Garciafast.  A 0.7 value will make you sound like a giant talking slowly.
136*b290403dSRicardo Garcia
137*b290403dSRicardo GarciaBy default, pitch is modified by changing the rate, and then using speed
138*b290403dSRicardo Garciamodification to bring the speed back to normal.  This allows for a wide range of
139*b290403dSRicardo Garciapitch changes, but changing the pitch makes the speaker sound larger or smaller,
140*b290403dSRicardo Garciatoo.  If you want to make the person sound like the same person, but talking at
141*b290403dSRicardo Garciaa higher or lower pitch, then enable the vocal chord emulation mode for pitch
142*b290403dSRicardo Garciascaling, using:
143*b290403dSRicardo Garcia
144*b290403dSRicardo Garcia    sonicSetChordPitch(stream, 1);
145*b290403dSRicardo Garcia
146*b290403dSRicardo GarciaHowever, only small changes to pitch should be used in this mode, as it
147*b290403dSRicardo Garciaintroduces significant distortion otherwise.
148*b290403dSRicardo Garcia
149*b290403dSRicardo GarciaAfter setting the sound parameters, you write to the stream like this:
150*b290403dSRicardo Garcia
151*b290403dSRicardo Garcia    sonicWriteShortToStream(stream, samples, numSamples);
152*b290403dSRicardo Garcia
153*b290403dSRicardo GarciaYou read the sped up speech samples from sonic like this:
154*b290403dSRicardo Garcia
155*b290403dSRicardo Garcia    samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
156*b290403dSRicardo Garcia    if(samplesRead > 0) {
157*b290403dSRicardo Garcia	/* Do something with the output samples in outBuffer, like send them to
158*b290403dSRicardo Garcia	 * the sound device. */
159*b290403dSRicardo Garcia    }
160*b290403dSRicardo Garcia
161*b290403dSRicardo GarciaYou may change the speed, pitch, rate, and volume parameters at any time, without
162*b290403dSRicardo Garciahaving to flush or create a new sonic stream.
163*b290403dSRicardo Garcia
164*b290403dSRicardo GarciaWhen your sound stream ends, there may be several milliseconds of sound data in
165*b290403dSRicardo Garciathe sonic stream's buffers.  To force sonic to process those samples use:
166*b290403dSRicardo Garcia
167*b290403dSRicardo Garcia    sonicFlushStream(stream);
168*b290403dSRicardo Garcia
169*b290403dSRicardo GarciaThen, read those samples as above.  That's about all there is to using libsonic.
170*b290403dSRicardo GarciaThere are some more functions as a convenience for the user, like
171*b290403dSRicardo GarciasonicGetSpeed.  Other sound data formats are supported: signed char and float.
172*b290403dSRicardo GarciaIf float, the sound data should be between -1.0 and 1.0.  Internally, all sound
173*b290403dSRicardo Garciadata is converted to 16-bit integers for processing.
174