xref: /aosp_15_r20/external/armnn/samples/SpeechRecognition/Readme.md (revision 89c4ff92f2867872bb9e2354d150bf0c8c502810)
1*89c4ff92SAndroid Build Coastguard Worker# Speech Recognition Example
2*89c4ff92SAndroid Build Coastguard Worker
3*89c4ff92SAndroid Build Coastguard Worker## Introduction
4*89c4ff92SAndroid Build Coastguard WorkerThis is a sample code showing automatic speech recognition using Arm NN public C++ API. The compiled application can take
5*89c4ff92SAndroid Build Coastguard Worker
6*89c4ff92SAndroid Build Coastguard Worker * an audio file
7*89c4ff92SAndroid Build Coastguard Worker
8*89c4ff92SAndroid Build Coastguard Workeras input and produce
9*89c4ff92SAndroid Build Coastguard Worker * recognised text to the console
10*89c4ff92SAndroid Build Coastguard Worker
11*89c4ff92SAndroid Build Coastguard Workeras output
12*89c4ff92SAndroid Build Coastguard Worker
13*89c4ff92SAndroid Build Coastguard Worker## Dependencies
14*89c4ff92SAndroid Build Coastguard Worker
15*89c4ff92SAndroid Build Coastguard WorkerThis example utilises `libsndfile`, `libasound` and `libsamplerate` libraries to capture the raw audio data from file, and to re-sample to the expected
16*89c4ff92SAndroid Build Coastguard Workersample rate. Top level inference API is provided by Arm NN library.
17*89c4ff92SAndroid Build Coastguard Worker
18*89c4ff92SAndroid Build Coastguard Worker### Arm NN
19*89c4ff92SAndroid Build Coastguard Worker
20*89c4ff92SAndroid Build Coastguard WorkerSpeech Recognition example build system does not trigger Arm NN compilation. Thus, before building the application,
21*89c4ff92SAndroid Build Coastguard Workerplease ensure that Arm NN libraries and header files are available on your build platform.
22*89c4ff92SAndroid Build Coastguard WorkerThe application executable binary dynamically links with the following Arm NN libraries:
23*89c4ff92SAndroid Build Coastguard Worker* libarmnn.so
24*89c4ff92SAndroid Build Coastguard Worker* libarmnnTfLiteParser.so
25*89c4ff92SAndroid Build Coastguard Worker
26*89c4ff92SAndroid Build Coastguard WorkerThe build script searches for available Arm NN libraries in the following order:
27*89c4ff92SAndroid Build Coastguard Worker1. Inside custom user directory specified by ARMNN_LIB_DIR cmake option.
28*89c4ff92SAndroid Build Coastguard Worker2. Inside the current Arm NN repository, assuming that Arm NN was built following [these instructions](../../BuildGuideCrossCompilation.md).
29*89c4ff92SAndroid Build Coastguard Worker3. Inside default locations for system libraries, assuming Arm NN was installed from deb packages.
30*89c4ff92SAndroid Build Coastguard Worker
31*89c4ff92SAndroid Build Coastguard WorkerArm NN header files will be searched in parent directory of found libraries files under `include` directory, i.e.
32*89c4ff92SAndroid Build Coastguard Workerlibraries found in `/usr/lib` or `/usr/lib64` and header files in `/usr/include` (or `${ARMNN_LIB_DIR}/include`).
33*89c4ff92SAndroid Build Coastguard Worker
34*89c4ff92SAndroid Build Coastguard WorkerPlease see [find_armnn.cmake](./cmake/find_armnn.cmake) for implementation details.
35*89c4ff92SAndroid Build Coastguard Worker
36*89c4ff92SAndroid Build Coastguard Worker## Building
37*89c4ff92SAndroid Build Coastguard WorkerThere is one flow for building this application:
38*89c4ff92SAndroid Build Coastguard Worker* native build on a host platform
39*89c4ff92SAndroid Build Coastguard Worker
40*89c4ff92SAndroid Build Coastguard Worker### Build Options
41*89c4ff92SAndroid Build Coastguard Worker* ARMNN_LIB_DIR - point to the custom location of the Arm NN libs and headers.
42*89c4ff92SAndroid Build Coastguard Worker* BUILD_UNIT_TESTS -  set to `1` to build tests. Additionally to the main application, `speech-recognition-example-tests`
43*89c4ff92SAndroid Build Coastguard Workerunit tests executable will be created.
44*89c4ff92SAndroid Build Coastguard Worker
45*89c4ff92SAndroid Build Coastguard Worker### Native Build
46*89c4ff92SAndroid Build Coastguard WorkerTo build this application on a host platform, firstly ensure that required dependencies are installed:
47*89c4ff92SAndroid Build Coastguard WorkerFor example, for raspberry PI:
48*89c4ff92SAndroid Build Coastguard Worker```commandline
49*89c4ff92SAndroid Build Coastguard Workersudo apt-get update
50*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libsndfile1-dev
51*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libasound2-dev
52*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libsamplerate-dev
53*89c4ff92SAndroid Build Coastguard Worker```
54*89c4ff92SAndroid Build Coastguard Worker
55*89c4ff92SAndroid Build Coastguard WorkerTo build demo application, create a build directory:
56*89c4ff92SAndroid Build Coastguard Worker```commandline
57*89c4ff92SAndroid Build Coastguard Workermkdir build
58*89c4ff92SAndroid Build Coastguard Workercd build
59*89c4ff92SAndroid Build Coastguard Worker```
60*89c4ff92SAndroid Build Coastguard WorkerIf you have already installed Arm NN and and the required libraries:
61*89c4ff92SAndroid Build Coastguard Worker
62*89c4ff92SAndroid Build Coastguard WorkerInside build directory, run cmake and make commands:
63*89c4ff92SAndroid Build Coastguard Worker```commandline
64*89c4ff92SAndroid Build Coastguard Workercmake  ..
65*89c4ff92SAndroid Build Coastguard Workermake
66*89c4ff92SAndroid Build Coastguard Worker```
67*89c4ff92SAndroid Build Coastguard WorkerThis will build the following in bin directory:
68*89c4ff92SAndroid Build Coastguard Worker* `speech-recognition-example` - application executable
69*89c4ff92SAndroid Build Coastguard Worker
70*89c4ff92SAndroid Build Coastguard WorkerIf you have custom Arm NN location, use `ARMNN_LIB_DIR` options:
71*89c4ff92SAndroid Build Coastguard Worker```commandline
72*89c4ff92SAndroid Build Coastguard Workercmake  -DARMNN_LIB_DIR=/path/to/armnn ..
73*89c4ff92SAndroid Build Coastguard Workermake
74*89c4ff92SAndroid Build Coastguard Worker```
75*89c4ff92SAndroid Build Coastguard Worker## Executing
76*89c4ff92SAndroid Build Coastguard Worker
77*89c4ff92SAndroid Build Coastguard WorkerOnce the application executable is built, it can be executed with the following options:
78*89c4ff92SAndroid Build Coastguard Worker* --audio-file-path: Path to the audio file to run speech recognition on **[REQUIRED]**
79*89c4ff92SAndroid Build Coastguard Worker* --model-file-path: Path to the Speech Recognition model to use **[REQUIRED]**
80*89c4ff92SAndroid Build Coastguard Worker
81*89c4ff92SAndroid Build Coastguard Worker* --preferred-backends: Takes the preferred backends in preference order, separated by comma.
82*89c4ff92SAndroid Build Coastguard Worker                        For example: `CpuAcc,GpuAcc,CpuRef`. Accepted options: [`CpuAcc`, `CpuRef`, `GpuAcc`].
83*89c4ff92SAndroid Build Coastguard Worker                        Defaults to `CpuRef` **[OPTIONAL]**
84*89c4ff92SAndroid Build Coastguard Worker
85*89c4ff92SAndroid Build Coastguard Worker### Speech Recognition on a supplied audio file
86*89c4ff92SAndroid Build Coastguard Worker
87*89c4ff92SAndroid Build Coastguard WorkerTo run speech recognition on a supplied audio file and output the result to console:
88*89c4ff92SAndroid Build Coastguard Worker```commandline
89*89c4ff92SAndroid Build Coastguard Worker./speech-recognition-example --audio-file-path /path/to/audio/file --model-file-path /path/to/model/file
90*89c4ff92SAndroid Build Coastguard Worker```
91*89c4ff92SAndroid Build Coastguard Worker---
92*89c4ff92SAndroid Build Coastguard Worker
93*89c4ff92SAndroid Build Coastguard Worker# Application Overview
94*89c4ff92SAndroid Build Coastguard WorkerThis section provides a walkthrough of the application, explaining in detail the steps:
95*89c4ff92SAndroid Build Coastguard Worker1. Initialisation
96*89c4ff92SAndroid Build Coastguard Worker    1. Reading from Audio Source
97*89c4ff92SAndroid Build Coastguard Worker2. Creating a Network
98*89c4ff92SAndroid Build Coastguard Worker    1. Creating Parser and Importing Graph
99*89c4ff92SAndroid Build Coastguard Worker    3. Optimizing Graph for Compute Device
100*89c4ff92SAndroid Build Coastguard Worker    4. Creating Input and Output Binding Information
101*89c4ff92SAndroid Build Coastguard Worker3. Speech Recognition pipeline
102*89c4ff92SAndroid Build Coastguard Worker    1. Pre-processing the Captured Audio
103*89c4ff92SAndroid Build Coastguard Worker    2. Making Input and Output Tensors
104*89c4ff92SAndroid Build Coastguard Worker    3. Executing Inference
105*89c4ff92SAndroid Build Coastguard Worker    4. Postprocessing
106*89c4ff92SAndroid Build Coastguard Worker    5. Decoding and Processing Inference Output
107*89c4ff92SAndroid Build Coastguard Worker
108*89c4ff92SAndroid Build Coastguard Worker### Initialisation
109*89c4ff92SAndroid Build Coastguard Worker
110*89c4ff92SAndroid Build Coastguard Worker##### Reading from Audio Source
111*89c4ff92SAndroid Build Coastguard WorkerAfter parsing user arguments, the chosen audio file is loaded into an AudioCapture object.
112*89c4ff92SAndroid Build Coastguard WorkerWe use [`AudioCapture`](./include/AudioCapture.hpp) in our main function to capture appropriately sized audio blocks from the source using the
113*89c4ff92SAndroid Build Coastguard Worker`Next()` function.
114*89c4ff92SAndroid Build Coastguard Worker
115*89c4ff92SAndroid Build Coastguard WorkerThe `AudioCapture` object also re-samples the audio input to a desired sample rate, and sets the number of channels used to one channel (i.e `mono`)
116*89c4ff92SAndroid Build Coastguard Worker
117*89c4ff92SAndroid Build Coastguard Worker### Creating a Network
118*89c4ff92SAndroid Build Coastguard Worker
119*89c4ff92SAndroid Build Coastguard WorkerAll operations with Arm NN and networks are encapsulated in [`ArmnnNetworkExecutor`](./include/ArmnnNetworkExecutor.hpp)
120*89c4ff92SAndroid Build Coastguard Workerclass.
121*89c4ff92SAndroid Build Coastguard Worker
122*89c4ff92SAndroid Build Coastguard Worker##### Creating Parser and Importing Graph
123*89c4ff92SAndroid Build Coastguard WorkerThe first step with Arm NN SDK is to import a graph from file by using the appropriate parser.
124*89c4ff92SAndroid Build Coastguard Worker
125*89c4ff92SAndroid Build Coastguard WorkerThe Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically
126*89c4ff92SAndroid Build Coastguard Workerfocus on `.tflite, .pb, .onnx` models.
127*89c4ff92SAndroid Build Coastguard Worker
128*89c4ff92SAndroid Build Coastguard WorkerBased on the extension of the provided model file, the corresponding parser is created and the network file loaded with
129*89c4ff92SAndroid Build Coastguard Worker`CreateNetworkFromBinaryFile()` method. The parser will handle the creation of the underlying Arm NN graph.
130*89c4ff92SAndroid Build Coastguard Worker
131*89c4ff92SAndroid Build Coastguard WorkerCurrent example accepts tflite format model files, we use `ITfLiteParser`:
132*89c4ff92SAndroid Build Coastguard Worker```c++
133*89c4ff92SAndroid Build Coastguard Worker#include "armnnTfLiteParser/ITfLiteParser.hpp"
134*89c4ff92SAndroid Build Coastguard Worker
135*89c4ff92SAndroid Build Coastguard WorkerarmnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create();
136*89c4ff92SAndroid Build Coastguard Workerarmnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile(modelPath.c_str());
137*89c4ff92SAndroid Build Coastguard Worker```
138*89c4ff92SAndroid Build Coastguard Worker
139*89c4ff92SAndroid Build Coastguard Worker##### Optimizing Graph for Compute Device
140*89c4ff92SAndroid Build Coastguard WorkerArm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the
141*89c4ff92SAndroid Build Coastguard Workerappropriate device context. We do this by creating a runtime context with default options with `IRuntime()`.
142*89c4ff92SAndroid Build Coastguard Worker
143*89c4ff92SAndroid Build Coastguard WorkerFor example:
144*89c4ff92SAndroid Build Coastguard Worker```c++
145*89c4ff92SAndroid Build Coastguard Worker#include "armnn/ArmNN.hpp"
146*89c4ff92SAndroid Build Coastguard Worker
147*89c4ff92SAndroid Build Coastguard Workerauto runtime = armnn::IRuntime::Create(armnn::IRuntime::CreationOptions());
148*89c4ff92SAndroid Build Coastguard Worker```
149*89c4ff92SAndroid Build Coastguard Worker
150*89c4ff92SAndroid Build Coastguard WorkerWe can optimize the imported graph by specifying a list of backends in order of preference and implement
151*89c4ff92SAndroid Build Coastguard Workerbackend-specific optimizations. The backends are identified by a string unique to the backend,
152*89c4ff92SAndroid Build Coastguard Workerfor example `CpuAcc, GpuAcc, CpuRef`.
153*89c4ff92SAndroid Build Coastguard Worker
154*89c4ff92SAndroid Build Coastguard WorkerFor example:
155*89c4ff92SAndroid Build Coastguard Worker```c++
156*89c4ff92SAndroid Build Coastguard Workerstd::vector<armnn::BackendId> backends{"CpuAcc", "GpuAcc", "CpuRef"};
157*89c4ff92SAndroid Build Coastguard Worker```
158*89c4ff92SAndroid Build Coastguard Worker
159*89c4ff92SAndroid Build Coastguard WorkerInternally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs
160*89c4ff92SAndroid Build Coastguard Workerfunction on each of them and, if possible, substitutes the corresponding subgraph in the original graph with
161*89c4ff92SAndroid Build Coastguard Workerits optimized version.
162*89c4ff92SAndroid Build Coastguard Worker
163*89c4ff92SAndroid Build Coastguard WorkerUsing the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute
164*89c4ff92SAndroid Build Coastguard Workerdevice with `LoadNetwork()`. This function creates the backend-specific workloads
165*89c4ff92SAndroid Build Coastguard Workerfor the layers and a backend specific workload factory which is called to create the workloads.
166*89c4ff92SAndroid Build Coastguard Worker
167*89c4ff92SAndroid Build Coastguard WorkerFor example:
168*89c4ff92SAndroid Build Coastguard Worker```c++
169*89c4ff92SAndroid Build Coastguard Workerarmnn::IOptimizedNetworkPtr optNet = Optimize(*network,
170*89c4ff92SAndroid Build Coastguard Worker                                              backends,
171*89c4ff92SAndroid Build Coastguard Worker                                              m_Runtime->GetDeviceSpec(),
172*89c4ff92SAndroid Build Coastguard Worker                                              armnn::OptimizerOptions());
173*89c4ff92SAndroid Build Coastguard Workerstd::string errorMessage;
174*89c4ff92SAndroid Build Coastguard Workerruntime->LoadNetwork(0, std::move(optNet), errorMessage));
175*89c4ff92SAndroid Build Coastguard Workerstd::cerr << errorMessage << std::endl;
176*89c4ff92SAndroid Build Coastguard Worker```
177*89c4ff92SAndroid Build Coastguard Worker
178*89c4ff92SAndroid Build Coastguard Worker##### Creating Input and Output Binding Information
179*89c4ff92SAndroid Build Coastguard WorkerParsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames`
180*89c4ff92SAndroid Build Coastguard Workerwe extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph.
181*89c4ff92SAndroid Build Coastguard WorkerFor example:
182*89c4ff92SAndroid Build Coastguard Worker```c++
183*89c4ff92SAndroid Build Coastguard Workerstd::vector<std::string> inputNames = parser->GetSubgraphInputTensorNames(0);
184*89c4ff92SAndroid Build Coastguard Workerauto inputBindingInfo = parser->GetNetworkInputBindingInfo(0, inputNames[0]);
185*89c4ff92SAndroid Build Coastguard Worker```
186*89c4ff92SAndroid Build Coastguard WorkerThe input binding information contains all the essential information about the input. It is a tuple consisting of
187*89c4ff92SAndroid Build Coastguard Workerinteger identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information,
188*89c4ff92SAndroid Build Coastguard Workernumber of dimensions, total number of elements).
189*89c4ff92SAndroid Build Coastguard Worker
190*89c4ff92SAndroid Build Coastguard WorkerSimilarly, we can get the output binding information for an output layer by using the parser to retrieve output
191*89c4ff92SAndroid Build Coastguard Workertensor names and calling `GetNetworkOutputBindingInfo()`.
192*89c4ff92SAndroid Build Coastguard Worker
193*89c4ff92SAndroid Build Coastguard Worker### Speech Recognition pipeline
194*89c4ff92SAndroid Build Coastguard Worker
195*89c4ff92SAndroid Build Coastguard WorkerThe speech recognition pipeline has 3 steps to perform, data pre-processing, run inference and decode inference results
196*89c4ff92SAndroid Build Coastguard Workerin the post-processing step.
197*89c4ff92SAndroid Build Coastguard Worker
198*89c4ff92SAndroid Build Coastguard WorkerSee [`SpeechRecognitionPipeline`](include/SpeechRecognitionPipeline.hpp) for more details.
199*89c4ff92SAndroid Build Coastguard Worker
200*89c4ff92SAndroid Build Coastguard Worker#### Pre-processing the Audio Input
201*89c4ff92SAndroid Build Coastguard WorkerEach frame captured from source is read and stored by the AudioCapture object.
202*89c4ff92SAndroid Build Coastguard WorkerIt's `Next()` function provides us with the correctly positioned window of data, sized appropriately for the given model, to pre-process before inference.
203*89c4ff92SAndroid Build Coastguard Worker
204*89c4ff92SAndroid Build Coastguard Worker```c++
205*89c4ff92SAndroid Build Coastguard Workerstd::vector<float> audioBlock = capture.Next();
206*89c4ff92SAndroid Build Coastguard Worker...
207*89c4ff92SAndroid Build Coastguard Workerstd::vector<int8_t> preprocessedData = asrPipeline->PreProcessing<float, int8_t>(audioBlock, preprocessor);
208*89c4ff92SAndroid Build Coastguard Worker```
209*89c4ff92SAndroid Build Coastguard Worker
210*89c4ff92SAndroid Build Coastguard WorkerThe `MFCC` class is then used to extract the Mel-frequency Cepstral Coefficients (MFCCs, [see Wikipedia](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)) from each stored audio frame in the provided window of audio, to be used as features for the network. MFCCs are the result of computing the dot product of the Discrete Cosine Transform (DCT) Matrix and the log of the Mel energy.
211*89c4ff92SAndroid Build Coastguard Worker
212*89c4ff92SAndroid Build Coastguard WorkerAfter all the MFCCs needed for an inference have been extracted from the audio data, we convolve them with 1-dimensional Savitzky-Golay filters to compute the first and second MFCC derivatives with respect to time. The MFCCs and the derivatives are concatenated to make the input tensor for the model
213*89c4ff92SAndroid Build Coastguard Worker
214*89c4ff92SAndroid Build Coastguard Worker
215*89c4ff92SAndroid Build Coastguard Worker#### Executing Inference
216*89c4ff92SAndroid Build Coastguard Worker```c++
217*89c4ff92SAndroid Build Coastguard Workercommon::InferenceResults results;
218*89c4ff92SAndroid Build Coastguard Worker...
219*89c4ff92SAndroid Build Coastguard WorkerasrPipeline->Inference<int8_t>(preprocessedData, results);
220*89c4ff92SAndroid Build Coastguard Worker```
221*89c4ff92SAndroid Build Coastguard WorkerInference step will call `ArmnnNetworkExecutor::Run` method that will prepare input tensors and execute inference.
222*89c4ff92SAndroid Build Coastguard WorkerA compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context.
223*89c4ff92SAndroid Build Coastguard WorkerFor example:
224*89c4ff92SAndroid Build Coastguard Worker```c++
225*89c4ff92SAndroid Build Coastguard Worker//const void* inputData = ...;
226*89c4ff92SAndroid Build Coastguard Worker//outputTensors were pre-allocated before
227*89c4ff92SAndroid Build Coastguard Worker
228*89c4ff92SAndroid Build Coastguard Workerarmnn::InputTensors inputTensors = {{ inputBindingInfo.first,armnn::ConstTensor(inputBindingInfo.second, inputData)}};
229*89c4ff92SAndroid Build Coastguard Workerruntime->EnqueueWorkload(0, inputTensors, outputTensors);
230*89c4ff92SAndroid Build Coastguard Worker```
231*89c4ff92SAndroid Build Coastguard WorkerWe allocate memory for output data once and map it to output tensor objects. After successful inference, we read data
232*89c4ff92SAndroid Build Coastguard Workerfrom the pre-allocated output data buffer. See [`ArmnnNetworkExecutor::ArmnnNetworkExecutor`](./src/ArmnnNetworkExecutor.cpp)
233*89c4ff92SAndroid Build Coastguard Workerand [`ArmnnNetworkExecutor::Run`](./src/ArmnnNetworkExecutor.cpp) for more details.
234*89c4ff92SAndroid Build Coastguard Worker
235*89c4ff92SAndroid Build Coastguard Worker#### Postprocessing
236*89c4ff92SAndroid Build Coastguard Worker
237*89c4ff92SAndroid Build Coastguard Worker##### Decoding and Processing Inference Output
238*89c4ff92SAndroid Build Coastguard WorkerThe output from the inference must be decoded to obtain the recognised characters from the speech.
239*89c4ff92SAndroid Build Coastguard WorkerA simple greedy decoder classifies the results by taking the highest element of the output as a key for the labels dictionary.
240*89c4ff92SAndroid Build Coastguard WorkerThe value returned is a character which is appended to a list, and the list is filtered to remove unwanted characters.
241*89c4ff92SAndroid Build Coastguard Worker
242*89c4ff92SAndroid Build Coastguard Worker```c++
243*89c4ff92SAndroid Build Coastguard WorkerasrPipeline->PostProcessing<int8_t>(results, isFirstWindow, !capture.HasNext(), currentRContext);
244*89c4ff92SAndroid Build Coastguard Worker```
245*89c4ff92SAndroid Build Coastguard WorkerThe produced string is displayed on the console.