1*89c4ff92SAndroid Build Coastguard Worker# Speech Recognition Example 2*89c4ff92SAndroid Build Coastguard Worker 3*89c4ff92SAndroid Build Coastguard Worker## Introduction 4*89c4ff92SAndroid Build Coastguard WorkerThis is a sample code showing automatic speech recognition using Arm NN public C++ API. The compiled application can take 5*89c4ff92SAndroid Build Coastguard Worker 6*89c4ff92SAndroid Build Coastguard Worker * an audio file 7*89c4ff92SAndroid Build Coastguard Worker 8*89c4ff92SAndroid Build Coastguard Workeras input and produce 9*89c4ff92SAndroid Build Coastguard Worker * recognised text to the console 10*89c4ff92SAndroid Build Coastguard Worker 11*89c4ff92SAndroid Build Coastguard Workeras output 12*89c4ff92SAndroid Build Coastguard Worker 13*89c4ff92SAndroid Build Coastguard Worker## Dependencies 14*89c4ff92SAndroid Build Coastguard Worker 15*89c4ff92SAndroid Build Coastguard WorkerThis example utilises `libsndfile`, `libasound` and `libsamplerate` libraries to capture the raw audio data from file, and to re-sample to the expected 16*89c4ff92SAndroid Build Coastguard Workersample rate. Top level inference API is provided by Arm NN library. 17*89c4ff92SAndroid Build Coastguard Worker 18*89c4ff92SAndroid Build Coastguard Worker### Arm NN 19*89c4ff92SAndroid Build Coastguard Worker 20*89c4ff92SAndroid Build Coastguard WorkerSpeech Recognition example build system does not trigger Arm NN compilation. Thus, before building the application, 21*89c4ff92SAndroid Build Coastguard Workerplease ensure that Arm NN libraries and header files are available on your build platform. 22*89c4ff92SAndroid Build Coastguard WorkerThe application executable binary dynamically links with the following Arm NN libraries: 23*89c4ff92SAndroid Build Coastguard Worker* libarmnn.so 24*89c4ff92SAndroid Build Coastguard Worker* libarmnnTfLiteParser.so 25*89c4ff92SAndroid Build Coastguard Worker 26*89c4ff92SAndroid Build Coastguard WorkerThe build script searches for available Arm NN libraries in the following order: 27*89c4ff92SAndroid Build Coastguard Worker1. Inside custom user directory specified by ARMNN_LIB_DIR cmake option. 28*89c4ff92SAndroid Build Coastguard Worker2. Inside the current Arm NN repository, assuming that Arm NN was built following [these instructions](../../BuildGuideCrossCompilation.md). 29*89c4ff92SAndroid Build Coastguard Worker3. Inside default locations for system libraries, assuming Arm NN was installed from deb packages. 30*89c4ff92SAndroid Build Coastguard Worker 31*89c4ff92SAndroid Build Coastguard WorkerArm NN header files will be searched in parent directory of found libraries files under `include` directory, i.e. 32*89c4ff92SAndroid Build Coastguard Workerlibraries found in `/usr/lib` or `/usr/lib64` and header files in `/usr/include` (or `${ARMNN_LIB_DIR}/include`). 33*89c4ff92SAndroid Build Coastguard Worker 34*89c4ff92SAndroid Build Coastguard WorkerPlease see [find_armnn.cmake](./cmake/find_armnn.cmake) for implementation details. 35*89c4ff92SAndroid Build Coastguard Worker 36*89c4ff92SAndroid Build Coastguard Worker## Building 37*89c4ff92SAndroid Build Coastguard WorkerThere is one flow for building this application: 38*89c4ff92SAndroid Build Coastguard Worker* native build on a host platform 39*89c4ff92SAndroid Build Coastguard Worker 40*89c4ff92SAndroid Build Coastguard Worker### Build Options 41*89c4ff92SAndroid Build Coastguard Worker* ARMNN_LIB_DIR - point to the custom location of the Arm NN libs and headers. 42*89c4ff92SAndroid Build Coastguard Worker* BUILD_UNIT_TESTS - set to `1` to build tests. Additionally to the main application, `speech-recognition-example-tests` 43*89c4ff92SAndroid Build Coastguard Workerunit tests executable will be created. 44*89c4ff92SAndroid Build Coastguard Worker 45*89c4ff92SAndroid Build Coastguard Worker### Native Build 46*89c4ff92SAndroid Build Coastguard WorkerTo build this application on a host platform, firstly ensure that required dependencies are installed: 47*89c4ff92SAndroid Build Coastguard WorkerFor example, for raspberry PI: 48*89c4ff92SAndroid Build Coastguard Worker```commandline 49*89c4ff92SAndroid Build Coastguard Workersudo apt-get update 50*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libsndfile1-dev 51*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libasound2-dev 52*89c4ff92SAndroid Build Coastguard Workersudo apt-get -yq install libsamplerate-dev 53*89c4ff92SAndroid Build Coastguard Worker``` 54*89c4ff92SAndroid Build Coastguard Worker 55*89c4ff92SAndroid Build Coastguard WorkerTo build demo application, create a build directory: 56*89c4ff92SAndroid Build Coastguard Worker```commandline 57*89c4ff92SAndroid Build Coastguard Workermkdir build 58*89c4ff92SAndroid Build Coastguard Workercd build 59*89c4ff92SAndroid Build Coastguard Worker``` 60*89c4ff92SAndroid Build Coastguard WorkerIf you have already installed Arm NN and and the required libraries: 61*89c4ff92SAndroid Build Coastguard Worker 62*89c4ff92SAndroid Build Coastguard WorkerInside build directory, run cmake and make commands: 63*89c4ff92SAndroid Build Coastguard Worker```commandline 64*89c4ff92SAndroid Build Coastguard Workercmake .. 65*89c4ff92SAndroid Build Coastguard Workermake 66*89c4ff92SAndroid Build Coastguard Worker``` 67*89c4ff92SAndroid Build Coastguard WorkerThis will build the following in bin directory: 68*89c4ff92SAndroid Build Coastguard Worker* `speech-recognition-example` - application executable 69*89c4ff92SAndroid Build Coastguard Worker 70*89c4ff92SAndroid Build Coastguard WorkerIf you have custom Arm NN location, use `ARMNN_LIB_DIR` options: 71*89c4ff92SAndroid Build Coastguard Worker```commandline 72*89c4ff92SAndroid Build Coastguard Workercmake -DARMNN_LIB_DIR=/path/to/armnn .. 73*89c4ff92SAndroid Build Coastguard Workermake 74*89c4ff92SAndroid Build Coastguard Worker``` 75*89c4ff92SAndroid Build Coastguard Worker## Executing 76*89c4ff92SAndroid Build Coastguard Worker 77*89c4ff92SAndroid Build Coastguard WorkerOnce the application executable is built, it can be executed with the following options: 78*89c4ff92SAndroid Build Coastguard Worker* --audio-file-path: Path to the audio file to run speech recognition on **[REQUIRED]** 79*89c4ff92SAndroid Build Coastguard Worker* --model-file-path: Path to the Speech Recognition model to use **[REQUIRED]** 80*89c4ff92SAndroid Build Coastguard Worker 81*89c4ff92SAndroid Build Coastguard Worker* --preferred-backends: Takes the preferred backends in preference order, separated by comma. 82*89c4ff92SAndroid Build Coastguard Worker For example: `CpuAcc,GpuAcc,CpuRef`. Accepted options: [`CpuAcc`, `CpuRef`, `GpuAcc`]. 83*89c4ff92SAndroid Build Coastguard Worker Defaults to `CpuRef` **[OPTIONAL]** 84*89c4ff92SAndroid Build Coastguard Worker 85*89c4ff92SAndroid Build Coastguard Worker### Speech Recognition on a supplied audio file 86*89c4ff92SAndroid Build Coastguard Worker 87*89c4ff92SAndroid Build Coastguard WorkerTo run speech recognition on a supplied audio file and output the result to console: 88*89c4ff92SAndroid Build Coastguard Worker```commandline 89*89c4ff92SAndroid Build Coastguard Worker./speech-recognition-example --audio-file-path /path/to/audio/file --model-file-path /path/to/model/file 90*89c4ff92SAndroid Build Coastguard Worker``` 91*89c4ff92SAndroid Build Coastguard Worker--- 92*89c4ff92SAndroid Build Coastguard Worker 93*89c4ff92SAndroid Build Coastguard Worker# Application Overview 94*89c4ff92SAndroid Build Coastguard WorkerThis section provides a walkthrough of the application, explaining in detail the steps: 95*89c4ff92SAndroid Build Coastguard Worker1. Initialisation 96*89c4ff92SAndroid Build Coastguard Worker 1. Reading from Audio Source 97*89c4ff92SAndroid Build Coastguard Worker2. Creating a Network 98*89c4ff92SAndroid Build Coastguard Worker 1. Creating Parser and Importing Graph 99*89c4ff92SAndroid Build Coastguard Worker 3. Optimizing Graph for Compute Device 100*89c4ff92SAndroid Build Coastguard Worker 4. Creating Input and Output Binding Information 101*89c4ff92SAndroid Build Coastguard Worker3. Speech Recognition pipeline 102*89c4ff92SAndroid Build Coastguard Worker 1. Pre-processing the Captured Audio 103*89c4ff92SAndroid Build Coastguard Worker 2. Making Input and Output Tensors 104*89c4ff92SAndroid Build Coastguard Worker 3. Executing Inference 105*89c4ff92SAndroid Build Coastguard Worker 4. Postprocessing 106*89c4ff92SAndroid Build Coastguard Worker 5. Decoding and Processing Inference Output 107*89c4ff92SAndroid Build Coastguard Worker 108*89c4ff92SAndroid Build Coastguard Worker### Initialisation 109*89c4ff92SAndroid Build Coastguard Worker 110*89c4ff92SAndroid Build Coastguard Worker##### Reading from Audio Source 111*89c4ff92SAndroid Build Coastguard WorkerAfter parsing user arguments, the chosen audio file is loaded into an AudioCapture object. 112*89c4ff92SAndroid Build Coastguard WorkerWe use [`AudioCapture`](./include/AudioCapture.hpp) in our main function to capture appropriately sized audio blocks from the source using the 113*89c4ff92SAndroid Build Coastguard Worker`Next()` function. 114*89c4ff92SAndroid Build Coastguard Worker 115*89c4ff92SAndroid Build Coastguard WorkerThe `AudioCapture` object also re-samples the audio input to a desired sample rate, and sets the number of channels used to one channel (i.e `mono`) 116*89c4ff92SAndroid Build Coastguard Worker 117*89c4ff92SAndroid Build Coastguard Worker### Creating a Network 118*89c4ff92SAndroid Build Coastguard Worker 119*89c4ff92SAndroid Build Coastguard WorkerAll operations with Arm NN and networks are encapsulated in [`ArmnnNetworkExecutor`](./include/ArmnnNetworkExecutor.hpp) 120*89c4ff92SAndroid Build Coastguard Workerclass. 121*89c4ff92SAndroid Build Coastguard Worker 122*89c4ff92SAndroid Build Coastguard Worker##### Creating Parser and Importing Graph 123*89c4ff92SAndroid Build Coastguard WorkerThe first step with Arm NN SDK is to import a graph from file by using the appropriate parser. 124*89c4ff92SAndroid Build Coastguard Worker 125*89c4ff92SAndroid Build Coastguard WorkerThe Arm NN SDK provides parsers for reading graphs from a variety of model formats. In our application we specifically 126*89c4ff92SAndroid Build Coastguard Workerfocus on `.tflite, .pb, .onnx` models. 127*89c4ff92SAndroid Build Coastguard Worker 128*89c4ff92SAndroid Build Coastguard WorkerBased on the extension of the provided model file, the corresponding parser is created and the network file loaded with 129*89c4ff92SAndroid Build Coastguard Worker`CreateNetworkFromBinaryFile()` method. The parser will handle the creation of the underlying Arm NN graph. 130*89c4ff92SAndroid Build Coastguard Worker 131*89c4ff92SAndroid Build Coastguard WorkerCurrent example accepts tflite format model files, we use `ITfLiteParser`: 132*89c4ff92SAndroid Build Coastguard Worker```c++ 133*89c4ff92SAndroid Build Coastguard Worker#include "armnnTfLiteParser/ITfLiteParser.hpp" 134*89c4ff92SAndroid Build Coastguard Worker 135*89c4ff92SAndroid Build Coastguard WorkerarmnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create(); 136*89c4ff92SAndroid Build Coastguard Workerarmnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile(modelPath.c_str()); 137*89c4ff92SAndroid Build Coastguard Worker``` 138*89c4ff92SAndroid Build Coastguard Worker 139*89c4ff92SAndroid Build Coastguard Worker##### Optimizing Graph for Compute Device 140*89c4ff92SAndroid Build Coastguard WorkerArm NN supports optimized execution on multiple CPU and GPU devices. Prior to executing a graph, we must select the 141*89c4ff92SAndroid Build Coastguard Workerappropriate device context. We do this by creating a runtime context with default options with `IRuntime()`. 142*89c4ff92SAndroid Build Coastguard Worker 143*89c4ff92SAndroid Build Coastguard WorkerFor example: 144*89c4ff92SAndroid Build Coastguard Worker```c++ 145*89c4ff92SAndroid Build Coastguard Worker#include "armnn/ArmNN.hpp" 146*89c4ff92SAndroid Build Coastguard Worker 147*89c4ff92SAndroid Build Coastguard Workerauto runtime = armnn::IRuntime::Create(armnn::IRuntime::CreationOptions()); 148*89c4ff92SAndroid Build Coastguard Worker``` 149*89c4ff92SAndroid Build Coastguard Worker 150*89c4ff92SAndroid Build Coastguard WorkerWe can optimize the imported graph by specifying a list of backends in order of preference and implement 151*89c4ff92SAndroid Build Coastguard Workerbackend-specific optimizations. The backends are identified by a string unique to the backend, 152*89c4ff92SAndroid Build Coastguard Workerfor example `CpuAcc, GpuAcc, CpuRef`. 153*89c4ff92SAndroid Build Coastguard Worker 154*89c4ff92SAndroid Build Coastguard WorkerFor example: 155*89c4ff92SAndroid Build Coastguard Worker```c++ 156*89c4ff92SAndroid Build Coastguard Workerstd::vector<armnn::BackendId> backends{"CpuAcc", "GpuAcc", "CpuRef"}; 157*89c4ff92SAndroid Build Coastguard Worker``` 158*89c4ff92SAndroid Build Coastguard Worker 159*89c4ff92SAndroid Build Coastguard WorkerInternally and transparently, Arm NN splits the graph into subgraph based on backends, it calls a optimize subgraphs 160*89c4ff92SAndroid Build Coastguard Workerfunction on each of them and, if possible, substitutes the corresponding subgraph in the original graph with 161*89c4ff92SAndroid Build Coastguard Workerits optimized version. 162*89c4ff92SAndroid Build Coastguard Worker 163*89c4ff92SAndroid Build Coastguard WorkerUsing the `Optimize()` function we optimize the graph for inference and load the optimized network onto the compute 164*89c4ff92SAndroid Build Coastguard Workerdevice with `LoadNetwork()`. This function creates the backend-specific workloads 165*89c4ff92SAndroid Build Coastguard Workerfor the layers and a backend specific workload factory which is called to create the workloads. 166*89c4ff92SAndroid Build Coastguard Worker 167*89c4ff92SAndroid Build Coastguard WorkerFor example: 168*89c4ff92SAndroid Build Coastguard Worker```c++ 169*89c4ff92SAndroid Build Coastguard Workerarmnn::IOptimizedNetworkPtr optNet = Optimize(*network, 170*89c4ff92SAndroid Build Coastguard Worker backends, 171*89c4ff92SAndroid Build Coastguard Worker m_Runtime->GetDeviceSpec(), 172*89c4ff92SAndroid Build Coastguard Worker armnn::OptimizerOptions()); 173*89c4ff92SAndroid Build Coastguard Workerstd::string errorMessage; 174*89c4ff92SAndroid Build Coastguard Workerruntime->LoadNetwork(0, std::move(optNet), errorMessage)); 175*89c4ff92SAndroid Build Coastguard Workerstd::cerr << errorMessage << std::endl; 176*89c4ff92SAndroid Build Coastguard Worker``` 177*89c4ff92SAndroid Build Coastguard Worker 178*89c4ff92SAndroid Build Coastguard Worker##### Creating Input and Output Binding Information 179*89c4ff92SAndroid Build Coastguard WorkerParsers can also be used to extract the input information for the network. By calling `GetSubgraphInputTensorNames` 180*89c4ff92SAndroid Build Coastguard Workerwe extract all the input names and, with `GetNetworkInputBindingInfo`, bind the input points of the graph. 181*89c4ff92SAndroid Build Coastguard WorkerFor example: 182*89c4ff92SAndroid Build Coastguard Worker```c++ 183*89c4ff92SAndroid Build Coastguard Workerstd::vector<std::string> inputNames = parser->GetSubgraphInputTensorNames(0); 184*89c4ff92SAndroid Build Coastguard Workerauto inputBindingInfo = parser->GetNetworkInputBindingInfo(0, inputNames[0]); 185*89c4ff92SAndroid Build Coastguard Worker``` 186*89c4ff92SAndroid Build Coastguard WorkerThe input binding information contains all the essential information about the input. It is a tuple consisting of 187*89c4ff92SAndroid Build Coastguard Workerinteger identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information, 188*89c4ff92SAndroid Build Coastguard Workernumber of dimensions, total number of elements). 189*89c4ff92SAndroid Build Coastguard Worker 190*89c4ff92SAndroid Build Coastguard WorkerSimilarly, we can get the output binding information for an output layer by using the parser to retrieve output 191*89c4ff92SAndroid Build Coastguard Workertensor names and calling `GetNetworkOutputBindingInfo()`. 192*89c4ff92SAndroid Build Coastguard Worker 193*89c4ff92SAndroid Build Coastguard Worker### Speech Recognition pipeline 194*89c4ff92SAndroid Build Coastguard Worker 195*89c4ff92SAndroid Build Coastguard WorkerThe speech recognition pipeline has 3 steps to perform, data pre-processing, run inference and decode inference results 196*89c4ff92SAndroid Build Coastguard Workerin the post-processing step. 197*89c4ff92SAndroid Build Coastguard Worker 198*89c4ff92SAndroid Build Coastguard WorkerSee [`SpeechRecognitionPipeline`](include/SpeechRecognitionPipeline.hpp) for more details. 199*89c4ff92SAndroid Build Coastguard Worker 200*89c4ff92SAndroid Build Coastguard Worker#### Pre-processing the Audio Input 201*89c4ff92SAndroid Build Coastguard WorkerEach frame captured from source is read and stored by the AudioCapture object. 202*89c4ff92SAndroid Build Coastguard WorkerIt's `Next()` function provides us with the correctly positioned window of data, sized appropriately for the given model, to pre-process before inference. 203*89c4ff92SAndroid Build Coastguard Worker 204*89c4ff92SAndroid Build Coastguard Worker```c++ 205*89c4ff92SAndroid Build Coastguard Workerstd::vector<float> audioBlock = capture.Next(); 206*89c4ff92SAndroid Build Coastguard Worker... 207*89c4ff92SAndroid Build Coastguard Workerstd::vector<int8_t> preprocessedData = asrPipeline->PreProcessing<float, int8_t>(audioBlock, preprocessor); 208*89c4ff92SAndroid Build Coastguard Worker``` 209*89c4ff92SAndroid Build Coastguard Worker 210*89c4ff92SAndroid Build Coastguard WorkerThe `MFCC` class is then used to extract the Mel-frequency Cepstral Coefficients (MFCCs, [see Wikipedia](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)) from each stored audio frame in the provided window of audio, to be used as features for the network. MFCCs are the result of computing the dot product of the Discrete Cosine Transform (DCT) Matrix and the log of the Mel energy. 211*89c4ff92SAndroid Build Coastguard Worker 212*89c4ff92SAndroid Build Coastguard WorkerAfter all the MFCCs needed for an inference have been extracted from the audio data, we convolve them with 1-dimensional Savitzky-Golay filters to compute the first and second MFCC derivatives with respect to time. The MFCCs and the derivatives are concatenated to make the input tensor for the model 213*89c4ff92SAndroid Build Coastguard Worker 214*89c4ff92SAndroid Build Coastguard Worker 215*89c4ff92SAndroid Build Coastguard Worker#### Executing Inference 216*89c4ff92SAndroid Build Coastguard Worker```c++ 217*89c4ff92SAndroid Build Coastguard Workercommon::InferenceResults results; 218*89c4ff92SAndroid Build Coastguard Worker... 219*89c4ff92SAndroid Build Coastguard WorkerasrPipeline->Inference<int8_t>(preprocessedData, results); 220*89c4ff92SAndroid Build Coastguard Worker``` 221*89c4ff92SAndroid Build Coastguard WorkerInference step will call `ArmnnNetworkExecutor::Run` method that will prepare input tensors and execute inference. 222*89c4ff92SAndroid Build Coastguard WorkerA compute device performs inference for the loaded network using the `EnqueueWorkload()` function of the runtime context. 223*89c4ff92SAndroid Build Coastguard WorkerFor example: 224*89c4ff92SAndroid Build Coastguard Worker```c++ 225*89c4ff92SAndroid Build Coastguard Worker//const void* inputData = ...; 226*89c4ff92SAndroid Build Coastguard Worker//outputTensors were pre-allocated before 227*89c4ff92SAndroid Build Coastguard Worker 228*89c4ff92SAndroid Build Coastguard Workerarmnn::InputTensors inputTensors = {{ inputBindingInfo.first,armnn::ConstTensor(inputBindingInfo.second, inputData)}}; 229*89c4ff92SAndroid Build Coastguard Workerruntime->EnqueueWorkload(0, inputTensors, outputTensors); 230*89c4ff92SAndroid Build Coastguard Worker``` 231*89c4ff92SAndroid Build Coastguard WorkerWe allocate memory for output data once and map it to output tensor objects. After successful inference, we read data 232*89c4ff92SAndroid Build Coastguard Workerfrom the pre-allocated output data buffer. See [`ArmnnNetworkExecutor::ArmnnNetworkExecutor`](./src/ArmnnNetworkExecutor.cpp) 233*89c4ff92SAndroid Build Coastguard Workerand [`ArmnnNetworkExecutor::Run`](./src/ArmnnNetworkExecutor.cpp) for more details. 234*89c4ff92SAndroid Build Coastguard Worker 235*89c4ff92SAndroid Build Coastguard Worker#### Postprocessing 236*89c4ff92SAndroid Build Coastguard Worker 237*89c4ff92SAndroid Build Coastguard Worker##### Decoding and Processing Inference Output 238*89c4ff92SAndroid Build Coastguard WorkerThe output from the inference must be decoded to obtain the recognised characters from the speech. 239*89c4ff92SAndroid Build Coastguard WorkerA simple greedy decoder classifies the results by taking the highest element of the output as a key for the labels dictionary. 240*89c4ff92SAndroid Build Coastguard WorkerThe value returned is a character which is appended to a list, and the list is filtered to remove unwanted characters. 241*89c4ff92SAndroid Build Coastguard Worker 242*89c4ff92SAndroid Build Coastguard Worker```c++ 243*89c4ff92SAndroid Build Coastguard WorkerasrPipeline->PostProcessing<int8_t>(results, isFirstWindow, !capture.HasNext(), currentRContext); 244*89c4ff92SAndroid Build Coastguard Worker``` 245*89c4ff92SAndroid Build Coastguard WorkerThe produced string is displayed on the console.