xref: /aosp_15_r20/external/libopus/dnn/torch/osce/README.md (revision a58d3d2adb790c104798cd88c8a3aff4fa8b82cc)
1*a58d3d2aSXin Li# Opus Speech Coding Enhancement
2*a58d3d2aSXin Li
3*a58d3d2aSXin LiThis folder hosts models for enhancing Opus SILK.
4*a58d3d2aSXin Li
5*a58d3d2aSXin Li## Environment setup
6*a58d3d2aSXin LiThe code is tested with python 3.11. Conda setup is done via
7*a58d3d2aSXin Li
8*a58d3d2aSXin Li
9*a58d3d2aSXin Li`conda create -n osce python=3.11`
10*a58d3d2aSXin Li
11*a58d3d2aSXin Li`conda activate osce`
12*a58d3d2aSXin Li
13*a58d3d2aSXin Li`python -m pip install -r requirements.txt`
14*a58d3d2aSXin Li
15*a58d3d2aSXin Li
16*a58d3d2aSXin Li## Generating training data
17*a58d3d2aSXin LiFirst step is to convert all training items to 16 kHz and 16 bit pcm and then concatenate them. A convenient way to do this is to create a file list and then run
18*a58d3d2aSXin Li
19*a58d3d2aSXin Li`python scripts/concatenator.py filelist 16000 dataset/clean.s16 --db_min -40 --db_max 0`
20*a58d3d2aSXin Li
21*a58d3d2aSXin Liwhich on top provides some random scaling.
22*a58d3d2aSXin Li
23*a58d3d2aSXin LiSecond step is to run a patched version of opus_demo in the dataset folder, which will produce the coded output and add feature files. To build the patched opus_demo binary, check out the exp-neural-silk-enhancement branch and build opus_demo the usual way. Then run
24*a58d3d2aSXin Li
25*a58d3d2aSXin Li`cd dataset && <path_to_patched_opus_demo>/opus_demo voip 16000 1 9000 -silk_random_switching 249 clean.s16 coded.s16 `
26*a58d3d2aSXin Li
27*a58d3d2aSXin LiThe argument to -silk_random_switching specifies the number of frames after which parameters are switched randomly.
28*a58d3d2aSXin Li
29*a58d3d2aSXin Li## Regression loss based training
30*a58d3d2aSXin LiCreate a default setup for LACE or NoLACE via
31*a58d3d2aSXin Li
32*a58d3d2aSXin Li`python make_default_setup.py model.yml --model lace/nolace --path2dataset <path2dataset>`
33*a58d3d2aSXin Li
34*a58d3d2aSXin LiThen run
35*a58d3d2aSXin Li
36*a58d3d2aSXin Li`python train_model.py model.yml <output folder> --no-redirect`
37*a58d3d2aSXin Li
38*a58d3d2aSXin Lifor running the training script in foreground or
39*a58d3d2aSXin Li
40*a58d3d2aSXin Li`nohup python train_model.py model.yml <output folder> &`
41*a58d3d2aSXin Li
42*a58d3d2aSXin Lito run it in background. In the latter case the output is written to `<output folder>/out.txt`.
43*a58d3d2aSXin Li
44*a58d3d2aSXin Li## Adversarial training (NoLACE only)
45*a58d3d2aSXin LiCreate a default setup for NoLACE via
46*a58d3d2aSXin Li
47*a58d3d2aSXin Li`python make_default_setup.py nolace_adv.yml --model nolace --adversarial --path2dataset <path2dataset>`
48*a58d3d2aSXin Li
49*a58d3d2aSXin LiThen run
50*a58d3d2aSXin Li
51*a58d3d2aSXin Li`python adv_train_model.py nolace_adv.yml <output folder> --no-redirect`
52*a58d3d2aSXin Li
53*a58d3d2aSXin Lifor running the training script in foreground or
54*a58d3d2aSXin Li
55*a58d3d2aSXin Li`nohup python adv_train_model.py nolace_adv.yml <output folder> &`
56*a58d3d2aSXin Li
57*a58d3d2aSXin Lito run it in background. In the latter case the output is written to `<output folder>/out.txt`.
58*a58d3d2aSXin Li
59*a58d3d2aSXin Li## Inference
60*a58d3d2aSXin LiGenerating inference data is analogous to generating training data. Given an item 'item1.wav' run
61*a58d3d2aSXin Li`mkdir item1.se && sox item1.wav -r 16000 -e signed-integer -b 16 item1.raw && cd item1.se && <path_to_patched_opus_demo>/opus_demo voip 16000 1 <bitrate> ../item1.raw noisy.s16`
62*a58d3d2aSXin Li
63*a58d3d2aSXin LiThe folder item1.se then serves as input for the test_model.py script or for the --testdata argument of train_model.py resp. adv_train_model.py
64*a58d3d2aSXin Li
65*a58d3d2aSXin LiCheckpoints of pre-trained models are located here: https://media.xiph.org/lpcnet/models/lace-20231019.tar.gz