Quickstart¶

docker run -it -p 8080:8080 iceychris/libreasr:latest

The output looks like this:

make sde &
make sen &
make b
make[1]: Entering directory '/workspace'
python3 -u api-server.py de
make[1]: Entering directory '/workspace'
python3 -u api-bridge.py
make[1]: Entering directory '/workspace'
python3 -u api-server.py en
[api-bridge] running on :8080
[quantization] LM done.
[quantization] LM done.
[LM] loaded.
[LM] loaded.
[quantization] Transducer done.
[Inference] Model and Pipeline set up.
[api-server] gRPC server running on [::]:50052 language de
[quantization] Transducer done.
[Inference] Model and Pipeline set up.
[api-server] gRPC server running on [::]:50051 language en

Head your browser to http://localhost:8080/

Architecture¶

Overview¶

LibreASR is composed of

core and language models
api-server, which is serving models over a gRPC API
api-bridge, which exposes a WebSocket API for clients to use
Client implementations

Features¶

These features are already implemented:

RNN-T network
Fused language models
Dynamic Bucketing DataLoader
Dynamic Quantization
english, german

These are on the Roadmap:

french, spanish, italian, multilingual
Tuned language model fusion

Training¶

This sections contains instructions on how you can train your own models using LibreASR.

Overview¶

RNN-T Model¶

get some audio data with transcriptions (e.g. librispeech, common voice, …)
edit create-asr-dataset.py if you use a custom dataset
process each of your datasets using create-asr-dataset.py, e.g.:

  python3 create-asr-dataset.py /data/common-voice-english common-voice --lang en --workers 4

This results in multiple asr-dataset.csv files, which will be used for training.

edit the configuration testing.yaml to point to your data, choose transforms and tweak other settings
adjust and run libreasr.ipynb to start training
watch the training progress in tensorboard
the model with the best validation loss will get saved to models/model.pth, the model with the best WER ends up in models/best_wer.pth

Language Model¶

See this colab notebook or use this notebook.

Inference¶

Deployment¶

Example Apps¶


React Web App	ESP32-LyraT

Performance¶

| Model | Dataset | Network | Params | CER (dev) | WER (dev) | |———–|———|————|——–|———–|———–| | english | 1400h | 6-2-1024 | 70M | 18.9 | 23.8 | | german | 800h | 6-2-1024 | 70M | 23.2 | 37.6 |

While this is clearly not SotA, training the models for longer and on multiple GPUs (instead of a single 2080 ti) would yield better results.

See releases for pretrained models.

Contributing¶

Feel free to open an issue, create a pull request and join the Discord.

You may also contribute by training a large model for longer.

Quickstart¶

Architecture¶

Overview¶

Features¶

Training¶

Overview¶

RNN-T Model¶

Language Model¶

Inference¶

Deployment¶

Example Apps¶

Performance¶

Contributing¶

References & Credits¶