Captar-Libras

A Unified Multimodal Dataset for Brazilian Sign Language Tasks

Cauã Magalhães1, Bruno Lages1, Ari Filho1, Jéssica Ramos,
Paulo de Souza Coelho, Michel Silva2, Thiago L. Gomes2,
Milena Soriano Marcolino2, Elidéa Bernardino2, Raquel O. Prates1,
Mario F. M. Campos1, Erickson R. Nascimento1

1Universidade Federal de Minas Gerais (UFMG)
2Universidade Federal de Viçosa (UFV)

⚠️ Important: Our dataset requires credentialized access. The datasets on HuggingFace are gated and access must be authorized after you agree to the terms and conditions. Please visit the dataset repositories and request access through HuggingFace.

Dataset Overview

Captar-Libras is a large-scale multimodal dataset for Brazilian Sign Language (Libras), comprising video sequences from native Deaf signers performing sentences in the medical domain. Data was collected in a controlled capture environment using four synchronized cameras: a frontal RGB-D camera, two wide-angle cameras covering lateral and zenithal views, and a dedicated high-definition facial camera. This enables rich spatial and temporal modeling of body, hand, and facial expressions.

Each recording follows a guided protocol to ensure linguistic consistency and is annotated with frame-level gloss temporal alignments produced by specialist annotators. The dataset includes raw multi-view recordings, preprocessed video segments, structured gloss and text annotations, and precomputed full-body pose sequences in SMPL-X format. A signer-independent train/validation/test split stratified by gender and skin tone is provided as the recommended evaluation protocol.

Key Statistics

Over 24,000

Video Sequences

78

Signers (50F / 28M)

375

Sentences

4

Camera Views

Data Access

Annotations

Structured gloss and text annotations for all sequences in the dataset.

Show Download Snippet

Frontal RGB

High-resolution frontal RGB video streams from the frontal camera.

Show Download Snippet

Frontal Depth

Depth sequences from the frontal RGB-D camera for 3D reconstruction.

Show Download Snippet

Lateral View

Wide-angle camera covering lateral perspective (side view).

Show Download Snippet

Zenithal View

Wide-angle camera covering zenithal perspective (top-down view).

Show Download Snippet

Facial HD

Dedicated high-definition facial camera for fine-grained facial expression analysis.

Show Download Snippet

SMPL-X Poses

Precomputed full-body 3D pose sequences in SMPL-X format for all sequences.

Show Download Snippet

Preprocessed Videos

Preprocessed video segments used in the baseline experiments.

Show Download Snippet

Citation

If you use Captar-Libras in your research, please cite it using the following BibTeX entry:

@inproceedings{magalhaes2026captarlibras,
                title={Captar-Libras: A Unified Multimodal Dataset for Brazilian Sign Language Tasks},
                author={Cauã Magalhães and Bruno Lages and Ari Filho and Jéssica Ramos and Paulo de Souza Coelho and Michel Silva and Thiago L. Gomes and Milena Soriano Marcolino and Elidéa Bernardino and Raquel O. Prates and Mario F. M. Campos and Erickson R. Nascimento},
                year={2026},
                booktitle={The Fortieth Annual Conference on Neural Information Processing Systems Evaluations and Datasets Track}
            }