
CognitiveLab has launched NetraEmbed, a new multimodal multilingual document retrieval model that supports 22 languages and the company claims delivers about 150% improvement over existing baselines.
The announcement was shared on December 8, along with the release of NayanaIR, an open source multilingual benchmark, and a preprint of the supporting research paper titled “M3DR: Towards Universal Multilingual Multimodal Document Retrieval”.
Adithya S Kolkavi, founder of CognitiveLab, said on X, “We are a Small Research Lab based out of India and we just dropped a one of a kind SoTA multimodal multilingual document retrieval model.”
The company in the blog post said NetraEmbed scores 0.716 on cross lingual retrieval tasks, up from the previous best of 0.284. It also records 0.738 on monolingual search. The model processes documents as images rather than relying on OCR, which helps it preserve charts, tables, diagrams, and layout.
It supports 22 languages covering English, Spanish, French, German, Italian, Hindi, Marathi, Sanskrit, Kannada, Telugu, Tamil, Malayalam, Chinese, Japanese, Korean, Arabic, Bengali, Gujarati, Odia, Punjabi, Russian, and Thai.
CognitiveLab said the model brings cross lingual document search from barely functional to production ready.
CognitiveLab also introduced ColNetraEmbed, a multi-vector variant that offers token level explanations. NetraEmbed uses compact embeddings at about 10 KB per document, compared to about 2.5 MB in traditional systems, enabling large scale indexing for enterprises.
The model offers flexible embedding sizes at 768, 1536, and 2560 dimensions without retraining.
The NayanaIR benchmark covers 23 datasets with nearly 28000 document images and more than 5400 queries and is designed for both monolingual and cross lingual evaluation.
The launch is part of CognitiveLab’s Nayana initiative focused on multilingual and multimodal document intelligence. Future models under the initiative will move beyond retrieval into deeper understanding and question answering across languages.
Both models are available on Hugging Face.
The post CognitiveLab Unveils NetraEmbed With 22 Languages & 150% Jump in Document Accuracy appeared first on Analytics India Magazine.