Industry Work Experience
Zoho Corporation, Chennai – AI Engineer, LLM ZLabs R&D
Jan 2024-Present
- Researched data tools for LLMs and built a Streamlit app using HuggingFace APIs to load, tokenize, filter data, ingest metadata, generate recipes, and search based on keywords using regex. Generated hundreds of billions of tokens of Indic data by translating open-source English datasets using an internal CPU-based translation model, optimized with multi-threading for faster processing. Researched data compositions for open-source LLMs and curated high-quality fine-tuning datasets.
- Keywords: HuggingFace APIs, Data Engineering, LLM
- Manager: Prathima MR, AI Engineer (Team Lead), LLM ZLabs R&D
Zoho Corporation, Chennai – AI Engineer, ASR ZLabs R&D
July 2021-Dec 2023
- Integrated an efficient rule-based ITN system into the ASR post-processing pipeline using NVIDIA's NeMo Text Processing. Customized grammars for specific use cases, added 4 new grammar rules, drastically improving ASR output readability. Compiled grammar rules into a FAR file, reducing processing time 140x and memory usage to 20%. Led a 4-member annotation team for manual ITN data collection. Evaluated F1 scores of internal punctuation and capitalization models and contributed to developing a transformer encoder-based neural ITN system to improve scores. Conducted experiments to fine-tune an internal ASR model. Developed a Flask-based ASR inference demo application, later deployed as a Linux service for continuous operation, enabling seamless internal testing. Preprocessed datasets for ASR model, increasing overall training data by 16%. Developed tools for efficient internal API communications and robust dataset and model management, widely adopted by various teams - A wrapper for Stratus (internal storage system) and integrated it with dvc-stratus (DVC wrapper), implemented OAuth credentials and resource policies to track user operations, improving cross-team usages; Maintained ZWAF for OneAuth token validation in cross-team Zoho API communication. Enhanced ASR system functionality by implementing an audio I/O module and deploying a FastAPI-based web server secured with ZWAF middleware.
- Keywords: ITN, Pynini, PyTorch, RESTful APIs, DVC, Flask
- Manager: Ananda Seelan Lakshmi Narasimhan, Senior Deep Learning Scientist, NVIDIA | Formerly - ML Engineer (Team Lead), ASR ZLabs R&D
Zoho Corporation, Chennai – Project Trainee (Intern), ASR ZLabs R&D
Jan 2021 - June 2021
- Collected and preprocessed datasets for ASR model and assisting peer ML Engineers with their data requirements - increased the existing benchmark data suite by 83% (~400 hours of audio). Processed open-source datasets using youtube-dl for YouTube data, Google ASR for synthetic transcriptions, and developed a Streamlit tool for audio recording. Organized team sessions to create a limited benchmark with real-time recordings. Wrote PyTorch Iterable Dataset classes for each dataset and unit tests for them using pytest.
- Keywords: Python, PyTorch Dataset, Pytest, Streamlit
- Manager: Ananda Seelan Lakshmi Narasimhan, Senior Deep Learning Scientist, NVIDIA | Formerly - ML Engineer (Team Lead), ASR ZLabs R&D
Mentor: Raman Rajarathinam, ML Engineer, ASR ZLabs R&D
ONGC, Chennai – Intern
May 2019
- Developed an issue tracking system on Oil and Natural Gas Corporation Limited (ONGC) Intranet for Regional Computer Center (RCC) users to create issues, retrieve issues by ID, and view status of resolution. Users can deny or accept solutions provided by ONGC employees to indicate if further assistance is required or if the issue can be closed as resolved. Gained experience working with a team of software engineers to build a real-world product.
- Keywords: Python, PyTorch Dataset, Pytest, Streamlit
- Advisor: Shri B. Ravindranath, Chief Manager of Programming Department, ONGC Ltd., Chennai, India
Mentor: Shri Pruthvee Mamidikuduru, Deputy Manager of Programming Department, ONGC Ltd., Chennai, India
- Links