Naman Goyal
Machine Learning - SWE Google DeepMind
SF Bay Area, CA
Exploring the world, one step at a time.
I am a Machine Learning Software Engineer at Google DeepMind, where I work on the Gemini team responsible for making Gemini output more useful and human-centric. My role involves advancing multimodal large language models through applied research and development, focusing on enhancing reasoning, planning, and instruction-following capabilities. I work on LLM-based synthetic data generation to address data scarcity challenges, instruction tuning, reinforcement learning from human feedback (RLHF), and building LLM orchestration over tools and knowledge bases.
Previously, I worked at NVIDIA on the autonomous vehicle team, developing the perception stack at scale. I designed horizontally scalable pipelines for data preparation and cloud inference, improving DNN training efficiency and resource utilization. Before that, I interned at Apple on multimodal learning for Visually Rich Document Understanding, and at Adobe Research developing adversarially robust training strategies for deep metric learning.
I hold an M.S. in Computer Science from Columbia University (2021-2022) and a B.Tech. in Computer Science from IIT Ropar (2015-2019). I have been a sponsored speaker at multiple AI conferences in 2025, including the AI Conference San Francisco, AI Risk Summit, AI Dev Summit, and Adobe Research World Headquarters, speaking on topics ranging from enterprise AI agents to multimodal AI challenges.
talks
| Sep 2025 | Adobe Research World Headquarters, San Jose — Architectures for the Next Generation of Enterprise AI Agents |
|---|---|
| Sep 2025 | The AI Conference, San Francisco — The Ascendancy and Challenges of Agentic Large Language Models |
| Aug 2025 | AI Risk Summit, CISO Forum, Half Moon Bay — The Ascendancy and Challenges of Agentic Large Language Models |
| May 2025 | AI Dev Summit, San Francisco — The Dual Edge of Multimodal AI: Advancing Accessibility While Navigating Bias |
papers
- Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesarXiv preprint arXiv:2507.06261, 2025
- A survey on Self Supervised learning approaches for improving Multimodal representation learningarXiv preprint arXiv:2210.11024, 2022
- Graph neural networks for image classification and reinforcement learning using graph representationsarXiv preprint arXiv:2203.03457, 2022
- A comprehensive study of on-device NLP applications–VQA, automated Form filling, Smart Replies for Linguistic CodeswitchingarXiv preprint arXiv:2409.19010, 2024