Jovana Kondic

I'm a PhD candidate at MIT CSAIL, where I am advised by Dr. Aude Oliva. I've also been fortunate to spend time at the MIT-IBM Watson AI Lab, working with Dr. Rogerio Feris.

Previously, I completed my SM at MIT and my BSE at Princeton.

LinkedIn / Scholar / CV / Email

News

Our work on ChartNet is featured in MIT News!
Honored to be featured in MIT News alongside the amazing fellow IBM interns I had the chance to work with this summer!
Super excited to be hosting The First Workshop on Memory and Vision at ICCV 2025! See you in Honolulu!

Research

Currently, my research focuses on vision-language models, synthetic data generation, and chart understanding. I also have experience working on AI agents, human-AI interaction, and motion planning and inference.

	ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Jovana Kondic, P. Li, D. Joshi, I. Sanchez, B. Wiesel, S. Abedin, A. Alfassy, E. Schwartz, D. Caraballo, Y. G. Cinar, F. Scheidegger, S. I. Ross, D. K. I. Weidele, H. Hua, E. Arutyunova, R. Herzig, Z. He, Z. Wang, X. Yu, Y. Zhao, S. Jiang, M. Liu, Q. Lin, P. Staar, L. Lastras, A. Oliva, R. Feris CVPR, 2026 paper / blog / video We contribute the largest and most comprehensive chart understanding dataset to date. Using a code-guided synthesis pipeline, we generate 1.5M chart samples each with five aligned modalities: plotting code, rendered image, data table, natural language summary, and QA pairs with reasoning. A rigorous quality-filtering pipeline ensures diversity is not at the expense of visual fidelity. Fine-tuning on ChartNet consistently improves results across benchmarks, and enables small open-source models to outperform GPT-4o and models alike.
	Granite Vision: A Lightweight, Open-Source Multimodal Model for Enterprise Intelligence Granite Vision Team (including Jovana Kondic as Core Contributor) arXiv, 2025 paper / blog We release a lightweight, open-source multimodal model built around a 2B-parameter language model, designed for visual document understanding in enterprise settings. Granite Vision achieves strong performance on document extraction from tables, charts, diagrams, and infographics while remaining computationally efficient.
	ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation Jovana Kondic, P. Li, D. Joshi, Z. He, S. Abedin, J. Sun, B. Wiesel, E. Schwartz, A. Nassar, B. Wu, A. Arbelle, A. Oliva, D. Gutfreund, L. Karlinsky, R. Feris ICCV Workshop on Curated Data for Efficient Learning, 2025 paper We present a fully automated synthesis pipeline that uses a VLM to reconstruct seed chart images into executable plotting code, then iteratively augments scripts with a code-oriented LLM to generate diverse synthetic chart-image-code pairs at scale. From just 13K seed images, ChartGen produces 222.5K unique chart image-code pairs across 24 chart types and 11 plotting libraries. We release the pipeline, prompts, and dataset to advance chart understanding research.
	Lyfe Agents: Generative Agents for Low-Cost Real-Time Social Interactions K. I. Zhao, M. Naim, Jovana Kondic, M. Ernesto Cortes, J. Ge, S. Luo, G. R. Yang, A. Ahn arXiv, 2024 paper We introduce Lyfe Agents, LLM-powered autonomous agents for real-time social interaction in virtual environments. Using an option-action framework, asynchronous self-monitoring, and a prioritized memory system, Lyfe Agents operate at 10–100x lower computational cost than existing alternatives while maintaining sophisticated social behaviors like collaborative problem-solving.
	Empowering Biomedical Discovery with AI Agents S. Gao, A. Fang, Y. Huang, V. Giunchiglia, A. Noori, J. R. Schwarz, Y. Ektefaie, Jovana Kondic, M. Zitnik Cell, 2024 paper We envision collaborative AI agents that integrate LLMs, generative models, and biomedical tools to empower scientific research — from virtual cell simulation and phenotype control to the design of new therapies. Rather than replacing humans, these agents combine human creativity with AI's ability to navigate vast hypothesis spaces, plan discovery workflows, and perform self-assessment to identify knowledge gaps.
	Bayesian Inverse Motion Planning for Online Goal Inference in Continuous Domains T. Zhi-Xuan, Jovana Kondic, S. Slocum, J. B. Tenenbaum, V. K. Mansinghka, D. Hadfield-Menell ICRA Workshop on Cognitive Modeling in Robot Learning for Adaptive Human-Robot Interactions, 2023 paper We perform online goal inference and trajectory prediction in continuous domains by modeling agents as approximately Boltzmann-rational motion planners that produce low-cost trajectories while avoiding obstacles. Using a sequential Monte Carlo algorithm, we approximate the full posterior distribution over goals and future trajectories from partial, noisy observations in real time.
	On the Critical Role of Conventions in Adaptive Human-AI Collaboration A. Shih, A. Sawhney, Jovana Kondic, S. Ermon, D. Sadigh ICLR, 2021 paper / blog We show that separating task-specific rules from partner-specific conventions is critical for adaptive human-AI collaboration. Our representation-learning framework disentangles these two types of knowledge, enabling zero-shot coordination on new tasks with familiar partners and rapid adaptation to new teammates, with up to 50% performance gains in coordination games including Hanabi.

Teaching Experience

MIT 9.58: The Science of Intelligence (with Prof. Tomaso Poggio & Dr. Brian Cheung), Fall 2024

Awards

MIT-IBM Watson AI Lab Graduate Research Assistantship, 2025
Hewlett Packard Fellowship, MIT, 2021
Sigma Xi Honors Society, Princeton, 2021
Stanford Summer Undergraduate Research Fellowship, 2020
Google Science Fair Regional Finalist, 2014

Design based on Jon Barron's website.