Research
Currently, my research focuses on vision-language models, synthetic data generation, and chart understanding. I also have experience working on AI agents, human-AI interaction, and motion planning and inference.
|
|
|
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
Jovana Kondic,
P. Li, D. Joshi, I. Sanchez, B. Wiesel, S. Abedin, A. Alfassy, E. Schwartz, D. Caraballo, Y. G. Cinar, F. Scheidegger, S. I. Ross, D. K. I. Weidele, H. Hua, E. Arutyunova, R. Herzig, Z. He, Z. Wang, X. Yu, Y. Zhao, S. Jiang, M. Liu, Q. Lin, P. Staar, L. Lastras, A. Oliva, R. Feris
CVPR, 2026
paper /
blog
We contribute the largest and most comprehensive chart understanding dataset to date. Using a code-guided synthesis pipeline, we generate 1.5M chart samples each with five aligned modalities: plotting code, rendered image, data table, natural language summary, and QA pairs with reasoning. A rigorous quality-filtering pipeline ensures diversity is not at the expense of visual fidelity. Fine-tuning on ChartNet consistently improves results across benchmarks, and enables small open-source models to outperform GPT-4o and models alike.
|
|
|
Granite Vision: A Lightweight, Open-Source Multimodal Model for Enterprise Intelligence
Granite Vision Team (including Jovana Kondic as Core Contributor)
arXiv, 2025
paper /
blog
We release a lightweight, open-source multimodal model built around a 2B-parameter language model, designed for visual document understanding in enterprise settings. Granite Vision achieves strong performance on document extraction from tables, charts, diagrams, and infographics while remaining computationally efficient.
|
|
|
ChartGen: Scaling Chart Understanding Via Code-Guided Synthetic Chart Generation
Jovana Kondic,
P. Li, D. Joshi, Z. He, S. Abedin, J. Sun, B. Wiesel, E. Schwartz, A. Nassar, B. Wu, A. Arbelle, A. Oliva, D. Gutfreund, L. Karlinsky, R. Feris
ICCV Workshop on Curated Data for Efficient Learning, 2025
paper
We present a fully automated synthesis pipeline that uses a VLM to reconstruct seed chart images into executable plotting code, then iteratively augments scripts with a code-oriented LLM to generate diverse synthetic chart-image-code pairs at scale. From just 13K seed images, ChartGen produces 222.5K unique chart image-code pairs across 24 chart types and 11 plotting libraries. We release the pipeline, prompts, and dataset to advance chart understanding research.
|
|
|
Lyfe Agents: Generative Agents for Low-Cost Real-Time Social Interactions
K. I. Zhao, M. Naim, Jovana Kondic, M. Ernesto Cortes, J. Ge, S. Luo, G. R. Yang, A. Ahn
arXiv, 2024
paper
We introduce Lyfe Agents, LLM-powered autonomous agents for real-time social interaction in virtual environments. Using an option-action framework, asynchronous self-monitoring, and a prioritized memory system, Lyfe Agents operate at 10–100x lower computational cost than existing alternatives while maintaining sophisticated social behaviors like collaborative problem-solving.
|
|
|
Empowering Biomedical Discovery with AI Agents
S. Gao, A. Fang, Y. Huang, V. Giunchiglia, A. Noori, J. R. Schwarz, Y. Ektefaie, Jovana Kondic, M. Zitnik
Cell, 2024
paper
We envision collaborative AI agents that integrate LLMs, generative models, and biomedical tools to empower scientific research — from virtual cell simulation and phenotype control to the design of new therapies. Rather than replacing humans, these agents combine human creativity with AI's ability to navigate vast hypothesis spaces, plan discovery workflows, and perform self-assessment to identify knowledge gaps.
|
|
|
Bayesian Inverse Motion Planning for Online Goal Inference in Continuous Domains
T. Zhi-Xuan, Jovana Kondic, S. Slocum, J. B. Tenenbaum, V. K. Mansinghka, D. Hadfield-Menell
ICRA Workshop on Cognitive Modeling in Robot Learning for Adaptive Human-Robot Interactions, 2023
paper
We perform online goal inference and trajectory prediction in continuous domains by modeling agents as approximately Boltzmann-rational motion planners that produce low-cost trajectories while avoiding obstacles. Using a sequential Monte Carlo algorithm, we approximate the full posterior distribution over goals and future trajectories from partial, noisy observations in real time.
|
|
|
On the Critical Role of Conventions in Adaptive Human-AI Collaboration
A. Shih, A. Sawhney, Jovana Kondic, S. Ermon, D. Sadigh
ICLR, 2021
paper /
blog
We show that separating task-specific rules from partner-specific conventions is critical for adaptive human-AI collaboration. Our representation-learning framework disentangles these two types of knowledge, enabling zero-shot coordination on new tasks with familiar partners and rapid adaptation to new teammates, with up to 50% performance gains in coordination games including Hanabi.
|
Awards
MIT-IBM Watson AI Lab Graduate Research Assistantship, 2025
Hewlett Packard Fellowship, MIT, 2021
Sigma Xi Honors Society, Princeton, 2021
Stanford Summer Undergraduate Research Fellowship, 2020
Google Science Fair Regional Finalist, 2014
|
|