Please note: This seminar will take place in DC 1304.
Jianbo Shi, Professor
Computer and Information Science, University of Pennsylvania
Today’s AI vision systems are like voracious readers, memorizing an astonishing range of information and patterns. Yet behind this impressive ability, the inner workings of AI technology often remain mysterious. Most existing approaches to improving AI models rely on clever computational techniques or trial-and-error observations, which offer limited scientific understanding.
We propose a more scientific approach to understanding knowledge representation and reasoning within AI vision systems by probing how these models encode visual information. Using fMRI brain-encoding data together with a prediction model, we analyze how deep neural networks process information, taking human brain regions as a guiding reference. Our results show that training methods tend to lead to one of two outcomes: models either learn hierarchically, building from simple concepts to more complex ones in a way similar to the human brain (which we call looking), or they rely on brute-force memorization (checking). This distinction helps explain why, depending on the model, fine-tuning techniques can be less effective than previously assumed, particularly when hierarchical reasoning is missing.
Reasoning and creativity are closely linked, since creativity provides a practical test of whether AI vision systems are checking or looking. To produce something creative, a model must look for meaningful and interesting patterns rather than simply check for common themes. Using the playful Totally-Look-Alike dataset, we measure creativity in latent space and further quantify it by comparing AI models such as ChatGPT and Gemini head-to-head with our own model.
Bio: Jianbo Shi is a Professor of Computer and Information Science at the University of Pennsylvania. He studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. degrees. He received his Ph.D. degree in Computer Science from University of California at Berkeley. He was a research faculty member at The Robotics Institute at Carnegie Mellon University before joining the faculty of the University of Pennsylvania.
He has made fundamental contributions to computer vision and machine learning on image segmentation, motion tracking, and data clustering. He was awarded for IEEE Longuet-Higgins Prize for ‘Fundamental contributions in Computer Vision.’ His long-term interests center around a broader area of machine intelligence, he wishes to develop a “visual thinking” module that allows computers not only to understand the environment around, but also to achieve cognitive abilities such as machine memory and learning.