James J. DiCarlo, M.D., Ph.D.
Peter de Florez Professor of Neuroscience, Head, Department of Brain and Cognitive Sciences,Investigator, McGovern Institute for Brain Research, MIT.
The Science of Natural intelligence (NI): Reverse Engineering Primate Visual Perception
Abstract: The fields of neuroscience and cognitive science are hard at work on one of our last great scientific quests — to reverse engineer the human mind. In comparison to other areas of science, these sciences are still in their infancy. Not surprisingly, forward engineering approaches that aim to emulate human intelligence in artificial systems (AI) are also still in their infancy. Yet the intelligence and cognitive flexibility apparent in human behavior are an existence proof that machines can be constructed to emulate and work alongside the human mind. In this talk, I will argue that these challenges of reverse engineering the mind will be solved by tightly combining the efforts of brain and cognitive scientists (hypothesis generation and data acquisition), and forward engineering aiming to emulate the mind (hypothesis instantiation and data prediction). To support that thesis, I will focus on one aspect of perceptual intelligence — object categorization and detection — and I will tell the story of how work in brain science, cognitive science and computer science converged to create deep neural networks that can support such tasks. These networks not only reach human performance for many images, but their internal workings are modeled after— and largely emulate — the internal workings of the primate visual system. Yet, the primate visual system (NI) still outperforms current generation artificial deep neural networks (AI), and I will show some new clues that neuroscience can offer. More broadly, this is just the beginning of the last great human science quest — to understand natural intelligence — and I hope to motivate others to engage that frontier alongside us.
BIO: James DiCarlo is a Professor of Neuroscience, and Head of the Department of Brain and Cognitive Sciences at the Massachusetts Institute of Technology. He is an Alfred Sloan Fellow, a Pew Scholar in the Biomedical Sciences, and a McKnight Scholar in Neuroscience. His research goal is a computational understanding of the brain mechanisms that underlie primate visual intelligence. Over the last 15 years, his group has helped reveal how population image transformations carried out by a deep stack of neocortical processing stages -- called the primate ventral visual stream -- are effortlessly able to extract object identity and other latent variables such as object position, scale, and pose from visual images. His group is currently using a combination of large-scale neurophysiology, brain imaging, direct neural perturbation methods, and machine learning methods to build neurally-mechanistic computational models of the ventral visual stream and its support of cognition and behavior. They aim to use this model-based understanding to inspire and develop: new machine vision approaches, new neural prosthetics (brain-machine interfaces) to restore or augment lost senses, and a new foundation to attack human conditions such as agnosia, dyslexia, and autism.
Harry Shum, Ph.D.
Executive Vice President, Artificial Intelligence and Research Group, Microsoft.
Commercializing computer vision: Success stories and lessons learned
Abstract: It is an exciting time for all of us computer vision researchers and practitioners. We have seen an unprecedented growth in the conversion of years of progress into marketable technologies. Microsoft has long been committed to developing new computer vision technologies, making them available to developers, and incorporating them into many products. In this talk, I will first briefly review 25 years of computer vision research at Microsoft Research (MSR), highlighting MSR's contributions to the vision community and emphasizing the importance of long-term commitment to funding successful industrial research labs. I will also describe some of our latest research work in computational photography, image understanding, and vision and language before detailing our commercialization successes. In particular, I will share our experiences in developing three products: Microsoft Pix, HoloLens, and Cognitive Services, which leverage computer vision systems and technologies in different ways. Pix is an AI-powered camera app that makes taking great pictures easy and fun: "point, shoot, perfect!" It has incorporated technologies from more than a dozen CVPR, ICCV, and SIGGRAPH papers from MSR. HoloLens is the first commercially available mixed reality system in the market. Cognitive Services allow you to build useful AI-based apps using just a few lines of code, across different devices and platforms. I will show IRIS, which is an interactive visual learning service for developers to create image recognition applications. I will also show the latest cool demos using HoloLens, including Holoportation project. Holoportation is a new type of 3D capture technology that allows high-quality 3D models of people to be reconstructed, compressed, and transmitted anywhere in real time. There are challenges in accelerating the cycle from research to product, and I will discuss the lessons learned in productizing Pix, HoloLens, and Cognitive Services.
BIO: Harry Shum is executive vice president of Microsoft’s Artificial Intelligence (AI) and Research group. He is responsible for driving the company’s overall AI strategy and forward-looking research and development efforts spanning infrastructure, services, apps and agents. He oversees AI-focused product groups — the Information Platform Group, Bing and Cortana product groups — and the Ambient Computing and Robotics teams. He also leads Microsoft Research, one of the world’s premier computer science research organizations, and its integration with the engineering teams across the company. Previously, Dr. Shum served as the corporate vice president responsible for Bing search product development from 2007 to 2013. Prior to his engineering leadership role at Bing and online services, he oversaw the research activities at Microsoft Research Asia and the lab’s collaborations with universities in the Asia Pacific region, and was responsible for the Internet Services Research Center, an applied research organization dedicated to advanced technology investment in search and advertising at Microsoft. Dr. Shum joined Microsoft Research in 1996 as a researcher based in Redmond, Washington. In 1998 he moved to Beijing as one of the founding members of Microsoft Research China (later renamed Microsoft Research Asia). There he began a nine-year tenure as a researcher, subsequently moving on to become research manager, assistant managing director and managing director of Microsoft Research Asia and a Distinguished Engineer. Dr. Shum is an IEEE Fellow and an ACM Fellow for his contributions to computer vision and computer graphics. He received his Ph.D. in robotics from the School of Computer Science at Carnegie Mellon University. In 2017, he was elected to the National Academy of Engineering of the United States.
Dan Jurafsky, Ph.D.
Professor and Chair of Linguistics, Professor of Computer Science, Stanford University.
Extracting Social Meaning from Language
Abstract: I describe research in our lab on computationally extracting social meaning from language, meaning that takes into account social relationships between people. I describe our study of interactions between police and community members in traffic stops recorded in body-worn camera footage. We automatically measure the quality of the interaction from language, study the role of race in the interaction, and draw suggestions for going forward in this fraught area. In another we computationally model the language of scientific papers together with the network formed by scientists and their research areas to better understand scientific innovation, how it progresses, and the role of interdisciplinarity. I discuss implications for the history of science and specifically of artificial intelligence. Both studies highlight the importance of social context and social models for interpreting the latent meanings behind the words we use.
BIO: Dan Jurafsky is Professor and Chair of Linguistics and Professor of Computer Science, at Stanford University. His research has focused on the extraction of meaning, intention, and affect from text and speech, on the processing of Chinese, and on applying natural language processing to the cognitive and social sciences. Dan is very interested in NLP education, and he co-wrote the widely-used textbook "Speech and Language Processing” (whose 3rd edition is in (slow) progress) and co-taught the first massive open online class on natural language processing. The recipient of a 2002 MacArthur Fellowship, Dan is also a 2015 James Beard Award Nominee for his book, "The Language of Food: A Linguist Reads the Menu”.