
The sculpture is waiting to speak.She sits there expectantly, shimmering and slightly pixilated at the edges, wearing headphones and a blue plaid shirt. Though she is actually in another room some 10 yards away, she appears quite … deep on the large plastic screen. When she turns her head, or when viewed from another angle, the space between her head and the bookshelves behind her has a decidedly three-dimensional quality. Until one’s polarized 3-D glasses are removed, anyway.
She is not really a sculpture, of course. She is Jane Mulligan, a post-doctoral student and researcher in Penn’s General Robotics, Automation, Sensing, and Perception (GRASP) Laboratory, and she is availing herself in a virtual sort of way to demonstrate the process known as tele-immersion. A semicircular battery of seven digital video cameras creates a “sculpture” of her head and upper body, but the sculpture moves in a herky-jerky way. When someone mentions that the whole setup feels like a conjugal visit to a prison inmate, the sound of her laughter—which comes through a speaker phone—can be heard a second or two before her face registers a reaction.
“This really feels like a window to the world,” says Dr. Konstantinos Daniilidis, assistant professor in the GRASP Laboratory, touching the large plastic screen on which the image of Mulligan appears. “The metaphor is that you open a hole in the wall and see the other room. Right now, you get the approximate effect. The real effect we want to achieve is to remove the plastic here and see what is behind.” It provides “the freedom to look at an object from any possible viewpoint,” he adds in his quiet, Greek-accented voice.
Daniilidis is Penn’s group leader on the National Tele-Immersion Initiative, a collaborative effort with the University of North Carolina Chapel Hill (UNC) and Brown University, as well as Advanced Network and Services, Inc., a computer-network company based in Armonk, N.Y. Brown is responsible for the graphics and the interaction with them; the display itself is UNC’s; and Penn is capturing the environment in 3-D. “I believe that we have the most important role,” he says of the Penn team. “Probably my collaborators believe the same.” The GRASP lab is part of the Department of Computer and Information Systems in the School of Engineering and Applied Science, and is headed by Dr. Ruzena Bajcsy, the professor of computer and information science who also serves as head of the National Science Foundation’s computer-science division [“The Vision Thing,” July/August 1999].
While tele-immersion is in some ways similar to videoconferencing, the extra dimension it provides allows participants—who could be thousands of miles away—to feel as if they’re in the same room at the same time. (Or at least they will when the pixilation, lag-time herky-jerkiness, and other bugs are worked out.) The potential benefits are intriguing. A neurosurgeon in Philadelphia could be “present” in an operating room in Australia. A salesman in Des Moines could show off a new gadget to a potential buyer in London. Actors in Los Angeles and New York could rehearse “together” before opening night.
According to the National Tele-Immersion Initiative’s Web site, tele-immersion represents a “new paradigm for human-computer interaction.” It is also the “ultimate synthesis of networking and media technologies”—combining 3D environment scanning, projective and display technologies, tracking technologies, audio technologies, robotics, and haptics, which might be described as “virtual touching.”
The images are split and polarized to give each eye a slightly different image, which the brain combines for a 3-D effect.
“Computer vision is a field that tries to extract 3-D information about the world,” says Daniilidis. “It is a very exciting field, and actually very related to research in visual perception in psychology and biology. They try to understand how the vision system and the human mind works. And we try to make a machine see in 3-D and recognize things.
“We have much in common with the study of human vision,” he adds. “Many of the findings of human-vision scientists influence computer vision, and many math models from computer vision help human-vision scientists understand how the human-vision system works.” To reallycapture a truly realistic virtual sculpture will probably require 50-60 cameras, and the challenge of synchronizing them will be considerable. But by the end of 2002 he expects to be able to provide the 3-D effect without using the polarized glasses.
The first demonstration of tele-immersion took place in May of 2000.
The tele-immersion project, notes the Web site, represents “the greatest technical challenge for Internet2.” Tele-immersion is a “voracious consumer of computer resources,” notes Dr. Jaron Lanier, the initiative’s lead scientist, in an article for Scientific American, which is one reason the team has opted to use commercially available digital-video cameras and other computer components. (Total cost for all the equipment today is about $45,000.) The GRASP laboratory has to let the University know whenever a long-distance tele-immersion demonstration is scheduled, since a single session boosts Web traffic at the participating institutions to four times the normal volume—hence the need for Internet2, with its massive bandwidth. So far, it takes about 30-50 milliseconds for fiber-bound bits of information to cross the continental U.S., which helps explain why all the test sites are on the East Coast. High-quality tele-immersion will require about 1.2 gigabits per second—considerably more than the 60 megabits per second used in the initial demonstrations linking Penn to Chapel Hill and Armonk.
Daniilidis laughs gently when asked if he ever watched The Jetsons on TV. “I am,” he says, “pretty much inspired from science fiction.”