I gave a keynote talk to kickstart the Packet Video 2007 Workshop in Lausanne, Switzerland. The audience was great, and the talk seemed to generate lots of discussion during the Q&A and for the remainder for the workshop. Here's a recap.
Abstract
Mobile & media experiences connect people with each other, with information, and with their environment. Media is increasingly being delivered in packets over networks. This raises a number of questions for today's networks:
- How can we transport media packets?
- How can we adapt media packets for diverse clients?
- How can we protect media packets?
A number of emerging applications will impact future directions for packet networks. We also discuss the following questions:
- What impact do globally distributed, immersive media environments have on media packet delivery systems?
- What role does context play in next-generation mobile media experiences?
We consider these questions from the perspective of a user and the perspective of a packet.
Coupling experience and technology
I began by stressing the importance of coupling experience and technology. Rather than developing technology in a box, it is important to first consider the desired user experience and then develop the technologies that impact it. The most important factor for deciding whether a technology gets transferred to product is not how good the technology is, but rather how it impacts the user experience. I have been passionate about this theme for quite some time, and as time passes my passion for this only grows stronger.
The rest of my talk cycled between the following experiences and technologies.
Mobile & Media Experiences
- Experience #1: Mobile, Diverse, Interactive: Diverse mobile video clients, desktop video, living room video
- Experience #2: Immersive, Conversational, Worldwide: Halo collaboration experience, Panoply immersive gaming experience
- Experience #3: Pervasive, Personalized, Context-aware: Mediascapes context-aware multimedia experience
Packet Technologies
- Packet labeling & metadata
- Transcoding & Processing in the network
- Scalable Streaming
- Secure Scalable Streaming
- Multiple Distortion Measures
- Public & private domains
- Sensing context in the network
The first five technologies were discussed in the context of Experience #1. The last two were discussed with Experience #2 and #3.
Experience #1: Mobile, Diverse, Interactive
Packet labeling & metadata: The main point is that we live in a distributed networked world where media packets will traverse distributed network elements with multiple owners and administrative domains and be processed by devices and equipment made by different manufacturers. In this highly distributed world, one important thing that we can do is smartly label our packets in hopes that over time the smart network elements along the way will use these labels to improve the overall quality of the user experience. The key design principle is to design packet labels that are 1) specific enough to be useful and 2) general enough to be understood.
Example packet labels and metadata include:
- Importance: Distortion values
- Time requirements: Time stamps
- Content type: Video, audio, text, data
- Scalability: Is it truncatable?
- Media attributes: spatial region, resolution, color; audio channel
- Dropability: Can it be dropped? e.g., Drop video for audio-only session.
- Processibility: Is it transcodable? Can it be processed?
- Security: What are the rights and privacy implications of the media?
The research challenges are designing and standardizing the labels with the design principle above, and then developing algorithms that use these labels for delivering improved mobile media experiences. These algorithms should be evaluated for their performance gains with respect to the label overhead.
Transcoding & Processing in the network
I discussed the experience of delivering media to and from users over any network and on any device. This motivates the technology of performing transcoding operations in the network. In 3G networks, the streaming, recording, and transcoding capabilities can be performed by the IMS Multimedia Resource Function (MRF), which serves and receives the media packets to and from the handsets. Dynamic transcoding can be used to adapt the video for the target client device (e.g., to lower the resolution) and for the network (e.g., to seamlessly handoff media between 3G and 2.5G networks during a mobile media session).
The research challenge that lies ahead is designing and developing transcoding algorithms in a manner that is computationally efficient so that a single transcoding node (e.g., IMS MRF) can process many streams at once to serve multiple clients at one time.
Scalable Streaming
This brings us to a technology called scalable streaming that makes transcoding much more efficient by leveraging scalable coding methods. In essence, if scalable coding methods are used, then we can form scalable packets that pack scalable data, for example low, medium, and high resolution data, into the packet in a manner that allows it to be transcoded by simply truncating the packet. Furthermore, the scalable media packets can have packet labels that contain image metadata and truncation points that can be used by a scalable packet transcoder. The scalable packet transcoder is quite simple- it performs transcoding by simply reading the packet label and then truncating the packet as needed.
Research opportunities arise if the packet labels contain the distortion value of the particular media packet. If distortion values are included in the label, then they can be used as hints for rate-distortion optimized streaming algorithms and rate-distortion optimized transcoding algorithms to improve the quality of the user experience.
Secure Scalable Streaming
Another desired experience includes serving diverse clients while having end-to-end security. End-to-end security means that the media is protected in a manner that only allows the sender and allowed receivers to access the media, while delivering, storing, and transcoding the media packet over the network in a way that does not require decryption. It turns out that this can be achieved by using the same method as scalable streaming, where scalable packets are formed by leveraging scalable coding, and then coupling the packet formation with the encryption process. Specifically, encryption is applied to the packet in a manner that allows the packet transcoding operation to still occur by simple packet truncation. This can also leverage secure scalable image coding standards such as the newly created JPSEC standard for security of JPEG-2000 imagery.
Secure Scalable Streaming was published in ICASSP 2001 by Susie Wee and John Apostolopoulos.
Multiple Distortion Measures
I then described a new technology area that we are studying called Multiple Distortion Measures (MDM). This begins with the following observation: Consider a set of scalable media packets. Generally speaking, the best ordering of the packets is determined by the profit-to-size ratio (or distortion-to-size ratio, in tech terms, delta d over delta r). Surprisingly, we observed that the best ordering for low resolution display is NOT equal to the best ordering for high resolution display. The question that arises is how different are they?
I showed a graph from our ICASSP 2007 paper that shows the PSNR vs. Rate plot for the low resolution reconstructed image with packets ordered in the low-res optimal order and with packets in the high-res optimal order. It turns out that there are differences in performance of up to 4 dB. The graph aso showed the PSNR vs. Rate plot for the high resolution reconstructed image with packets ordered in the high-res optimal order and the low-res optimal order. It turns out that these can have differences of over 1 dB.
This raised a lot of interest from the crowd. I think we'll have lots of people researching MDMs in the years ahead.
This raises the idea of labeling scalable media packets with multiple distortion measures, specifically, with the distortion value of the packet with respect to the low resolution image, the medium resolution image, and the high resolution image. If the packet contains this information, then streaming algorithms can be developed to optimize the media delivery experience to users with diverse client devices.
Multiple Distortion Measures was published in ICASSP 2007 by Carri Chan, Susie Wee, and John Apostolopoulos.
The Future
The last part of the keynote focussed on experiences #2 and #3 to look at the impact of emerging applications on future packet networks.
Experience #2: Immersive, Conversational, and Worldwide
Delivering immersive, high-quality, worldwide experiences has a number of challenges for today's networks. The main problem is that network intelligence exists, but only in spots. For example:
- QoS exists in spots, but is not guaranteed from beginning to end.
- IPv6 exists in spots, but it is often tunneled over IPv4 and so is not available from beginning to end.
- Significant congestion can occur in peering points between administrative domains, and it is very common for packets to traverse administrative domains many times in a single session.
- Due to the sheer number of IP addressses, packets in countries such as India may go through many network address translations (NATs) before being delivered to the recipient.
Public & private domains
As a result, proprietary networks are being built to deliver guaranteed experiences. HP's Halo immersive collaboration experience is built on a proprietary network for that very reason.
In the long run, the right answer is to build out networks that contain IPv6 and QoS. However, until that occurs, there is likely to be a co-existence of public and proprietary networks.
This raises research opportunities of developing protocols and algorithms that improve media delivery over co-existing public and proprietary networks. This also motivates the need to develop packet labels that contain information that can be used by smarter network elements that understand them. And, this once again raises the design principle of designing the labels so that they are specific enough to be useful but general enough to be widely understood.
Experience #3: Pervasive, Personalized, Context-aware
Finally, I described Mediascapes as an example of pervasive, context-aware multimedia experiences. The main essence of Mediascapes is that it uses sensors to trigger multimedia experiences tied to your physical and personal context.
Sensing context in the network
This raises the question of using sensors to sense your context and getting the sensed context into packets that can be used by different applications and services. In the web world, the sensors may exist as GPS sensors, environmental sensors, or personal sensors. In the operator world the sensors may come through carrier-grade network elements as in IP Multimedia Subsystem (IMS) architectures. For example, IMS context can include location, presence, group lists, and subscriber info.
The key is to have the sensors provide context that is wrapped into packets in a manner that they can be easiliy used by applications and services. This raises the challenge of creating a semantic representation for sensed context. Again, like the packet labels, this must be designed in a manner that is both specific enough to be useful but general enough to be widely understood.
Acknowledgments
I'd like to take a moment to give special thanks to thank John Apostolopoulos, Carri Chan, Steve Froelich, Dave Penkler, Qibin Sun, and Zhishou Zhang for their contributions to various parts of this work!
Final note and questions
The audience was great and the talk seemed to generate lots of discussion throughout the workshop.
This was a fun topic to put together for the keynote and I'd like to develop it further. I'd love to hear your thoughts and ideas on any aspects of this.
What are your thoughts and comments on the life of a packet? Did you attend the workshop and keynote? If so, what did you think? I'd like to develop this further. Do you have any suggestions for improvements?
Please feel free to leave a URL with your comments. |