Prof Sam Kwong and his team members in City University of Hong Kong
Multiview video, recorded video sequences using multiple cameras, has
attracted much attention recently since it is capable of
representing high quality 3D world scene, and provides new visual enjoyments beyond
2D, such as 3D depth impression and interactive selection of arbitrary
viewpoint/direction within a certain range of distances. With these
features and the technological advancements in display
technology, it would allow many new visual media applications, such as photorealistic
rendering of 3D scenes, free-viewpoint television (FTV), 3D television (3DTV) broadcasting, and
3D games, to provide exciting functions for users. However, multiview video consists of video
sequences (of the same scenario) simultaneously captured by multiple cameras from different
angles/locations, resulting in tremendous amounts of data with extremely high temporal and interview
redundancies. For example, an 8-view multiview video plus depth
(1920x1080@60Hz) compromising to autostereoscopic 3D displays
has 5.56 GBytes per second raw data which equivalent to 16 times of the size of a single view
video. Moreover, its data volume increases with the number of views and more views would be
required for more realistic 3D representation.
Thus, efficient compression technique is vital for the success of
multi-view video. Multiview Video Coding (MVC) is highly demanded and developed
as an amendment to H.264 MPEG-4 video compression standard in order
to achieve better compression efficiency than the independent
mono-view coding. However, the coding improvement is at the cost of
dramatically increasing encoding complexity due to the
facts that the coding units’ (block size) are variable for higher
prediction accuracy (i.e. variable-block-size mode) and predictions
are from not only temporally related pictures of the same
camera, but also pictures of neighboring cameras. It is
imperative to design optimization approaches which remove the
computational obstacles for MVC.
|
|
In this project, a series of highly efficient and
low complexity optimization techniques is developed to overcome the computational problem. It is
found that video contents and coding information of the multiview video
are highly spatial-temporal-view correlated to each other. Consequently,
coding parameters of the current coding unit could be estimated and/or predicted from
previously coded information in spatial-temporal-view
domain. Thus, these coding parameters do not require for
transmission and the complexity of the best parameters’ selection is reduced.
In addition, we have successfully developed several all-zero block (i.e.
all residual coefficients of a block are zero after being encoded)
detection algorithms to predict whether coefficients of the current block are
all-zero in advance. Then, they can be applied to the complexity intensive
coding modules, such as motion/disparity estimation, transform and
variable-block-size mode decision, to skip all-zero block coding and other
unnecessary memory access.
Multiview Video Coding for 3D Video System
Additionally, since signals of prediction coding have certain statistical
characteristics, e.g. prediction residuals are Gaussian distributed, a series
of statistical early termination approaches are developed and successfully
applied to multi-reference motion/disparity estimation and mode
decision to significantly reduce complexity of the encoder. These novel methods
open a new horizon in early termination of MVC, as well as the monoview
video coding. Based on these optimization techniques, fast MVC
encoders can be designed which makes MVC easily applicable to interactive 3D
cinema/TV, FTV, 3D gaming, immersive virtual reality and other multi-view
video-based
services. In addition, it reduces the cost for industrial realization and
production of 3D encoders.
Prof Sam Kwong Tak Wu
Department of Computer Science
City University of Hong Kong
cssamk@cityu.edu.hk
|