The National Science Foundation's (NSF) Tokyo Office periodically receives and disseminates reports on research developments in Japan that are related to the Foundation's mission. NSF-sponsored researchers currently working in Japan prepare many of these reports. These reports present information for use by NSF program managers and policy makers; they are not statements of NSF policy.
Mr. Daniel Morris, a Ph.D. candidate in the Department of Robotics at Carnegie Mellon University, Pittsburgh, Pennsylvania, co-authored the following report. Mr. Morris was a participant in the 1998 Monbusho Summer Program sponsored by NSF and the Ministry of Education, Science, Sports and Culture (Monbusho). Professor Kenichi Kanatani of the Department of Computer Science at Gunma University, Gunma, Japan hosted Mr. Morris and co-authored this report. Mr. Morris can be reached via email at: ddmorris@ri.cmu.edu; Professor Kanatani is at: kanatani@cs.gunma-u.ac.jp
1 Introduction
The Computer Vision field has seen significant interest in the task of using an image sequence taken by a moving camera to recover both the shape of an object being viewed and the motion of the camera. This is called Structure from Motion (SFM). A fundamental and empirically successful algorithm for achieving this was proposed by Tomasi and Kanade [4] and is called the factorization method. However the actual algorithm, as originally proposed, used a simple least squares method for overcoming noisy measurements and so is not optimal even when the noise is uniform and isotropic. In our work this summer Professor Kanatani and I used some of the basic theory developed by Kanatani [2,3] to find a first order theoretical bound on achievable accuracy by the factorization method. We also derived an optimal solution technique for the factorization method, again using the methods of Kanatani. We summarize our results in this report.
2 Factorization Method
Here we provide a brief overview of the factorization method that is described in more detail elsewhere [1,4]. The factorization method assumes that a set of features on a rigid object are viewed in a sequence of images. It then uses just these feature positions in the images to deduce the 3D positions of these features as well as the motion of the camera relative to the rigid object. This is the essence of the factorization method. A key property of the factorization method is its efficient two-step method for solving the problem. First, to eliminate most of the noise, features positions are projected into a small subspace which lies close to the true solution. Second, camera model constraints are applied to this intermediate s olution to achieve a final estimate for shape and motion. This method has empirically been shown to work well and gives accurate results when noise is small.
3 Noise Model and Optimality Criterion
An important weakness of the original algorithm is its dependence on a posterior treatment of noise through heuristic least squares. No noise model is used and hence its results are biased and sub-optimal. In our work we assume that we have a mean and covariance estimate for the noise in the data.
Using this noise model we are able to define an optimality bound for the solution. To do this we calculate the Cramer-Rao lower bound for the variance of the recovered parameters. However there are two key complicating factors here that make the problem more difficult than standard statistical analysis. The first factor is the incorporation of the model constraints requiring that perturbations lie in a subspace of the parameter space. This is achieved to first order by projecting the perturbations onto the tangent hyperplane of the parameter space at the solution. The second factor is the need to define a unique solution about which our pertubation analysis can be applied since the factorization method only recovers the parameters up to an arbitrary translation and rotation. Once we incorporate these factors we obtain an optimality bound on the structure from motion task.
The next phase of our work is to devise a method for achieving the optimal solution. We show that Maximum Likelihood estimation obtains the optimal solution to first order. So by minimizing an appropriate cost function using the Gauss-Newton method and constraining parameters to lie within the solution space, we can recover the optimal solution.
5 Conclusion
Our accuracy bound result is important since it tells us what is the best expected accuracy we can achieve for any given data. It thus provides an uncertainty estimate for the optimal solution technique we derived.
We plan to continue our collaboration on this project. The next step is to implement our method and evaluate the improvements we obtain.
References:
[1] T. Kanade and D. D. Morris. "Factorization methods for structure from motion." Phil. Trans. R. Soc. Lond., A 356(1740), pp. 1153-1173, 1998.[2] K. Kanatani. Statistical optimization for Geometric Computation: Theory and Practice. Elsevier, 1996.
[3] K. Kanatani. "Statistical optimization and geometric inference in computer vision." Phil. Trans. R. Soc. Lond., A 356(1740), pp. 1153-1173, 1998.
[4] C. Tomasi and T. Kanade. "Shape and motion for image streams under orthography: a factorization method." Int. J. Comp. Vision, 9(2), pp. 137-54, Nov. 1992