Incremental Engineering of Computer Vision Systems

TL;DR: Things change. People are often wrong. I show how to get computer programs to adapt without suffering from existential angst. [pdf]

My Ph.D. research was in “incremental engineering of computer vision systems.” Like many PhDs out there it tries to solve a general problem in computer science in the context of a specific application domain. The general problem is “engineering of computer vision systems”, in such a way that they easily evolve over time. The specific application domain is “medical image analysis”.

Let us start by setting the scene

Why medical image analysis? Over 95 million high-tech scans (i.e. CT, MRI, PET, etc.) are conducted each year. With so much data out there, radiologists/doctors can definitely use a helping hand in automatically interpreting the CT scans. This is where medical image analysis systems can help to detect anatomical structures and highlight abnormalities representing symptoms of diseases. Also, medical image analysis systems are a type of computer vision systems and therefore face the problem typically faced in developing computer vision systems. So by applying my research to medical imaging, I would have a complex real-world problem to test my theories against.

Now, onto the actual problem – Engineering of computer vision systems is very difficult. It is very very hard. Those of you with computer vision background know why this is the case, but non-vision people might doubt this assertion? After all aren’t software engineers expert programmers with uber-skills and have access to image processing algorithms developed over last 4 decades at their disposal? In fact, many software engineers have the first-hand experience with their own vision system, namely their own eyes and have navigated around the world really quite well. They should have the knowledge needed to build vision systems.

Well yes, but the problem is that “… the mental operations that we perform to interpret images lie almost totally beyond the threshold of consciousness” (Crevier and Lepage 1997).

So it is not surprising that our “experts” find it difficult to articulate the knowledge required by vision systems. The vision experts (i.e. software engineers who build vision systems) therefore tend to be quasi-experts. They have a general understanding of how to interpret an image, but lack specific knowledge on which algorithm to select, what parameter to use for those algorithms and even how to combine them with other algorithms. This means that when our vision experts try to code up their application, they often articulate incomplete or inaccurate knowledge within the algorithms of vision systems.

Consequently, computer vision systems are in practice developed incrementally that often descends into ad-hoc engineering. Engineers build the system, test it out, learn from the failures and then try to tweak the system. With each subsequent tweak being made even more reluctantly because changing one part of the system can actually unexpectedly regress the system.

At this point, some of the pattern recognition and machine-learning folks jump up and say something like “We don’t need experts. Why can’t we use machine learning to automatically tune the system from lots of data?” Let me address these concerns. (Non-machine learning folks, bear with me and if you get lost just meet me down where it says “This brings us back to the initial problem… “.)

Why not machine learning?

Machine learning is great but cannot always be used, especially when you are dealing with the real-world constraints of building medical images analysis systems. Machine learning algorithms that automatically mine the data for patterns rely on the sufficiently large amount of “training” data and may require data to be labeled with ground truth. Here is an example of how I used machine learning to speed up image segmentation, but that needed lots of labeled training data.

★ A. Misra, M. Rudrapatna and A. Sowmya, “Automatic Lung Segmentation: A Comparison of Anatomical and Machine Learning Approaches“, International Conference on Intelligent Sensors, Sensor Networks and Information Processing, Melbourne, Australia pp. 1-6, Dec. 2004. [pdf]

When building medical image analysis, we often don’t have lots of data apriori, nor labels for the data that we do have. Patient data tends to trickle in over time. One by one. Bit by bit. What is even worse is that we can’t get accurately labeled data for inductive or statistical learning algorithms. Doctors are way too busy (at least in Australia) working ridiculously long hours. They don’t have time to sit down and annotate images with disease patterns.

We did try to make it easy for doctors, making a web-based image annotation system. We hoped that at the end of the day, doctors would sit back with a glass of wine, put their favorite music on, log onto the system and markdown to pixel level regions of diseases that can be seen in a scan of the lung.

★ M. Rudrapatna, A. Sowmya, T. Zrimec, P. Wilson, G. Kossoff, P. Lucas, J. Wong, A. Misra, S. Busayarat, “LMIK – Learning Medical Image Knowledge: An Internet-based medical image knowledge acquisition framework,” In Internet Imaging Conference of SPIE, San Jose, Jan 2004. [pdf]

Unfortunately, it didn’t quite work out that way, because of the practical realities of doctor’s time.

But what about snakes, level-sets, and graph-cuts?

Ok, fine machine learning won’t work. But what about iterative optimization or energy-minimization based methods like active contours (i.e. “snakes”), level-sets, graph-cut? They don’t need labeled data.

Yes, I did explore those, but techniques like snakes and graph-cut have two main limitations. First is that they require an initial estimate of regions in the image that belongs to the object that you are interested in and the background. So either you need doctors to manually mark annotate images for input to graph-cut algorithms (which doesn’t work) or you need some image processing algorithms to automatically estimate the likely regions (which does work), as we did in the following.

★ L. Massoptier, A. Misra and A. Sowmya, “Automatic Lung Segmentation in HRCT Images with Diffuse Parenchymal Lung Disease Using Graph-Cut,” International Conference Image and Vision Computing New Zealand, Wellington, NZ, pp. 266-270, Nov. 2009. [pdf]

★ L. Massoptier, A. Misra, A. Sowmya and S Casciaro, “Combining Graph-cut Technique and Anatomical Knowledge for Automatic Segmentation of Lungs Affected by Diffuse Parenchymal Disease in HRCT Images”, to appear in International Journal of Image and Graphics, 2011.

The other problem is that these methods require an energy function or equation that is iteratively optimized. Coming up with these types of equations for each anatomical structure is not easy. It requires lots of expertise and if the equation has many terms then we have another problem – How do you select the weights for each term for the final energy evaluation? Should the strength of the edge be more important or labels on neighboring pixels? Remember our vision experts are really quasi-experts, who may have general ideas on what these terms should evaluate, but may not have the specifics down pat. We can’t expect them to develop equations and select parameters that will work in all circumstances.

This brings us back to the initial problem that we face in engineering vision systems. Data trickle in slowly over time. Vision algorithms and techniques evolve. Even our own understanding of the problem, the algorithms and parameters for those algorithms evolve. So engineering vision systems have to be incremental.

But incremental ad-hoc tweaking of algorithms and parameters make it very (very) difficult for experts to be confident that the change they are making will not regress the system and significantly limits their ability to build large and complex systems. One tweak to fix the system for one specific case might break the system for other cases that were working fine until this change. It is quite a dilemma.

My solution

I approached the problem by looking at how knowledge acquisition techniques can be applied to address the problem of incremental knowledge acquisition in computer vision. In particular, I looked at a technique called Ripple Down Rules (RDR). RDR capture knowledge in a nested hierarchy of rules and their exception, along with the evidence supporting the rules in the form of cornerstone cases. Their key innovation was with the idea of validated localized revisions to the knowledge base by checking the consistency of the cornerstone cases. RDRs have been pretty successful in a variety of applications including pathology report analysis, email classification, natural language processing, classification problems in image processing and chip design to name a few.

With this in mind, I adapted RDR for computer vision domains by building two frameworks for incremental engineering of computer vision systems, as well as run a simulation study into the impact of quasi-expertise.

The first framework that I developed is called ProcessRDR, which looks at how we can acquire control knowledge from vision experts incrementally, as and when new data trickles in over time. ProcessRDR was so effective that in a specific scenario of segmenting lung regions, it was able to reduce the time taken to develop a system from 3 months down to 4.5 hours. All this, while maintaining confidence that each tweak to the system is not going to degrade the system performance for cases known to work previously.

★ A. Misra, A. Sowmya and P. Compton “Incremental Learning for Segmentation in Medical Images,” In IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Arlington, Virginia, USA, Apr 2006. [pdf]

★ A. Misra, A. Sowmya and P. Compton, “Incremental Learning of Control Knowledge for Lung Boundary Extraction,” Pacific Knowledge Acquisition Workshop 2004, as part of Pacific Rim International Conference on Artificial Intelligence (PRICAI) 2004, Auckland, NZ, pp. 1-15, Aug 2004. [pdf]

The second framework I developed is called ProcessNet, which generalized the concepts of RDR from rule-based systems to any arbitrary form. This means, knowledge no longer has to be encoded in rules but can be within the algorithms and source code of image processing modules. The other thing it provides is the capacity to handle complexity, so we can incrementally evolve large computer vision systems with lots of smaller components that interact with each other.

With ProcessNet, I built a system that simultaneously segments lungs, shoulders, sternum, spine, trachea/bronchus and oesophagus. As far as I know, there is no other system out there that can do this.

★ A. Misra, A. Sowmya and P. Compton “Incremental Engineering of Lung Segmentation Systems” In Ayman El-Baz (Editor) & Jasjit S. Suri (Editor) Lung Imaging and Computer Aided Diagnosis, CRC Press, ISBN: 439845573, Aug 2011. [pdf]

★ A. Misra, A. Sowmya and P. Compton, “Incremental System Engineering Using Process Networks“, Pacific Knowledge Acquisition Workshop 2004, as part of Pacific Rim International Conference on Artificial Intelligence (PRICAI) 2010, Daegu, Korea, pp. 150-164, Aug 2010. [pdf]

OK, that is all well and good, but didn’t we start by saying that vision expert is really quasi-experts, so how can we rely on vision experts to do a good job of creating a complex system? Well, ProcessRDR and ProcessNet allow the quasi-expert to change his/her mind and incrementally evolve the system. We know that over time they will help quasi-experts, but it is still worth investigating how much of an impact quasi-expertise has on knowledge acquired for computer vision applications.

So the third thing I did was study the impact of quasi-expertise on knowledge acquisition on computer vision domain. I simulated expertise by using machine learning techniques to build an artificial expert (an oracle) and ran simulation studies on what its like with varying degrees of expertise – all the way from a perfect expert with 0% error to a complete idiot who is guessing 100% of the time.

The most interesting finding here is that if you restrict the number of choices for the expert (i.e. convert numeric features to nominal values), then no matter how bad the expert is, the expert will build a decent system. Too much choice is a bad thing, especially when the expert is partially guessing. In fact for numeric vision features a poor expert can build a system with 6x more errors than a perfect system For the same features framed as nominal features at worst the expert would lead to a system with 2x as many errors as a perfect system. Since ProceesRDR and ProcessNet frameworks simplify the complexity and allow the expert to focus on small parts of large systems, no matter how poor a grasp they have on the control of the vision algorithms and their parameters, they will eventually build a stable vision system.

★ A. Misra, A. Sowmya and P. Compton, “Impact of Quasi-expertise on Knowledge Acquisition in Computer Vision“, International Conference Image and Vision Computing New Zealand, Wellington, NZ, pp. 334-339, Nov 2009. [pdf]

Summing up

My research showed that using ProcessRDR and ProcessNet, we can develop large and complex vision systems incrementally, with the confidence that each tweak is not going to degrade the system for previously known cases. The study into quasi-expertise also tells us that by constraining the options for each component computer vision experts can develop good computer vision systems despite their quasi-expertise. [pdf]