The PIPE, old-school real-time image processing
The PIPE, old-school real-time image processing
Aspex Incorporated developed the PIPE system (Pipelined Image Processing Engine) in collaboration with the National Institute of Standards and Technology (NIST) in the mid-1980’s. This system was well ahead of its time but also a part of the times. The hardware was developed by Randy Luck, BJ Henrici, and Jim Herriman in conjunction with software by Jim Knapp, Shoshi Biro, and several others, all of Aspex in consultation with Ernie Kent, Mike Schneier, Tom Wheatley and others at NIST. An early version was described in Kent’s US patent  and in Kent, Shneier, and Lumia . The actual PIPE implementation is better described in Luck  and .
PIPE was designed to process video images at real-time video frame rates. It consisted of modular processing stages (MPS) that could be flexibly connected under program control in many series or parallel combinations. Each MPS was implemented on two large 15” x 13” circuit boards. Each board was populated with about 200 ICs – 74ALS, 74F, and 74AHCT logic, GALs (small FPGA precursors), dynamic RAM, static RAM, and PROMs.
PIPE MPS circuit boards
The architecture was optimized for real-time point, spatial, and temporal image processing. The design made heavy use of SRAM look up tables and ALUs to perform arbitrary pixel point processing such as scaling, summing, thresholding or non-linear operations. Each MPS had two frame buffers and could perform two real-time 3 x 3 arithmetic or Boolean convolutions. The MPS had what was called a TVF, a two valued function look up table. With two 8 bit data inputs, a 64K x 8 SRAM could perform any operation on two values. Thus with one convolution performing the X gradient, and the other the Y gradient, the TVF could then be pre-programmed with a table of the square root of the sum of the squares of the X and Y gradients and therefore perform things like the Sobel operator in real-time. The frame buffers gave the PIPE temporal processing capabilities. Images were written into the frame buffers while the previous image could be simultaneously read out. The timing system allowed all frame buffers to run synchronously with each other. An entire PIPE consisted of up to 8 such MPS board sets. Each MPS had local image flow connections forward from the previous MPS, recursively from itself, and backwards from the next MPS in the pipeline. This made the system perfect for experimenting with various frame to frame and optic flow algorithms. In addition to the MPS, the PIPE had a video A/D front end and D/A back end for B&W or RGB cameras and other video sources and video outputs to monitors, a set of input frame buffers, a set of output frame buffers, a control stage that orchestrated all the control and handled host computer interface, and finally a processor called ISMAP which performed several kinds of histogram and cumulative histogram functions at frame rate. The system also had 6 video buses that allowed images to be broadcast from anywhere to anywhere in the system beyond the local connections. Other features included region of interest processing and host controlled literal bytes. An AT/386 PC running MS-DOS functioned as a host control and programming computer. PIPE could also connect to a VME or Multibus interface for higher level image understanding applications on high speed computers.
8 stage PIPE system block diagram
MPS block diagram
In 1989 I created a PIPE demo video tape for use as a marketing tool. It included implementations of algorithms that I developed along with example video sequences that some of our customers had developed. Unfortunately the original 3/4” U-Matic master tape has disappeared in the mists of time and only a VHS copy of the demo remains. So the video quality you see on the YouTube video is not great. . Here is the link to the 1989 PIPE demo tape on YouTube.
Description of the demo sequences on the video
1) A sorting application using pattern matching
This is a translation, rotation and scale invariant matching algorithm. It first computes the Sobel edge direction, thresholded by the magnitude. Then the histogram of that is found. With a black background, the object becomes dominant in the histogram. This histogram is simply a list containing a count of the number of pixels in each of 180 edge directions (2o increments were used). Notice that the concept of image structure is eliminated and the histogram becomes essentially a unique signature for the object. Since the structure is gone, it becomes translation invariant. As the object is rotated, the histogram pattern shape remains the same but it just rotates around the 180 possible directions. Object matching then just becomes a rote comparison of the known signature histogram with the current histogram at all 180 rotations. A few other optimizations needed to be applied. In an image with square pixels, the pixels are 1.414 longer in the diagonal direction which causes an error, so this correction must be applied to the histogram pattern. If the measured pattern is normalized, for example to the bin with the maximum count, then over a reasonable range of sizes, the matching becomes scale invariant as well. In the video, the PIPE is performing the image processing, the ISMAP is performing the real-time histogram, and a program in the attached PC is doing the histogram normalizing and pattern matching. Two signature patterns were tested, one for the front and one for the back of the VHS box. As I recall the image processing worked at about 10 to 15 Hz, but clearly from the video, it looks like the computer matching is slower, about 1 Hz. I gave a paper on this  at an Electronic Imaging East conference. Here’s a link to a PDF of this paper. We also demonstrated this algorithm using an attached neural network co-processor to perform the pattern matching at an SPIE conference in the late 1980’s. This algorithm is somewhat like a global version of the HOG algorithm and was independently invented at around the same time.
2) Object tracking applications
The first example is performing a frame difference followed by thresholding and Boolean dilation. Histograms from ISMAP are finding the center of mass of the lit up pixels and then the host is drawing a cross hair cursor on that spot.
The Boston University sequences were courtesy Allen Waxman and are described in  and .
3) Template matching for quality control
This is showing a Sobel magnitude image, followed by histogramming. The differences between the current live histogram and a stored known good histogram are shown as a number.
4) Various image re-mapping functions
The first part is performing a log-polar transformation similar to Weiman and Chaiken  using non-linear functions for the read addresses of the image in the TVF.
The second part shows terrain mapping sequences, courtesy Eamon Barrett at Lockheed. Essentially this is Google Earth, 1988 style.
The driving demo shows variable resolution, depending on where the cursor is located. The ideas is that if the user’s eye gaze point could be sent at a low data rate back to the remote vehicle, then a variable resolution low bandwidth image could be returned to the user. The image is high resolution where the user is looking and progressively lower resolution in the periphery. This variable resolution fovea idea was conceived around the same time that the data compression techniques in JPEG and MPEG were invented. In the video, the periphery pixels look like they could use some filtering.
5) Thinning using Boolean morphology
PIPE’s Boolean neighborhood is performing connectivity preserving thinning on binary images.
6) Blob analysis using connected components
The PIPE is just capturing and thresholding the image, the attached PC is running a connected components program finding areas and bounding boxes. We had plans to develop a real-time connected components processor (CONCOMP) board for the PIPE, but never finished it.
7) Dynamic image centering
An early example of image shake reduction. The PIPE and the attached PC are finding the centroid of the Space Shuttle in the image, then adjusting X and Y address offsets to keep the centroid in the center of the frame. Every video camcorder made now has a similar feature, some use tracking, some use accelerometers.
8) Model based vision
The first part shows the PIPE running a corner detector using Gaussian curvature. Shrinking reduces these curvature maxima to dots. The ISMAP histograms make finding the X, Y locations of the dots easy. The attached PC then performs the model match as shown on the PC’s screen.
The second part was courtesy of Chuck Dyer at the University of Wisconsin and is described in .
9) Motion flow computations for velocity and direction
The first part of this demo is performing the simple Horn and Schunk optical flow computation :
Essentially this is just the frame difference (delta time) image divided by the Sobel magnitude (spatial gradient magnitude). As the frames flow through the PIPE, temporally, the delta time image is computed two frames apart using the first and third, and the gradient magnitude is computed from the middle, second frame.
The second part of this demo is based on van Santen and Sperling  and Adelson and Bergen . It is computing “Reichardt” type quadrature detectors in (x, t) and (y, t), thresholding on strength, taking the atan2 of the (x, t) and (y, t), and finally color coding for direction. The PIPE implementations of these image flow algorithms are described in .
10) Edge detectors
The first part is showing the Sobel direction, thresholded by the magnitude, and then color coded for edge direction. One PIPE stage could perform all this at 60 Hz.
The second part is showing thresholded zero crossings from the difference of Gaussians.
11) Hough transform for lines
The Hough transform based on histograms of Sobel direction images. You can see the input binary image in the lower left and the Hough space along the top.
12) PIPE’s menu driven, graphical software interface is both easy to use and versatile
PIPE had its own micro-coded programming system. Run-time programs could be downloaded, and it would run by itself. We devised a software tool called ASPIPE  that was written in C, ran on MS-DOS, and used the PC’s EGA text mode graphics characters. It was essentially a text mode pop up windowing system, developed before Windows existed. I based some of its visual design and organization on an article on the Smalltalk environment that I had read about in Byte magazine. The entire ASPIPE program was less than 640K bytes, though as I recall it did use overlays. The software used graphic representations of the signal flow in the PIPE hardware. It presented the entire system with the physical orientation of the processors along the horizontal axis and frame time on the vertical axis. To program it, you set up the image data flow through space and time. You could click on objects and menus would pop up to let you make selections on how that hardware object would behave at that moment in time. Click on one of the processors on this chart and a diagram of the MPS would pop up. Then click on, for example, a neighborhood operator and set up the mode and mask in a pop up. A second tool called LUTGEN let you enter equations to make the various look up tables. The LUT functions could be graphed out, created, saved as small files, and then could be re-used in any program. PIPE’s main control was via an attached PC. Due to limits in the state of the art at the time, the PC/AT bus interface was not fast. PIPE had two computer ports and some customers opted to connect the second port to a VME or Multibus interface to link higher speed computers like the Sun, Sequent, Apollo, Masscomp, and others for faster high level vision processing.
2019 retrospective, 30+ years on
The PIPE’s pixel, image and frame synchronous nature was both a significant benefit, but also in hindsight a limitation for some applications. Image size was fixed at 256 x 240, 8 bit precision, and 60 frames per second. This meant that the user did not need to deal with setting the frames up which was good, but limiting. After customer requests, extensions were added that allowed 512 x 480 image resolution at 15 Hz and 16 bit precision but this capability was not easy to use. For some kinds of algorithms like morphology, one might have wanted to cascade several erosions or dilations in sequence without incurring frame delays, but the architecture didn’t allow that. Everything that flowed into a MPS needed to pass through a frame delay. Pipelining up to 8 stages allowed the 60 Hz frame rate to remain at full speed, but there could be latency delays of up to 8 frames. If an algorithm needed more than 8 stages of processing, you could then add more time by slowing the processing down to 30, 20, 15, 10, 7.5, etc. Hz. Other contemporary vision processor architectures could avoid this issue. For example ERIM’s Cytocomputer  was intended to perform cascades of morphology operations within a frame time and it was really good for those kinds of inspection type algorithms, but it didn’t have the temporal optic flow capabilities that the PIPE had. Cal Tech’s PIFEX architecture  was in some ways more flexible because it used a more general cross bar architecture to connect its various processing hardware elements. I don’t think it was ever commercialized though. By the late 1980’s, we started to consider a next generation PIPE 2 architecture that combined the best features of the PIPE with cross bar connections similar to PIFEX. The PIPE was also contemporaneous with and in many respects, faster for many image processing applications than the more general purpose WARP systolic array processor from Carnegie Mellon University. An 8 stage PIPE system could run at up to 1.2 GOPS.
Aspex Incorporated ultimately sold 42 PIPE systems, a few with 1 MPS, but most either with 3 or 8. These went to university, government, and corporate research laboratories in the US and Asia. Some 10 systems were sold to Neuromedical Systems Inc. which used PIPE as the image processing front end for the first generation of their neural network based automated pap smear screening system, PAPNET described in their patents and this article .
One of the reasons this PIPE 2 never got developed was because ultimately something else did in the PIPE and all the other contemporaneous dedicated processors. By the early 1990’s, Intel 486 and Pentium processors and the new PCI bus got fast enough to be realistic for machine vision use. Essentially Moore’s Law caught up. This enabled most commercial customers to be able to do their machine vision applications using a simple PCI frame grabber and software running on the PC in a much more cost effective system. While the spatio-temporal image flow processing of the PIPE was very interesting to some in the research community, most industrial applications didn’t need it. Today I can write many of the same applications seen on this video using OpenCV and C++ or Python. On my Intel I7-8700 motherboard, these apps can run at speeds close to what the PIPE was doing back then but with much higher resolution frames. My desktop PC was put together for a very reasonable sub $1K cost. Lesson learned, ignore Moore’s Law and its various corollaries at your peril! Another lesson learned, a major cost reduction will almost always win over the incumbent technology even when there is also a small performance reduction. I had a professor tell me that the PIPE was really cool, but that he could get 10 Sun-3 systems and have 10 students working at the same time for the cost of 1 PIPE. Even though the Sun-3’s were not real time, the increased utilization was worth it. While it might be interesting to contemplate putting all the PIPE’s functionality into a few FPGA’s, very little you can create in this way can easily compete with close to free.
I think a second reason PIPE 2 never got developed had to do with the fall of the Berlin Wall. Many of the PIPE customers got their funding from various agencies of the US Defense Department. During the Reagan era, there seemed to be lots of money available. After the Wall fell and the Soviet Union broke up, for all intents and purposes, the Cold War ended and these funding sources dried up.
It was a lot of blood, sweat, and heartache, but also fun while it lasted!
The PIPE had many attributes that were really useful for temporal computer vision. Today, most computer vision applications both for industrial machine vision and for Convolutional NN’s, and Deep NN’s do not use or take much advantage of the temporal domain. Hot contemporary applications like ADAS and vision for self driving cars could benefit a lot from using the rich information available in the temporal domain. Also I think that OpenCV could really benefit from making the temporal domain easier to set up and use.
Back in the 1980’s there was a lot of interest in special purpose attached processors. The PIPE was one, and there were many others with similar acronyms; PIFEX, Cytocomputer, and Warp (all mentioned above), PUMPS, PICAP, ZMOB, Vicom, Butterfly, FLIP, MPP, Pixar (yes that Pixar), HNC, and many others. These were all developed because general purpose computers were not fast enough to handle the massive amounts of high speed data involved in computer vision. Today, the latest desktop PCs and even the processors in smart phones are fast enough for many imaging tasks. However, now there is a lot of renewed interest in special purpose processors that can handle certain tasks at higher speeds than a general purpose CPU. At the Embedded Vision Summit 2019 conference, I heard that over the last 2 years, VC’s have invested more than $1.5B in special purpose vision, NN, and AI chip companies. The original backpropagation based neural networks typically had only 3 layers, an input layer, a hidden layer and an output layer. In the PAPNET cancer screening system  mentioned above, the NN operated on 32 x 32 chunks of monochrome pixels that were selected by the PIPE image processing as most likely to be containing the nucleus of a cell. The NN then determined if that chunk was most likely a cancer cell. The NN therefore had 32 x 32 = 1024 input neurons. The hidden layer was about 25% of the input or 256 neurons and it had a single output neuron since it was just classifying how likely the 32 x 32 area had a cancer cell. Originally they used an attached “neurocomputer”, but Moore’s Law caught up and later production versions used a standard computer.
In contrast, today’s Convolutional NN and Deep NN’s such as ResNet-50, AlexNet, and others are a different beast altogether. These NN’s can be applied to operate on high resolution and/or color images. AlexNet uses 256 x 256 RGB input images, has 8 layers, 60M parameters, and 660K neurons. It was programmed to run on two Nvidia GPUs. ResNet-50 has 50 layers. These NN’s are only realistic on attached GPU’s, FPGA’s, or custom ASIC processors. Today’s AI chip companies are quoting NN inference in the tera- and peta-ops range. So it seems that the battle between custom attached processors and the standard CPU has come full circle.
 Kent, US Patent 4,601,055, 1986.
 Kent, Shneier, and Lumia, “PIPE (Pipelined Image-Processing Engine)”, Journal of Parallel and Distributed Computing, V2, Issue 1, pp 50-78 (Feb. 1985).
 Luck, “PIPE: A Parallel Processor for Dynamic Image Processing”, Proc. SPIE V.758, (1987).
 Luck, “An Overview of the PIPE System”, Third Int’l Conference on Supercomputing: Supercomputing ‘88, Vol III, Boston, MA, (1988).
 Luck, “Translation, Scale, and Rotation Invariant Pattern Recognition Using PIPE”, Proc. Electronic Imaging East ‘88, (1988).
 Waxman, Wong, Goldenberg, and Bayle, “Robotic eye-head-neck motions and visual-navigation reflex learning using adaptive linear neurons”, Neural Networks, V1, Supplement 1, page 365, (1988).
 Baloch and Waxman, “A neural system for behavioral conditioning of mobile robots”, International Joint Conference on Neural Networks, (1990).
 Weiman and Chaiken, “Logarithmic spiral grids for image processing and display”, Computer Graphics and Image Processing, 11, (1979).
 Verghese, Gale, and Dyer, “Real-time motion tracking of three-dimensional objects”, Proceedings, IEEE International Conference on Robotics and Automation, 1990, pages 1998-2003.
 Horn and Schunck, “Determining Optical Flow”, Artificial Intelligence 17, (1981).
 van Santen and Sperling, “Elaborated Reichardt detectors”, JOSA, Vol. 2, No. 2, (1985).
 Adelson and Bergen, “Spatiotemporal energy models for the perception of motion”, JOSA A, Vol. 2. Issue 2, (1985).
 Luck, “PIPE, a parallel processor for dynamic image processing”, Proc. SPIE V.758, (1987).
 Luck, “ASPIPE: A Graphical User Interface for the PIPE System”, Proc. SPIE V.1076, (1989).
 Sternberg, “Parallel architectures for image processing”, Proc. 3rd International IEEE COMPSAC, pp. 712-717, (1978), (and numerous other subsequent articles and patents by Sternberg, Lougheed, and/or McCubbrey).
 Gennery and Wilcox, US Patent 4,790,026, (1988).
 Luck, Tjon, Mango, Recht, Lin, Knapp, “PAPNET: An Automated Cytology Screener using Image Processing and Neural Networks:, Proc. SPIE 20th AIPR Workshop, V.1623, 161-171 (1991).