Live mobile panoramic high accuracy augmented reality for engineering and construction

Those of you who have read my previous blog posts on Augmented Reality (AR) (Ref1, Ref2, Ref3) know that one of the main challenges of AR is being able to track the user’s position and orientation in real time.  Overlaying a 3D model approximately at the right location on top of the physical world can be done relatively easily – a GPS and a compass can provide sufficient accuracy for rough alignment, which is enough for several types of applications, including finding the nearest restaurant, the whereabouts of your friends, or displaying nutritional info of a product one holds in his hand at the grocery.  That is easy.

The real difficulty arises when one wants to get accurate augmentation – I mean the type of augmentation that engineers would require.  Let’s say, for instance, that an electrical engineer uses a smartphone AR app to aim at an electric cable with the phone and “click” on it with a crosshair to display – say – the live voltage being measured on that cable.  It such a situation, it is extremely important that the AR system should display the voltage related with that specific cable, and not the other one located 5 cm to the right, because the engineer’s life may depend on it…  Talking with several of our users, we came to realize that accuracy is paramount if we want to develop AR apps for engineers.  Actually, a non-accurate AR app would indeed be “cool” (for a while) but quickly abandoned by serious users, as soon as they would realize they could not rely on it.

In our previous work, we “walked around” the user tracking problem by doing the augmentation on panoramic images.  An image being static, no tracking is required, and the augmentation is very precise (no jittering is observed).  That enabled us to develop prototypes for testing hypotheses that could not easily been tested with standard AR technology (that require real time tracking).  Having said that, augmenting images is far from ideal: an image is, by definition, out of date from the moment it is captured, it may not be up to date with the surrounding world and, most importantly, it is static, so it cannot display any live event taking place in the scene (such as a user trying to interact with the augmentation).  That is rather limiting.  Live augmentation is something most researchers in the field are trying to achieve.  After all, reality is about the present – so augmenting it should be taking place now

We looked again at the reasons why we chose panoramas in the first place.  First, a panorama represents an environment – if we are to augment reality using a static image, better be one that has a very wide field of view, to show enough of the environment and this way partly compensate the fact that the camera cannot be moved.  Also an image is static, so no tracking is required, as discussed above, making the augmentation very precise.  But there was also another reason: a panorama provides image data all around the camera.  That is important, as that image data is used to calculate the camera position.  Building corners, windows, and other features are used to calculate the camera position.  But think of typical standard cameras, with their relatively narrow field of view – if your camera gets too close to a wall for instance, all the camera can see will be a featureless wall surface, and no striking feature to calculate the position from.  In such a situation, a panoramic camera has much more chance of seeing other features, decreasing the risk of falling into a situation where the camera position cannot be calculated.  That means more accurate augmentations.  In summary, panoramas are good, but they would be even better if they would not be static…

So we proposed a combination of the two: augmenting live panoramic video.  We used a nice panoramic video camera from Point Grey Research as the basis of our system, which is used as follows: the camera is installed on a tripod, at a stationary position.  In an “initialization” phase, the live panoramic stream is first aligned with the 3D model – this way the augmentation can be displayed at the right location on the panoramic stream.   This initialization process actually calculates the camera position in the model.  From that point, augmentation can take place, and a user can augment any area surrounding the camera, assuming it is visible from the camera position.  Since the camera is stationary, the augmentation is jitter free, and potentially much more accurate than with systems that require live camera tracking.  Suppose now the user wants to augment a different location in the building, he simply moves the tripod.  In the process, the system “tracks” surrounding features, calculating the camera position every frame.  When the user puts the tripod back on the ground, the system knows where the camera is located (since he tracked it while it was being moved), which means the user does not have to re-initialize the system, and can resume augmentation right away. 


Our system is composed of a panoramic camera, one tripod, and 2 laptops.  The camera produces 75 Mb of data per second, so we needed quite some processing power to augment that sort of video stream…

The system in action is shown in the following video: