Augmented reality (AR) is for sure a hot subject. Several AR applications are already available on smartphones: real estate, tourist info, restaurant/shop location and reviews, subway stations location, games, etc. It is just beginning, as AR technology could be leveraged to many more areas – imagination is the only limit. Of course, one application area of high interest for us is infrastructure.Potential applications of augmented reality for infrastructure are also numerous: Here are a few (see images below): buidling site monitoring, underground infrastructure visualization, identification and query, etc
I am convinced that if you had a basic AR system for infrastructure that you could use on your smartphone, you could think of many more applications.
The main difficulty with augmented reality is registration: the capacity to display the 3D model information at exactly the right location on display, with respect to the real objects. It is important: for instance, if you want to know the exact location of a damaged pipe underneath the ground surface (because you need to excavate and fix it), you would like the virtual pipe to be displayed just on top of it, on the ground surface, and not floating in the air somewhere around you. For accurate display of the pipe to be possible, one thing we need to know is the exact user position and orientation (well actually what we need to know is the exact position + orientation of the mobile device used for augmentation). If we have that info, AR is easy. Unfortunately, that information is very hard to obtain with accuracy. And if it is only approximate, augmentation will also be approximate. A rough estimate of position and orientation is easy to get: GPS and compass provide a rather good approximation. That is what the commercial AR applications rely on. As a consequence, they are not very accurate. See an example: the Wikitude AR browser:.
As you can see, the model is displayed in the air, kind of "floating" and does not accurately track the device orientation. Note that the user moves the device very slowly. My guess is the result would only be worse if he was moving it more quickly. The reason is the sensors the app relies on to measure its position and orientation: GPS, accelerometer, and compass. Those devices are not very accurate, so is augmentation.For some types of applications (e.g. tourist information), that sort of accuracy is sufficient. But for infrastructure engineering, augmentation has to be displayed with much more accuracy. If you want to get information about a fire hydrant, you probably want to make sure the information you get is related with that hydrant, and not the drain next to it. The accuracy at which we can measure the position and orientation of an augmentation device is a major problem in augmented reality research, and prevents the development of serious AR applications. Note that researchers in the AR world spend a lot of energy trying to solve that problem, but until then, we need to find (temporary) solutions.Our team has been working on the problem for a while. What we thought is instead of trying to track position and orientation, why not make the problem simpler? Position is by far the hardest part to measure with accuracy, so let skip that – let’s assume the user does not move, but simply rotates around. (I know, that is quite a constraint, but let’s assume that for now). If we know the user location and we know that he stays at that location, all we have to measure is his orientation. And that is much easier to measure with accuracy. Now if you stand at a specific position and all you can do is turn around, what is your perception of the world? It turns out the world becomes a 2D image, in the form of a 360 degree panorama. So why not augmenting panoramas instead? After all, there are plenty of panoramas around... (think of Google Streetview).So we developed a technique for registering and anchoring a 3D model to a panorama with high accuracy, and used it to develop 2 prototypes: one that runs on the desktop, another one on a tablet. The desktop version can be used to view infrastructure in its real world context, from a remote location (e.g. your office). Check the video below. In the first part, we display underground infrastructure through the ground, and the user can click on individual elements of the model, to get their attributes (which may be stored in the model, or in a database). It could be used for instance for planning site visits. In the second part, we show the use of the method for viewing infrastructure before it is built, from various positions (by moving to the next panorama)
Update: Dec 14, 2011:
Want to see more? Dont miss my other post on augmented reality for underground infrastructure!
Interesting. Yes, it could. Well, eventually. Actually we looked at that possibility. This is related with a field of research called computer vision. In theory, if we can identify visual features on the images captured by the camera and match them with a 3D model (e.g. CAD or produced using SLAM), we could calculate the position of the camera. And, as you know, calculating the camera position is the main ingredient that enables accurate augmentation. So using such a technique, we could theoretically augment the world without being constrained by fixed positions (like our solution is). But in practice, it is hard to achieve. Here are a few reasons:
- The field of view of the camera is often not large enough to enable good tracking - the system is limited by the features it can detect, and if they are all in the same spot (e.g. a building on the side of the image), the accuracy of the calculated position is low. (in other words, you cannot get good triangulation on your position because your triangle base is not wide enough).
- Visual features are defined using pixels on an image. The accurate determination of their position is essential for good camera position estimation. Also you need to be able to recognize them from one image to the next in the video stream. Therefore, accuracy depends on:
- camera resolution
- lighting conditions, shadows
- dynamic scene (moving objects in the scene)
- quality of visible features visible in the scene (e.g. contrast)
All those factors induce some error in the calculations. You end up with error from multiple sources, that is additive, and that results in camera position being wrongly estimated. As a consequence, the resulting augmentation is poor quality.
Vision based-techniques such as SLAM are probably the future of augmented reality. But vision problems are still hard to solve, that is why the technology is still in university labs. It works well in controlled conditions (e.g. in labs). Outdoor tracking is a much more difficult problem.
So in theory, yes, you are absolutely right. In practice, we might have to wait a few more years before high quality vision-based outdoor positioning is available.