Augmenting Drone Videos could Facilitate Construction Monitoring

Augmented reality is a fascinating technology that could change the way we live and interact with the world that surrounds us. Unfortunately, we don’t see many good applications of AR in engineering yet, partly because achieving good visual integration of digital objects with reality is very challenging.

To display the augmented elements at the right location on a tablet display at every instant, the AR app must know the position of your tablet in real time - that is fundamental to AR. Now that problem is hard to solve on a handheld device, because it may require multiple sensors and heavy calculations, which are not always easy on small devices with limited battery capacity.

Initial tracking solutions made this easier through the use of markers (QR codes). The tablet’s camera captures live video of the scene that includes a marker, and based on the shape and orientation of the marker seen on the video, the AR app can calculate the tablet’s position with reasonable accuracy. Markers work quite well, but to be of any use, they must be visible in your camera view – so depending on the situation you may need to install many of them.

Later, more advanced computer vision techniques like “SLAM” made tracking without markers possible. To calculate the camera position, SLAM relies on visible features in the physical environment, such as edges, corners, and other striking features that appear on video.  Using SLAM saves you from installing markers, but of course all those techniques being based on video, they rely heavily on the environment – factors like poor lighting conditions or low contrast sometimes result in augmentations that appear “shaky” or displayed at the wrong location. Science has not yet solved that camera position tracking problem robustly enough for "anywhere" augmentation.

The knowledge of the camera position is not only useful for AR. Take for instance the technology that converts photos into meshes, such as Bentley Systems’ ContextCapture technology. It is very simple to use: you take a set of photos of a scene following some basic rules, and the program will generate a 3D mesh based on what appears on the photos. Results generally amaze me...

To achieve such meshes from photos, the ContextCapture process must first go through a step called “aerotriangulation” (AT), a process that matches photo features and accurately establishes the relative position and orientation of each photo. The process is offline, and can take several minutes to complete - but it results in very accurate measurements of... the camera position!  So we thought: if ContextCapture provides the camera position of each photo for free,  then perhaps we could use those calculated positions to augment the corresponding photos?

To test our hypothesis, here is roughly what we did:

  • We flew a drone around our local Bentley office, capturing video of the building, during the construction of an extension to the second floor.
  • We extracted all the frames from that video, and used them to create a mesh of the building scene using our ContextCapture technology.
  • We then aligned the resulting mesh with a BIM model of our building.
  • Finally, we used the calculated positions & orientations of each frame to augment them with the BIM model.

Results show a very steady augmentation, well aligned from frame to frame:

Of course, this can’t exactly be called augmented reality – because it is not live. The augmentation took several minutes to compute, because of the AT process.  On the other hand, the calculated camera positions are very accurate, which resulted in very steady augmentation.

In spite of that, such offline augmentation can be very useful.  Think for instance of monitoring your building site on a daily basis, trying to identify delays or mistakes in the construction process. Using standard hand held augmented reality, you could walk around the site with your tablet, and assuming the AR app is able to calculate your position accurately in such a dynamic environment, you could view live augmentation of the site which would facilitate the identification of those mistakes and delays. 

Alternatively, you could have a drone frequently flying around your site, taking photos, and uploading those to a server on the cloud, which would generate a mesh, align it with the BIM model, and augment the photos with the model.  So a few minutes after photo capture, you could look at those augmented photos from your office, identify delays or mistakes in the construction process, and raise a flag when you notice something that would deserve some immediate attention.  Not only this technique would save you several visits to the site, but it would enable more frequent monitoring, the augmentation would likely be much more steady and accurate than using handheld augmentation on site, and it would be available from a multitude of vantage points that you could not reach while walking around the site. 

Such a solution would do a great job for large infrastructure projects, at least as far as the outer shell of the asset is concerned.  Some drones are now equipped with range sensing technology, that can make them fly inside buildings, "seeing" and avoiding obstacles, and find their way by creating a map in the process.  I am sure you too can imagine what lies ahead...

Parents
  • Hi Stéphane,

    very interesting topic!
    Since years we are looking for a solution which allow us to simulate our stadium/bridge project with 'birds eye'. Unfortunately without success!

    there is a solution from Fologram (based on Microsoft Hololens) which works fine for us, but the head mounted display is limited on the site, meaning we can't go up to +30m or 'flying' through a river to observing the future project we are processing.

    When you said: 'aligned the resulting mesh with a BIM model of our building' to get the calculated positions & orientations of each frame.

    I am curiously with what software you deal with position calculation and alignment?

    Frank

  • Hello Frank,

    In our project, since this was done offline (on pre-recorded video) we "aligned" the video frames in microstation.  When both the photos and 3D model are georeferenced, this becomes easy (both Microstation and ContextCapture allow that georeferencing from survey points).   Are you talking about offline augmentation, or live?

    Stéphane  

Comment
  • Hello Frank,

    In our project, since this was done offline (on pre-recorded video) we "aligned" the video frames in microstation.  When both the photos and 3D model are georeferenced, this becomes easy (both Microstation and ContextCapture allow that georeferencing from survey points).   Are you talking about offline augmentation, or live?

    Stéphane  

Children
No Data