Critical bug in Geo Web Publisher tDPR publishing

We have found a critical bug in tDPR publishing which causes in best case drop in performance and in worst case causes GWP service to crash. It is logged as „Defect 656323:tDPR WMS service crash” but to this date still is not fixed.

This issue happens only when there are two requests received first which requests data in large scale which shouldn’t return any data followed by request which asks for data in small scale. See attached screenshot. By red I marked large scale(30x30km) WMS request(x=427363....)  where are no visible layers at that scale and shouldn’t produce any result. It processes normally.

The request then is followed by next WMS request(x=452086...) which asks for smaller scale(200x200m) and there are layers in that scale(but I know that there is no DPR tile data). Now as you can see in log file there is bug that Mapviewer requests DPR extents for previous 427363 extent and not for 452086 extent as expected. This causes a download of more than 2GB of data(if repeated in browser) and of course causes a crash. After mapviewer restarts the request is issued again and because some buffer is cleared now it is processed normally and downloads only 100KB of data as there is no actual elements in that place. This pattern repeats every time before crash and changing layer scale limits doesn't help at all. To reproduce crash the tDPR dataset should be large so it expands in RAM more than 2GB.

 

Another example. Found out that you do not need to actually crash the GWP service to reproduce this issue.

These two requests for example executed one by one also gives wrong results and the infamous „Resymbolization failed [MSPDSPMAP_ERROR_RESYMBOL]” error which seems to be tightly related to original issue as happens before crash.

 

For easier reproduction I have modified WMSserverstresspage tools so it logs if there are issues with WMS request. Original WMS stress test page included in GWP is not usable for testing as it does nothing on error it just loads next request so service seems to work even if there are reported WMS errors.

 

Included scripts should be copied to C:\Bentley\GeoWebPublisher\Examples\ServersStressPages

 

WmsServerStressPage2.aspx – Added exception event logging when server returns  error XML or when service doesn’t respond.

  • Added XML error parsing (parses ServiceException and ServiceException status from error XML)
  • Sanity check. Validates if response has correct content-type

 

WmsServerStressPage3.aspx – same as WmsServerStressPage2.aspx but added fake/dummy request at fixed scale(GWP bug workaround) after each  request.

 

 

Examples:

Example screenshot when error is logged. Removed SRS in GWP administrator without refreshing this page so it logs error in next request. Just for testing purposes to test if errors are logged.

 

 

 

 

WmsServerStressPage2 – typical log when crash is triggered. Wrongly parsed request in specific order causes service to fail and restart.Afterwards it recovers and accepts new requests. This repeats regularly after couple of minutes and after multiple crashes render server unusable.

 

WmsServerStressPage3 – if using modified version with added workaround then no issues are found even after 15 hours of running. So it means that request between at larger scale resets GWP Mapviewer internal buffer and then it doesn’t cause error. Of course this tDPR workaround is not usable in production.

 

 

 So if you are using tDPR publishing please run WmsServerStressPage2  on your dataset and report if there are errors found in your server. This would help to escalate this issue to Bentley as otherwise it seems that it is issue for only us which is not true.

 

 6153.WmsServerStressPage-improved.zip