Bentley Communities
Bentley Communities
  • Site
  • User
  • Site
  • Search
  • User
ProjectWise
  • Product Communities
ProjectWise
ProjectWise PowerShell Extensions Forum Problem with recursion when using Get-PWFolders in large directories
    • Sign In

    • State Verified Answer
    • Replies 6 replies
    • Subscribers 67 subscribers
    • Views 225 views
    • Users 0 members are here
    • powershell
    • powershell_dab

    Problem with recursion when using Get-PWFolders in large directories

    Tomas Žukauskas
    Offline Tomas Žukauskas 1 month ago

    I have a script which traverses all subfolders within a given master directory and extracts certain properties from the documents stored there.

    It works perfectly if a master directory contains a small amount (20-40) of subfolders at various depths.

    The problem starts if a master directory contains let's say 7000 subfolders - then, for some reason the script would traverse through all the subfolders and the return back to the first one ant try to repeat the process again. In addition, when the subfolders are scanned it appears that the script runs through them in some random order, i.e. in not that order that you might expect when viewing the directory tree in PWE.

    In my code i attempted to label the visited folders and documents so that the recursive function does not process them again, but it doesnt help. 

    I repeat - the script works perfectly well if the directory contains a small amount of subfolders.

    Does anyone has any tips on how to properly traverse through large directory trees in PW datasource?

    I am using pwps_dab v23.2.6   and powershell v5.1

    Here is my code:

    <...datasource, login and other initial setup omitted...>
    
    #store already visited folders and docs in hashtable
    $processedFolders = @{}
    $uniqueDocs = @{}
    
    function Get-DocumentsRecursive {
    param (
    [string]$folderPath
    )
    
    if ($processedFolders[$folderPath]) {
    return
    }
    
    Write-Host "Now processing documents in: $folderPath"
    
    # Mark the current folder as processed
    $processedFolders[$folderPath] = $true
    
    $docs = Get-PWDocumentsBySearch -Folderpath $folderPath -GetAttributes
    
    foreach ($doc in $docs) {
    $guid = $doc.DocumentGUID
    if (-not $uniqueDocs[$guid]) {
    $uniqueDocs[$guid] = $doc
    }
    }
    
    # Get only immediate subfolders, excluding the current folder so that we are not stuck forever in the very first folder
    $subfolders = Get-PWFolders -FolderPath $folderPath -Slow -Verbose | Where-Object { $_.FullPath.TrimEnd('\') -ne $folderPath.TrimEnd('\') }
    
    foreach ($subfolder in $subfolders) {
    Get-DocumentsRecursive -folderPath $subfolder.FullPath
    }
    }
    
    $masterFolderPath = '\Companyname\Region\PS6_Garla-P'+[char]0x00E4+'rnu\02_HCL\03_Deliverables\05_Master_Designs\ABC\'
    Get-DocumentsRecursive -folderPath $masterFolderPath
    
    $attributeMap = @{
    "PROP:CreateDate" = "Creation Date";
    "PROP:DocumentCreatorName" = "Created By";
    "PROP:Name" = "File Name";
    "PROP:FolderPath" = "Folder Name";
    "PROP:ProjectID" = "Folder ID";
    "PROP:DocumentGUID" = "Document GUID";
    "PROP:WorkflowState" = "State";
    "PROP:Version" = "Version";
    "CA:dc_document_no" = "File Number";
    "CA:tb_title_line_1" = "Deliverable (Title)";
    "CA:pw_doc_description" = "Document Description";
    "CA:qa_doc_no_status" = "Doc_No_Status";
    "CA:pw_deliverable" = "Is Deliverable";
    "CA:a_attrno" = "Attribute Record ID"
    }
    
    $results = New-Object 'System.Collections.Generic.List[PSCustomObject]'
    
    foreach ($guid in $uniqueDocs.Keys) {
    $document = $uniqueDocs[$guid]
    $row = [ordered]@{} # to maintain the order of properties
    foreach ($key in $attributeMap.Keys) {
    $userFriendlyName = $attributeMap[$key]
    
    if ($key -like "CA:*") {
    $actualKey = $key -replace "CA:"
    $value = $document.CustomAttributes[$actualKey]
    } elseif ($key -like "PROP:*") {
    $actualKey = $key -replace "PROP:"
    $value = $document.$actualKey
    } else {
    continue
    }
    $row[$userFriendlyName] = $value
    }
    $results.Add([PSCustomObject]$row)
    }
    
    
    $results | Export-Excel -Path ".\Output.xlsx" -WorksheetName "Documents" -AutoSize -FreezeTopRow
    
    $sw.Stop()
    
    Write-Host 'Total script run time: ' $sw.Elapsed
    
    
    Undo-PWLogin

    • Sign in to reply
    • Cancel

    Top Replies

    • Glenn Ryan
      Offline Glenn Ryan Thu, Sep 28 2023 2:14 AM +1
      I may be missing something obvious here, but why couldn't you just do the following: $rootFolderPath = 'Your\Special\ProjectWise\Path\Goes\Here' $folders = Get-PWFolders -FolderPath $rootFolderPath…
    • Kevin van Haaren
      Offline Kevin van Haaren Thu, Sep 28 2023 2:50 PM +1 verified
      I needed something similar, so I rewrote yours a bit to just loop through all the folders/files directly, you don't need to worry about unique folder/documents because it will touch each file exactly once…
    • Kevin van Haaren
      Offline Kevin van Haaren Thu, Sep 28 2023 2:56 PM in reply to Kevin van Haaren +1
      if this still has too many folders to process at once you could go another level down with Get-PWFoldersImmediateChildren: $pwfList = Get-PWFoldersImmediateChildren -FolderPath $masterPath ForEach (…
    • MWBSI
      0 MWBSI Tue, Sep 26 2023 9:03 AM

      If you can, please downgrade the version of PWPS_DAB to version 23.2.0.0, at least to determine whether Get-PWFolders works better in that version.  Significant changes were made to Get-PWFolders last July.

      Sorry for the inconvenience,

      Mark Weisman | Bentley Systems 

      • Cancel
      • Vote Up 0 Vote Down
      • Sign in to reply
      • Verify Answer
      • Cancel
    • Glenn Ryan
      0 Offline Glenn Ryan Thu, Sep 28 2023 2:14 AM

      I may be missing something obvious here, but why couldn't you just do the following:

      $rootFolderPath = 'Your\Special\ProjectWise\Path\Goes\Here'
      
      $folders = Get-PWFolders -FolderPath $rootFolderPath -PopulatePaths # this took just over 7 seconds to traverse the approx 12000 folders
      
      foreach ($folder in $folders) {
      	# get this folders documents, if any
      	$folderDocuments = $folder.GetFolderDocuments()
      
      	# loop folder documents
      	foreach ($folderDocument in $folderDocuments) {
      		# do something with document
      	}
      }

      BTW, in your code on line 31, Get-PWFolders is returning ALL the folders under the path specified and then you're doing a where on that, which is extra processing IMHO.

      If you really wanted to do recursion, I suppose you could do something like this:

      1. Get-PwFolders -FolderPath SomePath -JustOne

      2. Get-PwFoldersImmediateChildren passing folder from step 1

      3. Loop them in a recursive function, getting documents and their immediate child folders all the way down.

      Recursion is a handy trick and nice to have in the toolbox, but it can get tricky.

      I hope this helps.

      Cheers

      • Cancel
      • Vote Up +1 Vote Down
      • Sign in to reply
      • Verify Answer
      • Cancel
    • Kevin van Haaren
      +1 Offline Kevin van Haaren Thu, Sep 28 2023 2:50 PM

      I needed something similar, so I rewrote yours a bit to just loop through all the folders/files directly, you don't need to worry about unique folder/documents because it will touch each file exactly once.

      # New-PWLogin
      $start = Get-Date
      $masterFolderPath = '\Companyname\Region\PS6_Garla-Pärnu\02_HCL\03_Deliverables\05_Master_Designs\ABC\'
      
      # get just the sub-folders immediately under the master path
      Write-Host "Retrieving sub-folders from $($masterFolderPath)"
      $pwfList = Get-PWFoldersImmediateChildren -FolderPath $masterFolderPath | Sort Name
      
      # if $attributeMat to always be in order, then make this an ordered hashtable as well
      $attributeMap = [Ordered]@{
      	"PROP:CreateDate" = "Creation Date"
      	"PROP:DocumentCreatorName" = "Created By"
      	"PROP:Name" = "File Name"
      	"PROP:FolderPath" = "Folder Name"
      	"PROP:ProjectID" = "Folder ID"
      	"PROP:DocumentGUID" = "Document GUID"
      	"PROP:WorkflowState" = "State"
      	"PROP:Version" = "Version"
      	"CA:dc_document_no" = "File Number"
      	"CA:tb_title_line_1" = "Deliverable (Title)"
      	"CA:pw_doc_description" = "Document Description"
      	"CA:qa_doc_no_status" = "Doc_No_Status"
      	"CA:pw_deliverable" = "Is Deliverable"
      	"CA:a_attrno" = "Attribute Record ID"
      }
      
      $results = [System.Collections.ArrayList]@()
      # Loop through each folder
      ForEach ($pwf in $pwfList) {
      	Write-Host "Processing folder $(Join-Path $masterFolderPath $pwf.Name). " -NoNewLine
      	# Get all documents under the sub-folder (including documents in the sub-folders under current folder)
      	$pwdList = Get-PWDocumentsBySearch -FolderID $pwf.ProjectID -Slow | Sort FullPath
      	Write-Host "Found $($pwdList.count) documents"
      	ForEach ($pwd in $pwdList) {
      		Write-Host "Processing file $($pwd.FullPath)"
      		$row = [ordered]@{}                                 # maintain key order
      		ForEach ($k in $attributeMap.keys) {
      			$colName = $attributeMap.$k
      			($type,$attName) = $k.split(':',2)
      			if ($type -eq 'PROP') {
      				$row.$colName = $pwd.$attName
      			} elseif ($type -eq 'CA') {
      				$row.$colName = $pwd.CustomAttributes[$attName]
      			} else {
      				Write-Warning "Improper key in attribute map: $($k)"
      				continue
      			}
      		}
      		[void]$results.Add([PSCustomObject]$row)
      	}
      }
      
      $results | Export-Excel -Path '.\output.xlsx' -WorkSheetName 'Documents' -AutoSize -FreezeTopRow
      
      $runTime = ((Get-Date) - $start).TotalSeconds
      Write-Host "Total Script Run Time: $($runTime) seconds"
      Undo-PWLogin
      

      It uses Get-PWFoldersImmediateChildren to get just the sub-folders under the specified folder. It then processes all the documents under that folder in one go.

      Get-PWFolders without the -JustOne option, and Get-PWDocumentsBySearch without -JustThisFolder automatically recurses all sub-folders so you don't need to write code to recurse it again. I used Get-PWFoldersImmediateChildren to get around the issues with really big folder/document structures.

      A few notes:

      • If you save your script file in the UTF-8 with BOM (the with BOM is important) encoding you can embed unicode characters directly in your script without needing to use things like [char]0x00E4
        • In Notepad you can do this by doing a Save As... then changing the encoding next to the Save button
        • In Notepad++ this is in the Encoding menu at the top
        • Not sure where it is in the IDE
      • If you want the columns to always be in order, make the attributeMap [Ordered] as well, otherwise when you add a key it may change the order of the keys when you do $attributeMap.keys.
      • I changed the results to an ArrayList instead of a GenericList so that when new elements are added it doesn't create a whole new array, add the new entry, then delete the old.
      • ($type,$attName) = $k.split(':',2) is a cute way to split a string into 2 parts and assign each part to a different variable. I split on the colon
      • I added sorting by the folder paths, not sure that slows it down much or not. Probably depends on the number of folders & files you have

      This solution still requires a huge amount of memory to hold all the documents in a single array. If this is a problem you could leverage the fact that Export-Excel appends to the end of an existing xlsx by changing the loop:

      # Loop through each folder
      ForEach ($pwf in $pwfList) {
      	# reset results array for each folder
      	$results = [System.Collections.ArrayList]@()
      
      	Write-Host "Processing folder $(Join-Path $masterFolderPath $pwf.Name). " -NoNewLine
      	# Get all documents under the sub-folder (including documents in the sub-folders under current folder)
      	$pwdList = Get-PWDocumentsBySearch -FolderID $pwf.ProjectID -Slow | Sort FullPath
      	Write-Host "Found $($pwdList.count) documents"
      	ForEach ($pwd in $pwdList) {
      		Write-Host "Processing file $($pwd.FullPath)"
      		$row = [ordered]@{}                                 # maintain key order
      		ForEach ($k in $attributeMap.keys) {
      			$colName = $attributeMap.$k
      			($type,$attName) = $k.split(':',2)
      			if ($type -eq 'PROP') {
      				$row.$colName = $pwd.$attName
      			} elseif ($type -eq 'CA') {
      				$row.$colName = $pwd.CustomAttributes[$attName]
      			} else {
      				Write-Warning "Improper key in attribute map: $($k)"
      				continue
      			}
      		}
      		[void]$results.Add([PSCustomObject]$row)
      	}
      	# Append results for current folder to end of workbook
      	$results | Export-Excel -Path '.\output.xlsx' -WorkSheetName 'Documents' -AutoSize -FreezeTopRow
      	# clear out array from memory
      	$results = $null
      }
      

       

      Answer Verified By: Tomas Žukauskas 

      • Cancel
      • Vote Up +1 Vote Down
      • Sign in to reply
      • Reject Answer
      • Cancel
    • Kevin van Haaren
      0 Offline Kevin van Haaren Thu, Sep 28 2023 2:51 PM in reply to Kevin van Haaren

      Oh yeah, I assumed your $sw variable was a stopwatch. I prefer to do timings by just saving get-date at start and subtracting at the end. No real reason, i just find it easier to setup.

       

      • Cancel
      • Vote Up 0 Vote Down
      • Sign in to reply
      • Verify Answer
      • Cancel
    • Kevin van Haaren
      0 Offline Kevin van Haaren Thu, Sep 28 2023 2:56 PM in reply to Kevin van Haaren

      if this still has too many folders to process at once you could go another level down with Get-PWFoldersImmediateChildren:

      $pwfList = Get-PWFoldersImmediateChildren -FolderPath $masterPath
      ForEach ($pwf in $pwfList) {
          $pwfSubList = Get-PWFoldersImmediateChildren -FolderID $pwf.ProjectID
          ForEach ($subPwf in $pwfSubList) {
              $pwdList = Get-PWDocumentsBySearch -FolderID $subPwf.ProjectID
          }
      }

       

      • Cancel
      • Vote Up +1 Vote Down
      • Sign in to reply
      • Verify Answer
      • Cancel
    >

    Communities
    • Home
    • Getting Started
    • Community Central
    • Products
    • Support
    • Secure File Upload
    • Feedback
    Support and Services
    • Home
    • Product Support
    • Downloads
    • Subscription Services Portal
    Training and Learning
    • Home
    • About Bentley Institute
    • My Learning History
    • Reference Books
    Social Media
    •    LinkedIn
    •    Facebook
    •    Twitter
    •    YouTube
    •    RSS Feed
    •    Email

    © 2023 Bentley Systems, Incorporated  |  Contact Us  |  Privacy |  Terms of Use  |  Cookies