how to move large amounts of data in batches

We are starting to develop a project archive process, where we move finalized projects to a new archive datasource.  A few of the projects are quite large, and keep filling up my ICS server (more disk space not available at the moment).  I would be interested in a script that could move a large project in batches, say one folder at a time.

Kirk Peterson

  • If you are looking at relocating large volumes of Projects from datasource to datasource I would suggest you look at these options. Each environment is certainly different but Powershell can handle it, it's just a question of tweaking everything correctly. In my case our folder variable, $SourceFolder, the top folder containing the projects was pulled from a csv. 

    In our case we created a CSV file with hundreds of entries that the script would loop through one at a time. So one folder would hold 15-20 projects, it would process this as a block before going on to the next folder. The temp folder holding the previous exported data would be deleted and the process starts over in each loop.

    Use Excel to create a simple INPUT.CSV, no title on the column, and then read the file and do a loop. Or if you have a very complex structure like we do add in the CONCATENATE function. Use Excel to get everything sorted out. Name the column (NAME) you need to pull the data using Poweshell, example  $Foldername = $File.NAME. 

    $file = Import-Csv "d:\input.csv"
    ForEach ($File in $File )

    {  Start your loop of what you want done with the folder structure

    • Open-PWConnection (source)
    • Export-PWAccessControlToExcel  -InputFolder $SourceFolder etc etc
    • Export-PWDocumentsToArchive -ProjectWiseFolder $SourceFolder etc etc
    • $folders = get-pwfolders -FolderPath $SourceFolder -Slow 
    • Close-PWConnection

    This gets everything out to a folder along with the variable storing every folder name.

    Then in the new datasource area of your script look into handling the import. This is still part of the loop.

    • Open-PWConnection (destination)
    • foreach($folder in $folders) {New-PWFolder -FolderPath ("\" + $folder.FullPath) -StorageArea 'yourstorage'} (this creates the new folder structure)
    • Import-PWAccessControlFromExcel (this gets your access control sorted out)

    If you have specific environments, workflows etc in your projects, in our case a set structure allowed me to identify each and use the following in combination.

    • Set-PWEnvironmentByFolderPath (lots of options here to set environments where needed which is important before importing the data)
    • Set-PWWorkflowByFolderPath (again lots of ways to set workflows to new folders before you bring the data in)

    Now that you have the environments, workflows and security set on your folders now import the data. Everything slides right in. Document attributes, workflow states, audit trail, versions etc etc. If you bring in data and the environments are missing the document attribution is lost. You then can delete the data and re-import after you fix the environment.

    You can always use the Get-PWDocumentsBySearch -Environment YourEnvirName -FileName %.* -FolderPath $SourceFolder | Remove-PWDocuments. 

    • Import-PWDocumentsFromArchive
    • Close-PWConnection

    END THE LOOP

    }

    In a nutshell that is what I did. Bottom line it worked. We moved several thousand projects, just under 1TB of data from an existing datasource to a new one completely done with Powershell. A couple things I found were that you need to make sure any userlists, users, workflows etc etc that exist in the source datasource exist in the new one. Even if they are not needed. The biggest bottleneck I ran into was the Import-PWAccessControlFromExcel would crash if a single user or userlist was missing. Example: If a user is assigned specifically to a folder or a document and this user does not exist it crashes. So spend the time prepping the new one, it will make your life easier. Open the spreadsheet after you create it, you will soon see what I am talking about. 

    Also use system accounts with no expiration of login. If you are into massive projects these sessions will time out on you....to be greeted with a login prompt after it running all night.

    TEST TEST TEST. 

    Hope this helps and makes some sense. It worked for me.

    d