Excluding files from Full Text Indexing \ FTR, then re-introducing them later.

If you exclude files from being indexed and then a few months later decide to include them, will they get processed?

To answer this question we first need to understand how Full Text Indexing works

Full Text Indexing process each file twice in the following manner.

  • First it will extract some basic information about the file. It has to do this for all files before running through the second pass.
  • On the second pass, it will only extract the full text from files that are not excluded via the PW admin document processor settings

For example:

  • Let’s say you have 10 files in the datasource 2 of which have an *.abc extension which has been excluded from being processed.
  • The FTR process will extract basic information from all 10 files on the first pass.
  • Once all 10 files have had the first pass completed it will then go back and start to process (extract the text) from the 8 files that do not have an *.abc extension.

So we can see that the extraction process ignores the settings for exclusions for the first pass and processes all files. But it adheres to the settings on the second pass and does not extract the data from files types that are excluded. The reason it does this, is as it processes the file the first time, if it is an “excluded” file, it flips a bit in the database for that file so that it does not try to process it on the second pass.

If the file type that was excluded has already had the first pass, then removing the exclusion will not automatically reset the bit for these files to be reprocessed. Once the file type has been removed from the exclusion list, a change will need to be made to the file in order to set the bit in the database to process the full text from the file. Here are a few ways to accomplish this:

  1. Simply wait for your users to use the files by checking them out, editing and checking back in.
  2. Do a search for all files in the datasource with that extension, then checkout and check in the files via PWE. If there are a lot of files, you may want to do this in batches.
  3. Remark the folder where the files live for reprocessing via the admin. Unfortunately you cannot remark by file, you can only choose folders.

*PLEASE NOTE* by remarking a folder for reprocessing, the process starts over where it has to run through both passes of the FTR process. We do not recommend remarking the entire datasource for reprocessing. If you have a large dataset, it will set you way behind in full text retrieval.