This journal mailbox lives on its own VM (ie 1 physical server = 1 VM = 1 mailbox database = 1 mailbox, an unusual combination I am sure!) on a powerful but now out of warranty server and uses Hyper-V replica to keep a hot backup of the machine, which is not critical. It would be beneficial to split it into say one mailbox per year but I have not got round to investigating that yet.
When the OWA search failed to retrieve enough records I that I knew existed I decided to use PowerShell instead using the ContentFilter search property ALL, which Microsoft describes as
This property returns all messages that have a particular string in any of the indexed properties. For example, use this property if you want to export all messages that have "Ayla" as the recipient, the sender, or have the name mentioned in the message body. |
Searching all indexable properties:
new-mailboxexportrequest -mailbox journalmailbox -filepath "\\unc\share\filename.pst" -ContentFilter {(all -like '*keyword1*') -and (all -like '*keyword2*') -and (Sent -gt '01/01/2010') }
This job finished in a few seconds. I looked in the pst file and it contained 250 records. I changed the search criteria with different dates and keywords. There were always exactly 250 results. Looks like 250 is the hardcoded results limit when using the ALL search.
I worked around the problem by splitting my query and adding a date filter to pull back only 1 month at a time, which gave me 12 pst files for a year. This was much more work but it was not worth scripting as a one off job. Then I tried using the same keywords to search only the subject the subject
Searching the subject only:
new-mailboxexportrequest -mailbox journalmailbox -filepath "\\unc\share\filename.pst" -ContentFilter {(subject -like '*keyword1*') -and (subject -like '*keyword2*') -and (Sent -gt '01/01/2010') }
This search brought back all the records meeting the criteria but took much longer to run. It did not bring back all the records I was looking for as it does not look 'everywhere'' in the message.
I can understand why the default behaviour of ALL is to limit the results as using the index means a lookup in the main email database for each item found in the search index - in SQL Server this kind of behaviour is called a bookmark lookup (it is not exactly the same as SQL Server but comparable), where a search in an index points to where the actual data resides, and is a very expensive operation, frequently involving expensive random disk lookups. Imagine if someone searched for something like the word THE then that would be in almost all emails, it would create a real performance hit.
The documentation needs to have added that (presumably for performance reasons) only 250 records will be returned, or a switch added that allows the user to specify the maximum number of records number of records to be returned that acknowledges that performance will be hit. Or am I missing something?
No comments:
Post a Comment