Folder scan is 4x slower than EFU creation (with possible root cause)

Found a bug in "Everything"? report it here
Post Reply
zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Mon Mar 04, 2019 11:25 am

Hi,
I have a few shares indexed on our SAN which total about 12M files and 500K folders. I need to re-scan them daily to update the index.
Having the shares on Indexes->Folders, the re-indexing takes about 16h (and frequently crashes mid-way). Change monitoring is OFF.

Indexing the same shares to EFU (everything -create-file-list) takes about 4h.
That's 4x faster.

I analyzed network accesses during both scans, and the root cause seems clear:
- EFU scanning does a depth-first scan of the tree
- Folder scan does a breadth-first scan of the tree

The breadth-first is extremely inefficient on modern storage because it does not take advantage of folder Caching. Imagine a very reduced folder set and a disk with a very small Cache, enough to hold only 5 folders:
Depth first:

Code: Select all

read \\folder1
read \\folder1\sub1		(cache hit on \\folder1)
read \\folder1\sub1\sub2	(cache hit on \\folder1, cache hit on sub1)
read \\folder1\sub1\sub2\sub3	(cache hit on \\folder1, cache hit on sub1, cache hit on sub2)
read \\folder1\sub4		(cache hit on \\folder1)
read \\folder2
read \\folder2\sub5		(cache hit on \\folder2)
... and so on. Many hits.

Breadth first:

Code: Select all

read \\folder1
read \\folder2
read \\folder3
read \\folder4
read \\folder5
read \\folder6		(replaces folder1 in cache)
read \\folder7		(replaces folder2 in cache)
(...)
read \\folder1\sub1	(no cache hit, folder1 is no longer in cache)
read \\folder2\sub2	(no cache hit, folder2 is no longer in cache)
read \\folder3\sub3	(no cache hit, folder3 is no longer in cache)
... and so on. Almost no cache hits (depending on cache size).

So I think there's an easy fix for Folder scanning speed - just switch to Depth first :)

Thanks!

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Tue Mar 05, 2019 10:12 am

Thanks for bring this to my attention.

Everything 1.4.1.936 now does a depth-first scan.
and frequently crashes mid-way
Could you provide any more detail? an exception code?
Would it be possible to send a crash dump?

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Tue Mar 05, 2019 12:06 pm

Thanks, awesome RTT from you :)
I'll test this new build all let you know the scan time.

For the crashes: Everything would just freeze with no error, forcing me to kill the process. I've had Debug logging enabled for some time, but not anymore - I'll reenable and try to reproduce. I remember on one ocasion the debug log ended with something similar to this:

Code: Select all

2019-02-27 04:48:55.803: update folder start \\fradfs\ft\LOGS
(...)
2019-02-27 06:58:07.355: scan completed in 3456.471238 seconds
2019-02-27 06:58:07.433: scan completed in 3456.547078 seconds
2019-02-27 06:58:07.433: MSG: 00000000000201a0 1401 0000000000000000 0000000000000000
2019-02-27 06:58:07.449: EVENT: 000000014004c710 000000000034f810
2019-02-27 06:58:07.449: rescan update db
2019-02-27 06:58:07.465: updated fd for <filename> in 0.000841 seconds
 => followed by hundreds of lines similar to this last one, nothing afterwards
Btw, the printed scan duration times are wrong, as you can see from the timestamps.
Another annoying issue this exposed: after a successful Folder rescan the DB is not saved to disk. So if Everything crashes... that scan is lost, it just reloads the old database.

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Tue Mar 05, 2019 1:14 pm

I'll test this new build all let you know the scan time.
Thank you.
For the crashes: Everything would just freeze with no error, forcing me to kill the process.
Thanks for the debug output.

Everything could be processing the folder rescan, which would take a very very long time.
Does Everything ever recover?
Btw, the printed scan duration times are wrong, as you can see from the timestamps.
Everything uses 32bits for storing time in microseconds, which has a resolution of about 71 minutes.
I'll update this to 64bits for the x64 version.
Another annoying issue this exposed: after a successful Folder rescan the DB is not saved to disk. So if Everything crashes... that scan is lost, it just reloads the old database.
The next major version of Everything will flush the database to disk every 24 hours.
To force Everything to flush the database to disk, run the following command:
Everything.exe -update

therube
Posts: 2293
Joined: Thu Sep 03, 2009 6:48 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by therube » Tue Mar 05, 2019 7:06 pm

Quick look tells me that this makes a huge difference.

Thank you both :-).
Now, I'm not so sure... I'll have to check further, later...
(The other day, a rescan took forever. Today it's speedy, no matter what version of Everything, so I've got to figure that out?)
I was scanning a different system :oops:.



Anyhow, scanning my slow link (Folder Index) & still not "scientific"...

v936 ... 9 min
v924 ... 11.5 min

So better, yes.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Wed Mar 06, 2019 10:59 am

Well, it seems index time is similar for me with this version, though I see on the log it now does Depth First. There goes my "cache" theory...

Is it possible that the cost of adding new entries to the index keeps growing as the index grows? With 12M files, 500K folders and 25 tree depth, this seems a possibility. I know there's also interference from other processes running overnight on the filers (backup, sync), but even so, creating an EFU file is much faster during the same time period:

Code: Select all

Everything - Folder indexing v935:      15:48:04
Everything - Folder indexing v936:      15:49:12
Everything - EFU indexing v935/v936:    03:40:06
I've written a tool to create EFUs, and it supports a much faster "update" mode. I'll publish it soon to Github. Here's the results for my tool on the same file shares:

Code: Select all

EFUtool create (full scan):             03:41:21  (same as Everything EFU creation)
EFUtool update (scan for changes):      00:29:45
That's a massive 7x improvement over Everything EFU and 32x improvement over using Everything Folder indexing 8-) This is the solution I'm using for the last few days. I've compared EFUs produced by my tool in update mode with Everything's full scan, and they match.
Last edited by zybexXL on Wed Mar 06, 2019 5:22 pm, edited 1 time in total.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Wed Mar 06, 2019 11:04 am

The fast scanning method I'm using is one I've talked about here some 2 years ago:
viewtopic.php?f=5&t=4474&p=17571#p17571

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Wed Mar 06, 2019 5:24 pm

As promised, here's the EFUtool I wrote:
https://github.com/zybexXL/EFUtool

Is it OK to announce it in "General" or "3rd Party Software" forum?

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Thu Mar 07, 2019 5:56 am

Is it possible that the cost of adding new entries to the index keeps growing as the index grows?
Yes, this is most likely the issue.
Please check your logs for:
rescan update db
leave folder update

Please check the log time between rescan update db and leave folder update.
During this time Everything is processing the folder rescan.
I think you will see Everything is spending most of its time here.

Are there many changes to your files over 24 hours, seems to me you must have millions of changes for Everything to take 10+ hours to update the db.

I think you will also find it will be faster for Everything to perform a re-index than update the db.

Please try disabling folder rescanning and schedule a reindex:
  • In Everything, from the Tools menu, click Options.
  • Click the Folders tab.
  • For each folder listed:
    • Check Never rescan.
  • Click OK.
  • Create a Windows scheduled task to run the following command:
    Everything.exe -reindex
The fast scanning method
I've added to my TODO list: use FindFirstFileEx to find folders only to build a list of modified folders to rescan during a folder update.
Is it OK to announce it in "General" or "3rd Party Software" forum?
Thanks for sharing your EFU tool.

Yes, please post this on the 3rd party forum.

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Thu Mar 07, 2019 7:21 am

I've added more debug information to Everything 1.4.1.938 if you would like to do further testing.

This version will list all database changes after a rescan has been completed.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 10:07 am

Please check the log time between rescan update db and leave folder update.
During this time Everything is processing the folder rescan.
I think you will see Everything is spending most of its time here.
Nope - here are those timestamps, for each of the 5 shares that are being scanned:

Code: Select all

2019-03-05 23:00:02.264: update folder start <SHARE1>
2019-03-06 02:46:56.855: rescan update db
2019-03-06 02:47:26.994: leave folder update

2019-03-06 02:47:26.994: update folder start <SHARE2>
2019-03-06 05:03:15.213: rescan update db
2019-03-06 05:06:05.177: leave folder update

2019-03-06 05:06:05.192: update folder start <SHARE3>
2019-03-06 05:10:56.901: rescan update db
2019-03-06 05:10:58.367: leave folder update

2019-03-06 05:10:58.383: update folder start <SHARE4>
2019-03-06 10:48:09.858: rescan update db
2019-03-06 10:48:38.453: leave folder update

2019-03-06 10:48:38.469: update folder start <SHARE5>
2019-03-06 14:49:14.867: rescan update db
2019-03-06 14:49:34.070: leave folder update
Total scan time 15:49h, of which only about 4 minutes doing "rescan update DB".
I'll try to anonymize the full debug log and send it to you. My company is quite paranoid about these things, though the filenames/paths in our case do reveal quite a lot of info :roll:
Are there many changes to your files over 24 hours, seems to me you must have millions of changes for Everything to take 10+ hours to update the db.
No, it can reach some 100.000 daily changes max, usually much lower. Some days where lots of data is compressed/archived it may go up to 500k (mostly deleted files), but that's rare. File churn is really high here... most changes are either new or deleted files, not modified files.
I think you will also find it will be faster for Everything to perform a re-index than update the db.
(...)
[*]Create a Windows scheduled task to run the following command:
Everything.exe -reindex
I'll try this. But does "reindex" also perform a folder scan? Isn't this the same thing?
"I've added to my TODO list: use FindFirstFileEx to find folders only to build a list of modified folders to rescan during a folder update."
Great :) You can check my code for the algorithm, though I didn't comment it much. It's a fairly simple concept.

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Thu Mar 07, 2019 10:42 am

Thanks for the logs.
OK, scrap the idea of updating the db taking too long.
I'll try this. But does "reindex" also perform a folder scan? Isn't this the same thing?
No need to try this.
No need for anymore logs at this stage, thanks.
Indexing the same shares to EFU (everything -create-file-list) takes about 4h.
That's 4x faster.
When you tested this, did you scan all 5 shares at the same time, or one after the other?

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 11:08 am

When you tested this, did you scan all 5 shares at the same time, or one after the other?
Always sequentially, in all tests. With my tool as well.

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Thu Mar 07, 2019 11:17 am

Do you have any exclude filters set under Tools -> Options -> Exclude?

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 11:42 am

No. Here's my everything.ini attached.
Currently running a test with only 1 file share to see if there's difference between first time scan and subsequent re-scans.
You do not have the required permissions to view the files attached to this post.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 11:58 am

As you can see, the only non-default options are on the "indexes" tab: index by file size, folder size, date created, date modified, and several "fast sort" options enabled.

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Thu Mar 07, 2019 12:26 pm

Thanks for Everything.ini.

Everything uses THREAD_MODE_BACKGROUND_BEGIN when updating folders.

Could you please try disabling THREAD_MODE_BACKGROUND_BEGIN:
  • In Everything, type in the following search and press ENTER:
    /folder_update_thread_mode_background=0
Please let me know if this improves folder updating for you.
I'm looking into a fast update option as you suggest.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 3:22 pm

That's it :)
I tested with just one of my shares containing about 4M files, and checked the thread details while the scans were running.
Results for Folder Indexing:

First Scan when folder is added: 00:35:27
Thread priority = 8, I/O Priority = Normal, Memory Priority = 5, CPU usage: 5%

Rescan: 02:02:30
Thread priority = 4, I/O Priority = Very Low, Memory Priority = 1, CPU usage: 1%

Rescan with folder_update_thread_mode_background=0: 00:32:23
Thread priority = 8, I/O Priority = Normal, Memory Priority = 5, CPU usage: 5%

Looks like the low priority assigned to the thread also causes "Very Low" I/O Priority and even lower Memory Priority. CPU usage is much lower too as a consequence.

Looks like you found the root cause :)
I think this can be the default, there's really no impact on running the scan on Normal priority as the process is mostly waiting for I/O.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Thu Mar 07, 2019 3:56 pm

EFUtool update of the same share: 00:05:32 8-)

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Fri Mar 08, 2019 10:46 am

Thanks for the info.

folder_update_thread_mode_background is a low I/O proirity setting.
It was mainly added for USB disks, to prevent system hangs when accessing the USB disk while a background update was in progress.

It's odd that it has such a massive impact for you, is your local system or server currently under any I/O load?

I'm looking into disabling this option by default, or using this setting only for fixed or removable disks when a scheduled update was missed.

zybexXL
Posts: 22
Joined: Mon Mar 06, 2017 3:59 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by zybexXL » Fri Mar 08, 2019 11:03 am

There's always some load on these filers, but I don't have admin access to them to quantify it. They are however massive beasts, with huge IOPS capability. During a 16h scan time there are bound to be other processes running, like volume snapshots, backups and rsyncs, but that doesn't impact performance as much as it sounds.

Everything itself is running on a VM (doing pretty much nothing else), which I've seen can be fairly slower for I/O networking than for instance my desktop PC. I've measured some 20-30% speed difference, but not more than that. Perhaps VMWare impacts low-priority threads more severely, but I don't know.

temp0
Posts: 3
Joined: Fri Nov 16, 2018 1:33 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by temp0 » Mon Apr 15, 2019 3:25 pm

@void
Any ETA for implementing faster rescan mentined by @zybexXL?
Thanks for your great tool and also thanks to zybexXL for SMB rescan optimizations as it has big impact not only on huge number of folders/files but also on shares (networks) with high RTT!
Thanks!

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Tue Apr 16, 2019 1:15 am

It's done, I'm still doing some internal testing.

There's a little bit more I need to do before I have an alpha release.

Thanks for your patience.

temp0
Posts: 3
Joined: Fri Nov 16, 2018 1:33 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by temp0 » Tue Apr 16, 2019 7:39 am

Great thanks! Will check this thread and the beta one when it will be released.
Btw. from technical point of view what was the difference between your old SMB (re)scanning code and the "optimized" one as it wasn't problem only with cache hits or depth-first/breath-first reading? Like not to traverse folders for files with not modified date, etc.?
Thank you very much 8-)

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Tue Apr 16, 2019 11:55 pm

Btw. from technical point of view what was the difference between your old SMB (re)scanning code and the "optimized" one as it wasn't problem only with cache hits or depth-first/breath-first reading? Like not to traverse folders for files with not modified date, etc.?
There is little performance difference between the two versions, theoretically it should be faster, in real world scenarios there's no noticeable performance gain.

ref to the change: /forum/viewtopic.php?f=2&t=5718&start=150#p25338

Everything-1.4.1.935 and earlier would scan in the following order:
c:\folder1
c:\folder2
c:\folder3
c:\folder1\subfolder1
c:\folder1\subfolder2
c:\folder2\subfolder1
c:\folder2\subfolder2
c:\folder3\subfolder1
c:\folder3\subfolder2

Everything-1.4.1.936 and later scans in the following order:
c:\folder1
c:\folder1\subfolder1
c:\folder1\subfolder2
c:\folder2
c:\folder2\subfolder1
c:\folder2\subfolder2
c:\folder3
c:\folder3\subfolder1
c:\folder3\subfolder2

Currently, Everything 1.4 will rescan all files and folders.
The next major version of Everything will only rescan files in folders that have been modified since the last scan.

NotNull
Posts: 1364
Joined: Wed May 24, 2017 9:22 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by NotNull » Wed Apr 17, 2019 12:36 am

Shouldn't that be the other way around?

Everything-1.4.1.935 and earlier Everything-1.4.1.936 and later would scan in the following order:
c:\folder1
c:\folder1\subfolder1
c:\folder1\subfolder2
c:\folder2
c:\folder2\subfolder1
c:\folder2\subfolder2
c:\folder3
c:\folder3\subfolder1
c:\folder3\subfolder2


Everything-1.4.1.936 and later Everything-1.4.1.935 and earlier scans in the following order:
c:\folder1
c:\folder2
c:\folder3
c:\folder1\subfolder1
c:\folder1\subfolder2
c:\folder2\subfolder1
c:\folder2\subfolder2
c:\folder3\subfolder1
c:\folder3\subfolder2

void
Site Admin
Posts: 4831
Joined: Fri Oct 16, 2009 11:31 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by void » Wed Apr 17, 2019 1:33 am

Shouldn't that be the other way around?
Yes, it should be, I have adjusted my post, thanks.

temp0
Posts: 3
Joined: Fri Nov 16, 2018 1:33 pm

Re: Folder scan is 4x slower than EFU creation (with possible root cause)

Post by temp0 » Wed Apr 17, 2019 8:44 am

Thanks guys
That is exactly what I asked for, mentioned in VOID last sentence 8-)
I know about the change between 935 and 936 but from your discussion with zybexXL I know the rescan speed difference was minimal as problem was in thread priority and not generally in depth-first or breath-first reading.
My question was about difference of your old code and zybexXL latest one as my suspicion was that the main difference will be "not to traverse folders for files with not modified date" but wasn't sure if only this difference can make so huge speed difference or there is also another optimizations done not mentioned here.
Your last sentence confirmed my assumption (skip non-modified folders is the real gain here)
Thanks!

Post Reply