problem with content indexing

Discussion related to "Everything" 1.5 Alpha.
Post Reply
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

problem with content indexing

Post by LeoLUG »

I added a 20 GB folder for content indexing, the indexing prosses is stock for the last 3 hours at: indexing properties 77%
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: problem with content indexing

Post by NotNull »

All found text will be stored in the database. If Everything is running, that means that it will all be loaded in memory.

You might be running out of RAM on your machine?
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: problem with content indexing

Post by LeoLUG »

NotNull wrote: Sun Mar 14, 2021 9:47 pm You might be running out of RAM on your machine?
didn't show that on Task Manager.
I restarted now, delated the old db, as of now it works at 75% and he move on. let's see. will update.

BTW, on Option/Content it sow exactly on which file he work's now, and you can check if he is stock or working.
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: problem with content indexing

Post by LeoLUG »

he finished to index, it worked well.
but, now, everything is using 1350 mb of RAM when he is running ... - as all the db works on memory.
void
Developer
Posts: 16735
Joined: Fri Oct 16, 2009 11:31 pm

Re: problem with content indexing

Post by void »

Thank you for your feedback.

You are reaching the limits of Everything content indexing.

Content indexing in Everything is intended for a couple 100MB of raw text.
If you want to index 1GB+ of raw text and have 1GB of free memory, I won't stop you ;)

The initial content index will be slow, progress is shown in the status bar and detailed progress is now shown in Tools -> Options -> Content.

Please consider adding more filters for which files are content indexed.

For example, limit the content to a specific folder, set file types and size limit:
  • In Everything, from the Tools menu, click Options.
  • Click the Content tab on the left.
  • Set include only folders to a semicolon delimited list of folders, for example:
    c:\users\<my user name>\Documents;D:\documents
  • Set the Include only filters, for example: *.docx;*.pdf
  • Set a Maximum size, for example: 10 MB
  • Click OK.
If you have a fast NVMe drive, instead of using content indexing, consider using faster content searching.
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: problem with content indexing

Post by LeoLUG »

I used it for a folder with a lot of pdf files, I want to have searchable - filter will not help me in this situation, since I already choose the specific folder. as of your information I know now it's not the right tool to use for it.
I will back out of that, and use content indexing only for DOC files, and I'm sure even that will be very useful for me.
void
Developer
Posts: 16735
Joined: Fri Oct 16, 2009 11:31 pm

Re: problem with content indexing

Post by void »

Please be aware that if you enable content indexing, content: will only search your indexed content.

Use the notindexed: modifier to search content in your PDF folder, for example:
"d:\pdf folder\" ext:pdf notindexed:content:"text to search"
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: problem with content indexing

Post by LeoLUG »

Yes, I see that in your original post.

Is it possible to use a filter for separate EXT depend on folder?
For example: from folder: c:/user/files - index pdf and doc;
and from folder: c:/user/otherfiles - index only doc
LeoLUG
Posts: 69
Joined: Tue May 26, 2020 2:28 am

Re: problem with content indexing

Post by LeoLUG »

void wrote: Mon Mar 15, 2021 12:25 am Please consider adding more filters for which files are content indexed.

For example, limit the content to a specific folder, set file types and size limit:
  • In Everything, from the Tools menu, click Options.
  • Click the Content tab on the left.
  • Set include only folders to a semicolon delimited list of folders, for example:
    c:\users\<my user name>\Documents;D:\documents
  • Set the Include only filters, for example: *.docx;*.pdf
  • Set a Maximum size, for example: 10 MB
  • Click OK.
If you have a fast NVMe drive, instead of using content indexing, consider using faster content searching.
Can you please give some more detail on that, can this effect the speed of the pc on general?
void
Developer
Posts: 16735
Joined: Fri Oct 16, 2009 11:31 pm

Re: problem with content indexing

Post by void »

Is it possible to use a filter for separate EXT depend on folder?
Yes, please try the following:
leave include only folders blank.
include only files:
c:\user\files\**.pdf;c:\user\files\**.doc;c:\user\otherfiles\**.doc
If you have a fast NVMe drive, instead of using content indexing, consider using faster content searching.
Can you please give some more detail on that, can this effect the speed of the pc on general?
Enabling /no_incur_seek_penalty_multithreaded=1 will only effect search performance in Everything.
Most NVMe SSDs can read at 3000+ MB/s, with /no_incur_seek_penalty_multithreaded=1 Everything can read content at this speed, which makes content indexing moot.

Enabling /no_incur_seek_penalty_multithreaded=1 for normal SSDs will also increase search performance (just not to the extent with NVMe).
Enabling /no_incur_seek_penalty_multithreaded=1 will make no difference for HDDs.

/no_incur_seek_penalty_multithreaded=1 is not enabled by default because it can be very demanding on the system.
I will consider enabling it by default with more testing.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

void wrote: Mon Mar 15, 2021 12:25 am Content indexing in Everything is intended for a couple 100MB of raw text
That is a pity. So I guess there is no point in content indexing a network drive with roughly 50GB of documents, mostly DOC, PDF and MSG?
NotNull
Posts: 5461
Joined: Wed May 24, 2017 9:22 pm

Re: problem with content indexing

Post by NotNull »

Everything will load all indexed content in memory, so you will need a system with *a lot* of RAM for 50GB of documents.
But technically it is possible ..
raccoon
Posts: 1017
Joined: Thu Oct 18, 2018 1:24 am

Re: problem with content indexing

Post by raccoon »

David.P wrote: Wed Mar 16, 2022 11:00 pm That is a pity. So I guess there is no point in content indexing a network drive with roughly 50GB of documents, mostly DOC, PDF and MSG?
I'm double checking here to make sure we're on the same page. Do you want to index just the metadata of these documents? Or do you want to search the entire contents of 50 gigabytes of documents, over a network, without keeping a local copy of 50 gigabytes of documents?

You can certainly index the metadata (date modified, author, etc) of the documents, but if you want to search the documents themselves you will need to keep a local copy of those documents in your own possession.
void
Developer
Posts: 16735
Joined: Fri Oct 16, 2009 11:31 pm

Re: problem with content indexing

Post by void »

I highly recommend storing your text content on a local NVMe SSD drive.

Everything will max out your NVMe SSD read speeds (3000+ MB/s)



Consider using Everything content indexing if you want instant content searching and have the free ram available.
Last edited by void on Thu Mar 17, 2022 9:10 am, edited 1 time in total.
Reason: *free ram available
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

Currently I use Archivarius 3000 for content indexing. The index there is about 15 GB in size, but is not loaded into RAM. Still, the search results are instantaneous in most cases, although not in Everyting's FAYT style.

Archivarius 3000 also doesn't need access to the network drive for searching, since the entire raw text is stored in the index on a local SSD.

However, I don't fully understand your comments regarding my network drive -- especially not Void's, who seems to point to a way to have instant content search and still have low RAM usage?
void
Developer
Posts: 16735
Joined: Fri Oct 16, 2009 11:31 pm

Re: problem with content indexing

Post by void »

To clarify,

Consider using Everything content indexing if you want instant content searching at the cost of high RAM usage.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

OK thanks. Since I would probably need dozens of gigabytes of RAM for my amount of data then, content indexing is probably out of question for the time being.
horst.epp
Posts: 1445
Joined: Fri Apr 04, 2014 3:24 pm

Re: problem with content indexing

Post by horst.epp »

You can also query the Windows index with Everything using
si:your_search
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

OK thanks.

I just tried and started indexing only the MSG files for content. Instantly, CPU load jumped to 100%, and after a few minutes, Everything already used 16GB RAM.

I managed to stop indexing and disable content indexing in the settings, but my PC still crashed afterwards. I.e. it still runs and stuff can be accessed via network, but all screens have gone black.

Anyway, I guess I'll keep Archivarius 3000 around for content intexing for the time being (unfortunately, it seems to be abandoned), and use Everything for what it does best: blazing fast filename search.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

One more question regarding Content Indexing: Is the content index also saved to disk when shutting down Everything, and later loaded from disk to RAM again?

I would say that this makes even more sense than with the file name index, however could not observe such behavior so far?

And when content is indexed, you'd still have to use
content:[search term]
in order to search the content, correct?
horst.epp
Posts: 1445
Joined: Fri Apr 04, 2014 3:24 pm

Re: problem with content indexing

Post by horst.epp »

David.P wrote: Sun Mar 20, 2022 4:21 pm One more question regarding Content Indexing: Is the content index also saved to disk when shutting down Everything, and later loaded from disk to RAM again?

I would say that this makes even more sense than with the file name index, however could not observe such behavior so far?

And when content is indexed, you'd still have to use
content:[search term]
in order to search the content, correct?
1. Of course the content index is saved with the database.
2. You can define any shortname or macro to search for content.
I use a filter with defines a macro to search for name and content together.
Example:
fc:mysearch
Screenshot - 20.03.2022 , 18_28_22.png
Screenshot - 20.03.2022 , 18_28_22.png (13.85 KiB) Viewed 14911 times
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

OK thanks. It seems however that ET 1.5a always scans everything again first when started, which seems to take hours in my case.

However, I am running in non-admin, non-service mode because I have no admin rights on the machine. Additionally, I'm scanning a network drive with ~50GB of data over a VPN connection.
horst.epp
Posts: 1445
Joined: Fri Apr 04, 2014 3:24 pm

Re: problem with content indexing

Post by horst.epp »

David.P wrote: Sun Mar 20, 2022 5:49 pm OK thanks. It seems however that ET 1.5a always scans everything again first when started, which seems to take hours in my case.

However, I am running in non-admin, non-service mode because I have no admin rights on the machine. Additionally, I'm scanning a network drive with ~50GB of data over a VPN connection.
None service and None admin prevents Everything from scanning NTFS drives before you allow it to start.
Remove the local drives from the Indexes NTFS tab
and use Folder indexing in this case.
Set the schedule to manual.
For the VPN connected systems it would be much better to run Everything (Server or ETP)
on this machine and connect to it with your client.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

Thanks again.

I only use "Folders" and "Network Drives" under the "Indexes" settings.
Later, I might be able to run ET on the server, at the moment this is only for trying out.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

I have now tried and created a content index of the following file types: *.msg; *.doc; *.docx; *.txt -- limiting the maximum file size to 2 MB.

Everything then found about 200,000 files and indexed their content. While the index was being created, memory usage went up to about 14 GB. However, after closing and restarting Everything, its RAM usage remains at about 1 GB, which also corresponds approximately to the size of the Everything-1.5a.db file.

Is this expected behavior?

Is there a way to create a content index with reduced RAM usage (during the creation)?

If so, I could possibly have (almost) all files of the type mentioned initially content-indexed, without that size limitation.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

horst.epp wrote: Thu Mar 17, 2022 11:48 am You can also query the Windows index with Everything using
si:your_search
Oh wow, I just realized that Everything can also do a combined search like so:
si:[content search term] [file name portion]

That is AWESOME! So I don't need Everything to index any content at all, and still have full access to the content in Everything searches, by "side-loading" the Windows Index.

Incredible.
horst.epp
Posts: 1445
Joined: Fri Apr 04, 2014 3:24 pm

Re: problem with content indexing

Post by horst.epp »

I type only
fc:searchtext
and it finds content or file names with the searchtext.
To make this I have a filter with define the fc macro
Screenshot - 22.03.2022 , 22_23_34.png
Screenshot - 22.03.2022 , 22_23_34.png (17.1 KiB) Viewed 14657 times
You can make a similar one including the system index.
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

Thanks!

I can't believe how awesome this tool is.
nesedenyana
Posts: 58
Joined: Mon Sep 19, 2022 10:38 am

Re: problem with content indexing

Post by nesedenyana »

Hi David,

Did you solve your problems with content search? I have the same problem with having >100 GB of PDFs that I want to search, but of course I cannot do that with regular RAMs, I need a database solution. Do you still use Archivarius?
David.P
Posts: 200
Joined: Fri May 29, 2020 3:22 pm

Re: problem with content indexing

Post by David.P »

Yes I still use Archivarius.

FileLocator Pro aka Agent Ransack could be a possible successor, but unfortunately still doesn't support multicolor highlighting of search term occurrences.
Post Reply