Properties Feedback

Discussion related to "Everything" 1.5 Alpha.
Post Reply
rgbigel
Posts: 27
Joined: Sun Apr 17, 2011 4:00 pm

Properties Feedback

Post by rgbigel » Fri Jan 28, 2022 8:16 pm

I have found Property Indexing (currently only using MD5:) extremely useful. In fact, I am using it extensively on some 200000 picture files.
Here, the indexing takes a while, but that's OK.

When I added *.mp4 to the "only Files", the indexing is taking longer than 3 days. :geek: because these files are quite big.
After one run, I found that I forgot to add a folder containing many *.mp4 files. :o Horror, the indexing starts over from scratch!

I would have assumed that there would be some switch for indexing like (in my case) !MD5: or Resume-Unindexed: but that does not seem exist.
Also, If I remove any of the "Exclude Folders", I would have expected that the MD5 values of the files that already were computed would not change. That is, MD5's would only not update any longer and MD5 would be cleared when the file is modified?
At any rate, changing "Exclude Folders" starts a complete indexing operation... another 3 days. :cry:

So please please implement something like Resume-Unindexed:MD5

To keep things in context, here is more, of somewhat lesser importance. Just suggestions, but Some things limit my use of this very important feature (or I'm using it wrong?)

Building the Index is controlled by Include/Exclude Files or Folders, including patterns. OK so far, but any change to the pattern syntax is limited?
  • pattern syntax is not the same as the one for Searching
  • unable to use Filter: or Bookmarks: functions
  • The parameter list should have the same Syntax everywhere, hence as in most cases,
    • function:[!]term1;[!]term2 ... , or in some cases, when the implied logic is unclear
      • function:[!]term1|[!]term2 ...
      or
      • function:[!]term1&[!]term2 ...
      if the letter | is in conflict, it could be escaped in the terms, e.g. \| --- I have never seen a file or folder with a name containing "|" anyways.
  • I do not really understand what is happening/what's the difference with Find-Dupes: and [Sort:] Dupes: or Unique:, respectively
    why is can't I write
    • Path:abc;**def; instead of Path:abc Path:**def
    or, this -- see above, with comparison operators:
    • Path:abc&!def|**efg
Thank you again for all the work and the excellent quality of this Board
Rolf

void
Developer
Posts: 9453
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties Feedback

Post by void » Sat Jan 29, 2022 10:54 am

Thank you for your feedback Rolf,
I have found Property Indexing (currently only using MD5:) extremely useful. In fact, I am using it extensively on some 200000 picture files.
Here, the indexing takes a while, but that's OK.
Please consider using a tool like shasum256 (or md5sum).
shasum256 will produce a sidecar file (.sha256) so the hash only needs to be calculated once.

Everything has support for .sha256 sidecar files.

After one run, I found that I forgot to add a folder containing many *.mp4 files. :o Horror, the indexing starts over from scratch!
The rebuild is required as Everything will only store the MD5 property for files matching your specified property filters.

Everything will reuse the previously indexed properties where possible.
So only new mp4 files should need to be indexed.

Everything only remembers the previously indexed properties when you add exclude filters to your properties.
When adding include filters (eg: *.mp4), Everything must perform a full reindex of your properties.

I will look into keep your existing indexed properties when adding include filters (where possible).


Building the Index is controlled by Include/Exclude Files or Folders, including patterns. OK so far, but any change to the pattern syntax is limited?
The filter syntax is slightly different to the search syntax.
The filter search is simplified for complexity and performance reasons.
(For example, the *.jpg;*.mp4 filter gets compiled into a list of extensions)

The filter search supports wildcards (matches the whole filename) and the regex: modifier
Otherwise, the filter search matches the whole filename.

Accessing bookmarks/filters from filters would be a nightmare as bookmarks/filters can change at any time.
The parameter list should have the same Syntax everywhere, hence as in most cases,
I am not quite sure what you mean here.
The filter search does not support functions.

If you are referring to the normal search, then yes, the standard parameter list syntax in Everything is:
abc;123;"text ; with ; semicolons"

For example:
ext:jpg;mp4
filelist:notepad.exe;explorer.exe
tags:tree;sunny
zipfilenames:everything.exe;everything.lng

I do not really understand what is happening/what's the difference with Find-Dupes: and [Sort:] Dupes: or Unique:, respectively
I can understand the confusion.
These functions have changed every week for the last month.

The syntax I will be recommending now is dupe:<property1>;<property2>;<property3>

For example:
dupe:size
dupe:size;dm
dupe:size;date-modified;name

The functions above will find files where all the specified properties match (within the current results).

dupe:



Forget about find-dupes:
I will keep the find-dupes: function, but I will no longer recommend its use.
(find-dupes: is just an alias for dupe:)

unique:<property> does the opposite of dupe:, that is find files in the current results that have a unique property (removes duplicated properties from view).
unique: only supports one property as the parameter at this stage.

why is can't I write
Path:abc;**def; instead of Path:abc Path:**def
Some functions do not support list parameters.
path: is also a search modifier.

The correct syntax is:
path:<abc|**def>
-or-
path:<abc **def>

| = OR
space = AND

** implies path:
so you can also do something like:
path:abc|**def

This syntax will also work with most search functions.
For example:
content:<abc|123>

There is also the new \\ syntax:
def\\abc
-or-
abc\\def
\\ is replaced with **\**
(order is important)


or, this -- see above, with comparison operators:
Path:abc&!def|**efg
Please try the following search:
path:<abc !def|**efg>

path: = enable match path
< > = group search terms
space = AND
! = NOT
| = OR

I will consider a search function to take a semicolon delimited list of paths.

rgbigel
Posts: 27
Joined: Sun Apr 17, 2011 4:00 pm

Re: Properties Feedback

Post by rgbigel » Sat Jan 29, 2022 1:20 pm

void wrote:
Sat Jan 29, 2022 10:54 am
Please consider using a tool like shasum256 (or md5sum).
shasum256 will produce a sidecar file (.sha256) so the hash only needs to be calculated once.

Everything has support for .sha256 sidecar files.
I had no Idea about .sha256 files. Thank you! But how does this work? :? Are the SHA256 values used as indexed property?
The rebuild is required as Everything will only store the MD5 property for files matching your specified property filters.

Everything will reuse the previously indexed properties where possible.
So only new mp4 files should need to be indexed.
...
Everything only remembers the previously indexed properties when you add exclude filters to your properties.
When adding include filters (eg: *.mp4), Everything must perform a full reindex of your properties.
When I look at the result list immediately after changing anything on the propery page, it appears as if the previous index is completely erased before any (re-)indexing :?: . Hence, properties like MD5'es can not be reused at all :| at this time.
I will look into keep your existing indexed properties when adding include filters (where possible).
THIS is great! :D
Accessing bookmarks/filters from filters would be a nightmare as bookmarks/filters can change at any time.
Well, I can follow this... :(
I was looking at alternatives, but as the syntax for Include/Exclude can not simply be adapted by pasting a string copy of the BookMark or Filter definitions, I would need to copy the search terms with "!" into the Excludes, and the rest into Includes?

Could this be solved by having a compound Include/Exclude -- kind of like a static Filter for re-indexing :?:
I can understand the confusion.
These functions have changed every week for the last month.

The syntax I will be recommending now is dupe:<property1>;<property2>;<property3>
...
The correct syntax is:
path:<abc|**def>
-or-
path:<abc **def>

| = OR
space = AND

** implies path:
so you can also do something like:
path:abc|**def

This syntax will also work with most search functions.
For example:
content:<abc|123>
...

abc\\def
\\ is replaced with **\**

There is also the new \\ syntax:
This is a wonderful solution, I was not aware of it. Some of these very helpful explanations are not in "Help" / Examples yet.
I will consider a search function to take a semicolon delimited list of paths.
Applause! :D

And thank you again, you are amazing!
Rolf

void
Developer
Posts: 9453
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties Feedback

Post by void » Mon Jan 31, 2022 6:09 am

I had no Idea about .sha256 files. Thank you! But how does this work? :? Are the SHA256 values used as indexed property?
.sha256 files are a small text file that contain a list of filenames and sha256 sums.

Everything.sha256 example:

dcae88c71d8858d92932fbcee752bf6ca6980d348012daec7375553495c73b58 *Everything-1.4.1.1016.x64-Setup.exe
bbd01c54a6fc88301ec4f219b7cf1092d329897f837a74ad0dd0f8a357c7e1d7 *Everything.exe

The sha256 files only need to be created once and stored along side your files.
sha256 files can help to detect bit rot and to check the integrity of files.
You can have multiple sha256 files for each data file or a single sha256 file that contains hashes for all files.

sha256 files follow the common md5sum file format, they are also similar to the common sfv file format.
(Everything also has support for md5sum files and sfv files if you prefer those)

I recommend sha256 as there's currently no known collisions.

You can add the sha256sum sha-256 property to your index.
Everything will not need to touch the file's content.


sha256sum
md5sum
sfv
voidhash


When I look at the result list immediately after changing anything on the propery page, it appears as if the previous index is completely erased before any (re-)indexing :?: . Hence, properties like MD5'es can not be reused at all :| at this time.
Everything thinks the current indexed properties are invalid due to the addition to your include filters.
I will look into improving this..


Could this be solved by having a compound Include/Exclude -- kind of like a static Filter for re-indexing :?:
I will consider this.
For now the filters must be simple for efficiency.

Everything will need to know if a file should be included in the filter instantly. (Ideally, less than 1 nanosecond, ~1 second for 1 million files)

rgbigel
Posts: 27
Joined: Sun Apr 17, 2011 4:00 pm

Re: Properties Feedback

Post by rgbigel » Mon Jan 31, 2022 1:14 pm

Thank you again for your extensive help.

Trying to follow this up, if may have found a serious bug "somewhere", which may not be related to Everything, but it "smells" like it. So I have no idea how to proceed.

Looking and the listing search result with filter Everything, Search !md5 I found a number of files although the index was fully completed.
All of the files are in the same directory: "D:\GDrive\Dupes to One Pictures\", but not all the files in that directory lack an MD5 value.

I picked out a single file: "Bercht Rolf.jpg", and this is what Everything shows:
Screenshot (1).png
Screenshot (1).png (31.91 KiB) Viewed 1150 times
In particular, it shows pretty much a normal result, with the MD5 empty, as searched for. The Owner also is not returned, indicating the Everything itself was unhappy with the file.

But, when I try to access this file (by Explorer, by CMD as Administrator via Dir or Attrib or ICACLS /Reset, Del, and some...)
the systen can't find the file.

I tried "Properties" from Everything, giving the normal property display of that file ==> with Size=0 and no Details. So ?

I ran a "ChkDsk D: /F", no errors found. NTFS is also reported as healthy.

I tried to Delete the file inside of Everything, but that did not work (File still showing, but no Errors, as far as I can tell, not even in the debug log):
EverythingDebugLog-try-del.zip
(2.91 KiB) Downloaded 35 times
I started Google Drive (not normally active on my PC) via EDGE, and the file was not there, and not in the trash either. (Unfortunately, I noticed that I had run out of space there, but that was before I uploaded something else a fairly long time ago.)

:( So I assume that something is wrong with the Everything Index! :o

I then closed Everything and restarted it as Administrator / Not as Service, the described problems did not change.

I tried to restore the folder using Acronis Backup... the Backup date 20211230) and, guess what, Acronis found the folder, but then failed before even giving me the file list (without error message, of course). Now I was really sad.

I copied the files which were intact (most of them) to another drive, so my loss is not complete. But what is going on here?

Any Help VERY much appreciated.

Rolf

void
Developer
Posts: 9453
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties Feedback

Post by void » Tue Feb 01, 2022 7:09 am

Thank you for your feedback Rolf,

So I assume that something is wrong with the Everything Index!
Looks like a file Everything has not detected as being removed.
The most common cause is a hard link deletion.
Everything does not track hard link deletions.
Hard to know for sure without seeing debug logs.

Forcing a rebuild from Tools -> Options -> Indexes should fix the issue.

This is another reason to use sidecar files for md5, sha hashes.


Do other files under D:\GDrive\Dupes to One Pictures have a hard link count over 1? (Please check the Hard Link Count property in Everything)
Are you indexing your D: as a folder index under Tools -> Options -> Folders?

Rene
Posts: 38
Joined: Fri Nov 04, 2016 6:16 am

Re: Properties Feedback

Post by Rene » Sat Feb 05, 2022 10:10 pm

in the indexed Properties i think there might be a mix up with the property Video format
where it shows the "codec id"
i was expecting to see the Video Format instead,

for example according to mediainfo:
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main 10@L3@Main
Codec ID : V_MPEGH/ISO/HEVC
Would it be doable to change the video format property's name to codec id, and have an actual Video format prop. show the Format as mediainfo shows us ?

raccoon
Posts: 685
Joined: Thu Oct 18, 2018 1:24 am

Re: Properties Feedback

Post by raccoon » Sat Feb 05, 2022 11:02 pm

There's the formal codec id, and then the canonical name. I don't know if there's an easy lookup table to do this conversion. Maybe ffprobe or mediainfo source code has such a table to borrow from.

ref: Search for MP4 files with AAC codec
ref: What Is A Codec Tag? (external)

NotNull
Posts: 3684
Joined: Wed May 24, 2017 9:22 pm

Re: Properties Feedback

Post by NotNull » Sat Feb 05, 2022 11:29 pm

FWIW, exiftool calls "Video format " "Compressor ID". Windows does not have such a property.

raccoon
Posts: 685
Joined: Thu Oct 18, 2018 1:24 am

Re: Properties Feedback

Post by raccoon » Sun Feb 06, 2022 12:02 am

I'm having difficult finding any Codec Tag or FourCC ID conversion list to Codec String or Canonical Name.

It's probably not as simple as a this-to-that conversion table without deeper inspection of the data stream itself.

refs: ffmpeg -codecs; ffmpeg -formats; ffmpeg.org/general.html#Video-Codecs; fourcc.org; wikipedia.org/wiki/FourCC

void
Developer
Posts: 9453
Joined: Fri Oct 16, 2009 11:31 pm

Re: Properties Feedback

Post by void » Mon Feb 07, 2022 3:04 am

It's not really possible to support all format names.

I will consider supporting the common formats.
I am considering support for a MediaInfo plugin.

Everything will currently use the audio/video format as specified in the container file.


Technically, Everything already assigns nice audio format names to some audio codec IDs.
I will look at doing the same for common video formats.

Thanks for the suggestions.

Post Reply