Search/sort by frequency of file extension

General discussion related to "Everything".
Post Reply
David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 11:27 am

Hello Forum,

I just noticed that there are tens of thousands of files with the *.pyc file extension on my system. Since I have no need to search for these files, I have excluded *.pyc from the index, which should reduce memory usage of Everything accordingly.

Now I'm wondering if there might be other file types that exist by massive numbers, but that I never search for.

I don't want to generally exclude the various installation folders for programs, as this would also exclude all programs and tools (i.e. *.exe files) themselves, including hundreds of portable programs.

Is there any way to sort all files on the system by file extension frequency with Everything, or with some other tool, for that matter?
Last edited by David.P on Sun May 08, 2022 11:30 am, edited 1 time in total.

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Sun May 08, 2022 11:30 am

Everything 1.5 will have an extension frequency column.

To show extensions by frequency:
  • Right click the result list column header and click Add columns....
  • Select for: ext
  • Select Extension Frequency.
  • Click OK.
  • Click the Extension Frequency column header to gather and sort by extension frequency.


To reduce the number of duplicated extension results:
  • Right click the Extension Frequency column header.
  • Right click Find Extension Frequency Duplicates
  • Click Find unique (including first duplicated).

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 11:32 am

Image

You and Everything™ are the worlds fastest!

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 12:19 pm

Please, another comprehension question on this::

Image

What do the four different options mean for the (number of) search results?
And if I search for extension-frequency e.g. like this:
*.* extension-frequency:>=333
then I don't get any search result :?:

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Sun May 08, 2022 12:26 pm

What do the four different options mean for the (number of) search results?
Find duplicates (including first one) -removes all items that are not duplicated: A, A, A, B, C => A, A, A
Find duplicates (except first one) -removes all items that are not duplicated and the first duplicated item: A, A, A, B, C => A, A
Find unique (including first duplicated) -removes all items that are duplicated (except the first one): A, A, A, B, C => A, B, C
Find unique (not duplicated) -removes all items that are duplicated: A, A, A, B, C => B, C


if I search for extension-frequency e.g. like this:
*.* extension-frequency:>=333
then I don't get any search result
The extension-frequency: search function is currently not supported.
I'll add this in the next alpha update.

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 12:27 pm

Thanks very much, looking forward to it!

I find that this option in particular provides an extremely powerful, informative result about the most frequent file extensions on one's system:
void wrote:
Sun May 08, 2022 12:26 pm
Find unique (including first duplicated) -removes all items that are duplicated (except the first one): A, A, A, B, C => A, B, C

harryray2
Posts: 815
Joined: Sat Oct 15, 2016 9:56 am

Re: Search/sort by frequency of file extension

Post by harryray2 » Sun May 08, 2022 12:45 pm

Where do the four, find duplicates dialogue come from?

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 12:55 pm

Right click on a column header (I believe any one will do), and then right click on "Find Date Created Duplicates"

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Sun May 08, 2022 1:05 pm

Holding down Shift and right-clicking the result list column header will also give more options.

harryray2
Posts: 815
Joined: Sat Oct 15, 2016 9:56 am

Re: Search/sort by frequency of file extension

Post by harryray2 » Sun May 08, 2022 1:31 pm

Thanks...the shift did it.

raccoon
Posts: 712
Joined: Thu Oct 18, 2018 1:24 am

Re: Shift-Rightclick Menus

Post by raccoon » Sun May 08, 2022 1:58 pm

@void: Would you be inclined to make Shift-revealed menu items displayed in Italics? IDM_ITALIC

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 3:28 pm

harryray2 wrote:
Sun May 08, 2022 1:31 pm
Thanks...the shift did it.
I see (except for line 2) identical context menus, both with and without Shift :o
Image
Last edited by David.P on Sun May 08, 2022 5:56 pm, edited 1 time in total.

David.P
Posts: 75
Joined: Fri May 29, 2020 3:22 pm

Re: Search/sort by frequency of file extension

Post by David.P » Sun May 08, 2022 3:57 pm

David.P wrote:
Sun May 08, 2022 12:27 pm
Thanks very much, looking forward to it!

I find that this option in particular provides an extremely powerful, informative result about the most frequent file extensions on one's system:
void wrote:
Sun May 08, 2022 12:26 pm
Find unique (including first duplicated) -removes all items that are duplicated (except the first one): A, A, A, B, C => A, B, C
This killer feature is.... oddly satisfying.
Image

I just wonder what the 2972 files of the half-baked type are good for :o

horst.epp
Posts: 750
Joined: Fri Apr 04, 2014 3:24 pm

Re: Search/sort by frequency of file extension

Post by horst.epp » Sun May 08, 2022 4:57 pm

And what would you do with the knowledge about all that file types ?
This list is not relevant at all for using or administrating a system.
The importand extensions are a rather small list which mainly comes from basic Windows functions
and the installed software.
I was admistrating servers with a lot of users and terabytes of data
but I never needed a list like this.

raccoon
Posts: 712
Joined: Thu Oct 18, 2018 1:24 am

Re: Search/sort by frequency of file extension

Post by raccoon » Sun May 08, 2022 5:19 pm

-> Request: about:ext-survey -- File extension forum survey.
-> List of all file extensions used
horst.epp wrote:
Fri Jul 23, 2021 3:39 pm
Just for interest, for what purpose do you need such a list.
Normaly I'm only interested on a few extensions which I have on daily use.
I even have a folder which test files for this extensions to check preview and thumbnail functions.
But for what reason do I need to know all the other extensions ?
I never needed a thing and I don't see why anybody else should.

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Sun May 08, 2022 9:51 pm

So ... what *is* the practical usefulness of this feature? I didn't get it from the linked thread either.
Or just curiousity? That I can relate to :).

(Not that everything needs to be practical; I remember enjoying watching my disks getting defragmented. Every single time .. )


But it is cool that even this is possible with Everything!

raccoon
Posts: 712
Joined: Thu Oct 18, 2018 1:24 am

Re: Search/sort by frequency of file extension

Post by raccoon » Sun May 08, 2022 11:28 pm

void wrote:
Sun May 08, 2022 11:30 am
Everything 1.5 will have an extension frequency column.

To show extensions by frequency:
  • Right click the result list column header and click Add columns....
  • Select for: ext
  • Select Extension Frequency.
  • Click OK.
  • Click the Extension Frequency column header to gather and sort by extension frequency.
To reduce the number of duplicated extension results:
  • Right click the Extension Frequency column header.
  • Right click Find Extension Frequency Duplicates
  • Click Find unique (including first duplicated).
@void: By using Find Unique on the Extension Frequency column, this will discard two different extensions that have the same frequency count. By contrast, using Find Unique on the Extension column itself will reset all the Extension Frequency counts to 1. Is there any way to lock hold on the Frequency count column to prohibit update?

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Mon May 09, 2022 5:33 am

@void: By using Find Unique on the Extension Frequency column, this will discard two different extensions that have the same frequency count.
Yes, It's not 100% accurate when removing duplicates with this method.

For the next alpha update, Everything will only populate Extension Frequency once after sorting by Extension Frequency.
You'll be able to use F5 to clear this cache.
This way, you can sort by Extension Frequency to build the initial cache, find distinct extensions and then resort by Extension Frequency.

therube
Posts: 3449
Joined: Thu Sep 03, 2009 6:48 pm

Re: Search/sort by frequency of file extension

Post by therube » Wed May 11, 2022 7:14 pm

what *is* the practical usefulness of this feature?
Typo'd file extension, perhaps? (Perhaps.)

There have been times when I've mistyped a file extension.
There are some that I have a habit of doing, fairly regularly.
So I might search for said extension (ext:wrongo) so I can fix it.

Other times, I'll search for ext: to possibly find files that should have an extension, but were omitted (so I can add them).
Other times, I'll exclude known extensions to possibly find file names with mistyped extensions (so I can fix them).

Would I use Extension Frequency for that, eh, probably not.

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Thu May 12, 2022 7:03 am

Everything 1.5.0.1313a will now only populate Extension Frequency once after sorting by Extension Frequency.

Everything 1.5.0.1313a adds an extension-frequency: search function.

To list extensions by frequency:
  • Right click the result list column header and click Add columns....
  • Select for: ext
  • Select Extension Frequency.
  • Click OK.
  • Right click the result list column header and click Add columns....
  • Select for: ext
  • Select Extension.
  • Click OK.
  • Click the Extension Frequency column header to gather and sort by extension frequency.
  • Right click the Extension column header.
  • Right click Find Extension Duplicates.
  • Click Find unique (including first duplicated).
  • Click the Extension Frequency column header to resort by extension frequency.
Press F5 to clear extension frequency cache.

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Fri May 13, 2022 1:08 pm

The "lazy way": create bookmark for the following search

Code: Select all

add-columns:Extension;"Extension Frequency"   distinct:Extension   sort:"Extension Frequency"

defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Re: Search/sort by frequency of file extension

Post by defza » Fri May 13, 2022 7:10 pm

NotNull wrote:
Fri May 13, 2022 1:08 pm
The "lazy way": create bookmark for the following search
You rather mean:

Code: Select all

add-columns:Extension;"Extension Frequency"   distinct:"Extension Frequency" sort:"Extension Frequency"

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Fri May 13, 2022 7:16 pm

distinct:"Extension" gives you one entry for each extension.

distinct:"Extension Frequency"
gives you one entry for each frequency.
So if there are 5 .txt files and 5 .jpg files, either .jpg or .txt will not be shown.

defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Re: Search/sort by frequency of file extension

Post by defza » Fri May 13, 2022 9:26 pm

NotNull wrote:
Fri May 13, 2022 7:16 pm
distinct:"Extension" gives you one entry for each extension.

distinct:"Extension Frequency"
gives you one entry for each frequency.
So if there are 5 .txt files and 5 .jpg files, either .jpg or .txt will not be shown.
I see, but your query just gives me a list of all extensions, with a frequency of 1 next to all of them.

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Fri May 13, 2022 10:17 pm

With Everything version 1.5.0.1313a ?

defza
Posts: 22
Joined: Thu Apr 18, 2019 12:49 pm

Re: Search/sort by frequency of file extension

Post by defza » Fri May 13, 2022 10:20 pm

NotNull wrote:
Fri May 13, 2022 10:17 pm
With Everything version 1.5.0.1313a ?
Yes

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Fri May 13, 2022 10:29 pm

Tested in a new, fresh instance and there I get the same (frequency = 1) results.
Time for some research .. :)

NotNull
Posts: 3727
Joined: Wed May 24, 2017 9:22 pm

Re: Search/sort by frequency of file extension

Post by NotNull » Fri May 13, 2022 11:29 pm

It looks like a timing/processing issue. This works as expected:

Code: Select all

add-columns:;Extension;"Extension Frequency"  sort:"Extension Frequency"
When the frequency is calculated, adding the following to the search gets the correct results:

Code: Select all

    distinct:Extension



The following bookmark sort of does what it should:

Code: Select all

Search = distinct:"Extension Frequency" 
Columns=Name;Extension;Extension Frequency
Sort = Extension Frequency
.. but returns only ~200 file extensions instead of ~1700.



Conclusion: Don't use these bookmarks! :D

raccoon
Posts: 712
Joined: Thu Oct 18, 2018 1:24 am

Re: Search/sort by frequency of file extension

Post by raccoon » Sat May 14, 2022 12:45 am

I was able to use this feature just now to discover and delete 100,000 files at 200 MB (400 MB actual cluster space) using the faux extensions created from wget grabs of redundant index.html files. <html@C=S;O=A|html@C=N;O=D|html@C=N;O=A|html@C=M;O=D|html@C=M;O=A|html@C=S;O=D|html@C=D;O=A|html@C=D;O=D>

So there's something useful done. Nice and tidy.

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Sat May 14, 2022 3:21 am

Everything currently uses the extension frequency for the current results. (not the entire index)

This is causing too much confusion.



The next alpha update will make the following changes:

Use the extension frequency from the entire index, not the current results.
Gather extension frequency immediately when showing the extension frequency column, searching for extension-frequency: or sorting by extension frequency.

The following will work as expected in the next alpha update:

Code: Select all

add-columns:Extension;"Extension Frequency" distinct:"Extension" sort:"Extension Frequency"
Extension frequency will still only be gathered once.
Use F5 to fresh this cache.

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Tue May 17, 2022 6:50 am

Everything 1.5.0.1314a improves frequency properties:

Frequency property values are gathered immediately when showing a frequency column, searching for a frequency range or sorting by a frequency property.



Frequency property values are now calculated from the entire index.
Not the current results.



The following searches will now work as expected:
add-columns:extension;extension-frequency distinct:extension sort:extension-frequency
add-columns:size;size-frequency distinct:size sort:size-frequency
add-columns:name;name-frequency distinct:name sort:name-frequency



Frequency property values are gathered only once.
They are not updated in real-time.
Press F5 to update frequency property values.



I have put on my TODO list to add a regex-match1-frequency property.



extension-frequency:
name-frequency:
size-frequency:

void
Developer
Posts: 9608
Joined: Fri Oct 16, 2009 11:31 pm

Re: Search/sort by frequency of file extension

Post by void » Thu May 26, 2022 5:59 am

Everything 1.5.0.1315a improves add-columns: and columns:

These search functions will now clear any existing temporary columns when changing the search.

For example:
Changing the search from:
add-columns:extension-frequency
to:
add-columns:size-frequency
will no longer keep the extension-frequency column.


You can now also specify the insert position with :<insert-position>

For example, to add the size frequency column at position 1:
add-columns:size-frequency:1

Post Reply