Finding repetitions

Off-topic posts of interest to the "Everything" community.
Post Reply
Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Finding repetitions

Post by Debugger » Sat Apr 13, 2019 9:54 pm

How do find links (url) that repeat in a text file (on one line)?

NotNull
Posts: 1394
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull » Sat Apr 13, 2019 11:14 pm

(http[^\s]*)\s.*\1
(marks from first to last)

- or -

(http[^\s]*)\s(?=.*\1)
(marks first)

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 7:15 am

EmEmeditor - Both regular expressions do not work.

Image

tuska
Posts: 183
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding repetitions

Post by tuska » Sun Apr 14, 2019 9:51 am

Debugger wrote:
Sun Apr 14, 2019 7:15 am
EmEmeditor - Both regular expressions do not work.
That's not true!
You do not have the required permissions to view the files attached to this post.

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 10:17 am

That's not true!
You said half-truth, because it works only for your example, but not for my text. :)
In one line can contain only one URL or several URLs, and no duplicates and text.
I mean repeating links in the whole text file (MULTILINE)
So, the regular expression can not be valid.

Code: Select all

Line 1: https://www.voidtools.com/forum/viewtopic.php?f=7&t=7656&p=25849#p25849
Nie mam szczególnych ambicji literackich. Podczas mojej długiej podróży poznałem wielu niesamowitych, kreatywnych ludzi ze świata kina, teatru, literatury i innych zawodów, z których każdy pozostawił niezatarty ślad w mojej pamięci i miłych wspomnieniach, które czasami chcę dzielić. Wszystko, o czym piszę, jest odzwierciedleniem moich doświadczeń, emocji i stanu psychicznego na pewnym etapie życia. Jeśli moje historie choć trochę dotykają kogoś, to jest to dla mnie wielkie szczęście, w przeciwnym razie przepraszam, a poza tym dziękuję za czas spędzony na moich pismach!
===
Line 2: https://www.voidtools.com/forum/viewtopic.php?f=7&t=7656&p=25849#p25849
w przeciwnym razie przepraszam, a poza tym dziękuję za czas spędzony na moich pismach!
===
Line 3: Simillar
https://i.postimg.cc/2ycdjBSG/Screen-Sh ... -26-PM.jpg

tuska
Posts: 183
Joined: Thu Jul 13, 2017 9:14 am

Re: Finding repetitions

Post by tuska » Sun Apr 14, 2019 10:39 am

This is what I get in EmEditor:
Finding repetitions of Weblinks in several lines.png
Unfortunately I can't help you with this topic anyway due to the lack of RegEx knowledge…
Ahh, just seeing that your requirements have changed.
You do not have the required permissions to view the files attached to this post.

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 10:55 am

tuska wrote:
Sun Apr 14, 2019 10:39 am
Ahh, just seeing that your requirements have changed.[/b]
No. Only, in a different way formulated.

NotNull
Posts: 1394
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull » Sun Apr 14, 2019 11:07 am

Thanks, for testing, @tuska! (have to admit that I didn't ...)

@debugger: If your URL's don't start with http and/or end with a space (\s), you have to adapt the regular expression, of course.
I assumed you would understand that (after gazillion regex questions on this forum).

NotNull
Posts: 1394
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull » Sun Apr 14, 2019 11:10 am

Original question:
Debugger wrote:
Sat Apr 13, 2019 9:54 pm
How do find links (url) that repeat in a text file (on one line)?
Changed to:
Debugger wrote:
Sun Apr 14, 2019 10:17 am
I mean repeating links in the whole text file (MULTILINE)
How's that NOT a different requirement?

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 11:28 am

Addresses do not end with any spaces, each address is on a separate line.
Need to modify the regular expression a little to be found as well:
http://www.
https://www.
www.

(http|https):\/\/[\w\-_]+(\.[\w]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?

But that does not help much, because in this way will not find the repetition of links. Regex must have the appropriate pattern for repetition.

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 6:04 pm

NotNull wrote:
Sat Apr 13, 2019 11:14 pm
(http[^\s]*)\s.*\1
(marks from first to last)

- or -

(http[^\s]*)\s(?=.*\1)
(marks first)

:shock: The author, the developer, EmEditor told me that:
The regex does not make any senses

NotNull
Posts: 1394
Joined: Wed May 24, 2017 9:22 pm

Re: Finding repetitions

Post by NotNull » Sun Apr 14, 2019 6:21 pm

Then I suggest you ask your emeditor questions over there from now on, instead of here.

BTW: Did you ask the exact same question? - "How do find links (url) that repeat in a text file (on one line)?".
(No need to answer that, it is a rhetorical question).

Debugger
Posts: 530
Joined: Thu Jan 26, 2017 11:56 am

Re: Finding repetitions

Post by Debugger » Sun Apr 14, 2019 7:16 pm

NotNull - I can ask the author, but he will not answer in a very long time, and there is nothing to expect help. Also, I can ask the general users from forum who is in contact with this or that who is familiar with it. Each tip is valuable.


The only thing I could find on GOOGLE search: "find duplicate url"
Only this applies to websites, it has nothing to do with text files, which makes the task difficult.

I have to accept the fact that some duplicates can not be found unless you are a wizard who can do everything. And so most of my issues are not simple. Although in the first place I am looking for similar queries in google search.

Post Reply