Delete duplicate lines, with partial match
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
add a comment |
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
Sample text:
This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.
Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;
This is first line.
This is over_second line.
This is third line.
This is over_fourth line.
I could only come up with: over_w+
for selecting section of text. But don't know how to recognize duplicates, and delete whole line.
deletion lines
deletion lines
edited Nov 29 at 22:43
Drew
46.8k462104
46.8k462104
asked Nov 29 at 22:03
msinfo
1211
1211
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16
add a comment |
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "583"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
add a comment |
up vote
1
down vote
up vote
1
down vote
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp
, bound to C-M-%
. Start with point at the top of the buffer.
Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j
.
So, for over_second
, call C-M-%
, then enter the regular expression:
C-qC-j.*over_second.*
This will match an entire line that contains the string over_second
, and includes the previous new line.
Then enter the empty string (just type <enter>
) for the replacement value.
The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n
to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y
(or <space>
).
You can keep typing y
until all the matches are deleted, or you can type !
to delete all remaining matches at once.
answered Nov 29 at 22:42
Tyler
10.9k12048
10.9k12048
add a comment |
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
add a comment |
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
add a comment |
up vote
1
down vote
up vote
1
down vote
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
Try
delete-duplicate-lines
, which is part of distributed Emacs.
Emacs Wiki page Duplicate Lines might help.
It points to a blog post about it.
It explains why interactive search-and-replace might not help.
It explains how to do it with Lisp, in various ways.
It explains how to do it with the UNIX / GNU/Linux command
sort
orunique
.
answered Nov 29 at 22:42
Drew
46.8k462104
46.8k462104
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
add a comment |
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
– Tyler
Nov 29 at 22:48
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
@Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
– Drew
Nov 30 at 0:40
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
add a comment |
up vote
0
down vote
up vote
0
down vote
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
I could swear this is a duplicate, but I couldn't find it.
Try this:
(defun my-delete-duplicate-matches (regexp)
"Delete matching lines, except the first instance of each specific match."
(interactive (list (read-regexp "Regexp: ")))
(save-restriction
(when (use-region-p)
(narrow-to-region (region-beginning) (region-end)))
(save-excursion
(goto-char (point-min))
(let ((matches (make-hash-table :test #'equal)))
(save-match-data
(while (re-search-forward regexp nil :noerror)
(if (not (gethash (match-string 0) matches))
(puthash (match-string 0) t matches)
(forward-line 0)
(delete-region (point) (progn (forward-line 1)
(point))))))))))
Caveats:
If the same matching text appears twice on the first line in which it is found, the line will be deleted.
More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.
Multi-line patterns are not supported.
edited Nov 30 at 0:12
answered Nov 29 at 23:44
phils
25.5k23564
25.5k23564
add a comment |
add a comment |
Thanks for contributing an answer to Emacs Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11
Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14
Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16