Delete duplicate lines, with partial match











up vote
3
down vote

favorite












Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question
























  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    Nov 29 at 22:11










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    Nov 29 at 22:14










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    Nov 29 at 22:16















up vote
3
down vote

favorite












Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question
























  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    Nov 29 at 22:11










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    Nov 29 at 22:14










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    Nov 29 at 22:16













up vote
3
down vote

favorite









up vote
3
down vote

favorite











Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.










share|improve this question















Sample text:



This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.


Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;



This is first line.
This is over_second line.
This is third line.
This is over_fourth line.


I could only come up with: over_w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.







deletion lines






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 at 22:43









Drew

46.8k462104




46.8k462104










asked Nov 29 at 22:03









msinfo

1211




1211












  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    Nov 29 at 22:11










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    Nov 29 at 22:14










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    Nov 29 at 22:16


















  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
    – phils
    Nov 29 at 22:11










  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
    – msinfo
    Nov 29 at 22:14










  • Once this process gets complete for over_second, same should be repeated for over_fourth.
    – msinfo
    Nov 29 at 22:16
















Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11




Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ?
– phils
Nov 29 at 22:11












Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14




Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted.
– msinfo
Nov 29 at 22:14












Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16




Once this process gets complete for over_second, same should be repeated for over_fourth.
– msinfo
Nov 29 at 22:16










3 Answers
3






active

oldest

votes

















up vote
1
down vote













If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



So, for over_second, call C-M-%, then enter the regular expression:



C-qC-j.*over_second.*


This will match an entire line that contains the string over_second, and includes the previous new line.



Then enter the empty string (just type <enter>) for the replacement value.



The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






share|improve this answer




























    up vote
    1
    down vote














    1. Try delete-duplicate-lines, which is part of distributed Emacs.



    2. Emacs Wiki page Duplicate Lines might help.




      • It points to a blog post about it.


      • It explains why interactive search-and-replace might not help.


      • It explains how to do it with Lisp, in various ways.


      • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









    share|improve this answer





















    • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
      – Tyler
      Nov 29 at 22:48










    • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
      – Drew
      Nov 30 at 0:40


















    up vote
    0
    down vote













    I could swear this is a duplicate, but I couldn't find it.



    Try this:



    (defun my-delete-duplicate-matches (regexp)
    "Delete matching lines, except the first instance of each specific match."
    (interactive (list (read-regexp "Regexp: ")))
    (save-restriction
    (when (use-region-p)
    (narrow-to-region (region-beginning) (region-end)))
    (save-excursion
    (goto-char (point-min))
    (let ((matches (make-hash-table :test #'equal)))
    (save-match-data
    (while (re-search-forward regexp nil :noerror)
    (if (not (gethash (match-string 0) matches))
    (puthash (match-string 0) t matches)
    (forward-line 0)
    (delete-region (point) (progn (forward-line 1)
    (point))))))))))


    Caveats:




    • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


    • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


    • Multi-line patterns are not supported.







    share|improve this answer























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "583"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



      Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



      So, for over_second, call C-M-%, then enter the regular expression:



      C-qC-j.*over_second.*


      This will match an entire line that contains the string over_second, and includes the previous new line.



      Then enter the empty string (just type <enter>) for the replacement value.



      The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



      You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






      share|improve this answer

























        up vote
        1
        down vote













        If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



        Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



        So, for over_second, call C-M-%, then enter the regular expression:



        C-qC-j.*over_second.*


        This will match an entire line that contains the string over_second, and includes the previous new line.



        Then enter the empty string (just type <enter>) for the replacement value.



        The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



        You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






        share|improve this answer























          up vote
          1
          down vote










          up vote
          1
          down vote









          If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



          Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



          So, for over_second, call C-M-%, then enter the regular expression:



          C-qC-j.*over_second.*


          This will match an entire line that contains the string over_second, and includes the previous new line.



          Then enter the empty string (just type <enter>) for the replacement value.



          The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



          You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.






          share|improve this answer












          If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.



          Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.



          So, for over_second, call C-M-%, then enter the regular expression:



          C-qC-j.*over_second.*


          This will match an entire line that contains the string over_second, and includes the previous new line.



          Then enter the empty string (just type <enter>) for the replacement value.



          The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).



          You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 29 at 22:42









          Tyler

          10.9k12048




          10.9k12048






















              up vote
              1
              down vote














              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer





















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                Nov 29 at 22:48










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                Nov 30 at 0:40















              up vote
              1
              down vote














              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer





















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                Nov 29 at 22:48










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                Nov 30 at 0:40













              up vote
              1
              down vote










              up vote
              1
              down vote










              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.









              share|improve this answer













              1. Try delete-duplicate-lines, which is part of distributed Emacs.



              2. Emacs Wiki page Duplicate Lines might help.




                • It points to a blog post about it.


                • It explains why interactive search-and-replace might not help.


                • It explains how to do it with Lisp, in various ways.


                • It explains how to do it with the UNIX / GNU/Linux command sort or unique.










              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 29 at 22:42









              Drew

              46.8k462104




              46.8k462104












              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                Nov 29 at 22:48










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                Nov 30 at 0:40


















              • These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
                – Tyler
                Nov 29 at 22:48










              • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
                – Drew
                Nov 30 at 0:40
















              These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
              – Tyler
              Nov 29 at 22:48




              These are good suggestions, but OP was asking about deleting partial duplicates, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that?
              – Tyler
              Nov 29 at 22:48












              @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
              – Drew
              Nov 30 at 0:40




              @Tyler: Oh, right. Maybe the other stuff on that page will help in some way...
              – Drew
              Nov 30 at 0:40










              up vote
              0
              down vote













              I could swear this is a duplicate, but I couldn't find it.



              Try this:



              (defun my-delete-duplicate-matches (regexp)
              "Delete matching lines, except the first instance of each specific match."
              (interactive (list (read-regexp "Regexp: ")))
              (save-restriction
              (when (use-region-p)
              (narrow-to-region (region-beginning) (region-end)))
              (save-excursion
              (goto-char (point-min))
              (let ((matches (make-hash-table :test #'equal)))
              (save-match-data
              (while (re-search-forward regexp nil :noerror)
              (if (not (gethash (match-string 0) matches))
              (puthash (match-string 0) t matches)
              (forward-line 0)
              (delete-region (point) (progn (forward-line 1)
              (point))))))))))


              Caveats:




              • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


              • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


              • Multi-line patterns are not supported.







              share|improve this answer



























                up vote
                0
                down vote













                I could swear this is a duplicate, but I couldn't find it.



                Try this:



                (defun my-delete-duplicate-matches (regexp)
                "Delete matching lines, except the first instance of each specific match."
                (interactive (list (read-regexp "Regexp: ")))
                (save-restriction
                (when (use-region-p)
                (narrow-to-region (region-beginning) (region-end)))
                (save-excursion
                (goto-char (point-min))
                (let ((matches (make-hash-table :test #'equal)))
                (save-match-data
                (while (re-search-forward regexp nil :noerror)
                (if (not (gethash (match-string 0) matches))
                (puthash (match-string 0) t matches)
                (forward-line 0)
                (delete-region (point) (progn (forward-line 1)
                (point))))))))))


                Caveats:




                • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                • Multi-line patterns are not supported.







                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  I could swear this is a duplicate, but I couldn't find it.



                  Try this:



                  (defun my-delete-duplicate-matches (regexp)
                  "Delete matching lines, except the first instance of each specific match."
                  (interactive (list (read-regexp "Regexp: ")))
                  (save-restriction
                  (when (use-region-p)
                  (narrow-to-region (region-beginning) (region-end)))
                  (save-excursion
                  (goto-char (point-min))
                  (let ((matches (make-hash-table :test #'equal)))
                  (save-match-data
                  (while (re-search-forward regexp nil :noerror)
                  (if (not (gethash (match-string 0) matches))
                  (puthash (match-string 0) t matches)
                  (forward-line 0)
                  (delete-region (point) (progn (forward-line 1)
                  (point))))))))))


                  Caveats:




                  • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                  • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                  • Multi-line patterns are not supported.







                  share|improve this answer














                  I could swear this is a duplicate, but I couldn't find it.



                  Try this:



                  (defun my-delete-duplicate-matches (regexp)
                  "Delete matching lines, except the first instance of each specific match."
                  (interactive (list (read-regexp "Regexp: ")))
                  (save-restriction
                  (when (use-region-p)
                  (narrow-to-region (region-beginning) (region-end)))
                  (save-excursion
                  (goto-char (point-min))
                  (let ((matches (make-hash-table :test #'equal)))
                  (save-match-data
                  (while (re-search-forward regexp nil :noerror)
                  (if (not (gethash (match-string 0) matches))
                  (puthash (match-string 0) t matches)
                  (forward-line 0)
                  (delete-region (point) (progn (forward-line 1)
                  (point))))))))))


                  Caveats:




                  • If the same matching text appears twice on the first line in which it is found, the line will be deleted.


                  • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.


                  • Multi-line patterns are not supported.








                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 30 at 0:12

























                  answered Nov 29 at 23:44









                  phils

                  25.5k23564




                  25.5k23564






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Emacs Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f46324%2fdelete-duplicate-lines-with-partial-match%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Quarter-circle Tiles

                      build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

                      Mont Emei