Krdytkyu

Question

So basically what I want to do is compare two file by line by column 2. How could I accomplish this?

File_1.txt:

User1 US

User2 US

User3 US

File_2.txt:

User1 US

User2 US

User3 NG

Output_File:

User3 has changed

Use diff "File_1.txt" "File_2.txt"
– Pandya
Aug 25 '14 at 15:00 — Aug 25 '14 at 15:00
Also visit : askubuntu.com/q/12473
– Pandya
Aug 26 '14 at 1:56 — Aug 26 '14 at 1:56

score 77 · Accepted Answer · 2017-10-24 17:21:53Z

up vote
77
down vote

accepted

Look into the diff command. It's a good tool, and you can read all about it by typing man diff into your terminal.

The command you'll want to do is diff File_1.txt File_2.txt which will output the difference between the two and should look something like this:

enter image description here

A quick note on reading the output from the third command: The 'arrows' (< and >) refer to what the value of the line is in the left file (<) vs the right file (>), with the left file being the one you entered first on the command line, in this case File_1.txt

Additionally you might notice the 4th command is diff ... | tee Output_File this pipes the results from diff into a tee, which then puts that output into a file, so that you can save it for later if you don't want to view it all on the console right that second.

edited Oct 24 '17 at 17:21

answered Aug 25 '14 at 15:03

Mitch

3,0961531

Can this do other files (such as images)? Or is it limited to just documents?
– Gregory Opera
Aug 27 '14 at 13:10

2

As far as I know, it's limited to text files. Code will work, as it's essentially text, but any binary files (which pictures are) will just get junk out. You CAN compare to see if they're identical by doing: diff file1 file2 -s. Here's an example: imgur.com/ShrQx9x
– Mitch
Aug 27 '14 at 13:26

Is there a way to colorize the output? I'd like keeping it CLI-only, but with some more... human touch.
– Lazar Ljubenović
Nov 27 at 17:02

add a comment |

score 33 · Answer 2 · 2014-08-26 12:26:46Z

Or you can use Meld Diff

Meld helps you compare files, directories, and version controlled
projects. It provides two- and three-way comparison of both files and
directories, and has support for many popular version control systems.

Install by running:

sudo apt-get install meld

Your example:

enter image description here

Compare directory:

enter image description here

Example with full of text:

enter image description here

fedorqui 6,04611032 · Answer 3 · 2015-02-17 14:17:50Z

up vote
15
down vote

You can use vimdiff.

Example:

vimdiff  file1  file2

edited Feb 17 '15 at 14:17

fedorqui

6,04611032

answered Aug 28 '14 at 5:14

Mr. S

1512

this one has colors
– Jake Toronto
Jun 19 at 14:58

add a comment |

Meysam 2841518 · Answer 4 · 2014-08-26 06:54:29Z

up vote
8
down vote

Meld is a really great tool. But you can also use diffuse to visually compare two files:

diffuse file1.txt file2.txt

enter image description here

answered Aug 26 '14 at 6:54

Meysam

2841518

add a comment |

Mike Reardon 8111 · Answer 5 · 2015-05-22 19:37:38Z

FWIW, I rather like what I get with side-by-side output from diff

diff -y -W 120 File_1.txt File_2.txt

would give something like:

User1 US                            User1 US

User2 US                            User2 US

User3 US                          | User3 NG

Maythux 50.1k32164214 · Answer 6 · 2015-06-17 10:58:33Z

up vote
8
down vote

You can use the command cmp:

cmp -b "File_1.txt" "File_2.txt"

output would be

a b differ: byte 25, line 3 is 125 U 116 N

answered Jun 17 '15 at 10:58

Maythux

50.1k32164214

add a comment |

score 7 · Answer 7 · 2014-08-26 07:09:45Z

Litteraly sticking to the question (file1, file2, outputfile with "has changed" message) the script below works.

Copy the script into an empty file, save it as compare.py, make it executable, run it by the command:

/path/to/compare.py <file1> <file2> <outputfile>

The script:

#!/usr/bin/env python



import sys

file1 = sys.argv[1]; file2 = sys.argv[2]; outfile = sys.argv[3]



def readfile(file):

    with open(file) as compare:

        return [item.replace("n", "").split(" ") for item in compare.readlines()]



data1 = readfile(file1); data2 = readfile(file2)

mismatch = [item[0] for item in data1 if not item in data2]



with open(outfile, "wt") as out:

    for line in mismatch:

        out.write(line+" has changed"+"n")

With a few extra lines, you can make it either print to an outputfile, or to the terminal, depending on if the outputfile is defined:

To print to a file:

/path/to/compare.py <file1> <file2> <outputfile>

To print to the terminal window:

/path/to/compare.py <file1> <file2>

The script:

#!/usr/bin/env python



import sys



file1 = sys.argv[1]; file2 = sys.argv[2]

try:

    outfile = sys.argv[3]

except IndexError:

    outfile = None



def readfile(file):

    with open(file) as compare:

        return [item.replace("n", "").split(" ") for item in compare.readlines()]



data1 = readfile(file1); data2 = readfile(file2)

mismatch = [item[0] for item in data1 if not item in data2]



if outfile != None:

        with open(outfile, "wt") as out:

            for line in mismatch:

                out.write(line+" has changed"+"n")

else:

    for line in mismatch:

        print line+" has changed"

score 4 · Answer 8 · 2016-12-24 23:32:54Z

An easy way is to use colordiff, which behaves like diff but colorizes its output. This is very helpful for reading diffs. Using your example,

$ colordiff -u File_1.txt File_2.txt

--- File_1.txt  2016-12-24 17:59:17.409490554 -0500

+++ File_2.txt  2016-12-24 18:00:06.666719659 -0500

@@ -1,3 +1,3 @@

 User1 US

 User2 US

-User3 US

+User3 NG

where the u option gives a unified diff. This is how the colorized diff looks like:

enter image description here

Install colordiff by running sudo apt-get install colordiff.

If you want colors I find the diff built into vim to actually be easy to use, as in the answer by Mr.S — Nov 30 at 5:37

score 2 · Answer 9 · 2016-12-24 22:22:59Z

Additional answer

If there's no need to know what parts of the files differ, you can use checksum of the file. There's many ways to do that, using md5sum or sha256sum. Basically , each of them outputs a string to which a file contents hash. If the two files are the same, their hash will be the same as well. This is often used when you download software, such as Ubuntu installation iso images. They're often used for verifying integrity of a downloaded content.

Consider script below, where you can give two files as arguments, and the file will tell you if they are the same or not.

#!/bin/bash



# Check if both files exist  

if ! [ -e "$1"  ];

then

    printf "%s doesn't existn" "$1"

    exit 2

elif ! [ -e "$2" ]

then

    printf "%s doesn't existn" "$2"

    exit 2

fi



# Get checksums of eithe file

file1_sha=$( sha256sum "$1" | awk '{print $1}')

file2_sha=$( sha256sum "$2" | awk '{print $1}')



# Compare the checksums

if [ "x$file1_sha" = "x$file2_sha" ]

then

    printf "Files %s and %s are the samen" "$1" "$2"

    exit 0

else

    printf "Files %s and %s are differentn" "$1" "$2"

    exit 1

fi

Sample run:

$ ./compare_files.sh /etc/passwd ./passwd_copy.txt                                                                

Files /etc/passwd and ./passwd_copy.txt are the same

$ echo $?

0

$ ./compare_files.sh /etc/passwd /etc/default/grub                                                                

Files /etc/passwd and /etc/default/grub are different

$ echo $?

1

Older answer

In addition there is comm command, which compares two sorted files, and gives output in 3 colums : column 1 for items unique to file #1, column 2 for items unique to file #2, and column 3 for items present in both files.

To suppress either column you can use switches -1, -2 , and -3. Using -3 will show the lines that differ.

Bellow you can see the screenshot of the command in action.

enter image description here

There is just one requirement - the files must be sorted for them to be compared properly. sort command can be used for that purpose. Bellow is another screenshot , where files are sorted and then compared. Lines starting on the left bellong to File_1 only , lines starting on column 2 belong to File_2 only

enter image description here

@DavidFoerster it's kinda hard to be editing on mobile :) Done now, though — May 23 '15 at 9:17

Eric Korolev 658 · Answer 10 · 2018-11-09 14:24:40Z

up vote
1
down vote

Install git and use

$ git diff filename1 filename2

And you will get output in nice colored format

Git installation

$ apt-get update

$ apt-get install git-core

answered Nov 9 at 14:24

Eric Korolev

658

add a comment |

Jonathan 1136 · Answer 11 · 2018-11-30 01:33:39Z

colcmp.sh

Compares name/value pairs in 2 files in the format name valuen. Writes the name to Output_file if changed. Requires bash v4+ for associative arrays.

Usage

$ ./colcmp.sh File_1.txt File_2.txt

User3 changed from 'US' to 'NG'

no change: User1,User2

Output_File

$ cat Output_File

User3 has changed

Source (colcmp.sh)

cmp -s "$1" "$2"

case "$?" in

    0)

        echo "" > Output_File

        echo "files are identical"

        ;;

    1)

        echo "" > Output_File

        cp "$1" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh

        chmod 755 ~/.colcmp.array1.tmp.sh

        declare -A A1

        source ~/.colcmp.array1.tmp.sh



        cp "$2" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh

        chmod 755 ~/.colcmp.array2.tmp.sh

        declare -A A2

        source ~/.colcmp.array2.tmp.sh



        USERSWHODIDNOTCHANGE=

        for i in "${!A1[@]}"; do

            if [ "${A2[$i]+x}" = "" ]; then

                echo "$i was removed"

                echo "$i has changed" > Output_File

            fi

        done

        for i in "${!A2[@]}"; do

            if [ "${A1[$i]+x}" = "" ]; then

                echo "$i was added as '${A2[$i]}'"

                echo "$i has changed" > Output_File

            elif [ "${A1[$i]}" != "${A2[$i]}" ]; then

                echo "$i changed from '${A1[$i]}' to '${A2[$i]}'"

                echo "$i has changed" > Output_File

            else

                if [ x$USERSWHODIDNOTCHANGE != x ]; then

                    USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"

                fi

                USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"

            fi

        done

        if [ x$USERSWHODIDNOTCHANGE != x ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

        ;;

    *)

        echo "error: file not found, access denied, etc..."

        echo "usage: ./colcmp.sh File_1.txt File_2.txt"

        ;;

esac

Explanation

Breakdown of the code and what it means, to the best of my understanding. I welcome edits and suggestions.

Basic File Compare

cmp -s "$1" "$2"

case "$?" in

    0)

        # match

        ;;

    1)

        # compare

        ;;

    *)

        # error

        ;;

esac

cmp will set the value of $? as follows:

0 = files match

1 = files differ

2 = error

I chose to use a case..esac statement to evalute $? because the value of $? changes after every command, including test ([).

Alternatively I could have used a variable to hold the value of $?:

cmp -s "$1" "$2"

CMPRESULT=$?

if [ $CMPRESULT -eq 0 ]; then

    # match

elif [ $CMPRESULT -eq 1 ]; then

    # compare

else

    # error

fi

Above does the same thing as the case statement. IDK which I like better.

Clear the Output

        echo "" > Output_File

Above clears the output file so if no users changed, the output file will be empty.

I do this inside the case statements so that the Output_file remains unchanged on error.

Copy User File to Shell Script

        cp "$1" ~/.colcmp.arrays.tmp.sh

Above copies File_1.txt to the current user's home dir.

For example, if the current user is john, the above would be the same as cp "File_1.txt" /home/john/.colcmp.arrays.tmp.sh

Escape Special Characters

Basically, I'm paranoid. I know that these characters could have special meaning or execute an external program when run in a script as part of variable assignment:

` - back-tick - executes a program and the output as if the output were part of your script

$ - dollar sign - usually prefixes a variable

${} - allows for more complex variable substitution

$() - idk what this does but i think it can execute code

What I don't know is how much I don't know about bash. I don't know what other characters might have special meaning, but I want to escape them all with a backslash:

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh

sed can do a lot more than regular expression pattern matching. The script pattern "s/(find)/(replace)/" specifically performs the pattern match.

"s/(find)/(replace)/(modifiers)"

(find) = ([^A-Za-z0-9 ])
- () = capture group 1
- = match a character from a specific list of characters
- [^] = match any character NOT in a specific list of characters
- [^A-Za-z0-9 ] = match any character that is NOT a letter, digit or space

in english: capture any punctuation or special character as caputure group 1 (\1)

(replace) = \\\1
- \\ = literal character (\) i.e. a backslash
- \1 = capture group 1

in english: prefix all special characters with a backslash

(modifiers) = g
- g = globally replace

in english: if more than one match is found on the same line, replace them all

Comment Out the Entire Script

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.arrays.tmp.sh

Above uses a regular expression to prefix every line of ~/.colcmp.arrays.tmp.sh with a bash comment character (#). I do this because later I intend to execute ~/.colcmp.arrays.tmp.sh using the source command and because I don't know for sure the whole format of File_1.txt.

I don't want to accidentally execute arbitrary code. I don't think anyone does.

"s/(find)/(replace)/"

(find) = ^(.*)$
- ^ = beginning of a line
- () = capture group 1
- .* = anything
- $ = end of line

in english: capture each line as caputure group 1 (\1)

(replace) = #\1
- # = literal character (#) i.e. a pound symbol or hash
- \1 = capture group 1

in english: replace each line with a pound symbol followed by the line that was replaced

Convert User Value to A1[User]="value"

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.arrays.tmp.sh

Above is the core of this script.

convert this: #User1 US
- to this: A1[User1]="US"
- or this: A2[User1]="US" (for the 2nd file)

"s/(find)/(replace)/"

(find) = ^#\s*(\S+)\s+(\S.?)\s$
- ^ = beginning of a line
- # = literal character (#) i.e. a pound symbol or hash
- \s* - zero or more whitespace characters
- () = capture group 1
- \S+ - one or more NON-whitespace characters
- \s+ - one or more whitespace characters
- () = capture group 2
- \S - exactly one NON-whitespace character
- .*? = anything, non-greedy
- \s* - zero or more whitespace characters
- $ = end of line

in english:

require but ignore leading comment characters (#)

ignore leading whitespace

capture the first word as caputure group 1 (\1)

require a space (or tab, or whitespace)
- that will be replaced with an equals sign because
- it's not part of any capture group, and because
- the (replace) pattern puts an equals sign between capture group 1 and capture group 2

capture the rest of the line as capture group 2

(replace) = A1\[\1\]="\2"
- A1\[ - literal characters A1[ to start array assignment in an array called A1
- \1 = capture group 1 - which does not include the leading hash (#) and does not include leading whitespace - in this case capture group 1 is being used to set the name of the name/value pair in the bash associative array.
- \]=" = literal characters ]="
  - ] = close array assignment e.g. A1[User1]="US"
  - = = assignment operator e.g. variable=value
  - " = quote value to capture spaces ... although now that i think about it, it would have been easier to let the code above that backslashes everything to also backslash space characters.
- \1 = capture group 2 - in this case, the value of the name/value pair
- " = closing quote value to capture spaces

in english: replace each line in the format #name value with an array assignment operator in the format A1[name]="value"

Make Executable

        chmod 755 ~/.colcmp.arrays.tmp.sh

Above uses chmod to make the array script file executable.

I'm not sure if this is necessary.

Declare Associative Array (bash v4+)

        declare -A A1

The capital -A indicates that the variables declared will be associative arrays.

This is why the script requires bash v4 or greater.

Execute our Array Variable Assignment Script

        source ~/.colcmp.arrays.tmp.sh

We have already:

converted our file from lines of User value to lines of A1[User]="value",

made it executable (maybe), and

declared A1 as an associative array...

Above we source the script to run it in the current shell. We do this so we can keep the variable values that get set by the script. If you execute the script directly, it spawns a new shell, and the variable values are lost when the new shell exits, or at least that's my understanding.

This Should Be a Function

        cp "$2" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh

        chmod 755 ~/.colcmp.array2.tmp.sh

        declare -A A2

        source ~/.colcmp.array2.tmp.sh

We do the same thing for $1 and A1 that we do for $2 and A2. It really should be a function. I think at this point this script is confusing enough and it works, so I'm not gonna fix it.

Detect Users Removed

        for i in "${!A1[@]}"; do

            # check for users removed

        done

Above loops through associative array keys

            if [ "${A2[$i]+x}" = "" ]; then

Above uses variable substitution to detect the difference between a value that is unset vs a variable that has been explicitly set to a zero length string.

Apparently, there are a lot of ways to see if a variable has been set. I chose the one with the most votes.

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

Detect Users Added or Changed

        USERSWHODIDNOTCHANGE=

Above clears a variable so we can keep track of users that did not change.

        for i in "${!A2[@]}"; do

            # detect users added, changed and not changed

        done

Above loops through associative array keys

            if ! [ "${A1[$i]+x}" != "" ]; then

Above uses variable substitution to see if a variable has been set.

                echo "$i was added as '${A2[$i]}'"

Because $i is the array key (user name) $A2[$i] should return the value associated with the current user from File_2.txt.

For example, if $i is User1, the above reads as ${A2[User1]}

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

            elif [ "${A1[$i]}" != "${A2[$i]}" ]; then

Because $i is the array key (user name) $A1[$i] should return the value associated with the current user from File_1.txt, and $A2[$i] should return the value from File_2.txt.

Above compares the associated values for user $i from both files..

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

                if [ x$USERSWHODIDNOTCHANGE != x ]; then

                    USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"

                fi

                USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"

Above creates a comma separated list of users who did not change. Note there are no spaces in the list, or else the next check would need to be quoted.

        if [ x$USERSWHODIDNOTCHANGE != x ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

Above reports the value of $USERSWHODIDNOTCHANGE but only if there is a value in $USERSWHODIDNOTCHANGE. The way this is written, $USERSWHODIDNOTCHANGE cannot contain any spaces. If it does need spaces, above could be rewritten as follows:

        if [ "$USERSWHODIDNOTCHANGE" != "" ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

score 77 · Accepted Answer · 2017-10-24 17:21:53Z

up vote
77
down vote

accepted

Look into the diff command. It's a good tool, and you can read all about it by typing man diff into your terminal.

The command you'll want to do is diff File_1.txt File_2.txt which will output the difference between the two and should look something like this:

enter image description here

A quick note on reading the output from the third command: The 'arrows' (< and >) refer to what the value of the line is in the left file (<) vs the right file (>), with the left file being the one you entered first on the command line, in this case File_1.txt

Additionally you might notice the 4th command is diff ... | tee Output_File this pipes the results from diff into a tee, which then puts that output into a file, so that you can save it for later if you don't want to view it all on the console right that second.

edited Oct 24 '17 at 17:21

answered Aug 25 '14 at 15:03

Mitch

3,0961531

Can this do other files (such as images)? Or is it limited to just documents?
– Gregory Opera
Aug 27 '14 at 13:10

2

As far as I know, it's limited to text files. Code will work, as it's essentially text, but any binary files (which pictures are) will just get junk out. You CAN compare to see if they're identical by doing: diff file1 file2 -s. Here's an example: imgur.com/ShrQx9x
– Mitch
Aug 27 '14 at 13:26

Is there a way to colorize the output? I'd like keeping it CLI-only, but with some more... human touch.
– Lazar Ljubenović
Nov 27 at 17:02

add a comment |

score 33 · Answer 13 · 2014-08-26 12:26:46Z

Or you can use Meld Diff

Meld helps you compare files, directories, and version controlled
projects. It provides two- and three-way comparison of both files and
directories, and has support for many popular version control systems.

Install by running:

sudo apt-get install meld

Your example:

enter image description here

Compare directory:

enter image description here

Example with full of text:

enter image description here

fedorqui 6,04611032 · Answer 14 · 2015-02-17 14:17:50Z

up vote
15
down vote

You can use vimdiff.

Example:

vimdiff  file1  file2

edited Feb 17 '15 at 14:17

fedorqui

6,04611032

answered Aug 28 '14 at 5:14

Mr. S

1512

this one has colors
– Jake Toronto
Jun 19 at 14:58

add a comment |

Meysam 2841518 · Answer 15 · 2014-08-26 06:54:29Z

up vote
8
down vote

Meld is a really great tool. But you can also use diffuse to visually compare two files:

diffuse file1.txt file2.txt

enter image description here

answered Aug 26 '14 at 6:54

Meysam

2841518

add a comment |

Mike Reardon 8111 · Answer 16 · 2015-05-22 19:37:38Z

FWIW, I rather like what I get with side-by-side output from diff

diff -y -W 120 File_1.txt File_2.txt

would give something like:

User1 US                            User1 US

User2 US                            User2 US

User3 US                          | User3 NG

Maythux 50.1k32164214 · Answer 17 · 2015-06-17 10:58:33Z

up vote
8
down vote

You can use the command cmp:

cmp -b "File_1.txt" "File_2.txt"

output would be

a b differ: byte 25, line 3 is 125 U 116 N

answered Jun 17 '15 at 10:58

Maythux

50.1k32164214

add a comment |

score 7 · Answer 18 · 2014-08-26 07:09:45Z

Litteraly sticking to the question (file1, file2, outputfile with "has changed" message) the script below works.

Copy the script into an empty file, save it as compare.py, make it executable, run it by the command:

/path/to/compare.py <file1> <file2> <outputfile>

The script:

#!/usr/bin/env python



import sys

file1 = sys.argv[1]; file2 = sys.argv[2]; outfile = sys.argv[3]



def readfile(file):

    with open(file) as compare:

        return [item.replace("n", "").split(" ") for item in compare.readlines()]



data1 = readfile(file1); data2 = readfile(file2)

mismatch = [item[0] for item in data1 if not item in data2]



with open(outfile, "wt") as out:

    for line in mismatch:

        out.write(line+" has changed"+"n")

With a few extra lines, you can make it either print to an outputfile, or to the terminal, depending on if the outputfile is defined:

To print to a file:

/path/to/compare.py <file1> <file2> <outputfile>

To print to the terminal window:

/path/to/compare.py <file1> <file2>

The script:

#!/usr/bin/env python



import sys



file1 = sys.argv[1]; file2 = sys.argv[2]

try:

    outfile = sys.argv[3]

except IndexError:

    outfile = None



def readfile(file):

    with open(file) as compare:

        return [item.replace("n", "").split(" ") for item in compare.readlines()]



data1 = readfile(file1); data2 = readfile(file2)

mismatch = [item[0] for item in data1 if not item in data2]



if outfile != None:

        with open(outfile, "wt") as out:

            for line in mismatch:

                out.write(line+" has changed"+"n")

else:

    for line in mismatch:

        print line+" has changed"

score 4 · Answer 19 · 2016-12-24 23:32:54Z

An easy way is to use colordiff, which behaves like diff but colorizes its output. This is very helpful for reading diffs. Using your example,

$ colordiff -u File_1.txt File_2.txt

--- File_1.txt  2016-12-24 17:59:17.409490554 -0500

+++ File_2.txt  2016-12-24 18:00:06.666719659 -0500

@@ -1,3 +1,3 @@

 User1 US

 User2 US

-User3 US

+User3 NG

where the u option gives a unified diff. This is how the colorized diff looks like:

enter image description here

Install colordiff by running sudo apt-get install colordiff.

If you want colors I find the diff built into vim to actually be easy to use, as in the answer by Mr.S — Nov 30 at 5:37

score 2 · Answer 20 · 2016-12-24 22:22:59Z

Additional answer

If there's no need to know what parts of the files differ, you can use checksum of the file. There's many ways to do that, using md5sum or sha256sum. Basically , each of them outputs a string to which a file contents hash. If the two files are the same, their hash will be the same as well. This is often used when you download software, such as Ubuntu installation iso images. They're often used for verifying integrity of a downloaded content.

Consider script below, where you can give two files as arguments, and the file will tell you if they are the same or not.

#!/bin/bash



# Check if both files exist  

if ! [ -e "$1"  ];

then

    printf "%s doesn't existn" "$1"

    exit 2

elif ! [ -e "$2" ]

then

    printf "%s doesn't existn" "$2"

    exit 2

fi



# Get checksums of eithe file

file1_sha=$( sha256sum "$1" | awk '{print $1}')

file2_sha=$( sha256sum "$2" | awk '{print $1}')



# Compare the checksums

if [ "x$file1_sha" = "x$file2_sha" ]

then

    printf "Files %s and %s are the samen" "$1" "$2"

    exit 0

else

    printf "Files %s and %s are differentn" "$1" "$2"

    exit 1

fi

Sample run:

$ ./compare_files.sh /etc/passwd ./passwd_copy.txt                                                                

Files /etc/passwd and ./passwd_copy.txt are the same

$ echo $?

0

$ ./compare_files.sh /etc/passwd /etc/default/grub                                                                

Files /etc/passwd and /etc/default/grub are different

$ echo $?

1

Older answer

In addition there is comm command, which compares two sorted files, and gives output in 3 colums : column 1 for items unique to file #1, column 2 for items unique to file #2, and column 3 for items present in both files.

To suppress either column you can use switches -1, -2 , and -3. Using -3 will show the lines that differ.

Bellow you can see the screenshot of the command in action.

enter image description here

There is just one requirement - the files must be sorted for them to be compared properly. sort command can be used for that purpose. Bellow is another screenshot , where files are sorted and then compared. Lines starting on the left bellong to File_1 only , lines starting on column 2 belong to File_2 only

enter image description here

@DavidFoerster it's kinda hard to be editing on mobile :) Done now, though — May 23 '15 at 9:17

Eric Korolev 658 · Answer 21 · 2018-11-09 14:24:40Z

up vote
1
down vote

Install git and use

$ git diff filename1 filename2

And you will get output in nice colored format

Git installation

$ apt-get update

$ apt-get install git-core

answered Nov 9 at 14:24

Eric Korolev

658

add a comment |

Jonathan 1136 · Answer 22 · 2018-11-30 01:33:39Z

colcmp.sh

Compares name/value pairs in 2 files in the format name valuen. Writes the name to Output_file if changed. Requires bash v4+ for associative arrays.

Usage

$ ./colcmp.sh File_1.txt File_2.txt

User3 changed from 'US' to 'NG'

no change: User1,User2

Output_File

$ cat Output_File

User3 has changed

Source (colcmp.sh)

cmp -s "$1" "$2"

case "$?" in

    0)

        echo "" > Output_File

        echo "files are identical"

        ;;

    1)

        echo "" > Output_File

        cp "$1" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array1.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.array1.tmp.sh

        chmod 755 ~/.colcmp.array1.tmp.sh

        declare -A A1

        source ~/.colcmp.array1.tmp.sh



        cp "$2" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh

        chmod 755 ~/.colcmp.array2.tmp.sh

        declare -A A2

        source ~/.colcmp.array2.tmp.sh



        USERSWHODIDNOTCHANGE=

        for i in "${!A1[@]}"; do

            if [ "${A2[$i]+x}" = "" ]; then

                echo "$i was removed"

                echo "$i has changed" > Output_File

            fi

        done

        for i in "${!A2[@]}"; do

            if [ "${A1[$i]+x}" = "" ]; then

                echo "$i was added as '${A2[$i]}'"

                echo "$i has changed" > Output_File

            elif [ "${A1[$i]}" != "${A2[$i]}" ]; then

                echo "$i changed from '${A1[$i]}' to '${A2[$i]}'"

                echo "$i has changed" > Output_File

            else

                if [ x$USERSWHODIDNOTCHANGE != x ]; then

                    USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"

                fi

                USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"

            fi

        done

        if [ x$USERSWHODIDNOTCHANGE != x ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

        ;;

    *)

        echo "error: file not found, access denied, etc..."

        echo "usage: ./colcmp.sh File_1.txt File_2.txt"

        ;;

esac

Explanation

Breakdown of the code and what it means, to the best of my understanding. I welcome edits and suggestions.

Basic File Compare

cmp -s "$1" "$2"

case "$?" in

    0)

        # match

        ;;

    1)

        # compare

        ;;

    *)

        # error

        ;;

esac

cmp will set the value of $? as follows:

0 = files match

1 = files differ

2 = error

I chose to use a case..esac statement to evalute $? because the value of $? changes after every command, including test ([).

Alternatively I could have used a variable to hold the value of $?:

cmp -s "$1" "$2"

CMPRESULT=$?

if [ $CMPRESULT -eq 0 ]; then

    # match

elif [ $CMPRESULT -eq 1 ]; then

    # compare

else

    # error

fi

Above does the same thing as the case statement. IDK which I like better.

Clear the Output

        echo "" > Output_File

Above clears the output file so if no users changed, the output file will be empty.

I do this inside the case statements so that the Output_file remains unchanged on error.

Copy User File to Shell Script

        cp "$1" ~/.colcmp.arrays.tmp.sh

Above copies File_1.txt to the current user's home dir.

For example, if the current user is john, the above would be the same as cp "File_1.txt" /home/john/.colcmp.arrays.tmp.sh

Escape Special Characters

Basically, I'm paranoid. I know that these characters could have special meaning or execute an external program when run in a script as part of variable assignment:

` - back-tick - executes a program and the output as if the output were part of your script

$ - dollar sign - usually prefixes a variable

${} - allows for more complex variable substitution

$() - idk what this does but i think it can execute code

What I don't know is how much I don't know about bash. I don't know what other characters might have special meaning, but I want to escape them all with a backslash:

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array1.tmp.sh

sed can do a lot more than regular expression pattern matching. The script pattern "s/(find)/(replace)/" specifically performs the pattern match.

"s/(find)/(replace)/(modifiers)"

(find) = ([^A-Za-z0-9 ])
- () = capture group 1
- = match a character from a specific list of characters
- [^] = match any character NOT in a specific list of characters
- [^A-Za-z0-9 ] = match any character that is NOT a letter, digit or space

in english: capture any punctuation or special character as caputure group 1 (\1)

(replace) = \\\1
- \\ = literal character (\) i.e. a backslash
- \1 = capture group 1

in english: prefix all special characters with a backslash

(modifiers) = g
- g = globally replace

in english: if more than one match is found on the same line, replace them all

Comment Out the Entire Script

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.arrays.tmp.sh

Above uses a regular expression to prefix every line of ~/.colcmp.arrays.tmp.sh with a bash comment character (#). I do this because later I intend to execute ~/.colcmp.arrays.tmp.sh using the source command and because I don't know for sure the whole format of File_1.txt.

I don't want to accidentally execute arbitrary code. I don't think anyone does.

"s/(find)/(replace)/"

(find) = ^(.*)$
- ^ = beginning of a line
- () = capture group 1
- .* = anything
- $ = end of line

in english: capture each line as caputure group 1 (\1)

(replace) = #\1
- # = literal character (#) i.e. a pound symbol or hash
- \1 = capture group 1

in english: replace each line with a pound symbol followed by the line that was replaced

Convert User Value to A1[User]="value"

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A1\[\1\]="\2"/" ~/.colcmp.arrays.tmp.sh

Above is the core of this script.

convert this: #User1 US
- to this: A1[User1]="US"
- or this: A2[User1]="US" (for the 2nd file)

"s/(find)/(replace)/"

(find) = ^#\s*(\S+)\s+(\S.?)\s$
- ^ = beginning of a line
- # = literal character (#) i.e. a pound symbol or hash
- \s* - zero or more whitespace characters
- () = capture group 1
- \S+ - one or more NON-whitespace characters
- \s+ - one or more whitespace characters
- () = capture group 2
- \S - exactly one NON-whitespace character
- .*? = anything, non-greedy
- \s* - zero or more whitespace characters
- $ = end of line

in english:

require but ignore leading comment characters (#)

ignore leading whitespace

capture the first word as caputure group 1 (\1)

require a space (or tab, or whitespace)
- that will be replaced with an equals sign because
- it's not part of any capture group, and because
- the (replace) pattern puts an equals sign between capture group 1 and capture group 2

capture the rest of the line as capture group 2

(replace) = A1\[\1\]="\2"
- A1\[ - literal characters A1[ to start array assignment in an array called A1
- \1 = capture group 1 - which does not include the leading hash (#) and does not include leading whitespace - in this case capture group 1 is being used to set the name of the name/value pair in the bash associative array.
- \]=" = literal characters ]="
  - ] = close array assignment e.g. A1[User1]="US"
  - = = assignment operator e.g. variable=value
  - " = quote value to capture spaces ... although now that i think about it, it would have been easier to let the code above that backslashes everything to also backslash space characters.
- \1 = capture group 2 - in this case, the value of the name/value pair
- " = closing quote value to capture spaces

in english: replace each line in the format #name value with an array assignment operator in the format A1[name]="value"

Make Executable

        chmod 755 ~/.colcmp.arrays.tmp.sh

Above uses chmod to make the array script file executable.

I'm not sure if this is necessary.

Declare Associative Array (bash v4+)

        declare -A A1

The capital -A indicates that the variables declared will be associative arrays.

This is why the script requires bash v4 or greater.

Execute our Array Variable Assignment Script

        source ~/.colcmp.arrays.tmp.sh

We have already:

converted our file from lines of User value to lines of A1[User]="value",

made it executable (maybe), and

declared A1 as an associative array...

Above we source the script to run it in the current shell. We do this so we can keep the variable values that get set by the script. If you execute the script directly, it spawns a new shell, and the variable values are lost when the new shell exits, or at least that's my understanding.

This Should Be a Function

        cp "$2" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/([^A-Za-z0-9 ])/\\\1/g" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^(.*)$/#\1/" ~/.colcmp.array2.tmp.sh

        sed -i -E "s/^#\s*(\S+)\s+(\S.*?)\s*$/A2\[\1\]="\2"/" ~/.colcmp.array2.tmp.sh

        chmod 755 ~/.colcmp.array2.tmp.sh

        declare -A A2

        source ~/.colcmp.array2.tmp.sh

We do the same thing for $1 and A1 that we do for $2 and A2. It really should be a function. I think at this point this script is confusing enough and it works, so I'm not gonna fix it.

Detect Users Removed

        for i in "${!A1[@]}"; do

            # check for users removed

        done

Above loops through associative array keys

            if [ "${A2[$i]+x}" = "" ]; then

Above uses variable substitution to detect the difference between a value that is unset vs a variable that has been explicitly set to a zero length string.

Apparently, there are a lot of ways to see if a variable has been set. I chose the one with the most votes.

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

Detect Users Added or Changed

        USERSWHODIDNOTCHANGE=

Above clears a variable so we can keep track of users that did not change.

        for i in "${!A2[@]}"; do

            # detect users added, changed and not changed

        done

Above loops through associative array keys

            if ! [ "${A1[$i]+x}" != "" ]; then

Above uses variable substitution to see if a variable has been set.

                echo "$i was added as '${A2[$i]}'"

Because $i is the array key (user name) $A2[$i] should return the value associated with the current user from File_2.txt.

For example, if $i is User1, the above reads as ${A2[User1]}

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

            elif [ "${A1[$i]}" != "${A2[$i]}" ]; then

Because $i is the array key (user name) $A1[$i] should return the value associated with the current user from File_1.txt, and $A2[$i] should return the value from File_2.txt.

Above compares the associated values for user $i from both files..

                echo "$i has changed" > Output_File

Above adds the user $i to the Output_File

                if [ x$USERSWHODIDNOTCHANGE != x ]; then

                    USERSWHODIDNOTCHANGE=",$USERSWHODIDNOTCHANGE"

                fi

                USERSWHODIDNOTCHANGE="$i$USERSWHODIDNOTCHANGE"

Above creates a comma separated list of users who did not change. Note there are no spaces in the list, or else the next check would need to be quoted.

        if [ x$USERSWHODIDNOTCHANGE != x ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

Above reports the value of $USERSWHODIDNOTCHANGE but only if there is a value in $USERSWHODIDNOTCHANGE. The way this is written, $USERSWHODIDNOTCHANGE cannot contain any spaces. If it does need spaces, above could be rewritten as follows:

        if [ "$USERSWHODIDNOTCHANGE" != "" ]; then

            echo "no change: $USERSWHODIDNOTCHANGE"

        fi

How to compare two files

11 Answers 11

Additional answer

Older answer

colcmp.sh

Usage

Output_File

Source (colcmp.sh)

Explanation

Basic File Compare

Clear the Output

Copy User File to Shell Script

Escape Special Characters

Comment Out the Entire Script

Convert User Value to A1[User]="value"

Make Executable

Declare Associative Array (bash v4+)

Execute our Array Variable Assignment Script

This Should Be a Function

Detect Users Removed

Detect Users Added or Changed

Your Answer

Sign up or log in

Post as a guest

Post as a guest

11 Answers 11

11 Answers 11

Additional answer

Older answer

Additional answer

Older answer

Additional answer

Older answer

Additional answer

Older answer

colcmp.sh

Usage

Output_File

Source (colcmp.sh)

Explanation

Basic File Compare

Clear the Output

Copy User File to Shell Script

Escape Special Characters

Comment Out the Entire Script

Convert User Value to A1[User]="value"

Make Executable

Declare Associative Array (bash v4+)

Execute our Array Variable Assignment Script

This Should Be a Function

Detect Users Removed

Detect Users Added or Changed

colcmp.sh

Usage

Output_File

Source (colcmp.sh)

Explanation

Basic File Compare

Clear the Output

Copy User File to Shell Script

Escape Special Characters

Comment Out the Entire Script

Convert User Value to A1[User]="value"

Make Executable

Declare Associative Array (bash v4+)

Execute our Array Variable Assignment Script

This Should Be a Function

Detect Users Removed

Detect Users Added or Changed

colcmp.sh

Usage

Output_File

Source (colcmp.sh)

Explanation

Basic File Compare

Clear the Output

Copy User File to Shell Script

Escape Special Characters

Comment Out the Entire Script

Convert User Value to A1[User]="value"

11 Answers
11

11 Answers
11

11 Answers
11