Count string occurrences in pandas raw data row

up vote
7
down vote

favorite

I have a csv file as follows:

name,age

something

tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)



     0           1

1    name        age

2    something   NaN

3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode

df['_count_separators'] = len(df.raw_value.count(','))



     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago

add a comment |

up vote
7
down vote

favorite

I have a csv file as follows:

name,age

something

tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)



     0           1

1    name        age

2    something   NaN

3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode

df['_count_separators'] = len(df.raw_value.count(','))



     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago

add a comment |

up vote
7
down vote

favorite

I have a csv file as follows:

name,age

something

tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)



     0           1

1    name        age

2    something   NaN

3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode

df['_count_separators'] = len(df.raw_value.count(','))



     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

I have a csv file as follows:

name,age

something

tom,20

And when I put it into a dataframe it looks like:

df = pd.read_csv('file', header=None)



     0           1

1    name        age

2    something   NaN

3    tom         20

How would I get the count of a comma in the raw row data. For example, the answer should look like:

# in pseudocode

df['_count_separators'] = len(df.raw_value.count(','))



     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

python python-3.x pandas csv dataframe

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

edited 1 hour ago

coldspeed

116k18107185

edited 1 hour ago

coldspeed

116k18107185

edited 1 hour ago

coldspeed

116k18107185

asked 1 hour ago

Henry H

1767

asked 1 hour ago

Henry H

1767

asked 1 hour ago

Henry H

1767

do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago

add a comment |

do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago

do you also want to count the commas if they're in the column value?
– Omkar Sabade
1 hour ago

@OmkarSabade preferably just to get the number of separators that pandas inferred -- but either way is acceptable.
– David L
1 hour ago

add a comment |

4 Answers
4

active

oldest

votes

up vote
3
down vote

Doing this

df = pd.read_csv('file', header=None)

df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 



df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 

0    1

1    0

2    1

3    5

Name: 0, dtype: int64



df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age

something

tom,20

something,,,,,somethingelse

answered 1 hour ago

W-B

99.1k73162

add a comment |

up vote
3
down vote

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)

s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

df = pd.concat([

       s.str.split(',', expand=True), 

       s.str.count(',').rename('_count_sep')

    ], axis=1)



df

           0     1  _count_sep

0       name   age           1

1  something  None           0

2        tom    20           1

answered 1 hour ago

coldspeed

116k18107185

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

1

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

1

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

add a comment |

up vote
0
down vote

Try below code

df = pd.read_csv('file', header=None)

df['_count_separators'] = df.count(axis='columns')

print(df)

output: 

     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

answered 1 hour ago

Anjaneyulu Batta

3,23511333

add a comment |

up vote
0
down vote

One line of code: len(df) - df[1].isna().sum()

answered 1 hour ago

Quang Hoang

1,6421913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53862765%2fcount-string-occurrences-in-pandas-raw-data-row%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
3
down vote

Doing this

df = pd.read_csv('file', header=None)

df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 



df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 

0    1

1    0

2    1

3    5

Name: 0, dtype: int64



df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age

something

tom,20

something,,,,,somethingelse

answered 1 hour ago

W-B

99.1k73162

add a comment |

up vote
3
down vote

Doing this

df = pd.read_csv('file', header=None)

df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 



df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 

0    1

1    0

2    1

3    5

Name: 0, dtype: int64



df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age

something

tom,20

something,,,,,somethingelse

answered 1 hour ago

W-B

99.1k73162

add a comment |

up vote
3
down vote

Doing this

df = pd.read_csv('file', header=None)

df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 



df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 

0    1

1    0

2    1

3    5

Name: 0, dtype: int64



df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age

something

tom,20

something,,,,,somethingelse

answered 1 hour ago

W-B

99.1k73162

Doing this

df = pd.read_csv('file', header=None)

df2 = pd.read_csv('file', header=None,sep='|') # using another sep for read your csv again 



df2['0'].str.findall(',').str.len() # then one row into one cell , using str find 

0    1

1    0

2    1

3    5

Name: 0, dtype: int64



df['_count_separators']=df2['0'].str.findall(',').str.len()

Data

name,age

something

tom,20

something,,,,,somethingelse

answered 1 hour ago

W-B

99.1k73162

answered 1 hour ago

W-B

99.1k73162

answered 1 hour ago

W-B

99.1k73162

answered 1 hour ago

W-B

99.1k73162

add a comment |

up vote
3
down vote

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)

s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

df = pd.concat([

       s.str.split(',', expand=True), 

       s.str.count(',').rename('_count_sep')

    ], axis=1)



df

           0     1  _count_sep

0       name   age           1

1  something  None           0

2        tom    20           1

answered 1 hour ago

coldspeed

116k18107185

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

1

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

1

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

add a comment |

up vote
3
down vote

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)

s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

df = pd.concat([

       s.str.split(',', expand=True), 

       s.str.count(',').rename('_count_sep')

    ], axis=1)



df

           0     1  _count_sep

0       name   age           1

1  something  None           0

2        tom    20           1

answered 1 hour ago

coldspeed

116k18107185

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

1

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

1

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

add a comment |

up vote
3
down vote

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)

s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

df = pd.concat([

       s.str.split(',', expand=True), 

       s.str.count(',').rename('_count_sep')

    ], axis=1)



df

           0     1  _count_sep

0       name   age           1

1  something  None           0

2        tom    20           1

answered 1 hour ago

coldspeed

116k18107185

Very simply, read your data as a single column series, then split on comma and concatenate with separator count.

# s = pd.read_csv(pd.compat.StringIO(text), sep=r'|', squeeze=True, header=None)

s = pd.read_csv('/path/to/file.csv', sep=r'|', squeeze=True, header=None)

df = pd.concat([

       s.str.split(',', expand=True), 

       s.str.count(',').rename('_count_sep')

    ], axis=1)



df

           0     1  _count_sep

0       name   age           1

1  something  None           0

2        tom    20           1

answered 1 hour ago

coldspeed

116k18107185

answered 1 hour ago

coldspeed

116k18107185

answered 1 hour ago

coldspeed

116k18107185

answered 1 hour ago

coldspeed

116k18107185

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

1

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

1

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

add a comment |

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

1

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

1

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

We are on the same road:-) cheers
– W-B
1 hour ago

@W-B yup did not see until I posted... great minds.. huh? ;)
– coldspeed
1 hour ago

I read your mind hahahaha:-)
– W-B
1 hour ago

But learn new strcount:-) thanks man
– W-B
1 hour ago

Your answers stopped me from thinking otherwise
– Dark
1 hour ago

add a comment |

up vote
0
down vote

Try below code

df = pd.read_csv('file', header=None)

df['_count_separators'] = df.count(axis='columns')

print(df)

output: 

     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

answered 1 hour ago

Anjaneyulu Batta

3,23511333

add a comment |

up vote
0
down vote

Try below code

df = pd.read_csv('file', header=None)

df['_count_separators'] = df.count(axis='columns')

print(df)

output: 

     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

answered 1 hour ago

Anjaneyulu Batta

3,23511333

add a comment |

up vote
0
down vote

Try below code

df = pd.read_csv('file', header=None)

df['_count_separators'] = df.count(axis='columns')

print(df)

output: 

     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

answered 1 hour ago

Anjaneyulu Batta

3,23511333

Try below code

df = pd.read_csv('file', header=None)

df['_count_separators'] = df.count(axis='columns')

print(df)

output: 

     0           1      _count_separators

1    name        age   1

2    something   NaN   0

3    tom         20    1

answered 1 hour ago

Anjaneyulu Batta

3,23511333

answered 1 hour ago

Anjaneyulu Batta

3,23511333

answered 1 hour ago

Anjaneyulu Batta

3,23511333

answered 1 hour ago

Anjaneyulu Batta

3,23511333

add a comment |

up vote
0
down vote

One line of code: len(df) - df[1].isna().sum()

answered 1 hour ago

Quang Hoang

1,6421913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

add a comment |

up vote
0
down vote

One line of code: len(df) - df[1].isna().sum()

answered 1 hour ago

Quang Hoang

1,6421913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

add a comment |

up vote
0
down vote

One line of code: len(df) - df[1].isna().sum()

answered 1 hour ago

Quang Hoang

1,6421913

One line of code: len(df) - df[1].isna().sum()

answered 1 hour ago

Quang Hoang

1,6421913

answered 1 hour ago

Quang Hoang

1,6421913

answered 1 hour ago

Quang Hoang

1,6421913

answered 1 hour ago

Quang Hoang

1,6421913

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

add a comment |

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

Ohk if the nan itself is a part of the dataset then? like something,,,something?
– Dark
1 hour ago

i'm not sure in which instance would df = pd.read_csv('file.csv', header=None) give a nan in his sample.
– Quang Hoang
1 hour ago

This assumes there are only two columns...?
– coldspeed
1 hour ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu