Groupby Count on user defined time periods pandas

up vote
1
down vote

favorite

I have a data frame like:

import datetime as dt

import pandas as pd



s = pd.Series(

    range(8),

    pd.to_datetime(

        [

            '20130101 10:34',

            '20130101 10:34:08', 

            '20130101 10:34:08', 

            '20130101 10:34:15', 

            '20130101 10:34:28', 

            '20130101 10:34:54',

            '20130101 10:34:55',

            '20130101 10:35:12'

        ]

    )

)

df = s.to_frame()

df = df.reset_index()

df = df.rename(columns=

               {

                   0         : 'value',

                   'index'   : 'start'

               }

              )

df['ID'] = [1,2,1,2,1,2,1,2]



sec = dt.timedelta(seconds=30)



df['end'] = df['start'].map(lambda t: t + sec)

 df





    start               val ID  end

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42

I have to sum the values of each ID for all rows between the start and end time stamps.
To be accurate my result should have this meaning:

p_ = 

#CICLE IS a problem

for row in range(len(df)):

    p_.append(

        #USING LOC IS A PROBLEM

        df.loc[

            (df['start'] >= df['start'][row]) & 

            (df['start'] <= df['end'][row])   &

            (df['ID']    == df['ID'][row])

            ]

        ['value']

        .sum()

    )

df

    start               val ID  end                 sum_of_values for_ID_in_time_period

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30 6

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38 4

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38 6

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45 3

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58 10

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24 12

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25 6

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42 7

Instead of the for cycle and the loc I would like to ask for help to transform this problem to some kind of groupby, map solution because my real data set is hardly fits into memory and I have to come up with something faster.
I have tried to use:

df.groupby(

    [

        df.start.map(lambda t: t.minute),

        'ID'

    ]

)[['value']]

.sum()

but this transforms my result something what is not depending on the end column.

          value

start ID    

34    1   12

      2   9

35    2   7

asked yesterday

Datas A

283

add a comment |

up vote
1
down vote

favorite

I have a data frame like:

import datetime as dt

import pandas as pd



s = pd.Series(

    range(8),

    pd.to_datetime(

        [

            '20130101 10:34',

            '20130101 10:34:08', 

            '20130101 10:34:08', 

            '20130101 10:34:15', 

            '20130101 10:34:28', 

            '20130101 10:34:54',

            '20130101 10:34:55',

            '20130101 10:35:12'

        ]

    )

)

df = s.to_frame()

df = df.reset_index()

df = df.rename(columns=

               {

                   0         : 'value',

                   'index'   : 'start'

               }

              )

df['ID'] = [1,2,1,2,1,2,1,2]



sec = dt.timedelta(seconds=30)



df['end'] = df['start'].map(lambda t: t + sec)

 df





    start               val ID  end

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42

I have to sum the values of each ID for all rows between the start and end time stamps.
To be accurate my result should have this meaning:

p_ = 

#CICLE IS a problem

for row in range(len(df)):

    p_.append(

        #USING LOC IS A PROBLEM

        df.loc[

            (df['start'] >= df['start'][row]) & 

            (df['start'] <= df['end'][row])   &

            (df['ID']    == df['ID'][row])

            ]

        ['value']

        .sum()

    )

df

    start               val ID  end                 sum_of_values for_ID_in_time_period

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30 6

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38 4

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38 6

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45 3

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58 10

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24 12

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25 6

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42 7

df.groupby(

    [

        df.start.map(lambda t: t.minute),

        'ID'

    ]

)[['value']]

.sum()

but this transforms my result something what is not depending on the end column.

          value

start ID    

34    1   12

      2   9

35    2   7

asked yesterday

Datas A

283

add a comment |

up vote
1
down vote

favorite

I have a data frame like:

import datetime as dt

import pandas as pd



s = pd.Series(

    range(8),

    pd.to_datetime(

        [

            '20130101 10:34',

            '20130101 10:34:08', 

            '20130101 10:34:08', 

            '20130101 10:34:15', 

            '20130101 10:34:28', 

            '20130101 10:34:54',

            '20130101 10:34:55',

            '20130101 10:35:12'

        ]

    )

)

df = s.to_frame()

df = df.reset_index()

df = df.rename(columns=

               {

                   0         : 'value',

                   'index'   : 'start'

               }

              )

df['ID'] = [1,2,1,2,1,2,1,2]



sec = dt.timedelta(seconds=30)



df['end'] = df['start'].map(lambda t: t + sec)

 df





    start               val ID  end

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42

I have to sum the values of each ID for all rows between the start and end time stamps.
To be accurate my result should have this meaning:

p_ = 

#CICLE IS a problem

for row in range(len(df)):

    p_.append(

        #USING LOC IS A PROBLEM

        df.loc[

            (df['start'] >= df['start'][row]) & 

            (df['start'] <= df['end'][row])   &

            (df['ID']    == df['ID'][row])

            ]

        ['value']

        .sum()

    )

df

    start               val ID  end                 sum_of_values for_ID_in_time_period

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30 6

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38 4

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38 6

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45 3

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58 10

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24 12

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25 6

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42 7

df.groupby(

    [

        df.start.map(lambda t: t.minute),

        'ID'

    ]

)[['value']]

.sum()

but this transforms my result something what is not depending on the end column.

          value

start ID    

34    1   12

      2   9

35    2   7

asked yesterday

Datas A

283

I have a data frame like:

import datetime as dt

import pandas as pd



s = pd.Series(

    range(8),

    pd.to_datetime(

        [

            '20130101 10:34',

            '20130101 10:34:08', 

            '20130101 10:34:08', 

            '20130101 10:34:15', 

            '20130101 10:34:28', 

            '20130101 10:34:54',

            '20130101 10:34:55',

            '20130101 10:35:12'

        ]

    )

)

df = s.to_frame()

df = df.reset_index()

df = df.rename(columns=

               {

                   0         : 'value',

                   'index'   : 'start'

               }

              )

df['ID'] = [1,2,1,2,1,2,1,2]



sec = dt.timedelta(seconds=30)



df['end'] = df['start'].map(lambda t: t + sec)

 df





    start               val ID  end

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42

I have to sum the values of each ID for all rows between the start and end time stamps.
To be accurate my result should have this meaning:

p_ = 

#CICLE IS a problem

for row in range(len(df)):

    p_.append(

        #USING LOC IS A PROBLEM

        df.loc[

            (df['start'] >= df['start'][row]) & 

            (df['start'] <= df['end'][row])   &

            (df['ID']    == df['ID'][row])

            ]

        ['value']

        .sum()

    )

df

    start               val ID  end                 sum_of_values for_ID_in_time_period

0   2013-01-01 10:34:00 0   1   2013-01-01 10:34:30 6

1   2013-01-01 10:34:08 1   2   2013-01-01 10:34:38 4

2   2013-01-01 10:34:08 2   1   2013-01-01 10:34:38 6

3   2013-01-01 10:34:15 3   2   2013-01-01 10:34:45 3

4   2013-01-01 10:34:28 4   1   2013-01-01 10:34:58 10

5   2013-01-01 10:34:54 5   2   2013-01-01 10:35:24 12

6   2013-01-01 10:34:55 6   1   2013-01-01 10:35:25 6

7   2013-01-01 10:35:12 7   2   2013-01-01 10:35:42 7

df.groupby(

    [

        df.start.map(lambda t: t.minute),

        'ID'

    ]

)[['value']]

.sum()

but this transforms my result something what is not depending on the end column.

          value

start ID    

34    1   12

      2   9

35    2   7

python pandas

asked yesterday

Datas A

283

asked yesterday

Datas A

283

asked yesterday

Datas A

283

asked yesterday

Datas A

283

asked yesterday

Datas A

283

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209225%2fgroupby-count-on-user-defined-time-periods-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu