Calculating difference statistics over a moving window











up vote
3
down vote

favorite












I'd like to calculate some statistics of given data (frequencies of difference among elements on various distances in percent multiplied by 10) using moving window within that data. Is it possible to speed up the code below? I noticed that some calculations are repeating. But I was not able to exclude them without additional slowness.



def get_dist_stat(pdata, pwin_length):
''' pdata - given data array
pwin_length - the lenght of window
the function returns stat table where
row represents the distance between elements
col represents the difference for that distance in percent multiplied by 10 (assume that maximum difference can be 20 percent)

'''

l_data = len(pdata)
l_win = pwin_length
print("l_data=", l_data)
print("l_win=", l_win)

# stat table
stat_table = np.zeros((l_win-1, 20*10), dtype = int)

# loop over all data
for k in range(l_data - l_win + 1):

win = pdata[k : k + l_win]
print('-' * 10)
print("k=", k, " kend=", k + l_win )

print("win=", win)

# loop over window
for i in range(1 , l_win):
b=win[i:]
a=win[:-i]
diff=(abs((b-a)/a*100 ) * 10).astype(int)
print("i=",i)
print("b=", b)
print("a=", a)
print("diff=",diff)

# storing found differences into stat table
apercents, acount = np.unique(diff, return_counts = True)
l_apercents = len(apercents)
for j in range(l_apercents):
stat_table[i-1, apercents[j]] += acount[j]
return stat_table

adata=np.array([1.1,1.2,1.3,1.4,1.5])
print("adata=", adata)

astat_table=get_dist_stat(adata,3)
print(astat_table)


And that is its output



adata= [1.1 1.2 1.3 1.4 1.5]
l_data= 5
l_win= 3
----------
k= 0 kend= 3
win= [1.1 1.2 1.3]
i= 1
b= [1.2 1.3]
a= [1.1 1.2]
diff= [90 83]
i= 2
b= [1.3]
a= [1.1]
diff= [181]
----------
k= 1 kend= 4
win= [1.2 1.3 1.4]
i= 1
b= [1.3 1.4]
a= [1.2 1.3]
diff= [83 76]
i= 2
b= [1.4]
a= [1.2]
diff= [166]
----------
k= 2 kend= 5
win= [1.3 1.4 1.5]
i= 1
b= [1.4 1.5]
a= [1.3 1.4]
diff= [76 71]
i= 2
b= [1.5]
a= [1.3]
diff= [153]
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]









share|improve this question
















bumped to the homepage by Community 2 days ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.



















    up vote
    3
    down vote

    favorite












    I'd like to calculate some statistics of given data (frequencies of difference among elements on various distances in percent multiplied by 10) using moving window within that data. Is it possible to speed up the code below? I noticed that some calculations are repeating. But I was not able to exclude them without additional slowness.



    def get_dist_stat(pdata, pwin_length):
    ''' pdata - given data array
    pwin_length - the lenght of window
    the function returns stat table where
    row represents the distance between elements
    col represents the difference for that distance in percent multiplied by 10 (assume that maximum difference can be 20 percent)

    '''

    l_data = len(pdata)
    l_win = pwin_length
    print("l_data=", l_data)
    print("l_win=", l_win)

    # stat table
    stat_table = np.zeros((l_win-1, 20*10), dtype = int)

    # loop over all data
    for k in range(l_data - l_win + 1):

    win = pdata[k : k + l_win]
    print('-' * 10)
    print("k=", k, " kend=", k + l_win )

    print("win=", win)

    # loop over window
    for i in range(1 , l_win):
    b=win[i:]
    a=win[:-i]
    diff=(abs((b-a)/a*100 ) * 10).astype(int)
    print("i=",i)
    print("b=", b)
    print("a=", a)
    print("diff=",diff)

    # storing found differences into stat table
    apercents, acount = np.unique(diff, return_counts = True)
    l_apercents = len(apercents)
    for j in range(l_apercents):
    stat_table[i-1, apercents[j]] += acount[j]
    return stat_table

    adata=np.array([1.1,1.2,1.3,1.4,1.5])
    print("adata=", adata)

    astat_table=get_dist_stat(adata,3)
    print(astat_table)


    And that is its output



    adata= [1.1 1.2 1.3 1.4 1.5]
    l_data= 5
    l_win= 3
    ----------
    k= 0 kend= 3
    win= [1.1 1.2 1.3]
    i= 1
    b= [1.2 1.3]
    a= [1.1 1.2]
    diff= [90 83]
    i= 2
    b= [1.3]
    a= [1.1]
    diff= [181]
    ----------
    k= 1 kend= 4
    win= [1.2 1.3 1.4]
    i= 1
    b= [1.3 1.4]
    a= [1.2 1.3]
    diff= [83 76]
    i= 2
    b= [1.4]
    a= [1.2]
    diff= [166]
    ----------
    k= 2 kend= 5
    win= [1.3 1.4 1.5]
    i= 1
    b= [1.4 1.5]
    a= [1.3 1.4]
    diff= [76 71]
    i= 2
    b= [1.5]
    a= [1.3]
    diff= [153]
    [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
    0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]









    share|improve this question
















    bumped to the homepage by Community 2 days ago


    This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

















      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I'd like to calculate some statistics of given data (frequencies of difference among elements on various distances in percent multiplied by 10) using moving window within that data. Is it possible to speed up the code below? I noticed that some calculations are repeating. But I was not able to exclude them without additional slowness.



      def get_dist_stat(pdata, pwin_length):
      ''' pdata - given data array
      pwin_length - the lenght of window
      the function returns stat table where
      row represents the distance between elements
      col represents the difference for that distance in percent multiplied by 10 (assume that maximum difference can be 20 percent)

      '''

      l_data = len(pdata)
      l_win = pwin_length
      print("l_data=", l_data)
      print("l_win=", l_win)

      # stat table
      stat_table = np.zeros((l_win-1, 20*10), dtype = int)

      # loop over all data
      for k in range(l_data - l_win + 1):

      win = pdata[k : k + l_win]
      print('-' * 10)
      print("k=", k, " kend=", k + l_win )

      print("win=", win)

      # loop over window
      for i in range(1 , l_win):
      b=win[i:]
      a=win[:-i]
      diff=(abs((b-a)/a*100 ) * 10).astype(int)
      print("i=",i)
      print("b=", b)
      print("a=", a)
      print("diff=",diff)

      # storing found differences into stat table
      apercents, acount = np.unique(diff, return_counts = True)
      l_apercents = len(apercents)
      for j in range(l_apercents):
      stat_table[i-1, apercents[j]] += acount[j]
      return stat_table

      adata=np.array([1.1,1.2,1.3,1.4,1.5])
      print("adata=", adata)

      astat_table=get_dist_stat(adata,3)
      print(astat_table)


      And that is its output



      adata= [1.1 1.2 1.3 1.4 1.5]
      l_data= 5
      l_win= 3
      ----------
      k= 0 kend= 3
      win= [1.1 1.2 1.3]
      i= 1
      b= [1.2 1.3]
      a= [1.1 1.2]
      diff= [90 83]
      i= 2
      b= [1.3]
      a= [1.1]
      diff= [181]
      ----------
      k= 1 kend= 4
      win= [1.2 1.3 1.4]
      i= 1
      b= [1.3 1.4]
      a= [1.2 1.3]
      diff= [83 76]
      i= 2
      b= [1.4]
      a= [1.2]
      diff= [166]
      ----------
      k= 2 kend= 5
      win= [1.3 1.4 1.5]
      i= 1
      b= [1.4 1.5]
      a= [1.3 1.4]
      diff= [76 71]
      i= 2
      b= [1.5]
      a= [1.3]
      diff= [153]
      [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
      0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
      [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]









      share|improve this question















      I'd like to calculate some statistics of given data (frequencies of difference among elements on various distances in percent multiplied by 10) using moving window within that data. Is it possible to speed up the code below? I noticed that some calculations are repeating. But I was not able to exclude them without additional slowness.



      def get_dist_stat(pdata, pwin_length):
      ''' pdata - given data array
      pwin_length - the lenght of window
      the function returns stat table where
      row represents the distance between elements
      col represents the difference for that distance in percent multiplied by 10 (assume that maximum difference can be 20 percent)

      '''

      l_data = len(pdata)
      l_win = pwin_length
      print("l_data=", l_data)
      print("l_win=", l_win)

      # stat table
      stat_table = np.zeros((l_win-1, 20*10), dtype = int)

      # loop over all data
      for k in range(l_data - l_win + 1):

      win = pdata[k : k + l_win]
      print('-' * 10)
      print("k=", k, " kend=", k + l_win )

      print("win=", win)

      # loop over window
      for i in range(1 , l_win):
      b=win[i:]
      a=win[:-i]
      diff=(abs((b-a)/a*100 ) * 10).astype(int)
      print("i=",i)
      print("b=", b)
      print("a=", a)
      print("diff=",diff)

      # storing found differences into stat table
      apercents, acount = np.unique(diff, return_counts = True)
      l_apercents = len(apercents)
      for j in range(l_apercents):
      stat_table[i-1, apercents[j]] += acount[j]
      return stat_table

      adata=np.array([1.1,1.2,1.3,1.4,1.5])
      print("adata=", adata)

      astat_table=get_dist_stat(adata,3)
      print(astat_table)


      And that is its output



      adata= [1.1 1.2 1.3 1.4 1.5]
      l_data= 5
      l_win= 3
      ----------
      k= 0 kend= 3
      win= [1.1 1.2 1.3]
      i= 1
      b= [1.2 1.3]
      a= [1.1 1.2]
      diff= [90 83]
      i= 2
      b= [1.3]
      a= [1.1]
      diff= [181]
      ----------
      k= 1 kend= 4
      win= [1.2 1.3 1.4]
      i= 1
      b= [1.3 1.4]
      a= [1.2 1.3]
      diff= [83 76]
      i= 2
      b= [1.4]
      a= [1.2]
      diff= [166]
      ----------
      k= 2 kend= 5
      win= [1.3 1.4 1.5]
      i= 1
      b= [1.4 1.5]
      a= [1.3 1.4]
      diff= [76 71]
      i= 2
      b= [1.5]
      a= [1.3]
      diff= [153]
      [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
      0 0 0 0 2 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
      [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
      0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]






      python performance numpy statistics






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Mar 7 at 21:12

























      asked Mar 7 at 12:43









      Prokhozhii

      163




      163





      bumped to the homepage by Community 2 days ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







      bumped to the homepage by Community 2 days ago


      This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          You've already made the key observation here, which is that most of the work is redone. Each time you pick a window, most of the calculations are the same as the previous window.



          In fact it's much faster to do all the calculations ahead of time into one big ndarray, and then for each window, pick out the calculations that are relevant. So we don't need the temporary a and b lists.



          How many dimensions do we need? Just starting point and length. It's going to be a triangular array, so we'll waste some space.



          precomputed_results = np.zeros(l_win+1, l_data), dtype = int)
          # First pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # compute diff relative to elements [first_point_index] and [first_point_index+interval]
          # line will be similar to precomputed_results[...] = ...

          # Second pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # use slicing on precomputed_results





          share|improve this answer





















          • It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
            – Prokhozhii
            Jul 26 at 10:13












          • @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
            – Snowbody
            Jul 27 at 12:57













          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f189045%2fcalculating-difference-statistics-over-a-moving-window%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          You've already made the key observation here, which is that most of the work is redone. Each time you pick a window, most of the calculations are the same as the previous window.



          In fact it's much faster to do all the calculations ahead of time into one big ndarray, and then for each window, pick out the calculations that are relevant. So we don't need the temporary a and b lists.



          How many dimensions do we need? Just starting point and length. It's going to be a triangular array, so we'll waste some space.



          precomputed_results = np.zeros(l_win+1, l_data), dtype = int)
          # First pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # compute diff relative to elements [first_point_index] and [first_point_index+interval]
          # line will be similar to precomputed_results[...] = ...

          # Second pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # use slicing on precomputed_results





          share|improve this answer





















          • It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
            – Prokhozhii
            Jul 26 at 10:13












          • @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
            – Snowbody
            Jul 27 at 12:57

















          up vote
          0
          down vote













          You've already made the key observation here, which is that most of the work is redone. Each time you pick a window, most of the calculations are the same as the previous window.



          In fact it's much faster to do all the calculations ahead of time into one big ndarray, and then for each window, pick out the calculations that are relevant. So we don't need the temporary a and b lists.



          How many dimensions do we need? Just starting point and length. It's going to be a triangular array, so we'll waste some space.



          precomputed_results = np.zeros(l_win+1, l_data), dtype = int)
          # First pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # compute diff relative to elements [first_point_index] and [first_point_index+interval]
          # line will be similar to precomputed_results[...] = ...

          # Second pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # use slicing on precomputed_results





          share|improve this answer





















          • It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
            – Prokhozhii
            Jul 26 at 10:13












          • @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
            – Snowbody
            Jul 27 at 12:57















          up vote
          0
          down vote










          up vote
          0
          down vote









          You've already made the key observation here, which is that most of the work is redone. Each time you pick a window, most of the calculations are the same as the previous window.



          In fact it's much faster to do all the calculations ahead of time into one big ndarray, and then for each window, pick out the calculations that are relevant. So we don't need the temporary a and b lists.



          How many dimensions do we need? Just starting point and length. It's going to be a triangular array, so we'll waste some space.



          precomputed_results = np.zeros(l_win+1, l_data), dtype = int)
          # First pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # compute diff relative to elements [first_point_index] and [first_point_index+interval]
          # line will be similar to precomputed_results[...] = ...

          # Second pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # use slicing on precomputed_results





          share|improve this answer












          You've already made the key observation here, which is that most of the work is redone. Each time you pick a window, most of the calculations are the same as the previous window.



          In fact it's much faster to do all the calculations ahead of time into one big ndarray, and then for each window, pick out the calculations that are relevant. So we don't need the temporary a and b lists.



          How many dimensions do we need? Just starting point and length. It's going to be a triangular array, so we'll waste some space.



          precomputed_results = np.zeros(l_win+1, l_data), dtype = int)
          # First pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # compute diff relative to elements [first_point_index] and [first_point_index+interval]
          # line will be similar to precomputed_results[...] = ...

          # Second pass
          for interval in range(1, l_win):
          for first_point_index in range(l_data-interval):
          # use slicing on precomputed_results






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 9 at 3:21









          Snowbody

          7,7671344




          7,7671344












          • It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
            – Prokhozhii
            Jul 26 at 10:13












          • @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
            – Snowbody
            Jul 27 at 12:57




















          • It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
            – Prokhozhii
            Jul 26 at 10:13












          • @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
            – Snowbody
            Jul 27 at 12:57


















          It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
          – Prokhozhii
          Jul 26 at 10:13






          It looks like the window is not moving in your code. For example, if we have l_data = 10 and l_win = 7, we won't get difference between elements 8 and 9.
          – Prokhozhii
          Jul 26 at 10:13














          @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
          – Snowbody
          Jul 27 at 12:57






          @Prokhozhii Um, yes we will: when interval is 1 (which is certainly in range(1, 7), and first_point_index is 8 (which is certainly in range(1, 10-1) then first_point_index + interval is 9. I didn't write all the code, but the comments indicate which elements are getting compared.
          – Snowbody
          Jul 27 at 12:57




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f189045%2fcalculating-difference-statistics-over-a-moving-window%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Quarter-circle Tiles

          build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

          Mont Emei