Identity column re-seed: when it is necessary?












11














During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.



One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).



This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.



Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?



Thanks in advance!










share|improve this question




















  • 2




    Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
    – danblack
    Nov 25 at 22:56






  • 1




    Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
    – danblack
    Nov 25 at 23:06












  • "One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
    – jpmc26
    Nov 27 at 1:06


















11














During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.



One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).



This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.



Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?



Thanks in advance!










share|improve this question




















  • 2




    Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
    – danblack
    Nov 25 at 22:56






  • 1




    Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
    – danblack
    Nov 25 at 23:06












  • "One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
    – jpmc26
    Nov 27 at 1:06
















11












11








11


1





During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.



One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).



This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.



Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?



Thanks in advance!










share|improve this question















During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.



One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).



This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.



Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?



Thanks in advance!







mysql identity auto-increment






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 at 23:31









Rick James

40.6k22257




40.6k22257










asked Nov 25 at 22:32









Crypt32

1587




1587








  • 2




    Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
    – danblack
    Nov 25 at 22:56






  • 1




    Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
    – danblack
    Nov 25 at 23:06












  • "One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
    – jpmc26
    Nov 27 at 1:06
















  • 2




    Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
    – danblack
    Nov 25 at 22:56






  • 1




    Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
    – danblack
    Nov 25 at 23:06












  • "One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
    – jpmc26
    Nov 27 at 1:06










2




2




Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56




Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
Nov 25 at 22:56




1




1




Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
Nov 25 at 23:06






Database clusters, Galera and multiple active-write db servers, also use auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
Nov 25 at 23:06














"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06






"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
Nov 27 at 1:06












4 Answers
4






active

oldest

votes


















15















That is, when table row is deleted, it's PK must be reused in subsequent inserts.




What universe is your lecturer from??



That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.



If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.



In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.



Gaps are a convenience to the users and DBAs, not an inconvenience.



I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.



Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.



Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.



Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!



Feel free to have us sanity-check anything else that lecturer says that is suspicious.






share|improve this answer























  • Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
    – Crypt32
    Nov 25 at 23:12






  • 2




    @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
    – Rick James
    Nov 25 at 23:28






  • 1




    Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
    – João Mendes
    Nov 26 at 11:42






  • 1




    @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
    – Rick James
    Nov 26 at 22:17






  • 1




    In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
    – Drunken Code Monkey
    Nov 27 at 0:52





















8














Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.



Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.



Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.






share|improve this answer





















  • @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
    – Rick James
    Nov 26 at 4:33










  • Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
    – Crypt32
    Nov 26 at 5:35



















5














The reuse of PK id values has problems and generally should be avoided.



First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.



Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.



Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.



The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.






share|improve this answer





















  • Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
    – Crypt32
    Nov 25 at 23:01






  • 1




    Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
    – danblack
    Nov 25 at 23:03






  • 1




    Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
    – danblack
    Nov 25 at 23:09



















0














I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.



Corruption of the PK index itself.



Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.






share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "182"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223392%2fidentity-column-re-seed-when-it-is-necessary%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    15















    That is, when table row is deleted, it's PK must be reused in subsequent inserts.




    What universe is your lecturer from??



    That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.



    If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.



    In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.



    Gaps are a convenience to the users and DBAs, not an inconvenience.



    I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.



    Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.



    Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.



    Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!



    Feel free to have us sanity-check anything else that lecturer says that is suspicious.






    share|improve this answer























    • Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
      – Crypt32
      Nov 25 at 23:12






    • 2




      @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
      – Rick James
      Nov 25 at 23:28






    • 1




      Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
      – João Mendes
      Nov 26 at 11:42






    • 1




      @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
      – Rick James
      Nov 26 at 22:17






    • 1




      In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
      – Drunken Code Monkey
      Nov 27 at 0:52


















    15















    That is, when table row is deleted, it's PK must be reused in subsequent inserts.




    What universe is your lecturer from??



    That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.



    If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.



    In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.



    Gaps are a convenience to the users and DBAs, not an inconvenience.



    I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.



    Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.



    Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.



    Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!



    Feel free to have us sanity-check anything else that lecturer says that is suspicious.






    share|improve this answer























    • Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
      – Crypt32
      Nov 25 at 23:12






    • 2




      @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
      – Rick James
      Nov 25 at 23:28






    • 1




      Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
      – João Mendes
      Nov 26 at 11:42






    • 1




      @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
      – Rick James
      Nov 26 at 22:17






    • 1




      In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
      – Drunken Code Monkey
      Nov 27 at 0:52
















    15












    15








    15







    That is, when table row is deleted, it's PK must be reused in subsequent inserts.




    What universe is your lecturer from??



    That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.



    If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.



    In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.



    Gaps are a convenience to the users and DBAs, not an inconvenience.



    I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.



    Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.



    Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.



    Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!



    Feel free to have us sanity-check anything else that lecturer says that is suspicious.






    share|improve this answer















    That is, when table row is deleted, it's PK must be reused in subsequent inserts.




    What universe is your lecturer from??



    That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.



    If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.



    In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.



    Gaps are a convenience to the users and DBAs, not an inconvenience.



    I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.



    Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.



    Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.



    Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!



    Feel free to have us sanity-check anything else that lecturer says that is suspicious.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 26 at 4:12

























    answered Nov 25 at 23:09









    Rick James

    40.6k22257




    40.6k22257












    • Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
      – Crypt32
      Nov 25 at 23:12






    • 2




      @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
      – Rick James
      Nov 25 at 23:28






    • 1




      Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
      – João Mendes
      Nov 26 at 11:42






    • 1




      @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
      – Rick James
      Nov 26 at 22:17






    • 1




      In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
      – Drunken Code Monkey
      Nov 27 at 0:52




















    • Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
      – Crypt32
      Nov 25 at 23:12






    • 2




      @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
      – Rick James
      Nov 25 at 23:28






    • 1




      Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
      – João Mendes
      Nov 26 at 11:42






    • 1




      @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
      – Rick James
      Nov 26 at 22:17






    • 1




      In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
      – Drunken Code Monkey
      Nov 27 at 0:52


















    Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
    – Crypt32
    Nov 25 at 23:12




    Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
    – Crypt32
    Nov 25 at 23:12




    2




    2




    @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
    – Rick James
    Nov 25 at 23:28




    @Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
    – Rick James
    Nov 25 at 23:28




    1




    1




    Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
    – João Mendes
    Nov 26 at 11:42




    Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
    – João Mendes
    Nov 26 at 11:42




    1




    1




    @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
    – Rick James
    Nov 26 at 22:17




    @Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
    – Rick James
    Nov 26 at 22:17




    1




    1




    In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
    – Drunken Code Monkey
    Nov 27 at 0:52






    In most enterprise systems it would be a sacrilege (and probably criminal) to delete anything from the database, ever, apart from audit prunes and DBA tasks. Add a column named "deleted" type bit to every table, and use that instead. When you modify data, create an audit row in an audit table driven by a trigger. Since you never delete anything, there are no gaps to fill, and every row can be traced back through it's entire life from creation to deletion. Integrity is critical in enterprise systems, and storage is cheap.
    – Drunken Code Monkey
    Nov 27 at 0:52















    8














    Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.



    Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.



    Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.






    share|improve this answer





















    • @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
      – Rick James
      Nov 26 at 4:33










    • Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
      – Crypt32
      Nov 26 at 5:35
















    8














    Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.



    Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.



    Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.






    share|improve this answer





















    • @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
      – Rick James
      Nov 26 at 4:33










    • Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
      – Crypt32
      Nov 26 at 5:35














    8












    8








    8






    Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.



    Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.



    Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.






    share|improve this answer












    Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.



    Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.



    Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 26 at 1:18









    jmoreno

    559416




    559416












    • @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
      – Rick James
      Nov 26 at 4:33










    • Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
      – Crypt32
      Nov 26 at 5:35


















    • @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
      – Rick James
      Nov 26 at 4:33










    • Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
      – Crypt32
      Nov 26 at 5:35
















    @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
    – Rick James
    Nov 26 at 4:33




    @Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
    – Rick James
    Nov 26 at 4:33












    Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
    – Crypt32
    Nov 26 at 5:35




    Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
    – Crypt32
    Nov 26 at 5:35











    5














    The reuse of PK id values has problems and generally should be avoided.



    First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.



    Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.



    Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.



    The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.






    share|improve this answer





















    • Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
      – Crypt32
      Nov 25 at 23:01






    • 1




      Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
      – danblack
      Nov 25 at 23:03






    • 1




      Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
      – danblack
      Nov 25 at 23:09
















    5














    The reuse of PK id values has problems and generally should be avoided.



    First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.



    Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.



    Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.



    The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.






    share|improve this answer





















    • Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
      – Crypt32
      Nov 25 at 23:01






    • 1




      Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
      – danblack
      Nov 25 at 23:03






    • 1




      Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
      – danblack
      Nov 25 at 23:09














    5












    5








    5






    The reuse of PK id values has problems and generally should be avoided.



    First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.



    Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.



    Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.



    The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.






    share|improve this answer












    The reuse of PK id values has problems and generally should be avoided.



    First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.



    Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.



    Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.



    The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 25 at 22:52









    danblack

    1,4921212




    1,4921212












    • Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
      – Crypt32
      Nov 25 at 23:01






    • 1




      Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
      – danblack
      Nov 25 at 23:03






    • 1




      Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
      – danblack
      Nov 25 at 23:09


















    • Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
      – Crypt32
      Nov 25 at 23:01






    • 1




      Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
      – danblack
      Nov 25 at 23:03






    • 1




      Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
      – danblack
      Nov 25 at 23:09
















    Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
    – Crypt32
    Nov 25 at 23:01




    Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
    – Crypt32
    Nov 25 at 23:01




    1




    1




    Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
    – danblack
    Nov 25 at 23:03




    Rollback an insert is either failure or user instigated like: BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
    – danblack
    Nov 25 at 23:03




    1




    1




    Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
    – danblack
    Nov 25 at 23:09




    Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
    – danblack
    Nov 25 at 23:09











    0














    I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.



    Corruption of the PK index itself.



    Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.






    share|improve this answer


























      0














      I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.



      Corruption of the PK index itself.



      Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.






      share|improve this answer
























        0












        0








        0






        I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.



        Corruption of the PK index itself.



        Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.






        share|improve this answer












        I won't echo everyone else's comments that that reusing a PK is a bad idea but I have come across times where an identity column needed to be re-seeded.



        Corruption of the PK index itself.



        Granted this was using MS-SQL and many, many years ago but it still is relevant. Many years ago for the company that I work for, someone thought it would be a good idea to re-use PC's as servers in our 150+ remote locations after they were too old to be used by the clients and then stick them in a closet with no ventilation. When no Because we all know that a pile of junk 10 year old computer in a tiny room with temps of 120+ running mission critical databases could only result in good things. Like 40% failure rates and me rethinking my career choice. We would replicate the data back to the corp headquarters but more often than not, these failures would result in bad things happening to the databases. One of those things were the database having corrupted indexes which would seize up the database and the replication process. Twice in this great environment, the only solution to fix replication was to reseed the indexes and then reestablish replication. We did replace the servers later before ditching them completely.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 29 at 20:33









        user1207758

        3161210




        3161210






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Database Administrators Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223392%2fidentity-column-re-seed-when-it-is-necessary%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ellipse (mathématiques)

            Quarter-circle Tiles

            Mont Emei