Identity column re-seed: when it is necessary?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
add a comment |
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
2 days ago
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
During one of the last lessons at university (I'm a student), lecturer asked us to develop a database (MySQL Server if it matters) and tiny client app that would consume the database as data source.
One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words). That is, when table row is deleted, it's PK must be reused in subsequent inserts. I have average knowledge in RDBMS, PKs and identity columns. From what I understand, that identity column is just a way to let DB to auto-generate PKs when inserting rows and nothing more. And identity column value shall not be related to row attributes in any way (as long as it is not natural key).
This requirement (strictly sequential identity column) was suspicious to me. I tried to ask the lecturer what is wrong if identity is not sequential (with gaps caused by deletions), but got very abstract answer like "it is convenient for users and useful for DB administrators who maintain the database". No specific examples. The argument "convenient for users" sounds silly, because it doesn't have any meaning in business domain.
Therefore I'm curious if these reasons are real? I can think only of one case when identity column reseed is required -- when identity space is exhausted. But this is more design issue when identity column type was chosen incorrectly, say simple int instead of bigint or uniqueidentifier when table contains billion rows. Suppose, an identity column is a clustered index: can gaps in identity column affect index performance? Maybe there are other real-world reasons for automatic identity column re-seed after each delete I'm not aware of?
Thanks in advance!
mysql identity auto-increment
mysql identity auto-increment
edited 2 days ago
Rick James
39.7k22256
39.7k22256
asked 2 days ago
Crypt32
1485
1485
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
2 days ago
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday
add a comment |
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
1
Database clusters, Galera and multiple active-write db servers, also useauto_increment_(increment|offset)(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.
– danblack
2 days ago
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday
2
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
1
1
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
2 days ago
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset) (ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
2 days ago
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday
add a comment |
3 Answers
3
active
oldest
votes
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
|
show 10 more comments
up vote
6
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
add a comment |
up vote
4
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
2 days ago
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
|
show 10 more comments
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
|
show 10 more comments
up vote
13
down vote
accepted
up vote
13
down vote
accepted
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
That is, when table row is deleted, it's PK must be reused in subsequent inserts.
What universe is your lecturer from??
That is grossly inefficient. If you try to do that, you will cut your performance prospects down by a factor of 10.
If you need gapless numbers for auditing reasons, build them explicitly, not directly from database tools. And never delete rows, but flag them as "deleted". This will add to the messiness of queries, since they will have to ignore such rows.
In MySQL, InnoDB requires the existence of a unique PRIMARY KEY for each table. But that is the extent of the requirement. The key can even be a string.
Gaps are a convenience to the users and DBAs, not an inconvenience.
I can think of one case where gapless would be convenient -- chunking into groups of 100 rows at a time. But there is a simple workaround using LIMIT 100,1.
Gaps have zero impact on performance. That includes non-numeric indexes. And non-unique indexes. And composite indexes.
Sure, you can run out of ids. I think I have seen it happen twice in nearly 2 decades of using MySQL. I may as well worry about being struck by an asteroid. It's low on my things-that-keep-me-awake-at-night list.
Gaps occur from (at least): INSERT IGNORE, IODKU, REPLACE, DELETE, ROLLBACK (explicit, or due to crash), Multi-master replication (including Galera and Group Replication). Do you really want to come up with workarounds for those?!
Feel free to have us sanity-check anything else that lecturer says that is suspicious.
edited yesterday
answered 2 days ago
Rick James
39.7k22256
39.7k22256
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
|
show 10 more comments
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
Don’t ask. I feel that her statement is not very correct, but don’t have arguments to argue this why I’m asking.
– Crypt32
2 days ago
1
1
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
@Crypt32 - I added a bunch more. I think this is a question that needs to be tackled the other way. The lecturer should provide coherent claims; then we can shoot them down or suggest that those claims do not justify the complexity of reusing ids. I suggest that they need to provide sample code on how to identify and reuse the ids; again so we can point out how messy that will be.
– Rick James
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
I tried to ask again during intermission, but all I got was what I said in my question.
– Crypt32
2 days ago
1
1
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
Consider that the lecturer's requirement may not have a grounding on real database reasons, but rather, stems from the lecturer wanting to put the students through hoops so that they go through the experience of being forced to tackle difficult algorithms.
– João Mendes
yesterday
1
1
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
@Vilx- - That is fine for only one of my half-dozen cases where gaps happen.
– Rick James
yesterday
|
show 10 more comments
up vote
6
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
add a comment |
up vote
6
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
add a comment |
up vote
6
down vote
up vote
6
down vote
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
Reusing an identity value, should in general be discouraged. Either the value is used entirely internally, in which case it’s actual value is immaterial, or it is also used externally in which case reusing the value is very likely going to lead to a misidentification.
Take the obvious case of an invoice or purchase order number, these might easily come from an identity column and be exposed externally, but you would never want to reuse them for precisely that reason. Both refer to specific transactions which you would not want to get confused.
Resolving such issues can be a big hassle when companies merge or are acquired. Creating such problems on purpose? Not wise.
answered 2 days ago
jmoreno
539416
539416
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
add a comment |
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
@Crypt32 - The "invoice number" argument may be the most compelling one to throw at your lecturer.
– Rick James
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
Good example, I didn't think in this way. Bookmarking a link to a resource with its exposed ID would point to a totally different resource after reuse.
– Crypt32
yesterday
add a comment |
up vote
4
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
2 days ago
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
add a comment |
up vote
4
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
2 days ago
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
add a comment |
up vote
4
down vote
up vote
4
down vote
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
The reuse of PK id values has problems and generally should be avoided.
First, the implementation of auto_increment columns doesn't provide the guarantee of being gapless. Indeed gaps will occur if you rollback an insert on an auto increment column.
Secondly the gap ID may refer to existing data that hasn't been deleted (due to missing FK constrains). If they translate to member numbers communicated outside the system then that poses potential business identity risks.
Thirdly, bigint unsigned won't run out of IDs for a significant time even given an extremely large insert rate.
The biggest pain with gaps is coming across auditors who insist its an audit flaw. For DBAs they know gaps exist and why.
answered 2 days ago
danblack
1,3961212
1,3961212
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
2 days ago
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
add a comment |
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
Rollback an insert is either failure or user instigated like:BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK
– danblack
2 days ago
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
Interesting sentence about auditors. How they explain gaps as flaw? “Rollback an insert”, you mean when insert fails due to some reasons?
– Crypt32
2 days ago
1
1
Rollback an insert is either failure or user instigated like:
BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK– danblack
2 days ago
Rollback an insert is either failure or user instigated like:
BEGIN; INSERT INTO ... ; {some other insert/update}; ROLLBACK– danblack
2 days ago
1
1
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
Auditors just use similar terms around IDs to DBAs, just fairly different business meanings. Each is logical to their own view on the world. Usually resolvable with good communication.
– danblack
2 days ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f223392%2fidentity-column-re-seed-when-it-is-necessary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Gaps don't affect clustered indexes, these are b-trees of a form anyway. Old (still relevant) rant: Shooting yourself in the foot: renumbering auto-inc IDs
– danblack
2 days ago
1
Database clusters, Galera and multiple active-write db servers, also use
auto_increment_(increment|offset)(ref: dev.mysql.com/doc/refman/5.7/en/…) as a mechanism to prevent conflicts which explicitly configures servers to have gaps in namespace.– danblack
2 days ago
"One of requirements was that the identity column (which is the PK in every table) must be sequential, because it is a good practice (as per lecturer words)." media.tenor.com/images/5442002c8081300e02cdb610e70a5460/… This is a well known as an anti-pattern.
– jpmc26
yesterday