Split a set keeping the proportion of its elements on each subset
up vote
0
down vote
favorite
I have a set of elements. To simplify here, these elements are $+$ and $-$.
I need to split this set into n sets keeping the proportion of $+$ and $-$. These new sets need to have similar size: their size must differ in one or zero.
For example, if I have a set with 7 $+$ and 4 $-$, and I need to split it into 5 sets. Each of these sets will have: 3 elements, 2 elements, 2 elements, 2 elements and 2 elements.
To solve this, I have done the following:
- Calculate the percentage of $+$ elements: $p = 4/11 = 0.37 => 37%$
- Calculate the percentage of $-$ elements: $n = 7/11 = 0.63 => 63%$
Now I have the percentages, I use the following formulas to calculate how many $+$ and $-$ elements I will have on each subset.
If $m$ is the number of elements on a subset, I calculate:
$mp = m * p$
$mn = m * n$
Where $mp$ is the number of $+$ elements, and $mn$ is the number of $-$ elements on each subset.
Those numbers always have decimals, so I need to round then.
If $mp > mn$ I do:
$mp = lceil mprceil$
$mn = lfloor mn rfloor $
And, if $mp < mn$ I do:
$mp = lfloor mp rfloor $
$mn = lceil mnrceil$
But it doesn't work in this example, because when I do:
$mp = 2 * 0.37 = 0.74$
$mn = 2 * 0.63 = 1.26$
I get than $mn > mp$, so:
$mp = lfloor 0.74 rfloor = 0 $
$mn = lceil 1.26 rceil = 2$
I don't use any of the $+$ elements in four subsets. This is an error.
My problem is I don't know what to do with the decimals.
How can I do to keep the proportion and avoid these kind of problems?
elementary-set-theory algorithms
add a comment |
up vote
0
down vote
favorite
I have a set of elements. To simplify here, these elements are $+$ and $-$.
I need to split this set into n sets keeping the proportion of $+$ and $-$. These new sets need to have similar size: their size must differ in one or zero.
For example, if I have a set with 7 $+$ and 4 $-$, and I need to split it into 5 sets. Each of these sets will have: 3 elements, 2 elements, 2 elements, 2 elements and 2 elements.
To solve this, I have done the following:
- Calculate the percentage of $+$ elements: $p = 4/11 = 0.37 => 37%$
- Calculate the percentage of $-$ elements: $n = 7/11 = 0.63 => 63%$
Now I have the percentages, I use the following formulas to calculate how many $+$ and $-$ elements I will have on each subset.
If $m$ is the number of elements on a subset, I calculate:
$mp = m * p$
$mn = m * n$
Where $mp$ is the number of $+$ elements, and $mn$ is the number of $-$ elements on each subset.
Those numbers always have decimals, so I need to round then.
If $mp > mn$ I do:
$mp = lceil mprceil$
$mn = lfloor mn rfloor $
And, if $mp < mn$ I do:
$mp = lfloor mp rfloor $
$mn = lceil mnrceil$
But it doesn't work in this example, because when I do:
$mp = 2 * 0.37 = 0.74$
$mn = 2 * 0.63 = 1.26$
I get than $mn > mp$, so:
$mp = lfloor 0.74 rfloor = 0 $
$mn = lceil 1.26 rceil = 2$
I don't use any of the $+$ elements in four subsets. This is an error.
My problem is I don't know what to do with the decimals.
How can I do to keep the proportion and avoid these kind of problems?
elementary-set-theory algorithms
1
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a set of elements. To simplify here, these elements are $+$ and $-$.
I need to split this set into n sets keeping the proportion of $+$ and $-$. These new sets need to have similar size: their size must differ in one or zero.
For example, if I have a set with 7 $+$ and 4 $-$, and I need to split it into 5 sets. Each of these sets will have: 3 elements, 2 elements, 2 elements, 2 elements and 2 elements.
To solve this, I have done the following:
- Calculate the percentage of $+$ elements: $p = 4/11 = 0.37 => 37%$
- Calculate the percentage of $-$ elements: $n = 7/11 = 0.63 => 63%$
Now I have the percentages, I use the following formulas to calculate how many $+$ and $-$ elements I will have on each subset.
If $m$ is the number of elements on a subset, I calculate:
$mp = m * p$
$mn = m * n$
Where $mp$ is the number of $+$ elements, and $mn$ is the number of $-$ elements on each subset.
Those numbers always have decimals, so I need to round then.
If $mp > mn$ I do:
$mp = lceil mprceil$
$mn = lfloor mn rfloor $
And, if $mp < mn$ I do:
$mp = lfloor mp rfloor $
$mn = lceil mnrceil$
But it doesn't work in this example, because when I do:
$mp = 2 * 0.37 = 0.74$
$mn = 2 * 0.63 = 1.26$
I get than $mn > mp$, so:
$mp = lfloor 0.74 rfloor = 0 $
$mn = lceil 1.26 rceil = 2$
I don't use any of the $+$ elements in four subsets. This is an error.
My problem is I don't know what to do with the decimals.
How can I do to keep the proportion and avoid these kind of problems?
elementary-set-theory algorithms
I have a set of elements. To simplify here, these elements are $+$ and $-$.
I need to split this set into n sets keeping the proportion of $+$ and $-$. These new sets need to have similar size: their size must differ in one or zero.
For example, if I have a set with 7 $+$ and 4 $-$, and I need to split it into 5 sets. Each of these sets will have: 3 elements, 2 elements, 2 elements, 2 elements and 2 elements.
To solve this, I have done the following:
- Calculate the percentage of $+$ elements: $p = 4/11 = 0.37 => 37%$
- Calculate the percentage of $-$ elements: $n = 7/11 = 0.63 => 63%$
Now I have the percentages, I use the following formulas to calculate how many $+$ and $-$ elements I will have on each subset.
If $m$ is the number of elements on a subset, I calculate:
$mp = m * p$
$mn = m * n$
Where $mp$ is the number of $+$ elements, and $mn$ is the number of $-$ elements on each subset.
Those numbers always have decimals, so I need to round then.
If $mp > mn$ I do:
$mp = lceil mprceil$
$mn = lfloor mn rfloor $
And, if $mp < mn$ I do:
$mp = lfloor mp rfloor $
$mn = lceil mnrceil$
But it doesn't work in this example, because when I do:
$mp = 2 * 0.37 = 0.74$
$mn = 2 * 0.63 = 1.26$
I get than $mn > mp$, so:
$mp = lfloor 0.74 rfloor = 0 $
$mn = lceil 1.26 rceil = 2$
I don't use any of the $+$ elements in four subsets. This is an error.
My problem is I don't know what to do with the decimals.
How can I do to keep the proportion and avoid these kind of problems?
elementary-set-theory algorithms
elementary-set-theory algorithms
asked Nov 20 at 15:23
VansFannel
122113
122113
1
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33
add a comment |
1
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33
1
1
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example.
Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
- for $0le i<P$, $Aleft[iright] = +$
- for $Ple i<T$, $Aleft[iright] = -$
Then for $0le k < N$, let the $k$-th bucket/subset be:
$B_k = left{ Aleft[qN+kright], qinmathbb Z right}$.
Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
- All elements are used
- Bucket size is either $leftlceilfrac TNrightrceil$ or $leftlfloorfrac TNrightrfloor$
- A bucket contains either $leftlceilfrac PNrightrceil$ or $leftlfloorfrac PNrightrfloor$ $+$ elements
- A bucket contains either $leftlceilfrac MNrightrceil$ or $leftlfloorfrac MNrightrfloor$ $-$ elements
- Assuming that the floor and ceil values are distinct, there are always bigger buckets with more $+$ elements, and smaller buckets with fewer $+$ elements.
- In the binary case (only two types of elements), the remaining two types of buckets (bigger bucket with fewer $+$, smaller buckets with more $+$) cannot coexist.
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements.
Also if you let $ell$ be the number of types of elements, I think this method can be implemented in $O(ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example.
Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
- for $0le i<P$, $Aleft[iright] = +$
- for $Ple i<T$, $Aleft[iright] = -$
Then for $0le k < N$, let the $k$-th bucket/subset be:
$B_k = left{ Aleft[qN+kright], qinmathbb Z right}$.
Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
- All elements are used
- Bucket size is either $leftlceilfrac TNrightrceil$ or $leftlfloorfrac TNrightrfloor$
- A bucket contains either $leftlceilfrac PNrightrceil$ or $leftlfloorfrac PNrightrfloor$ $+$ elements
- A bucket contains either $leftlceilfrac MNrightrceil$ or $leftlfloorfrac MNrightrfloor$ $-$ elements
- Assuming that the floor and ceil values are distinct, there are always bigger buckets with more $+$ elements, and smaller buckets with fewer $+$ elements.
- In the binary case (only two types of elements), the remaining two types of buckets (bigger bucket with fewer $+$, smaller buckets with more $+$) cannot coexist.
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements.
Also if you let $ell$ be the number of types of elements, I think this method can be implemented in $O(ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.
add a comment |
up vote
1
down vote
accepted
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example.
Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
- for $0le i<P$, $Aleft[iright] = +$
- for $Ple i<T$, $Aleft[iright] = -$
Then for $0le k < N$, let the $k$-th bucket/subset be:
$B_k = left{ Aleft[qN+kright], qinmathbb Z right}$.
Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
- All elements are used
- Bucket size is either $leftlceilfrac TNrightrceil$ or $leftlfloorfrac TNrightrfloor$
- A bucket contains either $leftlceilfrac PNrightrceil$ or $leftlfloorfrac PNrightrfloor$ $+$ elements
- A bucket contains either $leftlceilfrac MNrightrceil$ or $leftlfloorfrac MNrightrfloor$ $-$ elements
- Assuming that the floor and ceil values are distinct, there are always bigger buckets with more $+$ elements, and smaller buckets with fewer $+$ elements.
- In the binary case (only two types of elements), the remaining two types of buckets (bigger bucket with fewer $+$, smaller buckets with more $+$) cannot coexist.
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements.
Also if you let $ell$ be the number of types of elements, I think this method can be implemented in $O(ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example.
Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
- for $0le i<P$, $Aleft[iright] = +$
- for $Ple i<T$, $Aleft[iright] = -$
Then for $0le k < N$, let the $k$-th bucket/subset be:
$B_k = left{ Aleft[qN+kright], qinmathbb Z right}$.
Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
- All elements are used
- Bucket size is either $leftlceilfrac TNrightrceil$ or $leftlfloorfrac TNrightrfloor$
- A bucket contains either $leftlceilfrac PNrightrceil$ or $leftlfloorfrac PNrightrfloor$ $+$ elements
- A bucket contains either $leftlceilfrac MNrightrceil$ or $leftlfloorfrac MNrightrfloor$ $-$ elements
- Assuming that the floor and ceil values are distinct, there are always bigger buckets with more $+$ elements, and smaller buckets with fewer $+$ elements.
- In the binary case (only two types of elements), the remaining two types of buckets (bigger bucket with fewer $+$, smaller buckets with more $+$) cannot coexist.
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements.
Also if you let $ell$ be the number of types of elements, I think this method can be implemented in $O(ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.
As was mentioned in the comments, there's only so much you can achieve with discrete quantities, so you have to compromise somewhere. In general there are different ways to achieve a compromise, so you'd want to check what are the desirable properties that must be enforced. I'm not too sure whether this applies in this case though.
First an example.
Say you want to play a card game, and have to deal 52 cards evenly between 5 people. Around here it is common to give one card to each player in a fixed order, until you run out of cards. That way you end up with a distribution of $11$, $11$, $10$, $10$, $10$ cards.
Now if we get back to your problem, say you have $P$ $+$ elements, $M$ $-$ elements, for a total of $T=P+M$. You want to split these into $N$ sets.
To do so, sort your elements in an array, say $A$ with:
- for $0le i<P$, $Aleft[iright] = +$
- for $Ple i<T$, $Aleft[iright] = -$
Then for $0le k < N$, let the $k$-th bucket/subset be:
$B_k = left{ Aleft[qN+kright], qinmathbb Z right}$.
Basically each bucket gets one element every $N$ elements.
With this method, you're guaranteed that:
- All elements are used
- Bucket size is either $leftlceilfrac TNrightrceil$ or $leftlfloorfrac TNrightrfloor$
- A bucket contains either $leftlceilfrac PNrightrceil$ or $leftlfloorfrac PNrightrfloor$ $+$ elements
- A bucket contains either $leftlceilfrac MNrightrceil$ or $leftlfloorfrac MNrightrfloor$ $-$ elements
- Assuming that the floor and ceil values are distinct, there are always bigger buckets with more $+$ elements, and smaller buckets with fewer $+$ elements.
- In the binary case (only two types of elements), the remaining two types of buckets (bigger bucket with fewer $+$, smaller buckets with more $+$) cannot coexist.
I was too lazy to look how far/close the proportions you obtain are from/to the target proportion... but assuming all elements must be distributed, the four possible ratio look like best approximations.
On a side note, this method simply generalizes to more than two types of elements.
Also if you let $ell$ be the number of types of elements, I think this method can be implemented in $O(ell+N)$ if you just need the quantities of each element in each set. Kinda too lazy to check that last statement.
answered Nov 21 at 23:11
N.Bach
1,526310
1,526310
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3006450%2fsplit-a-set-keeping-the-proportion-of-its-elements-on-each-subset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Note that if you have four "-" symbols to distribute among five (multi)sets, there will be (at least) one multiset that has none. It seems to me the best you can do is to distribute the symbols with lower frequency as equally as possible among your "buckets", then fill the remaining slots with the more numerous symbols.
– hardmath
Nov 21 at 2:33