In the accounting department I am working for we are from time to time confronted to the following problem:
A customer sends us a check for a given amount, but without specifying what invoices it cancels. It is up to us to find out which ones the payment corresponds to.
For example, say that the customer has the following outstanding invoices: $300, $200, $50; and say that the check is for $250. This time it is clear, the customer is paying bills $200 and $50.
However, let's now say that the outstanding invoices are $300, $200, $100 and that the check is for $300. In this case there are already two possibilities. The customer is paying the $300 invoice or the $200 and $100. In other words, there is more than one solution to the problem.
My first question is: 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a check Ch=600 how can I print all the different combinations of invoices that the check is possibly cancelling
1.a. first approach using "brute force", that is, exploring all different combinations: every single invoice, all of 2-element tuples, 3-element tuples, etc...
1.b can all the solutions be found without exploring all possible combinations? some problems can be solved by discarding some invoices, for example those whose amounts are greater than the amount of the check. Any ideas?
My second question is: 2. this time there are also credit notes outstanding, that is, invoices with negative amounts. For example, I=[500, 400, -100, 450, 200, 600, -200, 700] and a check Ch=600
2.a is the "brute force" method used in 1.a still applicable now that "I" contains negative values?
2.b same as 1.b. However, this time I can imagen that the number of invoices that can be discarded is a lot more reduced.
I am a fan of Python, which I find very elegant, powerful and easy to develop with. I would like to find answers to the problems described above, partially because I am still learning python, and I would like to make use of it.
On Sat, Nov 7, 2009 at 4:39 PM, vsoler <vicente.so...@gmail.com> wrote: > In the accounting department I am working for we are from time to time > confronted to the following problem: [snip] > For example, say that the customer has the following outstanding > invoices: $300, $200, $50; and say that the check is for $250. This > time it is clear, the customer is paying bills $200 and $50.
Unless your customers are giant defense contractors you should be able to brute force a solution. If they are so big that it doesn't take micro seconds to brute force a solution then you probably have more problems than just matching checks to invoices...
On Sat, 7 Nov 2009, vsoler wrote: > In the accounting department I am working for we are from time to > time confronted to the following problem:
> A customer sends us a check for a given amount, but without > specifying what invoices it cancels. It is up to us to find out > which ones the payment corresponds to.
> For example, say that the customer has the following outstanding > invoices: $300, $200, $50; and say that the check is for $250. This > time it is clear, the customer is paying bills $200 and $50.
> However, let's now say that the outstanding invoices are $300, $200, > $100 and that the check is for $300. In this case there are already > two possibilities. The customer is paying the $300 invoice or the > $200 and $100. In other words, there is more than one solution to > the problem.
> My first question is: 1. given a list of invoives I=[500, 400, 450, > 200, 600, 700] and a check Ch=600 how can I print all the different > combinations of invoices that the check is possibly cancelling
On Sat, 7 Nov 2009, vsoler wrote: > In the accounting department I am working for we are from time to > time confronted to the following problem:
> A customer sends us a check for a given amount, but without > specifying what invoices it cancels. It is up to us to find out > which ones the payment corresponds to.
> For example, say that the customer has the following outstanding > invoices: $300, $200, $50; and say that the check is for $250. This > time it is clear, the customer is paying bills $200 and $50.
> However, let's now say that the outstanding invoices are $300, $200, > $100 and that the check is for $300. In this case there are already > two possibilities. The customer is paying the $300 invoice or the > $200 and $100. In other words, there is more than one solution to > the problem.
> My first question is: > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a > check Ch=600 > how can I print all the different combinations of invoices that the > check is possibly cancelling
by the way, there's a bit more to it than just seeing if you can match the cheque amount exactly. some python solutions are here:
and a general solution allows you to place different "values" on which items you pack into your knapsack.
say a customer has outstanding invoices for 200, 400 and 600, and you get a cheque for 600. what do you apply that against? the single invoice for 600, or the two for 200 and 400? that depends.
if all invoices have the same "value", it won't matter. but if the invoice for 600 just went out, while the two others are just about to become, say, overdue so that a penalty is about to be applied, your customer would probably *really* appreciate it if you applied that cheque to the older invoices.
in general, then, you can not only see what matches exactly but, for the sake of your customer, you can give higher value to paying off older invoices. that's how the general knapsack problem works.
rday --
======================================================================== Robert P. J. Day Waterloo, Ontario, CANADA
As mentioned in the other answers, your problem is best solved by a long overdue organisational decision instead of a technical one.
Most popular solutions are: - All invoices are essential a single credit amount, money coming in is taken of from the big pile. - Invoices are split by date, creating multiple credit amounts, any money coming in is used the pay of the oldest one.
The latter one allows more easily for different interest and penalty rates.
-- MPH http://blog.dcuktec.com 'If consumed, best digested with added seasoning to own preference.'
> My first question is: > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a > check Ch=600 > how can I print all the different combinations of invoices that the > check is possibly cancelling
Incidentally, I'm currently learning python myself, and was working on more or less the same problem as an exercise;
For listing all different subsets of a list (This is what I came up with. Can it be implemented shorter, btw?):
def subsets(L): S = [] if (len(L) == 1): return [L, []] else: for s in subsets(L[1:]): S.append(s) S.append(s + [ L[0]]) return S
Now, to use the above piece of code (after 'import subset'):
> > My first question is: > > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a > > check Ch=600 > > how can I print all the different combinations of invoices that the > > check is possibly cancelling
> Incidentally, I'm currently learning python myself, and was working > on more or less the same problem as an exercise;
> For listing all different subsets of a list (This is what I came up > with. Can it be implemented shorter, btw?):
> def subsets(L): > S = [] > if (len(L) == 1): > return [L, []] > else: > for s in subsets(L[1:]): > S.append(s) > S.append(s + [ L[0]]) > return S
> Now, to use the above piece of code (after 'import subset'):
> It's not a real solution yet, and as others have pointed out the > problem is NP complete but it might help to get you going.
does your solution allow for the possibility of different invoices of equal amounts? i would be reluctant to use the word "subset" in a context where you can have more than one element with the same value.
rday --
======================================================================== Robert P. J. Day Waterloo, Ontario, CANADA
> does your solution allow for the possibility of different invoices > of equal amounts? i would be reluctant to use the word "subset" in a > context where you can have more than one element with the same value.
I think it handles different invoices of equal amounts correctly. I agree with you though that the term subset may not be the best name in this context because of those duplicates.
> > My first question is: > > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a > > check Ch=600 > > how can I print all the different combinations of invoices that the > > check is possibly cancelling
> Incidentally, I'm currently learning python myself, and was working on > more or less the same problem as an exercise;
> For listing all different subsets of a list (This is what I came up > with. Can it be implemented shorter, btw?):
> def subsets(L): > S = [] > if (len(L) == 1): > return [L, []] > else: > for s in subsets(L[1:]): > S.append(s) > S.append(s + [ L[0]]) > return S
You can avoid the S list my making it a generator:
def subsets(L): if L: for s in subsets(L[1:]): yield s yield s + [L[0]] else: yield []
On Nov 8, 1:27 pm, Ozz <notva...@wathever.com> wrote:
> Robert P. J. Day schreef:
> > does your solution allow for the possibility of different invoices > > of equal amounts? i would be reluctant to use the word "subset" in a > > context where you can have more than one element with the same value.
> I think it handles different invoices of equal amounts correctly. > I agree with you though that the term subset may not be the best name in > this context because of those duplicates.
> cheers, > Ozz
Ozz,
Instead of subsets, do you mean permutations/combinations? Since 2 invoices can have the same amount perhaps the terms permutation is better.
>> My first question is: >> 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a >> check Ch=600 >> how can I print all the different combinations of invoices that the >> check is possibly cancelling
> Incidentally, I'm currently learning python myself, and was working on > more or less the same problem as an exercise;
> For listing all different subsets of a list (This is what I came up > with. Can it be implemented shorter, btw?):
> def subsets(L): > S = [] > if (len(L) == 1): > return [L, []] > else: > for s in subsets(L[1:]): > S.append(s) > S.append(s + [ L[0]]) > return S
> Now, to use the above piece of code (after 'import subset'):
> It's not a real solution yet, and as others have pointed out the problem > is NP complete but it might help to get you going.
Here's my own take on it:
def list_possible_invoices(invoices, check): if check: # The check hasn't yet been fully consumed. for index, inv in enumerate(invoices): # If this invoice doesn't exceed the check then it consume some of the check. if inv <= check: # Try to consume the remainder of the check. for inv_list in list_possible_invoices(invoices[index + 1 : ], check - inv): yield [inv] + inv_list else: # The check has been fully consumed. yield []
# List all the possible subsets of invoices. # Sorting the invoices first in descending order lets us reduce the number of possibilities to try. for inv_list in list_possible_invoices(sorted(invoices, reverse=True), check): print inv_list
On Sun, Nov 8, 2009 at 11:52 AM, Dan Bishop <danb...@yahoo.com> wrote: > On Nov 8, 4:43 am, Ozz <notva...@wathever.com> wrote: >> Hi,
>> > My first question is: >> > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a >> > check Ch=600 >> > how can I print all the different combinations of invoices that the >> > check is possibly cancelling
>> Incidentally, I'm currently learning python myself, and was working on >> more or less the same problem as an exercise;
>> For listing all different subsets of a list (This is what I came up >> with. Can it be implemented shorter, btw?):
>> def subsets(L): >> S = [] >> if (len(L) == 1): >> return [L, []] >> else: >> for s in subsets(L[1:]): >> S.append(s) >> S.append(s + [ L[0]]) >> return S
> You can avoid the S list my making it a generator:
> def subsets(L): > if L: > for s in subsets(L[1:]): > yield s > yield s + [L[0]] > else: > yield []
What you're describing is the powerset operation. Here's the example from the python docs:
def powerset(iterable): "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
What I find interesting is that running it through timeit, it is much slower than the code suggested by Dan Bishop.
setup = """ from itertools import chain, combinations
x = list(range(100))
def subsets1(L): S = [] if (len(L) == 1): return [L, []] else: for s in subsets1(L[1:]): S.append(s) S.append(s + [ L[0]]) return S
def subsets2(L): if L: for s in subsets(L[1:]): yield s yield s + [L[0]] else: yield []
def subsets3(iterable): s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1)) """
#timeit.timeit("subsets1(x)", setup) doesn't appear to terminate timeit.timeit("subsets2(x)", setup) timeit.timeit("subsets3(x)", setup)
On Sun, Nov 8, 2009 at 12:31 PM, geremy condra <debat...@gmail.com> wrote: > On Sun, Nov 8, 2009 at 11:52 AM, Dan Bishop <danb...@yahoo.com> wrote: >> On Nov 8, 4:43 am, Ozz <notva...@wathever.com> wrote: >>> Hi,
>>> > My first question is: >>> > 1. given a list of invoives I=[500, 400, 450, 200, 600, 700] and a >>> > check Ch=600 >>> > how can I print all the different combinations of invoices that the >>> > check is possibly cancelling
>>> Incidentally, I'm currently learning python myself, and was working on >>> more or less the same problem as an exercise;
>>> For listing all different subsets of a list (This is what I came up >>> with. Can it be implemented shorter, btw?):
>>> def subsets(L): >>> S = [] >>> if (len(L) == 1): >>> return [L, []] >>> else: >>> for s in subsets(L[1:]): >>> S.append(s) >>> S.append(s + [ L[0]]) >>> return S
>> You can avoid the S list my making it a generator:
>> def subsets(L): >> if L: >> for s in subsets(L[1:]): >> yield s >> yield s + [L[0]] >> else: >> yield []
> What you're describing is the powerset operation. Here's the example > from the python docs:
> def powerset(iterable): > "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" > s = list(iterable) > return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
> What I find interesting is that running it through timeit, it is much > slower than the code suggested by Dan Bishop.
> setup = """ > from itertools import chain, combinations
> x = list(range(100))
> def subsets1(L): > S = [] > if (len(L) == 1): > return [L, []] > else: > for s in subsets1(L[1:]): > S.append(s) > S.append(s + [ L[0]]) > return S
> def subsets2(L): > if L: > for s in subsets(L[1:]): > yield s > yield s + [L[0]] > else: > yield []
> def subsets3(iterable): > s = list(iterable) > return chain.from_iterable(combinations(s, r) for r in range(len(s)+1)) > """
> Instead of subsets, do you mean permutations/combinations? Since 2 > invoices can have the same amount perhaps the terms permutation is > better.
As some other poster already suggested 'powerset' ( http://en.wikipedia.org/wiki/Power_set ) may be a better name, except for those duplicates, of course. On the other hand, I think viewing it as a powerset is the most 'natural' in this case. (imo permutations are about the order of objects, not about whether the objects are included in a set or not)
> > Instead of subsets, do you mean permutations/combinations? Since 2 > > invoices can have the same amount perhaps the terms permutation is > > better.
> As some other poster already suggested 'powerset' (http://en.wikipedia.org/wiki/Power_set) may be a better name, except > for those duplicates, of course. On the other hand, I think viewing it > as a powerset is the most 'natural' in this case. (imo permutations are > about the order of objects, not about whether the objects are included > in a set or not)
On Sun, 08 Nov 2009 12:31:26 -0500, geremy condra wrote: > What you're describing is the powerset operation. Here's the example > from the python docs: [...] > What I find interesting is that running it through timeit, it is much > slower than the code suggested by Dan Bishop.
Your test doesn't show what you think it shows. You shouldn't just blindly apply timeit without testing to see that the functions return what you think they return. Your test is, I'm afraid, totally bogus and the conclusion you draw is completely wrong.
The reason subsets1() "doesn't appear to terminate" is that you are trying to list all the subsets of x = range(100). That's a LOT of subsets: 2**100 to be precise, or approximately 1.2e30.
subsets2() and subsets3() return a generator function and an iterator object respectively. There may be some overhead difference between those, but that's trivial compared to the cost of generating the subsets themselves.
A better way of doing the timing is as follows:
from itertools import chain, combinations from timeit import Timer
# use a number small enough to calculate in a reasonable time x = list(range(10))
def subsets1(L): S = [] if (len(L) == 1): return [L, []] else: for s in subsets1(L[1:]): S.append(s) S.append(s + [ L[0]]) return S
def subsets2(L): if L: for s in subsets2(L[1:]): yield s yield s + [L[0]] else: yield []
def subsets3(iterable): s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
# Check the three functions return the same results: x1 = sorted(subsets1(x)) x2 = sorted(subsets2(x)) x3 = sorted(list(t) for t in subsets1(x)) assert x1 == x2 == x3
# Set up some timers. t1 = Timer("subsets1(x)", setup) t2 = Timer("list(subsets2(x))", setup) t3 = Timer("list(subsets3(x))", setup)
# And run them! for t in (t1, t2, t3): print min(t.repeat(number=1000, repeat=5))
The results I get are:
1.19647693634 0.901714801788 0.175387859344
Which as you can see, shows that the recipe in the docs is nearly ten times faster than Dan's version. That's not surprising -- the docs version uses highly optimized C code from itertools, while Dan's version uses slow-ish Python code and recursion.
To show this is no fluke, I increased the size of x and ran the tests again:
>>> x = list(range(15)) # 32000+ subsets >>> x1 = sorted(subsets1(x)) >>> x2 = sorted(subsets2(x)) >>> x3 = sorted(list(t) for t in subsets1(x)) >>> assert x1 == x2 == x3
>>> t1 = Timer("subsets1(x)", setup) >>> t2 = Timer("list(subsets2(x))", setup) >>> t3 = Timer("list(subsets3(x))", setup) >>> for t in (t1, t2, t3):
<st...@remove-this-cybersource.com.au> wrote: > On Sun, 08 Nov 2009 12:31:26 -0500, geremy condra wrote:
>> What you're describing is the powerset operation. Here's the example >> from the python docs: > [...] >> What I find interesting is that running it through timeit, it is much >> slower than the code suggested by Dan Bishop.
> Your test doesn't show what you think it shows. You shouldn't just > blindly apply timeit without testing to see that the functions return > what you think they return. Your test is, I'm afraid, totally bogus and > the conclusion you draw is completely wrong.
<snip>
Doh! I even noticed that the asymptotic times didn't match up and still blew right past the obvious answer. Thanks (again) for the correction, Steven.
> Gerry >>> Instead of subsets, do you mean permutations/combinations? Since 2 >>> invoices can have the same amount perhaps the terms permutation is >>> better. >> As some other poster already suggested 'powerset' (http://en.wikipedia.org/wiki/Power_set) may be a better name, except >> for those duplicates, of course. On the other hand, I think viewing it >> as a powerset is the most 'natural' in this case. (imo permutations are >> about the order of objects, not about whether the objects are included >> in a set or not)
> In the accounting department I am working for we are from time to time > confronted to the following problem:
> A customer sends us a check for a given amount, but without specifying > what invoices it cancels. It is up to us to find out which ones the > payment corresponds to.
> For example, say that the customer has the following outstanding > invoices: $300, $200, $50; and say that the check is for $250. This > time it is clear, the customer is paying bills $200 and $50.
> However, let's now say that the outstanding invoices are $300, $200, > $100 and that the check is for $300. In this case there are already > two possibilities. The customer is paying the $300 invoice or the $200 > and $100. In other words, there is more than one solution to the > problem.
The problems that you mention are only a SUBSET of the total problem. Example: oustanding invoices are for 300, 200, and 100 and the cheque is for 450 -- in general the total of the cheque amounts does not equal the total of any possible selection of outstanding invoice amounts.
I would be very surprised if a real accounting department did not already have a set of business rules for dealing with a problem that has existed since invoices and cheques were invented.
I would be extremely surprised if a real accounting department could be persuaded to imagine a subset of their unpaid/underpaid/overpaid invoice problem as being an instance of the (extended) knapsack problem :-)
On Nov 8, 8:39 am, vsoler <vicente.so...@gmail.com> wrote:
> In the accounting department I am working for we are from time to time > confronted to the following problem:
[snip]
> My second question is: > 2. this time there are also credit notes outstanding, that is, > invoices with negative amounts. For example, I=[500, 400, -100, 450, > 200, 600, -200, 700] and a check Ch=600
How can a credit note be "outstanding"? The accounting department issues a credit note without recording what invoice it relates to?
> In the accounting department I am working for we are from time to time > confronted to the following problem:
> A customer sends us a check for a given amount, but without specifying > what invoices it cancels. It is up to us to find out which ones the > payment corresponds to.
> For example, say that the customer has the following outstanding > invoices: $300, $200, $50; and say that the check is for $250. This > time it is clear, the customer is paying bills $200 and $50.
I worked on a similar problem involving frequent bank deposits (the bank recorded only the amount of the deposit with no other tracking information to let us know which store made the deposit) and reconciling those deposits to general ledger entries.
As pointed-out by others, the purest statement of the problem is computationally unfeasible; however, the real-world application can provide useful heuristics to that limit the search space.
1) Dates can be used to provide some alignment. Customers tend to pay the oldest invoices first and they don't pay before the first invoice is sent. Also, there can be a typical time lag that helps identify where to start a search (i.e. the customer typically takes 45 days to pay an invoice).
2) Search smallest groupings first (eliminate exact matches) then groupings of two items and groupings of three items.
3) Sometime the values on the list possess properties that make them stand-out and easier to match-up. Given invoices of [10, 11, 12, 1000, 13, 14], the 1000 should be easy to match-up in any grouping of payments. Likewise, when looking for groupings, start with the most unique values. Given [2, 2, 2, 3, 3, 5, 5, 5, 6.1, 7], start by trying to match the 6.1 since there is only one occurrence.
On Tue, 10 Nov 2009 14:46:49 -0800, John Machin wrote: > The problems that you mention are only a SUBSET of the total problem. > Example: oustanding invoices are for 300, 200, and 100 and the cheque is > for 450 -- in general the total of the cheque amounts does not equal the > total of any possible selection of outstanding invoice amounts.
> I would be very surprised if a real accounting department did not > already have a set of business rules for dealing with a problem that has > existed since invoices and cheques were invented.
As a sometimes accounts department, let me put my hand up for that.
Yes. Generally the rule is, "call the damn customer and ask them what they're smoking", only more politely.
Usually they'll have some convoluted breakdown of what amount they are paying off each invoice. Sometimes they will have taken off a settlement discount for prompt payment (often whether or not they actually paid promptly). Sometimes they overpay, or underpay, or apply credits to the wrong invoice, or pay invoices twice, or pay the wrong amount, or just make up a number from thin air. Sometimes they themselves will have no idea what the amount represents. And, I can guarantee, they will *ALWAYS* use a different rounding scheme to whatever accounting software you use, so there's always odd one or two cents that need to be manually adjusted somewhere.
> I would be extremely surprised if a real accounting department could be > persuaded to imagine a subset of their unpaid/underpaid/overpaid invoice > problem as being an instance of the (extended) knapsack problem :-)
That's because the average accounting department is mathematically illiterate :)
Nevertheless, many accounting software packages, like Quickbooks, will take a wild stab at allocating payments for you, usually using some variation of "if you can't find an exact match for a single invoice, just blindly allocate it to the oldest invoices you can". Frankly, I'd much prefer a knapsack solution.