In Perl one can assign a value to any element of an array, even to ones corresponding to indices greater or equal than the length of the array:
my @arr; $arr[999] = 42;
perl grows the array as needed to accommodate this assignment. In fact one common optimization in Perl is to "pre-grow" the array to its final size, rather than having perl grow it piecemeal as required by assignments like the one above:
my @arr; $#arr = 999_999;
After assigning to $#arr (the last index of @arr) as shown above, @arr has length 1,000,000, and all its elements are initialized to undef.
In Python the most literal translation of the first code snippet above triggers an IndexError exception:
>>> arr = list() >>> arr[999] = 42
Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list assignment index out of range
In fact, one would need to pre-grow the list sufficiently to be able to make an assignment like this one. I.e. one needs the equivalent of the second Perl snippet above.
The best I can come up with is this:
arr = [None] * 1000000
Is this the most efficient way to achieve this result?
kj <no.em...@please.post> writes: > >>> arr[999] = 42 > ... > The best I can come up with is this: > arr = [None] * 1000000 > Is this the most efficient way to achieve this result?
If you're talking about an array of ints, use the array module. You might also look at numpy.
On Nov 6, 12:12 pm, kj <no.em...@please.post> wrote:
[snip]
> In fact, one would need to pre-grow the list sufficiently to be > able to make an assignment like this one. I.e. one needs the > equivalent of the second Perl snippet above.
> The best I can come up with is this:
> arr = [None] * 1000000
> Is this the most efficient way to achieve this result?
That's a good as it gets I think.
If sparsely populated I might be tempted to use a dict (or maybe defaultdict):
d = {999: 42, 10673: 123} for idx in xrange(1000000): # Treat it as though it's a list of 1,000,000 items... print 'index %d has a value of %d' % (idx, d.get(idx, None))
On Fri, Nov 6, 2009 at 1:12 PM, kj <no.em...@please.post> wrote:
> In Perl one can assign a value to any element of an array, even to > ones corresponding to indices greater or equal than the length of > the array:
> my @arr; > $arr[999] = 42;
> perl grows the array as needed to accommodate this assignment. In > fact one common optimization in Perl is to "pre-grow" the array to > its final size, rather than having perl grow it piecemeal as required > by assignments like the one above:
> my @arr; > $#arr = 999_999;
> After assigning to $#arr (the last index of @arr) as shown above, > @arr has length 1,000,000, and all its elements are initialized to > undef.
> In Python the most literal translation of the first code snippet > above triggers an IndexError exception:
>>>> arr = list() >>>> arr[999] = 42 > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > IndexError: list assignment index out of range
> In fact, one would need to pre-grow the list sufficiently to be > able to make an assignment like this one. I.e. one needs the > equivalent of the second Perl snippet above.
> The best I can come up with is this:
> arr = [None] * 1000000
> Is this the most efficient way to achieve this result?
It depends - what do you want to do with it? My first hunch would be to use a dictionary instead of a list, then the whole problem disappears. If there is a reason you don't want to do that, what is it?
> In fact, one would need to pre-grow the list sufficiently to be > able to make an assignment like this one. I.e. one needs the > equivalent of the second Perl snippet above.
> The best I can come up with is this:
> arr = [None] * 1000000
> Is this the most efficient way to achieve this result?
>On Fri, Nov 6, 2009 at 1:12 PM, kj <no.em...@please.post> wrote: [snip] >> arr = [None] * 1000000
>> Is this the most efficient way to achieve this result?
> It depends - what do you want to do with it? My first hunch would be > to use a dictionary instead of a list, then the whole problem > disappears. If there is a reason you don't want to do that, what is > it?
I second this. It might seem a sensible thing to do in perl, but I can't imagine what you would actually want to do it for! Seems like an odd thing to want to do!
>>> Is this the most efficient way to achieve this result?
>> It depends - what do you want to do with it? My first hunch would be >> to use a dictionary instead of a list, then the whole problem >> disappears. If there is a reason you don't want to do that, what is >> it? >I second this. It might seem a sensible thing to do in perl, but I can't >imagine what you would actually want to do it for! Seems like an odd thing >to want to do!
As I said, this is considered an optimization, at least in Perl, because it lets the interpreter allocate all the required memory in one fell swoop, instead of having to reallocate it repeatedly as the array grows. (Of course, like with all optimizations, whether it's worth the bother is another question.)
Another situation where one may want to do this is if one needs to initialize a non-sparse array in a non-sequential order, e.g. if that's the way the data is initially received by the code. Of course, there are many ways to skin such a cat; pre-allocating the space and using direct list indexing is just one of them. I happen to think it is a particularly straighforward one, but I realize that others (you, Andre, etc.) may not agree.
> In Perl one can assign a value to any element of an array, even to > ones corresponding to indices greater or equal than the length of > the array:
> my @arr; > $arr[999] = 42;
> perl grows the array as needed to accommodate this assignment. In > fact one common optimization in Perl is to "pre-grow" the array to > its final size, rather than having perl grow it piecemeal as required > by assignments like the one above:
> my @arr; > $#arr = 999_999;
> After assigning to $#arr (the last index of @arr) as shown above, > @arr has length 1,000,000, and all its elements are initialized to > undef.
> In Python the most literal translation of the first code snippet > above triggers an IndexError exception:
> >>> arr = list() > >>> arr[999] = 42
> Traceback (most recent call last): > File "<stdin>", line 1, in <module> > IndexError: list assignment index out of range
> In fact, one would need to pre-grow the list sufficiently to be > able to make an assignment like this one. I.e. one needs the > equivalent of the second Perl snippet above.
> The best I can come up with is this:
> arr = [None] * 1000000
> Is this the most efficient way to achieve this result?
> TIA!
> kynn
I don't have the code with me, but for huge arrays, I have used something like:
>>> arr[0] = initializer >>> for i in range N: >>> arr.extend(arr)
This doubles the array every time through the loop, and you can add the powers of 2 to get the desired result. Gil
> In Perl one can assign a value to any element of an array, even to > ones corresponding to indices greater or equal than the length of > the array:
> my @arr; > $arr[999] = 42;
> perl grows the array as needed to accommodate this assignment. In > fact one common optimization in Perl is to "pre-grow" the array to > its final size, rather than having perl grow it piecemeal as required > by assignments like the one above:
> my @arr; > $#arr = 999_999;
> After assigning to $#arr (the last index of @arr) as shown above, > @arr has length 1,000,000, and all its elements are initialized to > undef.
> In Python the most literal translation of the first code snippet > above triggers an IndexError exception:
> >>> arr = list() > >>> arr[999] = 42
> Traceback (most recent call last): > File "<stdin>", line 1, in <module> > IndexError: list assignment index out of range
> In fact, one would need to pre-grow the list sufficiently to be > able to make an assignment like this one. I.e. one needs the > equivalent of the second Perl snippet above.
> The best I can come up with is this:
> arr = [None] * 1000000
> Is this the most efficient way to achieve this result?
> TIA!
> kynn
You mean sum'in like dis?
class PerlishList(list): '''Hand holding list object for even the most demanding Perl hacker''' def __init__(self, dim=0): list.__init__(self) if dim: self.__setitem__(dim, None)
def __setitem__(self, idx, v): lenkeys = len(self) sup = super(PerlishList, self) if idx > lenkeys: for idx in range(lenkeys, idx): sup.append(None) sup.__setitem__(idx, v)
def __getitem__(self, idx): return self[idx]
l = PerlishList(3) l.append('a') l.append('b') print l l[10] = 10 print l
> As I said, this is considered an optimization, at least in Perl, > because it lets the interpreter allocate all the required memory > in one fell swoop, instead of having to reallocate it repeatedly > as the array grows.
IIRC, CPython has it's own way to optimize list growth.
> (Of course, like with all optimizations, > whether it's worth the bother is another question.)
My very humble opinion is that unless you spot a bottleneck (that is, you have real performance issues AND the profiler identified list growth as the culprit), the answer is a clear and obvious NO.
> Another situation where one may want to do this is if one needs to > initialize a non-sparse array in a non-sequential order,
> On Sat, Nov 7, 2009 at 8:25 PM, Luis Alberto Zarrabeitia Gomez > <ky...@uh.cu> wrote:
> > Ok, he has a dict.
> > Now what? He needs a non-sparse array.
> Let d be your dict.
> Call the zeroeth place in your array d[0], the first d[1], the 10000th > d[100000].
Following that reasoning, we could get rid of lists and arrays altogether.
Here's why that wouldn't work:
for x,y in zip(d,other): ... do something ...
Yes, we could also ignore zip and just use range/xrange to iterate for the indices...
Lists and dictionaries have different semantics. One thing is to argue that you shouldn't be thinking on pre-growing a list for performance reasons before being sure that it is a bottleneck, and a very different one is to argue that because one operation (__setitem__) is the same with both structures, we should not use lists for what may need, depending on the problem, list semantics.
¿Have you ever tried to read list/matrix that you know it is not sparse, but you don't know the size, and it may not be in order? A "grow-able" array would just be the right thing to use - currently I have to settle with either hacking together my own grow-able array, or preloading the data into a dict, growing a list with the [0]*size trick, and updating that list. Not hard, not worthy of a PEP, but certainly not so easy to dismiss.
Steven D'Aprano wrote: > On Fri, 06 Nov 2009 18:46:33 -0800, gil_johnson wrote:
>> I don't have the code with me, but for huge arrays, I have used >> something like:
>>>>> arr[0] = initializer >>>>> for i in range N: >>>>> arr.extend(arr) >> This doubles the array every time through the loop, and you can add the >> powers of 2 to get the desired result. Gil
> Why is it better to grow the list piecemeal instead of just allocating a > list the size you want in one go?
>Quoting Andre Engels <andreeng...@gmail.com>: >> On Sat, Nov 7, 2009 at 8:25 PM, Luis Alberto Zarrabeitia Gomez >> <ky...@uh.cu> wrote:
>> > Ok, he has a dict.
>> > Now what? He needs a non-sparse array.
>> Let d be your dict.
>> Call the zeroeth place in your array d[0], the first d[1], the 10000th >> d[100000]. >Following that reasoning, we could get rid of lists and arrays altogether. >Here's why that wouldn't work: >for x,y in zip(d,other): > ... do something ... >Yes, we could also ignore zip and just use range/xrange to iterate for the >indices... >Lists and dictionaries have different semantics. One thing is to argue that you >shouldn't be thinking on pre-growing a list for performance reasons before being >sure that it is a bottleneck, and a very different one is to argue that because >one operation (__setitem__) is the same with both structures, we should not use >lists for what may need, depending on the problem, list semantics. >¿Have you ever tried to read list/matrix that you know it is not sparse, but you >don't know the size, and it may not be in order? A "grow-able" array would just >be the right thing to use - currently I have to settle with either hacking >together my own grow-able array, or preloading the data into a dict, growing a >list with the [0]*size trick, and updating that list. Not hard, not worthy of a >PEP, but certainly not so easy to dismiss.
> Is this the most efficient way to achieve this result?
Yes, but why would you want to? Appending to a Python list has amortized O(1) complexity. I am not sure about Perl, but in MATLAB arrays are preallocated because resize has complexity O(n), instead of amortized O(1). You don't need to worry about that in Python. Python lists are resized with empty slots at the end, in proportion to the size of the list. On average, this has the same complexity as pre- allocation.
> As I said, this is considered an optimization, at least in Perl, > because it lets the interpreter allocate all the required memory > in one fell swoop, instead of having to reallocate it repeatedly > as the array grows.
Python does not need to reallocate repeatedly as a list grows. That is why it's called a 'list' and not an array.
There will be empty slots at the end of a list you can append to, without reallocating. When the list is resized, it is reallocated with even more of these. Thus the need to reallocate becomes more and more rare as the list grows, and on average the complexity of appending is just O(1).
On Nov 7, 5:05 pm, sturlamolden <sturlamol...@yahoo.no> wrote:
> On 7 Nov, 03:46, gil_johnson <gil_john...@earthlink.net> wrote:>
> > I don't have the code with me, but for huge arrays, I have used > > something like:
> > >>> arr[0] = initializer > > >>> for i in range N: > > >>> arr.extend(arr)
> > This doubles the array every time through the loop, and you can add > > the powers of 2 to get the desired result. > > Gil
> You should really use append instead of extend. The above code is O > (N**2), with append it becomes O(N) on average.
I think the top one is O(N log N), and I'm suspicious that it's even possible to grow a list in less than O(N log N) time without knowing the final size in advance. Citation? Futhermore why would it matter to use extend instead of append; I'd assume extend uses the same growth algorithm. (Although in this case since the extend doubles the size of the list it most likely reallocates after each call.)
[None]*N is linear time and is better than growing the list using append or extend.
>> > I don't have the code with me, but for huge arrays, I have used >> > something like:
>> > >>> arr[0] = initializer >> > >>> for i in range N: >> > >>> arr.extend(arr)
>> > This doubles the array every time through the loop, and you can add >> > the powers of 2 to get the desired result. >> > Gil
>>You should really use append instead of extend. The above code is O >>(N**2), with append it becomes O(N) on average.
>I think the top one is O(N log N), and I'm suspicious that it's even >possible to grow a list in less than O(N log N) time without knowing >the final size in advance. Citation? Futhermore why would it matter >to use extend instead of append; I'd assume extend uses the same >growth algorithm. (Although in this case since the extend doubles the >size of the list it most likely reallocates after each call.)
>[None]*N is linear time and is better than growing the list using >append or extend.
On Nov 6, 8:46 pm, gil_johnson <gil_john...@earthlink.net> wrote:
> I don't have the code with me, but for huge arrays, I have used > something like:
> >>> arr[0] = initializer > >>> for i in range N: > >>> arr.extend(arr)
> This doubles the array every time through the loop, and you can add > the powers of 2 to get the desired result. > Gil
To all who asked about my previous post,
I'm sorry I posted such a cryptic note before, I should have waited until I had the code available.
I am still new to Python, and I couldn't find a way to create an array of size N, with each member initialized to a given value. If there is one, please let me know.
I think Paul Rubin and Paul Rudin are right, if they really are 2 different people, an array is a better solution if you're dealing with integers. They both mention numpy, which I know nothing about, or the array module.
kj, what *are* you going to do with this list/array? As others have pointed out, there are differences between lists, arrays, and dictionaries.
The problem I was solving was this: I wanted an array of 32-bit integers to be used as a bit array, and I wanted it initialized with all bits set, that is, each member of the array had to be set to 4294967295. Of course, you could set your initializer to 0, or any other 32-bit number.
Originally I found that the doubling method I wrote about before was a LOT faster than appending the elements one at a time, and tonight I tried the "list = initializer * N" method. Running the code below, the doubling method is still fastest, at least on my system.
Of course, as long as you avoid the 'one at a time' method, we're talking about fractions of a second, even for arrays that I think are huge, like the 536,870,912 byte beastie below.
for i in range(27): # 2**27 integers, 2**29 bytes newArray.extend(newArray)
print(time.time() - t0)
print(newArray[134217727]) # confirm array is fully initialized
#* * * * * * * * * * * * * * * * * * * * * * # One at a time, run time = 28.5479729176
t0 = time.time()
newArray2 = array.array('I')
for i in range(134217728): # the same size as above newArray2.append(4294967295)
print(time.time() - t0)
print(newArray2[134217727]) # confirm array
#* * * * * * * * * * * * * * * * * * * * * * # List with "*", run time = 1.06160402298
t0 = time.time()
newList = [4294967295] * 134217728
print(time.time() - t0)
print(newList[134217727]) # confirm list
[/code]
If, instead of 134,217,728 integers, I want something different, like 100,000,000, the method I use is:
[code]
#* * * * * * * * * * * * * * * * * * * * * * # Not a power of 2, run time = 0.752086162567
t0 = time.time()
newArray = array.array('I')
tempArray = array.array('I')
tempArray.append(4294967295)
size = 100000000
while size: # chew through 'size' until it's gone if (size & 1): # if this bit of 'size' is 1 newArray.extend(tempArray) # add a copy of the temp array size >>= 1 # chew off one bit tempArray.extend(tempArray) # double the size of the temp array
print(time.time() - t0)
print(newArray[99999999])
#* * * * * * * * * * * * * * * * * * * * * * # # Not a power of 2, list with "*", run time = 1.19271993637
t0 = time.time()
newList = [4294967295] * 100000000
print(time.time() - t0)
print(newList[99999999])
[/code]
I think it is interesting that the shorter list takes longer than the one that is a power of 2 in length. I think this may say that the "list = initializer * N" method uses something similar to the doubling method.
Also, tempArray (above) gets reallocated frequently, and demonstrates that reallocation is not a big problem.
Finally, I just looked into calling C functions, and found PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management section of the Python/C API Reference Manual. This gives you uninitialized memory, and should be really fast, but it's 6:45 AM here, and I don't have the energy to try it.
In <mailman.44.1257626420.2873.python-l...@python.org> Luis Alberto Zarrabeitia Gomez <ky...@uh.cu> writes:
>¿Have you ever tried to read list/matrix that you know it is not sparse, but you
jdon't know the size, and it may not be in order? A "grow-able" array would just
>be the right thing to use - currently I have to settle with either hacking >together my own grow-able array, or... >...Not hard, not worthy of a >PEP, but certainly not so easy to dismiss.
I'm pretty new to Python, and I thought I'd give this a try:
class array(list): """A list that grows as needed upon assignment.
Any required padding is done with the value of the fill option.
Attempts to retrieve an index greater than the maximum assigned index still produces an IndexError. """
def __init__(self, seq='', fill=None): super(array, self).__init__(seq) self.fill = fill
@property def max(self): return len(self) - 1
def __setitem__(self, index, item): m = self.max while m < index: self.append(self.fill) m += 1 super(array, self).__setitem__(index, item)
if __name__ == '__main__': x = array(('zero',)) x[3] = 'it works!' print x
Looks like it works, but it seems almost too easy. Did I miss anything? Comments welcome. (E.g. I'm not sure that '' is the best choice of default value for the seq parameter to __init__.)