Chapter 8: Object References, Mutability, and Recycling
A name is not the object; a name is a separate thing
- Variables Are Not Boxes
- Identity, Equality and Aliases
- Choosing Between == and is
- The Relative Immutability of Tuples
- Copies are Shallow by Default
- Deep and Shallow Copies of Arbitrary Objects
- Function Parameters as References
- Defensive Programming with Mutable Parameters
- del and Garbage Collection
- Ticks python plays with Immutables
Variables Are Not Boxes
Variables are labels attached to objects. Thinks of variables as sticky notes you past on the objects you have.
Identity, Equality and Aliases
Because variables are mere labels, nothing prevents an object from having several labels assigned to it. This is aliasing.
lets see an analogy in the real world, pen names.
charlse = {'name': 'Charles L. Dodgson', 'born': 1832} # our Author
lewis = charlse # lewis is the pen name of Charlse
lewis is charlse
id(charlse), id(lewis)
lewis['balance'] = 950
charlse
now we have an imposter 'Dr. Alexander Pedachenko'. He is not Charlse but says he is
alex = {'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}
alex == charlse, alex is not charlse
In The Python Language Reference, “3.1. Objects, values and types” states:
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects; the id() function returns an integer representing its identity.
Choosing Between == and is
Every object has a identity, type and value. Through the life of the object the identity will not change. Most of the time the is
opperator is used to check the identities of 2 object and return if it is same.
most of the time we use __eq__
for comparing two objects. is
is effective to check if a variable is bound to None. This is much faster the using ==.
t1 = (1, 2, [30, 40])
t2 = (1, 2, [30, 40])
t1 == t2
id(t1[-1])
t1[-1].append(99)
t1
id(t1[-1])
t1 == t2
l1 = [3, [55, 44], (7, 8, 9)]
l2 = list(l1)
l2
l2 == l1
l2 is l1
l3 = l1[:]
l3 == l1
l3 is l1
Using the constructor of [:] produces a shallow copy. This copy is filled with refferences to the same items held by the original container. This saves memory and is ok for immutable items but for mutables ones it can cause some bugs.
l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = list(l1) # shalow copy of l1
l1.append(100)
l1[1].remove(55) # removeing from list
print('l1:', l1)
print('l2:', l2)
l2[1] += [33, 22] # appending lists
l2[2] += (10, 11) # appending tuples
print('l1:', l1)
print('l2:', l2)
Deep and Shallow Copies of Arbitrary Objects
You saw earlier that shallow copies are easy to make but they may or many not be what you want but sometimes you need to make deep copies (duplicates that do not share references of emebedded objects). The copy
module provudes the deepcopy
and copy
for both.
class Bus:
def __init__(self, passengers=None):
if passengers is None:
self.passengers = []
else:
self.passengers = list(passengers)
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
import copy
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.copy(bus1)
bus3 = copy.deepcopy(bus1)
id(bus1), id(bus2), id(bus3)
bus1.drop('Bill')
print(bus2.passengers)
print(bus3.passengers)
id(bus1.passengers), id(bus2.passengers), id(bus3.passengers)
Note that making deep copies is not straightforward because of cases like cyclic references.
a = [10, 20]
b = [a, 30]
a.append(b)
a
from copy import deepcopy
c = deepcopy(a)
c
d = a[:]
d
You can control the behavior of copy and deepcopy by implementing the __copy__()
and __deepcopy__()
methods.
Function Parameters as References
The only mode of parameter passing in python is call by sharing ie the parameter inside the function become aliases of the actual arguments.
The implication of this is that a function cannot change the identity of the objects passed to it, it cannot altogether repace an object with another)
def f(a, b):
a += b
return a
x = 1
y = 2
f(x, y)
x, y # no change for imutable objects
c = [1, 2]
d = [3, 4]
f(c, d)
c, d # since c was mutable, it changed
t = (10, 20)
u = (30, 40)
f(t, u)
t, u # again, immutable...
This brings us to a major caviate in using Mutable types as default parameters. This is one of the most common gotchas for beginners in python!
Lets see this problem in action with an example.
class HauntedBus:
def __init__(self, passengers=[]):
self.passengers = passengers
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
bus1 = HauntedBus(['Alice', 'Bob'])
bus1.passengers
bus1.pick('Charlie')
bus1.drop('Alice')
bus1.passengers
bus2 = HauntedBus()
bus2.pick('Carrie')
bus2.passengers
bus3 = HauntedBus()
bus3.passengers
bus3.pick('Dave')
bus2.passengers
bus2.passengers is bus3.passengers
bus1.passengers
As you have noticed bus2
and bus3
give some funny outputs. The resone is that both the passenger list in bus2 and bus3 point to the same list. The bug only appears when HauntedBus
is instantiated with empty part. This is beause each default value is evaluated when the function is defined ie usually when the module is loaded, and the default values become attributes of the function object. So if the default value is mutable, and you change it, the change will affect every future call of the function.
dir(HauntedBus.__init__)
HauntedBus.__init__.__defaults__
HauntedBus.__init__.__defaults__[0] is bus2.passengers, HauntedBus.__init__.__defaults__[0] is bus3.passengers
This is why None
is often used as a default value for parameters that may receive mutable values. In the __init__
we can check this and assign an empty list.
Defensive Programming with Mutable Parameters
When coding functions that receive a mutable parameter you should consider if the caller expects the parameter to get modified. This usally depends on the context and aligning what the function coder and caller expects. Lets see an example were this breaks
class TwilightBus:
"""A bus model that makes people vanish"""
def __init__(self, passengers=None):
if passengers is None:
self.passengers = []
else:
self.passengers = passengers
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
bus = TwilightBus(basketball_team)
bus.drop('Tina')
bus.drop('Pat')
basketball_team
This violates the "Principle of Least Astonishment"
a best practice interface design. A simple fix is to give self.passengers
a copy of the passenger list. This also makes it flexible since the parameter can now be any iterable.
class TwilightBus:
"""A bus model that makes people vanish"""
def __init__(self, passengers=None):
if passengers is None:
self.passengers = []
else:
self.passengers = list(passengers)
def pick(self, name):
self.passengers.append(name)
def drop(self, name):
self.passengers.remove(name)
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
bus = TwilightBus(basketball_team)
bus.drop('Tina')
bus.drop('Pat')
basketball_team
del and Garbage Collection
"Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected.". The del
statement delets the name pointing to the object.
It is to be noted that python objects have a special __del__
method. This is called by the python interpreter.
In Cpython the primary algo for garbage collection in reference counting. Each object keeps a count on how may references point to it. As soon as it reaches zero is available for the garbage collector
import weakref
s1 = {1, 2, 3}
s2 = s1
def bye():
print('Gone with the wind...')
ender = weakref.finalize(s1, bye)
ender.alive
del s1
ender.alive
s2 = 'spam'
ender.alive
import weakref
a_set = {0, 1}
wref = weakref.ref(a_set)
wref
wref()
a_set = {2, 3, 4}
wref()
wref() is None, wref
hex(id(a_set))
wref() is None
wref()
weakref documentation makes the point that the weakref.ref class is actually a low-level interface. Users are better of using the weakref collections
or finalize
. So consider using WeakKeyDictionaly
, WeakValueDictionary
, WeakSet
and finalize
.
WeakValueDictionary
The class WeakValueDictionary
implements a mutable mapping where values are weak references to objects. When a referred object is garbage collected elsewhere in the program the corresponding key is automatically removed from WeakValueDictionary
. This is commonly used for caching.
class Cheese:
def __init__(self, kind):
self.kind = kind
def __repr__(self):
return 'Cheese(%r)' % self.kind
import weakref
stock = weakref.WeakValueDictionary()
catalog = [Cheese('Red Leicester'), Cheese('Tilsit'),
Cheese('Brie'), Cheese('Parmesan')]
for cheese in catalog:
stock[cheese.kind] = cheese
sorted(stock.keys())
del catalog
sorted(stock.keys())
del cheese
sorted(stock.keys())
A counterpart to the WeakValueDictionary
is the WeakKeyDictionary
in which the keys are the weak refs.
t1 = (1, 2, 3)
t2 = tuple(t1)
t2 is t1
t3 = t1[:]
t3 is t1
- The same behaviour is observed with instances of
str
,bytes
andfrozenset
. Note thatfrozenset
is not a sequence so fs[:] doesnot work but when callingfs.copy()
has the same effect it cheats and returns a reference to the same object.
s1 = 'ABC'
s2 = 'ABC'
s2 is s1