Variables Are Not Boxes

Variables are labels attached to objects. Thinks of variables as sticky notes you past on the objects you have.

Identity, Equality and Aliases

Because variables are mere labels, nothing prevents an object from having several labels assigned to it. This is aliasing.

lets see an analogy in the real world, pen names.

charlse = {'name': 'Charles L. Dodgson', 'born': 1832}  # our Author
lewis = charlse  # lewis is the pen name of Charlse

lewis is charlse
True
id(charlse), id(lewis)
(139701799991488, 139701799991488)
lewis['balance'] = 950
charlse
{'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}

now we have an imposter 'Dr. Alexander Pedachenko'. He is not Charlse but says he is

alex = {'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}
alex == charlse, alex is not charlse
(True, True)

In The Python Language Reference, “3.1. Objects, values and types” states:

Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The is operator compares the identity of two objects; the id() function returns an integer representing its identity.

Choosing Between == and is

Every object has a identity, type and value. Through the life of the object the identity will not change. Most of the time the is opperator is used to check the identities of 2 object and return if it is same.

most of the time we use __eq__ for comparing two objects. is is effective to check if a variable is bound to None. This is much faster the using ==.

The Relative Immutability of Tuples

Another think is the realative mutablitily of tuples. Tuples, like all python collection hold references to objects. If the referenced items are mutable then they may change even if the tuple does not.

t1 = (1, 2, [30, 40])
t2 = (1, 2, [30, 40])
t1 == t2
True
id(t1[-1])
139701799957952
t1[-1].append(99)
t1
(1, 2, [30, 40, 99])
id(t1[-1])
139701799957952
t1 == t2
False

Copies are Shallow by Default

A copy is an equal object with different ID. But if it contains other objects should the copy also duplicate the inner object or is it OK to share them? Both are valid ways to do copy and lets see how you do both

l1 = [3, [55, 44], (7, 8, 9)]
l2 = list(l1)
l2
[3, [55, 44], (7, 8, 9)]
l2 == l1
True
l2 is l1
False
l3 = l1[:]
l3 == l1
True
l3 is l1
False

Using the constructor of [:] produces a shallow copy. This copy is filled with refferences to the same items held by the original container. This saves memory and is ok for immutable items but for mutables ones it can cause some bugs.

l1 = [3, [66, 55, 44], (7, 8, 9)]
l2 = list(l1)  # shalow copy of l1
l1.append(100)
l1[1].remove(55)   # removeing from list
print('l1:', l1)
print('l2:', l2)
l2[1] += [33, 22]  # appending lists
l2[2] += (10, 11)  # appending tuples
print('l1:', l1)
print('l2:', l2)
l1: [3, [66, 44], (7, 8, 9), 100]
l2: [3, [66, 44], (7, 8, 9)]
l1: [3, [66, 44, 33, 22], (7, 8, 9), 100]
l2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]

Deep and Shallow Copies of Arbitrary Objects

You saw earlier that shallow copies are easy to make but they may or many not be what you want but sometimes you need to make deep copies (duplicates that do not share references of emebedded objects). The copy module provudes the deepcopy and copy for both.

class Bus:
    
    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers = []
        else:
            self.passengers = list(passengers)
            
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
import copy
bus1 = Bus(['Alice', 'Bill', 'Claire', 'David'])
bus2 = copy.copy(bus1)
bus3 = copy.deepcopy(bus1)
id(bus1), id(bus2), id(bus3)
(139917492682656, 139917492680736, 139917492681936)
bus1.drop('Bill')
print(bus2.passengers)
print(bus3.passengers)
['Alice', 'Claire', 'David']
['Alice', 'Bill', 'Claire', 'David']
id(bus1.passengers), id(bus2.passengers), id(bus3.passengers)
(139917492204288, 139917492204288, 139917492703744)

Note that making deep copies is not straightforward because of cases like cyclic references.

a = [10, 20]
b = [a, 30]
a.append(b)
a
[10, 20, [[...], 30]]
from copy import deepcopy
c = deepcopy(a)
c
[10, 20, [[...], 30]]
d = a[:]
d
[10, 20, [[10, 20, [...]], 30]]

You can control the behavior of copy and deepcopy by implementing the __copy__() and __deepcopy__() methods.

Function Parameters as References

The only mode of parameter passing in python is call by sharing ie the parameter inside the function become aliases of the actual arguments.

The implication of this is that a function cannot change the identity of the objects passed to it, it cannot altogether repace an object with another)

def f(a, b):
    a += b
    return a
x = 1
y = 2 
f(x, y)
3
x, y  # no change for imutable objects
(1, 2)
c = [1, 2]
d = [3, 4]
f(c, d)
[1, 2, 3, 4]
c, d # since c was mutable, it changed
([1, 2, 3, 4], [3, 4])
t = (10, 20)
u = (30, 40)
f(t, u)
(10, 20, 30, 40)
t, u  # again, immutable...
((10, 20), (30, 40))

This brings us to a major caviate in using Mutable types as default parameters. This is one of the most common gotchas for beginners in python!

Lets see this problem in action with an example.

class HauntedBus:
    def __init__(self, passengers=[]):
        self.passengers = passengers
        
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
bus1 = HauntedBus(['Alice', 'Bob'])
bus1.passengers
['Alice', 'Bob']
bus1.pick('Charlie')
bus1.drop('Alice')
bus1.passengers
['Bob', 'Charlie']
bus2 = HauntedBus()
bus2.pick('Carrie')
bus2.passengers
['Carrie']
bus3 = HauntedBus()
bus3.passengers
['Carrie']
bus3.pick('Dave')
bus2.passengers
['Carrie', 'Dave']
bus2.passengers is bus3.passengers
True
bus1.passengers
['Bob', 'Charlie']

As you have noticed bus2 and bus3 give some funny outputs. The resone is that both the passenger list in bus2 and bus3 point to the same list. The bug only appears when HauntedBus is instantiated with empty part. This is beause each default value is evaluated when the function is defined ie usually when the module is loaded, and the default values become attributes of the function object. So if the default value is mutable, and you change it, the change will affect every future call of the function.

dir(HauntedBus.__init__)
['__annotations__',
 '__call__',
 '__class__',
 '__closure__',
 '__code__',
 '__defaults__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__get__',
 '__getattribute__',
 '__globals__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__kwdefaults__',
 '__le__',
 '__lt__',
 '__module__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']
HauntedBus.__init__.__defaults__
(['Carrie', 'Dave'],)
HauntedBus.__init__.__defaults__[0] is bus2.passengers, HauntedBus.__init__.__defaults__[0] is bus3.passengers
(True, True)

This is why None is often used as a default value for parameters that may receive mutable values. In the __init__ we can check this and assign an empty list.

Defensive Programming with Mutable Parameters

When coding functions that receive a mutable parameter you should consider if the caller expects the parameter to get modified. This usally depends on the context and aligning what the function coder and caller expects. Lets see an example were this breaks

class TwilightBus:
    """A bus model that makes people vanish"""
    
    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers = []
        else:
            self.passengers = passengers
            
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
bus = TwilightBus(basketball_team)
bus.drop('Tina')
bus.drop('Pat')
basketball_team
['Sue', 'Maya', 'Diana']

This violates the "Principle of Least Astonishment" a best practice interface design. A simple fix is to give self.passengers a copy of the passenger list. This also makes it flexible since the parameter can now be any iterable.

class TwilightBus:
    """A bus model that makes people vanish"""
    
    def __init__(self, passengers=None):
        if passengers is None:
            self.passengers = []
        else:
            self.passengers = list(passengers)
            
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)
basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']
bus = TwilightBus(basketball_team)
bus.drop('Tina')
bus.drop('Pat')
basketball_team
['Sue', 'Tina', 'Maya', 'Diana', 'Pat']

Note: Unless a method is explicitly intended to mutate an object received as argument, you should think twice before aliasing the argument by simply asigning it to an instance variable in your class. if in doubt, make a copy. Your clients will often be happier.

del and Garbage Collection

"Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected.". The del statement delets the name pointing to the object.

It is to be noted that python objects have a special __del__ method. This is called by the python interpreter.

In Cpython the primary algo for garbage collection in reference counting. Each object keeps a count on how may references point to it. As soon as it reaches zero is available for the garbage collector

import weakref
s1 = {1, 2, 3}
s2 = s1
def bye():
    print('Gone with the wind...')
    
ender = weakref.finalize(s1, bye)
ender.alive
True
del s1
ender.alive
True
s2 = 'spam'
Gone with the wind...
ender.alive
False

Weak References

The presence of refs are what keeps an object alive in memory but sometimes it is useful to have a reference to an object that does not keep it around longer than necessary. A common use case is cache.

Weak refs do not increase the ref count.

import weakref
a_set = {0, 1}
wref = weakref.ref(a_set)
wref
<weakref at 0x7fb3702f3d60; to 'set' at 0x7fb3713b8820>
wref()
{0, 1}
a_set = {2, 3, 4}
wref()
{0, 1}
wref() is None, wref
(False, <weakref at 0x7fb3702f3d60; to 'set' at 0x7fb3713b8820>)
hex(id(a_set))
'0x7fb3713b8ac0'
wref() is None
False
wref()
{0, 1}

weakref documentation makes the point that the weakref.ref class is actually a low-level interface. Users are better of using the weakref collections or finalize. So consider using WeakKeyDictionaly, WeakValueDictionary, WeakSet and finalize.

WeakValueDictionary

The class WeakValueDictionary implements a mutable mapping where values are weak references to objects. When a referred object is garbage collected elsewhere in the program the corresponding key is automatically removed from WeakValueDictionary. This is commonly used for caching.

class Cheese:
    
    def __init__(self, kind):
        self.kind = kind
        
    def __repr__(self):
        return 'Cheese(%r)' % self.kind
import weakref

stock = weakref.WeakValueDictionary()
catalog = [Cheese('Red Leicester'), Cheese('Tilsit'),
           Cheese('Brie'), Cheese('Parmesan')]

for cheese in catalog:
    stock[cheese.kind] = cheese
    
sorted(stock.keys())
['Brie', 'Parmesan', 'Red Leicester', 'Tilsit']
del catalog
sorted(stock.keys())
['Parmesan']
del cheese
sorted(stock.keys())
[]

A counterpart to the WeakValueDictionary is the WeakKeyDictionary in which the keys are the weak refs.

Ticks python plays with Immutables

These are just random tips for immutable handling.

  1. for a tuple t, t[:] does not make a copy, but returns a reference to the same object, tuple(t) also makes a reference.
t1 = (1, 2, 3)
t2 = tuple(t1)
t2 is t1
True
t3 = t1[:]
t3 is t1
True
  1. The same behaviour is observed with instances of str, bytes and frozenset. Note that frozenset is not a sequence so fs[:] doesnot work but when calling fs.copy() has the same effect it cheats and returns a reference to the same object.
s1 = 'ABC'
s2 = 'ABC'

s2 is s1
True