Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Building Blocks

Building Blocks

Optimization is the route to all evil

Getting right first and fast then by D. Knuth. AKA “Get it right first, then make it fast”.

For this lecture, we followed a lot of the contents already included in the following tutorials.

1. Motivation

Let’s start by taking a look to some of the reasons why continously testing our code is a good prectice that produce better code and more reproducible too.

1.1. Numerical precision

As we saw in the notebook simple-numerical-chaos.ipynb, notebook, even simple arithmetic in computers can produce surprising numerical behavior. This means that, especially when we handle lots of data, we should strive to always validate that our codes are producing the answers we expect them to produce.

In brief, the basic issue is that even two algebraically equivalent forms of the same (simple!) expression, in a computer, may give different results:

def f1(x): return r*x*(1-x)
def f2(x): return r*x - r*x**2

r = 3.9
x = 0.8
print('f1:', f1(x))
print('f2:', f2(x))

print('difference:', (f1(x)-f2(x)))
f1: 0.6239999999999999
f2: 0.6239999999999997
difference: 2.220446049250313e-16

Now, the decimal digits of the difference are just garbage: eirher f1(x) or f2(x) have no information after the last digit. The apparent precision in the difference f1(x) - f2(x) is completely spourious.

Now, this raises the question about what does it mean to get the right answer from our code and what does it mean to be reproducible in scientific computing.

This short example help us to undersrand what is important in the context of computational

1.2. Implementing or changing features

Testing also help us when we want to make significant changes in our code and we want to ensure that the functionallity of the code doesn’t go affected by these new changes. These cases include

  • Adding a new function/feature that communicates with other existing pieces of code.

  • Making changes to the implementation of existing function, for example by changing the data types or the algorithm we use for certain operations

  • Change the data we used to feed our code

2. Types of tests

There are different classes of test that evaluate the correctness of our code at different levels and scales. In this course, we re goign to cover the following tests:

  • Assertions statements

  • Exceptions statements

  • Unit tests

  • Regression tests

  • Integration tests

2.1. Assestions

The assert statement in Python just evaluates when some given condition is true or false. If False, it interrupst the exectution of the code.

assert 1+1 == 2, "One plus one is not two."

As you can see from the previous example, you can also add a small text description for the error induced. in this way, assertion statements are very simple to write and evaluate.

As you can imagine from the discussion in the previous section, we need to be careful at the moment of comparing objects in Python. For example, for float types we have

assert 0.1 + 0.2 == 0.3
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 assert 0.1 + 0.2 == 0.3

AssertionError: 

The problem here is induced by floating point aritmethics in our code. In order to raise an AssertionError here, we can use numpy.testing.assert_allclose():

from numpy.testing import assert_allclose
assert_allclose(0.1 + 0.2, 0.3)

Since assertions are raised when a given condition is not satisfied, we can also use any other functionallity that retuns True/False for doing this. Other examples are

import math
assert math.isclose(0.1 + 0.2, 0.3), "Numbers are not close."
import pytest
assert 0.1 + 0.2 == pytest.approx(0.3), "Numbers are not close."

Ussually assertion statements go inside a functions or definitions an help us to keep the correctness of the code. In pair programming, it is the role of the observer to think in cases where the code may not work and think about simple assertion statements that will help prevent those errors.

2.2. Exceptions

Different kinds of errors that occur as we write code include syntax, runtime and semantic errors. Specially for runtime errors, Python give us a clue about what kind or error may happened during the execution of our code. For example,

1 / 0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[7], line 1
----> 1 1 / 0

ZeroDivisionError: division by zero
my_dict = {'a':1, 'b':2}
my_dict['c']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[8], line 2
      1 my_dict = {'a':1, 'b':2}
----> 2 my_dict['c']

KeyError: 'c'
my_dict + {'c':3}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 my_dict + {'c':3}

TypeError: unsupported operand type(s) for +: 'dict' and 'dict'

There are many more different kind of built-in exceptions in Python. You can find some more examples in this link. A general RuntimeError is raised when the detected error doesn’t fall in any of the other categories.

There are different ways of dealing with runtime errors in Python, there include the

  • try...except clause

  • raise statement

def division(numerator, denominator):
    try:
        return numerator / denominator
    except ZeroDivisionError:
        return 0
division(1,1)
1.0
division(1,0)
0

Now, at the moment of raising an error we would like to print a meaningful message. We can do this

def division(numerator, denominator):
    try:
        return numerator / denominator
    except ZeroDivisionError:
        raise ZeroDivisionError(f"You cannot divide by {denominator=}")
division(1,0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[13], line 3, in division(numerator, denominator)
      2 try:
----> 3     return numerator / denominator
      4 except ZeroDivisionError:

ZeroDivisionError: division by zero

During handling of the above exception, another exception occurred:

ZeroDivisionError                         Traceback (most recent call last)
Cell In[14], line 1
----> 1 division(1,0)

Cell In[13], line 5, in division(numerator, denominator)
      3     return numerator / denominator
      4 except ZeroDivisionError:
----> 5     raise ZeroDivisionError(f"You cannot divide by {denominator=}")

ZeroDivisionError: You cannot divide by denominator=0

If you already know what may be causing an error in your code, you can avoind the use of the try / except statement and directly raise an exception when certain critical condition happens:

def division(numerator, denominator):
    if denominator == pytest.approx(0.0):
        raise ZeroDivisionError(f"You cannot divide by {denominator=}")
    return numerator / denominator
division(1, 0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[16], line 1
----> 1 division(1, 0)

Cell In[15], line 3, in division(numerator, denominator)
      1 def division(numerator, denominator):
      2     if denominator == pytest.approx(0.0):
----> 3         raise ZeroDivisionError(f"You cannot divide by {denominator=}")
      4     return numerator / denominator

ZeroDivisionError: You cannot divide by denominator=0

Something cool about exceptions is that their are classes and Python allow us to create new assertion errors.

class LightSpeedBound(Exception):
    """
    Defines a new exception error of my preference.
    """
    pass

def lorentz_factor(v, c=299_792_458):
    if v > c:
        raise LightSpeedBound(f"The current velocity {v} cannot exceed the speed of light")
    return 1 / (1 - v**2/c**2) ** 0.5
lorentz_factor(300_000_000)
---------------------------------------------------------------------------
LightSpeedBound                           Traceback (most recent call last)
Cell In[18], line 1
----> 1 lorentz_factor(300_000_000)

Cell In[17], line 9, in lorentz_factor(v, c)
      7 def lorentz_factor(v, c=299_792_458):
      8     if v > c:
----> 9         raise LightSpeedBound(f"The current velocity {v} cannot exceed the speed of light")
     10     return 1 / (1 - v**2/c**2) ** 0.5

LightSpeedBound: The current velocity 300000000 cannot exceed the speed of light

2.3. Unit Tests

In previous section we were discussing about the importance of writting clean and modular code. Having small functions that perfom very specific tasks help us to desing pipelines for testing those small units of code. That is the purpose of unit tests, to individually test the functions in our code.

The way of writing unit tests consist in defining function that will return an assert statement testing whenever the output matches the true answer.

import numpy as np

def division(numerator, denominator):
    if denominator == pytest.approx(0.0):
        raise ZeroDivisionError(f"You cannot divide by {denominator=}")
    return numerator / denominator

def test_float_division():
    assert np.isclose(division(2.0,0.5), 4.0)
test_float_division()

The next step is to scalate this! Having more than one test for function that can evaluate different cases (eg, different types) and then extent to all the functions in your code. For example, for the division function we probably want to add a test that fix the expected behaviour when dividing by zero. Surprisingly, we can assert that the output of a funcition is an Error itself:

import pytest

def test_division_by_zero():
    with pytest.raises(ZeroDivisionError):
        division(numerator=10.0, denominator=0.0)
test_division_by_zero()

2.4. Integration tests

As their name indicate, integration tests are the responsible of evaluating how multiple units of code work together, instead of individually. For example, it is easy to see how a simple code that has the division function can fail, even when each unit has being tested independnely.

In general, any test that involves more than one function is called an integration test. Let’s see the following example that uses inheritance classes in Python.

class Person:
    
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def birthday(self):
        self.age += 1
        
    def append_lastname(self, lastname):
        self.name += " " + lastname
        
class Student(Person):
    
    def __init__(self, name, age, major):
        super().__init__(name, age)
        self.major = major
        self.grades = {}
        
    def add_grade(self, course, grade):
        self.grades[course] = grade
def test_student():
    
    subject = Student("Facu", 28, "Statistics")
    subject.birthday()
    subject.add_grade("Stat 159", "A+")
    assert subject.age == 29 and subject.grades["Stat 159"] == "A+"
    
test_student()

2.5. Regression tests

Regression tests try to fix in time the expected behaviour of certain piece of code. This is particularry useful when we don’t know what the true output of a piece of code is, but we want to ensure the stability of the code. In a sense, we want to be sure that as we make changes we don’t break or change the code that, in principle, was working before.

Another example of a regression test happens after we found and fix a bug in our code. After detecting an error, we may want to include a test for this so we are sure that the bug doesn’t reapear in the future.