Python

Data Model

Objects are python’s abstraction for data. Everything is an object - intergers, strings, functions, classes, modules, methods etc. And objects are “first class citizens”. This implies that all objects that can be named in the language can be assigned to variables, passed as arguments to functions, stored in containers (lists, tuples, dictionaries) etc.

Reymond Hettinger: Classes Reymond Hettinger: The Mental Game Strings String values represent a series of Unicode code points (ASCII is a subset of Unicode) Idioms

Development Environment

Top Level Functions and Protocols vs Object Methods Why does python has len() rather than object.length? (and similarly with del(), zip() instead of zip_with(), str() instead of object.str() , list comprehensions new_list = [expression for member in iterable] are different from calling methods on iterables, and so on**

Just because everything is an object doesn’t mean being an object is the most important thing about each thing! Python is not a pure OOP but is a multi-paradigm language, and this is one of the useful things borrowed from functional programming.

This allows the use of these “methods” with higher order functions (functions that take a function as argument) without the need to define a function or lambda just to call a mthod.

Using a protocol, one can provide alternative ways of implementing things. len() returns the length of x. How does it do that? It calls the special internal method .len on x. So you can still customize it. Think of len() as an operator more than a function. By keeping the special internal methods named something instead of just something we can adapt to pretty much any API in Python.

What needs to be done? if you do not think of “I will use a while loop here, here and here, and a condition there and then I have the first part” but as “this is what I want to do, this is what I need and am done”. You do not focus on the implementation details.

If you look at Java in comparison what will we find? A builtin array has a field called .length. A builtin string however is an actual object and has a method called .length(). A map or list in Java responds to .size(). All the XML APIs in Java will use .getLenght() instead

Having that one function (instead of method) doing that call gives us extra powers to commonly required behavior. __iter__ method that object is iterable so you can use the for-loop to iterate over it. You should not call .iter yourself, instead use iter(x) if you really need the iterator. Why is that? Why can’t I just use x.iter() and be happy? Because For instance in Python if you have something that has a method named .__getitem__ (subscript operator x[y] == x.__getitem__(y)) and the subscripted object is an integer and the special method will not raise a lookup error if 0 is passed it means that obviously there is a first item in the object. Python will then assume it’s iterable and continue to subscript it incrementing integers (first iteration step is x[0], second is x[1] etc.). You can easily test this yourself:

Modules

A module in python is just a python file. Packages enable to keep a directory structure of modules for more readable imports.

Aside from some naming restrictions, nothing special is required for a Python file to be a module, but you need to understand the import mechanism in order to use this concept properly and avoid some issues.

Concretely, the import module_name statement will look for the proper file, which is module_name.py in the same directory as the caller if it exists
If it is not found, the Python interpreter will search for module_name.py in the sys.path recursively and raise an ImportError exception if it is not found.
There are various ways of making sure a directory is always on the Python sys.path list when you run Python, including:
- put the directory into the contents of the PYTHONPATH environment variable
- make the module part of an installable package, and install it – see: making a Python package.
Once module_name.py is found, the Python interpreter will execute the module in an isolated scope.
- In many languages, an include file directive is used by the preprocessor to take all code found in the file and ‘copy’ it into the caller’s code. It is different in Python: the included code is isolated in a module namespace, which means that you generally don’t have to worry that the included code could have unwanted effects, e.g. override an existing function with the same name.
- It is executed once per process.All unenclosed statements are executed, For classes, the story is different: at import time, the interpreter executes the body of every class, even the body of classes nested in other classes. Execution of a class body means that the attributes and methods of the class are defined, and then the class object itself is built. 
- Any top-level statement in module_name.py will be executed, including other imports if any.   
Function and class definitions are stored in the module’s dictionary. Then, the module’s variables, functions, and classes will be available to the caller through the module’s namespace, a central concept in programming that is particularly helpful and powerful in Python. https://stackoverflow.com/questions/419163/what-does-if-name-main-do
import module_name creates an object of the module’s name (and possibly creates the .pyc byte code file). Later imports of same module name use the existing object.
The dir() function displays names of module attributes. [Example: name attribute, which is equal to main when a module is running as a standalone programme). The dict attribute is a dictionary of module objects [E.g. math.dict contains pow, sum, cosh` as keys for corresponding function objects)

Don’t: from module import * It affects same named objects within the namespace. It is possible to simulate the more standard behavior by using a special syntax of the import statement: from modu import *. This is generally considered bad practice. Using import * makes code harder to read and makes dependencies less compartmentalized.

Relative Imports

Note that relative imports are based on the name of the current module. Since the name of the main module is always __main__, modules intended for use as the main module of a Python application must always use absolute imports.

Tests

Ways to call tests:

python -m unittest test_module (e.g. test.test_*. Module search path will have root)
python -m unittest test/test_file.py (Module search path will have test directory, not root)
python -m unittest discover test 

To give the individual tests import context, create a tests/context.py file:

import os 
import sys 
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
import sample

Then, within the individual test modules, import the module like so: from .context import sample

This will always work as expected, regardless of installation method.

Some people will assert that you should distribute your tests within your module itself, but it often increases complexity for your users; many test suites often require additional dependencies and runtime contexts.

Useful Standard Library Modules

sys: responsible for the interaction between the program and the Python interpreter, providing a series of functions and variables for manipulating the Python runtime environment

command line arguments (sys.argv)
module Search Path (sys.path)
exit code (sys.exit)
sys.executable
sys.version
sys.platform

argeparse

os: interacting with the operating system, providing access to the underlying interface of the operating system. file system drivers or manage the file system as a hiearchy of directories and files and command line utilities

os.environ
os.sep
os.getcwd
os.chdir
os.listdir()
os.path (to query the filesystem)
os.remove

subprocess: The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. Spawning a subprocess always incurs a (minor) performance hit minor compared to the alternatives.

subprocess.run(), subprocess.Popen() handles process creation.  

import subprocess
new_process = subprocess.Popen([‘ping, ‘localhost’], shell = true, stidout = subprocess.PIPE)
For pingline in newprocess.stdout():
    print(pingline.decode())

Python’s subprocess library, and the popen command that underlies it, offer a way to get a pipe to stdin of the process. This way, you can send in the commands you want directly from Python and don’t have to attempt to get another subprocess to talk to it.

glob: provides pattens matching on file names. glob() function returns a list of matching file names.

re: regular expression pattern matching.

database access

archiving and compression

date and time information and measurement

unittest

logging

pathlib

tempFile

typing

functools

metaclasses

Abstract Class: All classes have a meta class. E.g. that’s what python uses to tell a class how to create an instance e.g. abstactmethod: decorator function that requires that abstract methods are overridden

isinstance vs type

weakreference

A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping. For example, if you have a number of large binary image objects, you may wish to associate a name with each. If you used a Python dictionary to map names to images, or images to names, the image objects would remain alive just because they appeared as values or keys in the dictionaries. The WeakKeyDictionary and WeakValueDictionary classes supplied by the weakref module are an alternative, using weak references to construct mappings that don’t keep objects alive solely because they appear in the mapping objects. If, for example, an image object is a value in a WeakValueDictionary, then when the last remaining references to that image object are the weak references held by weak mappings, garbage collection can reclaim the object, and its corresponding entries in weak mappings are simply deleted.

Decorators

This is a cool trick from aspect-oriented programming.¹

A decorator is a function or a class that wraps (or decorates) a function or a method. The ‘decorated’ function or method will replace the original ‘undecorated’ function or method. Because functions are first-class objects in Python, this can be done ‘manually’, but using the @decorator syntax is clearer and thus preferred.

def foo():
    # do something

def decorator(func):
    # manipulate func
    return func

foo = decorator(foo)  # Manually decorate

@decorator
def bar():
    # Do something
# bar() is decorated

This mechanism is useful for separating concerns and avoiding external unrelated logic ‘polluting’ the core logic of the function or method. Mainly used by library writers to make your code work with the library (rather than you have to change your code to make it work with their library, they would decorate it).

A good example of a piece of functionality that is better handled with decoration is memoization or caching: you want to store the results of an expensive function in a table and use them directly instead of recomputing them when they have already been computed. This is clearly not part of the function logic.

@cached_property (e.g _is_valid for Config object: if it is valid on construction, it always remains valid)

@property (can be called with syntax object.property instead of object.property())

@dataclass (automatically sets getters and setters for objects stored)

@patch(package.module.ClassName, mock_object_name) in tests for mocking

@abstractmethod

@classmethod With classmethods, the class of the object instance is implicitly passed as the first argument instead of self. One use people have found for class methods is to create inheritable alternative constructors.

@staticmethod (example: _build in _YAMLConfig class) when compared to normal methods, the static methods and class methods can also be accessed using the class but unlike class methods, static methods are immutable via inheritance With classmethods, the class of the object instance is implicitly passed as the first argument instead of self. With staticmethods, neither self (the object instance) nor cls (the class) is implicitly passed as the first argument. They behave like plain functions except that you can call them from an instance or the class. Staticmethods are used to group functions which have some logical connection with a class to the class…we can just use a simple outside-of-class function but it helps with organization and style with multiple classes.

Tips

Using properly mutable types for things that are mutable in nature and immutable types for things that are fixed in nature helps to clarify the intent of the code. For example, the immutable equivalent of a list is the tuple, created with (1, 2). This tuple is a pair that cannot be changed in-place, and can be used as a key for a dictionary.

range(start, end, step) is the constructor of a class which returns an iterable sequence of integers zip(collection1, collection2, coll3…) to stich corresponding elements. dict(zip(coll1, coll2) to make it a dict.

Lists use many methods similar to strings. Lists can also be sliced like strings. Sets use many methods similar to dictionaries.

Use List and Generator Comprehension

List comprehension = [function(iterable)] e.g..

prices = […, …,..]  
fees = […,…] 
totals = [price - fee for prices in prices if price > min]

Lambda expressions: maybe used where statements are not syntactically allowed.  

apply disc = 
{  ‘cruise’: lambda price: price - 5 ,
  ‘rocker’: lambda price: price*2  }

Functions usually return immediately, but generator functions can maintain state.

List comprehensions are usually the fastest and most idiomatic way to construct a string from its parts, appending each part to the string is inefficient because the entirety of the string is copied on each append. Instead, it is much more efficient to accumulate the parts in a list, which is mutable, and then glue (join) the parts together when the full string is needed. using join() is not always best. In the instances where you are creating a new string from a pre-determined number of strings, using the addition operator is actually faster

foo = 'foo'
bar = 'bar'
foobar = foo + bar  # This is good
foo += 'ooo'  # This is bad, instead you should do:
foo = ''.join([foo, 'ooo'])

# Alternative:
foobar = '%s%s' % (foo, bar) # It is OK
foobar = '{0}{1}'.format(foo, bar) # It is better
foobar = '{foo}{bar}'.format(foo=foo, bar=bar) # It is best 

Do not use readLines() method of files. Not efficient (loads entire in memory: huge memory requirement, and other tasks can only be done after this). [Instead use readline() or use file as a lazy iterator] (https://stackoverflow.com/questions/17246260/python-readlines-usage-and-efficient-practice-for-reading )

Bare * in function signature: used to force the caller to use named arguments - so you cannot define a function with * as an argument when you have no following keyword arguments. * is in place of *args, and vice-versa; they can’t coexist in a signature. That’s why they chose *; previously, *args was the only way to force purely positional arguments, and it marked the end of arguments which could be passed positionally (since it collected all remaining positional arguments, they could reach the following named arguments). * `means the same “positional arguments can’t go beyond here”, but the lack of a name means “but I won’t accept them at all, because I chose not to provide a place to put them

Use for _ in range(x) kind of language when the index is not important. Use in to check membership (rather than .get)

Use Protocol paradigm to make life easier: can override __add__ for defining the meaning of “+”.

Use aliasing to make life easier.

Strings support sllicing functions, concatenation and repetition, str() constructor which also does type conversion, .upper(), .lower() etc. The built-in string class provides the ability to do complex variable substitutions and value formatting via the format() method.

Common Bugs

airports = [..,…,..]
dest = airports
dest is airpots -> True
dest = list(airports) # this is the data type conversion function. Makes a copy
dest is airports -> False. 

readline() includes the \n line. Use rstrip() to remove it (Note the file is an iterator so we don't really need readline()`)

Any functions which returns a new value, usually doesn’t change the original object (except in stuff like list.pop()). If a function doesn’t return, it usually just has modified the original element.

As a caveat to Global access - reading a global variable can happen without explicit declaration, but writing to it without declaring global(var_name)will instead create a new local instance.

‘-123’.isdigit is false because ‘-‘ is not a digit.

Default parameters are bound at function creation time, not evaluation time. If the default parameter is a mutable type, repeated calls to the function can mutate it.

Aspect-oriented programming entails breaking down program logic into distinct parts (so-called concerns, cohesive areas of functionality). Nearly all programming paradigms support some level of grouping and encapsulation of concerns into separate, independent entities by providing abstractions (e.g., functions, procedures, modules, classes, methods) that can be used for implementing, abstracting and composing these concerns. Some concerns “cut across” multiple abstractions in a program, and defy these forms of implementation. These concerns are called cross-cutting concerns or horizontal concerns. Logging exemplifies a crosscutting concern because a logging strategy necessarily affects every logged part of the system. Logging thereby crosscuts all logged classes and methods. ↩

25 May 2021

Notes by Piyush

Python

Modules

Tips

Common Bugs