Indices and tables¶
Popular functions¶
*type() - gives the type *str() - converts to string *int() - convers to integer *.replace() - substitutes a word for another *.lower() - converts all uppercase text to lowercase text
Lists¶
*months = [] - initialise a list *months.append(‘value’) - Adds to the end of the list *months = [1, “January”, 2, “February”] - creating list with values *months[0] - accessing values in list *len(months) - returns the length of the list *month_slice = months[2:4] - give the list items 2 and 3, not 4. *split_list = g.split(”,”) - split the data in g into a list
Files¶
*f = open(“crime_rates.csv”, “r”) - open files *g = f.read() - returns a string representation of a text in a file
shorthand: *g = open(“crime_rates.csv”).read()
for loops¶
- for row in rows:
- do something
list of lists¶
three_rows = [“Albuquerque,749”, “Anaheim,371”, “Anchorage,828”] final_list = [] for row in three_rows:
split_list = row.split(‘,’) final_list.append(split_list)
- If you have a list of lists
- first_element = data[0] will first give you the list (from the list of lists) for whatever is in that list - it may be a couple of items
- first_element[0] will give you the first item in the list
- shorthand: data[0][0]
To get a list of lists from a csv: Long way -
f = open(‘dq_unisex_names.csv’, ‘r’) names = f.read() names_list = names.split(‘n’)
nested_list = [] for element in names_list:
comma_list = element.split(‘,’) nested_list.append(comma_list)
print(nested_list[0:5])
short way -
import csv
f = open(“world_alcohol.csv”) reader = csv.reader(f) world_alcohol = list(reader)
Booleans¶
- Booleans help you to filter data according to specified criteria:
- == returns True if both variables are equivalent, and False if they’re different
- != returns True if both variables are different, and False if they’re equivalent
Use parentheses for cleaner code. t = (8 == 8) # True
Remember that when using len() to retrieve the last element from a list you should subtract 1: crime_last = crime[len(crime) - 1] The length of the list is does not specify the last element in the list as the list index begins at 0.
If Statements¶
- if value > 500:
- do something
If Else Statements¶
- if temperature > 50:
- print(“It’s hot!”)
- else:
- print(“It’s cold!”)
In Statement¶
The Instatement checks of there is a specific element in a list
animals = [“cat”, “dog”, “rabbit”] if “cat” in animals:
print(“Cat found”)
Or assign to a variable
animals = [“cat”, “dog”, “rabbit”] cat_found = “cat” in animals
The In statement can also check to see if there is a specific key in a dict
- students = {
- “Tom”: 60, “Jim”: 70
}
“Tom” in students will return True
Dictionaries¶
scores = {} - initialise a dictionary Stupid way: scores[“Tom”] = 70 - Will assign “Tom” with a score of 70. “Tom” is the index of the dict. Clever way:
- students = {
- “Tom”: 60, “Jim”: 70
}
This gives us key/value pairs
Functions¶
- def clean_text(string_value):
- cleaned_value = string_value.replace(”,”, “”) return(cleaned_value)
sentence = “Howdy,james,bond!” sentence = clean_text(sentence)
NumPy¶
NumPy gives you the ability to work with multidimensional arrays. e.g. a table where table 2,2 gives you the value at row 2, column 2.
import numpy nfl = numpy.genfromtxt(“nfl.csv”, delimiter=”,”)
generate an array: matrix = numpy.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
find the shape of an array: matrix.shape OR vector.shape
type of an array: numbers.dtype
- a numpy array has to be of the same type
- numpy will convert all of the leements in the array to a type it guessed.
- elements that can’t be converted to the selected type will be NaN 0 Not a Number
- missing elements will resolve to na - Not Available
To specify that the genfromtxt() function should read in the data as string: import numpy world_alcohol = numpy.genfromtxt(‘world_alcohol.csv’, delimiter=”,”, dtype=’U75’, skip_header=1) print (world_alcohol)
#slice vectors and lists the same *indexing (getting the element) for vectors and lists are the same
To get the entire column (slicing) from an array countries = world_alcohol[:,2]
To get a matrix from a matrix matrix = numpy.array([
[5, 10, 15], [20, 25, 30], [35, 40, 45]])
print(matrix[:,0:2])
- [
- [5, 10], [20, 25], [35, 40]
]
This specifies that the matrix should include column 0 to 3, but excluding column 3 (and all the rows)
Array comparisons: vector = numpy.array([5, 10, 15, 20]) vector == 10
numpy will compare 10 to each value in vector and build a new vector with True/ False values. e.g [False, True, False, False]
Select a row or column from an array or matrix according to specified criteria matrix = numpy.array([
[5, 10, 15], [20, 25, 30], [35, 40, 45]])
second_column_25 = (matrix[:,1] == 25) print(matrix[second_column_25, :])
Recipes¶
Open a file and read in each row into a list of lists
f = open(“la_weather.csv”, ‘r’) data = f.read() rows = data.split(‘n’) weather_data = [] for row in rows:
split_row = row.split(”,”) weather_data.append(split_row)
Counting frequency in a dict pantry = [“apple”, “orange”, “grape”, “apple”, “orange”, “apple”, “tomato”, “potato”, “grape”]
pantry_counts = {} for element in pantry:
- if element in pantry_counts:
- pantry_counts[element] = pantry_counts[element] + 1
- else:
- pantry_counts[element] = 1
a function to read a csv, split the string (‘n’), converts the elements in the list (of lists) to integers.
- def read_csv(filename):
data = open(filename).read() data_split = births_data.split(“n”) string_list = data_split[1:len(data_split)-1]
final_list = [] for element in string_list:
int_fields = [] string_fields = element.split(”,”) for elmnt in string_fields:
int_fields.append(int(elmnt))final_list.append(int_fields)
return (final_list)
cdc_list = read_csv(“US_births_1994-2003_CDC_NCHS.csv”) print (cdc_list[0:10])
function to retrieve the frequency from a list of lists for a specific column def calc_counts(data, column):
column_total = {} for element in data:
chosen_column = element[column] birth_column = element[4] if chosen_column in column_total:
column_total[chosen_column] = column_total[chosen_column] + birth_column
- else:
- column_total[chosen_column] = birth_column
return (column_total)
convert the second element in the list of 2 elements to a numerical value temp_list = [] numerical_list = []
#print (nested_list[0][1])
for element in nested_list: # print (element)
first_element = element[0] second_element = float(element[1]) temp_list.append(first_element) temp_list.append(second_element) numerical_list.append(temp_list) temp_list = []
Get the second column from a list weather = [] for element in weather_data:
weather.append(element[1])
counting the number of unique values in a list pantry = [“apple”, “orange”, “grape”, “apple”, “orange”, “apple”, “tomato”, “potato”, “grape”]
pantry_counts = {} for element in pantry:
- if element in pantry_counts:
- pantry_counts[element] = pantry_counts[element] + 1
- else:
- pantry_counts[element] = 1
Remember¶
months = [“Jan”, “Feb”] print (months[0:1])
NOT
print (months)[0:1]
This entire document is written with the RST syntax. In the right sidebar, you should find a link show source, which shows the RST source code.