Week 4: Introduction to Python
Homework
Part 1: (Due by Friday (1/31) at noon)
Use Python within a Jupyter Notebook (with filename: week-four-prelim-analysis
) to gather some interesting statistics and information about your data set for your final project. This part of the assignment is open-ended, meaning I am not requiring you to do a specific set of analyses. Rather, I want you to explore your data set using the techniques and skills we learned today. You will keep learning how to do more in Python in the coming weeks, which will only allow you to do even more with a data set. In the meantime, think about interesting ways that you could analyze the data set (as you would typically do in Excel)?
Use comments (#) within your code in you Jupyter Notebook file to explain what the analyses you are doing are, what you are exploring, and potential questions you could ask and answer using these data. Push to GitHub by this Friday (1/31) at noon (12pm). (worth 4% of your final project grade)
Part 2: (Due by Saturday (2/1) at 11:59pm)
Slicing Strings
A section of an array is called a slice. We can take slices of character strings as well:
element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])
- What is the value of
element[:4]
? What aboutelement[4:]
? Orelement[:]
? - What is
element[-1]
? What iselement[-2]
? - Given those answers, explain what
element[1:-1]
does. - The expression
element[3:3]
produces an empty string, i.e., a string that contains no characters. If data holds our array of patient data, what doesdata[3:3, 4:4]
produce? What aboutdata[3:3, :]
?
Stacking Arrays
Arrays can be concatenated and stacked on top of one another, using NumPy’s vstack
and hstack
functions for vertical and horizontal stacking, respectively.
import numpy
A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)
B = numpy.hstack([A, A])
print('B = ')
print(B)
C = numpy.vstack([A, A])
print('C = ')
print(C)
- Write some additional code that slices the first and last columns of
A
, and stacks them into a 3x2 array. Make sure toprint
the results to verify your solution.
Change In Inflammation
This patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept.
The numpy.diff()
function takes a NumPy array and returns the differences between two successive values along a specified axis. For example, a NumPy array that looks like this:
npdiff = numpy.array([ 0, 2, 5, 9, 14])
Calling numpy.diff(npdiff)
would do the following calculations and put the answers in another array.
[ 2 - 0, 5 - 2, 9 - 5, 14 - 9 ]
numpy.diff(npdiff)
array([2, 3, 4, 5])
- Which axis would it make sense to use this function along?
- If the shape of an individual data file is
(60, 40)
(60 rows and 40 columns), what would the shape of the array be after you run thediff()
function and why? - How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?
Save these answers in week-four
in your lab-assignments
directory as a Jupyter Notebook (with filename: week-four-python-problems
and push to GitHub by this Saturday (2/1) at 11:59pm.