Homework

Part 1: (Due by Friday (1/31) at noon)

Use Python within a Jupyter Notebook (with filename: week-four-prelim-analysis) to gather some interesting statistics and information about your data set for your final project. This part of the assignment is open-ended, meaning I am not requiring you to do a specific set of analyses. Rather, I want you to explore your data set using the techniques and skills we learned today. You will keep learning how to do more in Python in the coming weeks, which will only allow you to do even more with a data set. In the meantime, think about interesting ways that you could analyze the data set (as you would typically do in Excel)?

Use comments (#) within your code in you Jupyter Notebook file to explain what the analyses you are doing are, what you are exploring, and potential questions you could ask and answer using these data. Push to GitHub by this Friday (1/31) at noon (12pm). (worth 4% of your final project grade)

Part 2: (Due by Saturday (2/1) at 11:59pm)

Slicing Strings

A section of an array is called a slice. We can take slices of character strings as well:

element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])
  1. What is the value of element[:4]? What about element[4:]? Or element[:]?
  2. What is element[-1]? What is element[-2]?
  3. Given those answers, explain what element[1:-1] does.
  4. The expression element[3:3] produces an empty string, i.e., a string that contains no characters. If data holds our array of patient data, what does data[3:3, 4:4] produce? What about data[3:3, :]?

Stacking Arrays

Arrays can be concatenated and stacked on top of one another, using NumPy’s vstack and hstack functions for vertical and horizontal stacking, respectively.

import numpy

A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)

B = numpy.hstack([A, A])
print('B = ')
print(B)

C = numpy.vstack([A, A])
print('C = ')
print(C)
  1. Write some additional code that slices the first and last columns of A, and stacks them into a 3x2 array. Make sure to print the results to verify your solution.

Change In Inflammation

This patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept.

The numpy.diff() function takes a NumPy array and returns the differences between two successive values along a specified axis. For example, a NumPy array that looks like this:

npdiff = numpy.array([ 0,  2,  5,  9, 14])

Calling numpy.diff(npdiff) would do the following calculations and put the answers in another array.

[ 2 - 0, 5 - 2, 9 - 5, 14 - 9 ]
numpy.diff(npdiff)
array([2, 3, 4, 5])
  1. Which axis would it make sense to use this function along?
  2. If the shape of an individual data file is (60, 40) (60 rows and 40 columns), what would the shape of the array be after you run the diff() function and why?
  3. How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?

Save these answers in week-four in your lab-assignments directory as a Jupyter Notebook (with filename: week-four-python-problems and push to GitHub by this Saturday (2/1) at 11:59pm.