Homework

Part 1: (Due Tomorrow on GitHub by 9am)

  • Complete all 15 lessons on the RegexOne online tutorial https://regexone.com/lesson/introduction_abcs (we have already completed lessons 1-8 in class), but you should do them again to reinforce what we’ve discussed in lab today.
  • Don’t rush through these intensive tutorials. If you really want to master regular expressions, read through all commentary and explanations.
  • Create a week-two directory in your lab-exercises directory and place your solutions in a single .txt file named regexone-solutions.txt, numbered line-by-line with your matching patterns.

Part 2: (Due Sunday 1/19 on GitHub by 11:59pm)

To solve these problems, I would recommend copy and pasting each of the text blocks into Sublime Text separately.

2.1 DNA Sequence

Create a .txt file named dna_regex.txt in your lab-exercises/week-two/ directory containing the answers to the following questions.

DNA Sequence with 1000 bases

ctgtagtctactattttgtgcactcggcgcggccctatacagctttgaattacgtctaaa
cttgggtttattgtttggatttcgcgtcagccagacgcttagccaagatgccaagggccc
tagactcctaaagttgtgattagtacgctgaacgattgttgtatgaaacaagccttaggc
tataaccatcgctatcgtaagcactctaatttagttaagacctctgtttcgttggttttc
gcgcatgtgacgaagcgcccggacagcccagccctagagtccacggattatacccctctg
tcatggagtatgaccccatccagatacttgcagcagcccgtagttaactgactggcctgc
ccatctacatgacacccgtttgtcccctatcacaaaacacatcctcttatgcaggtctga
agcaacgcttgggttcgaaactacctatggatacctcccttgaggtctccaggtataggc
acaaagtacggctgaacactattcgagtcaccatgcttctaacgtgattcgtgcgaccga
cgaatcaggtgagcccaaggctcctcgggtttccgcgtcttagaaaaagtggatatagtc
gaaatcgtgaggccggtttaggacttcgatgtttatcgaatcacatcgcaaggggcgtag
gccaacgccaagaatctgtctttggatgcattcatccatgctgacagcttcactgagtga
cgagggagggggggcatagtgagttattattccttataagggaaacttccccagtctcat
aatcattaccccagggaaatgcgtacagcgacgcgcccctagcaggtcggaagccttggc
agactaagaaataaatactagttattttccgtctgcgcggcccaccacgaaaggaaactg
ccacgtagagtggccatgtcacggtaaaactaccccgagacagatgcctgatgctatgca
cacgacattgcgcctagattgcctatcagttttccgcgct
  1. Write a regular expression that will match any bases with at least 2 c’s but no more than 5 c’s.

  2. Write a regular expression that will match any ct followed by an a,c,t (not g). Your resulting matches should be 3 characters long each.

2.2 Searching Through Text

Create a .txt file named darwin_regex.txt in your lab-exercises/week-two directory containing the answers to the following questions.

Excerpt from Ch. 4 of Darwin’s On The Origin of Species

Natural Selection: its power compared with man’s selection, its power on characters of trifling importance, its power at all ages and on both sexes. Sexual Selection. On the generality of intercrosses between individuals of the same species. Circumstances favourable and unfavourable to Natural Selection, namely, intercrossing, isolation, number of individuals. Slow action. Extinction caused by Natural Selection. Divergence of Character, related to the diversity of inhabitants of any small area, and to naturalisation. Action of Natural Selection, through Divergence of Character and Extinction, on the descendants from a common parent. Explains the Grouping of all organic beings. How will the struggle for existence, discussed too briefly in the last chapter, act in regard to variation? Can the principle of selection, which we have seen is so potent in the hands of man, apply in nature? I think we shall see that it can act most effectually. Let it be borne in mind in what an endless number of strange peculiarities our domestic productions, and, in a lesser degree, those under nature, vary; and how strong the hereditary tendency is. Under domestication, it may be truly said that the whole organisation becomes in some degree plastic. Let it be borne in mind how infinitely complex and close-fitting are the mutual relations of all organic beings to each other and to their physical conditions of life. Can it, then, be thought improbable, seeing that variations useful to man have undoubtedly occurred, that other variations useful in some way to each being in the great and complex battle of life, should sometimes occur in the course of thousands of generations? If such do occur, can we doubt (remembering that many more individuals are born than can possibly survive) that individuals having any advantage, however slight, over others, would have the best chance of surviving and of procreating their kind? On the other hand, we may feel sure that any variation in the least degree injurious would be rigidly destroyed. This preservation of favourable variations and the rejection of injurious variations, I call Natural Selection. Variations neither useful nor injurious would not be affected by natural selection, and would be left a fluctuating element, as perhaps we see in the species called polymorphic.

  1. Write a regular expression that will find all of the words beginning with the letter s. Your regular expression should capture the whole word, not just the beginning s.

  2. Write a regular expression that will find the last word of every sentence. Your regular expression should capture the whole last word, not just the first or last letter!

  3. Write a regular expression that will match every whole sentence in the text (excluding the period).

  4. Write a regular expression that will match the sentence containing the phrase “struggle for existence”.

Bonus Question (Extra Credit) Write a regular expression that will find the first word of every sentence. Your regular expression should match the whole first word, not just the first letter! In order to receive extra credit, you must explain your logic and reasoning of your regular expression and why you chose what you did.