Temel Python

Harvard Universitesinin EDX platfromu uzerinde Using Python for Research adli dersinden yararlanilmistir.

link: https://courses.edx.org/courses/course-v1:HarvardX+PH526x+3T2016/course/

In [1]:
import numpy as np # matris ve vektor islemleri icin
x = np.array([1,3,5])

x.mean() # mean bir metod. NOT: () var
Out[1]:
3.0
In [2]:
x.shape # shape bir ozellik. NOT: () yok
Out[2]:
(3,)
In [3]:
import math
math.pi # pi sayisi
Out[3]:
3.141592653589793
In [4]:
math.cos(math.pi)
Out[4]:
-1.0

math.sqrt ve np.sqet farki

math.sqrt vektorler ile calismaz

In [5]:
math.sqrt
Out[5]:
<function math.sqrt>
In [6]:
np.sqrt
Out[6]:
<ufunc 'sqrt'>
In [7]:
sq = np.sqrt
try:
    print(sq([16, 9, 4]))
    print(sq, "vektorler ile calisir")
except:
    print(sq, "vektorler ile calismaz")
[ 4.  3.  2.]
<ufunc 'sqrt'> vektorler ile calisir
In [8]:
sq = math.sqrt
try:
    print(sq([16, 9, 4]))
    print(sq, "vektorler ile calisir")
except:
    print(sq, "vektorler ile calismaz")
<built-in function sqrt> vektorler ile calismaz

Data tipleri

In [9]:
type("Uzay")
Out[9]:
str
In [10]:
type(5)
Out[10]:
int
In [11]:
type(True)
Out[11]:
bool
In [12]:
True or False
Out[12]:
True

Rastgele

In [13]:
import random
random.choice([3,"A", 5, "B"])
Out[13]:
'B'
In [14]:
random.choice([3,"A", 5, "B"])
Out[14]:
'A'
In [15]:
A = np.random.randint(low = 20, high=30, size = (9,2))
A
Out[15]:
array([[24, 27],
       [27, 29],
       [20, 20],
       [29, 22],
       [20, 28],
       [21, 29],
       [23, 28],
       [20, 26],
       [25, 25]])
In [16]:
B = A - 20
B
Out[16]:
array([[4, 7],
       [7, 9],
       [0, 0],
       [9, 2],
       [0, 8],
       [1, 9],
       [3, 8],
       [0, 6],
       [5, 5]])
In [17]:
B.sum(axis=0) # satirlar uzerinden toplam
Out[17]:
array([29, 54])
In [18]:
B.sum(axis=1) # sutunlar uzerinden toplam
Out[18]:
array([11, 16,  0, 11,  8, 10, 11,  6, 10])

Tuple kullanimi

veri paketlemek icin kullanilir. Ozellikle bir fonksiyon birden fazla deger dondurecekse, cok faydali olur, veriler paketlenip gonderilir.

In [19]:
x, y = 5, 7
koordinat = (x, y)
koordinat
Out[19]:
(5, 7)
In [20]:
a, b = koordinat
print(a, "->", b)
5 -> 7
In [21]:
def topla_cikar(a,b):
    ''' toplam ve fark degerlerini tuple icinde dondur'''
    toplam = a + b
    fark = a - b
    return (toplam, fark)

toplam, fark = topla_cikar(4,6)
print("toplam : {} fark : {}".format(toplam, fark))
toplam : 10 fark : -2
In [22]:
def password(uzunluk):
    ''' password olusturur.'''
    pw = ""
    ch = "abcdefghijklmnopqrstuvyxwz" + "0123456789"
    for i in range(uzunluk):
        pw = pw + random.choice(ch) 
    return pw
    
password(6)
Out[22]:
'tqs6cx'

Cizim

In [23]:
np.linspace(10,20,5)
Out[23]:
array([ 10. ,  12.5,  15. ,  17.5,  20. ])
In [24]:
x = np.linspace(-100,100,50)
y = x**2
y
Out[24]:
array([  1.00000000e+04,   9.20033319e+03,   8.43398584e+03,
         7.70095793e+03,   7.00124948e+03,   6.33486047e+03,
         5.70179092e+03,   5.10204082e+03,   4.53561016e+03,
         4.00249896e+03,   3.50270721e+03,   3.03623490e+03,
         2.60308205e+03,   2.20324865e+03,   1.83673469e+03,
         1.50354019e+03,   1.20366514e+03,   9.37109538e+02,
         7.03873386e+02,   5.03956685e+02,   3.37359434e+02,
         2.04081633e+02,   1.04123282e+02,   3.74843815e+01,
         4.16493128e+00,   4.16493128e+00,   3.74843815e+01,
         1.04123282e+02,   2.04081633e+02,   3.37359434e+02,
         5.03956685e+02,   7.03873386e+02,   9.37109538e+02,
         1.20366514e+03,   1.50354019e+03,   1.83673469e+03,
         2.20324865e+03,   2.60308205e+03,   3.03623490e+03,
         3.50270721e+03,   4.00249896e+03,   4.53561016e+03,
         5.10204082e+03,   5.70179092e+03,   6.33486047e+03,
         7.00124948e+03,   7.70095793e+03,   8.43398584e+03,
         9.20033319e+03,   1.00000000e+04])
In [25]:
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()
In [26]:
x = np.linspace(-10,10,100)
y = x**2
plt.plot(y,x, label = "$x^{1/2}$")
plt.plot(x,y, label = "$x^2$")
plt.grid()
plt.legend()
plt.savefig("saved.pdf")
plt.show()
In [27]:
random.choice(["Yazi","Tura"])
Out[27]:
'Tura'
In [28]:
atis = [random.choice([0,1]) for i in range(1000)]
plt.hist(atis)
plt.show()
In [29]:
zar = list(range(1,7))
zar
Out[29]:
[1, 2, 3, 4, 5, 6]
In [30]:
atis = [random.choice(zar) for i in range(1000)]
plt.hist(atis)
plt.show()

Rassal Yuruyus

In [31]:
A = np.random.randint(low = -1, high=3, size = (2,10))
A
Out[31]:
array([[ 1,  2, -1,  0,  2,  1,  0, -1,  2,  2],
       [ 2,  1, -1,  2,  1,  0,  1,  0, -1,  1]])
In [32]:
A = A.cumsum(axis=1) # sutun uzerinde birikimli toplam
A
Out[32]:
array([[1, 3, 2, 2, 4, 5, 5, 4, 6, 8],
       [2, 3, 2, 4, 5, 5, 6, 6, 5, 6]])
In [33]:
orijin = np.array([[0],[0]])
A = np.hstack((orijin, A))
A
Out[33]:
array([[0, 1, 3, 2, 2, 4, 5, 5, 4, 6, 8],
       [0, 2, 3, 2, 4, 5, 5, 6, 6, 5, 6]])
In [34]:
plt.plot(A[0],A[1])
plt.scatter(orijin[0],orijin[1])
plt.show()
In [35]:
def rassalYuruyus(geri = -1, ileri = 3, sayi = 10000, renk = "b"):
    """
    sayi = 10000 adim boyunca, rastgele ileri ve geri adim atarsak ne olur?
    """
    A = np.random.randint(low = geri, high=ileri, size = (2,sayi))
    A = A.cumsum(axis=1) # sutun uzerinde birikimli toplam
    orijin = np.array([[0],[0]])
    A = np.hstack((orijin, A))
    plt.plot(A[0],A[1], renk)
    plt.scatter(orijin[0],orijin[1])
    plt.show()
In [36]:
# ileri geriden daha buyuk oldugu icin genelde hep ileri gitme egilimi var
rassalYuruyus()
In [37]:
# geri adim ve ileri adim arasindaki farki azaltalim
rassalYuruyus(geri = -2, ileri = 3, renk = "r")
In [38]:
# geri adim ve ileri adim arasindaki farki azaltalim
rassalYuruyus(geri = -2, ileri = 2, renk = "r")
In [39]:
# geri adim ve ileri adim arasindaki farki azaltalim
x = 6
rassalYuruyus(geri = -x, ileri = x+ 1, renk = "r")

DNA Analizi

Veri linki : https://www.ncbi.nlm.nih.gov/nuccore/

Arama kutusunda NM_207618.2 yazin

In [40]:
# From DNA (origin) to Amino Acids (translation)

origin = """GGTCAGAAAAAGCCCTCTCCATGTCTACTCACGATACATCCCTGAAAACCACTGAGGAAGTGGCTTTTCA
GATCATCTTGCTTTGCCAGTTTGGGGTTGGGACTTTTGCCAATGTATTTCTCTTTGTCTATAATTTCTCT
CCAATCTCGACTGGTTCTAAACAGAGGCCCAGACAAGTGATTTTAAGACACATGGCTGTGGCCAATGCCT
TAACTCTCTTCCTCACTATATTTCCAAACAACATGATGACTTTTGCTCCAATTATTCCTCAAACTGACCT
CAAATGTAAATTAGAATTCTTCACTCGCCTCGTGGCAAGAAGCACAAACTTGTGTTCAACTTGTGTTCTG
AGTATCCATCAGTTTGTCACACTTGTTCCTGTTAATTCAGGTAAAGGAATACTCAGAGCAAGTGTCACAA
ACATGGCAAGTTATTCTTGTTACAGTTGTTGGTTCTTCAGTGTCTTAAATAACATCTACATTCCAATTAA
GGTCACTGGTCCACAGTTAACAGACAATAACAATAACTCTAAAAGCAAGTTGTTCTGTTCCACTTCTGAT
TTCAGTGTAGGCATTGTCTTCTTGAGGTTTGCCCATGATGCCACATTCATGAGCATCATGGTCTGGACCA
GTGTCTCCATGGTACTTCTCCTCCATAGACATTGTCAGAGAATGCAGTACATATTCACTCTCAATCAGGA
CCCCAGGGGCCAAGCAGAGACCACAGCAACCCATACTATCCTGATGCTGGTAGTCACATTTGTTGGCTTT
TATCTTCTAAGTCTTATTTGTATCATCTTTTACACCTATTTTATATATTCTCATCATTCCCTGAGGCATT
GCAATGACATTTTGGTTTCGGGTTTCCCTACAATTTCTCCTTTACTGTTGACCTTCAGAGACCCTAAGGG
TCCTTGTTCTGTGTTCTTCAACTGTTGAAAGCCAGAGTCACTAAAAATGCCAAACACAGAAGACAGCTTT
GCTAATACCATTAAATACTTTATTCCATAAATATGTTTTTAAAAGCTTGTATGAACAAGGTATGGTGCTC
ACTGCTATACTTATAAAAGAGTAAGGTTATAATCACTTGTTGATATGAAAAGATTTCTGGTTGGAATCTG
ATTGAAACAGTGAGTTATTCACCACCCTCCATTCTCT"""


translation="""MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPIST
GSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARST
NLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTG
PQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRM
QYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCND
ILVSGFPTISPLLLTFRDPKGPCSVFFNC"""

translation[:4]
Out[40]:
'MSTH'
In [41]:
# there are some invisible "\n" line breaks, get rid of them
origin = origin.replace("\n","")
origin = origin.replace("\r","")
origin = origin[20:938]

translation = translation.replace("\n","")
translation = translation.replace("\r","")
In [42]:
# Dictionary (lookup table) from DNA to AminoAcids
table = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}
In [43]:
table["ATA"]
Out[43]:
'I'
In [45]:
k = 0
triple = origin[k:(k+3)]
print(triple, "->", table[triple])
ATG -> M
In [46]:
mytranslation = ""
L = int(len(origin) / 3)
for i in range(L):
    k = 3 * i
    triple = origin[k:(k+3)]
    mytranslation += table[triple]
mytranslation
Out[46]:
'MSTHDTSLKTTEEVAFQIILLCQFGVGTFANVFLFVYNFSPISTGSKQRPRQVILRHMAVANALTLFLTIFPNNMMTFAPIIPQTDLKCKLEFFTRLVARSTNLCSTCVLSIHQFVTLVPVNSGKGILRASVTNMASYSCYSCWFFSVLNNIYIPIKVTGPQLTDNNNNSKSKLFCSTSDFSVGIVFLRFAHDATFMSIMVWTSVSMVLLLHRHCQRMQYIFTLNQDPRGQAETTATHTILMLVVTFVGFYLLSLICIIFYTYFIYSHHSLRHCNDILVSGFPTISPLLLTFRDPKGPCSVFFNC_'
In [48]:
# bizim cevirimizdeki son karakter haric tutarak, asil ceviri ile karsilatiralim
translation == mytranslation[:-1]
Out[48]:
True

Milli Piyango

Cekilis sonuclarini yukairdakine benzer bir bicimde sorgulayan bir sistem yapabilir misiniz?

Sorgu sistemi: http://www.millipiyango.gov.tr (Tum listeyi tiklayin)

In [ ]:
 
In [ ]:
 
In [ ]:
 

Metin Isleme

Gutenberg Projesi kapsaminda Ingilizce ve Fransizca bazi kitaplari inceleyecegiz. Kitaplari asagidaki linkten indirebilirsiniz.

https://prod-edxapp.edx-cdn.org/assets/courseware/v1/1d1e264f416e27b22a0b8c970d52f3e3/asset-v1:HarvardX+PH526x+3T2016+type@asset+block/Books_EngFr.zip

Ama once biraz alistirma yapalim.

In [49]:
pierces_lyrics = """We are stars,
Fashioned in the flesh and bone,
We are islands,
Excuses to remain alone,
We are moons,
Throw ourselves around each other,
We are oceans,
Being controlled by the pull of another.
"""

text = pierces_lyrics
pierces_lyrics
Out[49]:
'We are stars,\nFashioned in the flesh and bone,\nWe are islands,\nExcuses to remain alone,\nWe are moons,\nThrow ourselves around each other,\nWe are oceans,\nBeing controlled by the pull of another.\n'
In [50]:
pierces_lyrics = pierces_lyrics.replace(","," ")
pierces_lyrics =  pierces_lyrics.replace("\n"," ")
pierces_lyrics
Out[50]:
'We are stars  Fashioned in the flesh and bone  We are islands  Excuses to remain alone  We are moons  Throw ourselves around each other  We are oceans  Being controlled by the pull of another. '
In [51]:
pierces_lyrics = pierces_lyrics.lower()
pierces_lyrics
Out[51]:
'we are stars  fashioned in the flesh and bone  we are islands  excuses to remain alone  we are moons  throw ourselves around each other  we are oceans  being controlled by the pull of another. '
In [52]:
pierces_lyrics.split(" ")
Out[52]:
['we',
 'are',
 'stars',
 '',
 'fashioned',
 'in',
 'the',
 'flesh',
 'and',
 'bone',
 '',
 'we',
 'are',
 'islands',
 '',
 'excuses',
 'to',
 'remain',
 'alone',
 '',
 'we',
 'are',
 'moons',
 '',
 'throw',
 'ourselves',
 'around',
 'each',
 'other',
 '',
 'we',
 'are',
 'oceans',
 '',
 'being',
 'controlled',
 'by',
 'the',
 'pull',
 'of',
 'another.',
 '']
In [57]:
lyrics = {}
for word in pierces_lyrics.split(" "):
    if word == "": continue
    lyrics.setdefault(word, 0) #unseen word default value
    lyrics[word] += 1
    
lyrics
Out[57]:
{'alone': 1,
 'and': 1,
 'another.': 1,
 'are': 4,
 'around': 1,
 'being': 1,
 'bone': 1,
 'by': 1,
 'controlled': 1,
 'each': 1,
 'excuses': 1,
 'fashioned': 1,
 'flesh': 1,
 'in': 1,
 'islands': 1,
 'moons': 1,
 'oceans': 1,
 'of': 1,
 'other': 1,
 'ourselves': 1,
 'pull': 1,
 'remain': 1,
 'stars': 1,
 'the': 2,
 'throw': 1,
 'to': 1,
 'we': 4}
In [58]:
def getDictionary(text):
    text = text.lower()
    skips = [".", ",",";",":", "?", "'",'"', "\n"]
    for s in skips: 
        text = text.replace(s," ")

    mydict = {}
    for word in text.split(" "):
        if word == "": continue
        mydict.setdefault(word, 0) #unseen word default value
        mydict[word] += 1
    return mydict

getDictionary(text)
Out[58]:
{'alone': 1,
 'and': 1,
 'another': 1,
 'are': 4,
 'around': 1,
 'being': 1,
 'bone': 1,
 'by': 1,
 'controlled': 1,
 'each': 1,
 'excuses': 1,
 'fashioned': 1,
 'flesh': 1,
 'in': 1,
 'islands': 1,
 'moons': 1,
 'oceans': 1,
 'of': 1,
 'other': 1,
 'ourselves': 1,
 'pull': 1,
 'remain': 1,
 'stars': 1,
 'the': 2,
 'throw': 1,
 'to': 1,
 'we': 4}
In [59]:
getDictionary(text).values()
Out[59]:
dict_values([4, 4, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
In [61]:
lyrics == getDictionary(text)
Out[61]:
False
In [62]:
def read_book(path):
    """
    Read a book and return it as a string
    """
    with open(path, "r", encoding="utf8") as f:
        text = f.read()
        text = text.replace("\n","").replace("\r","")
    return text

title_path = "Books_EngFr/English/shakespeare/Romeo and Juliet.txt"
book = read_book(title_path)
len(book) 
Out[62]:
169275
In [63]:
ind = book.find("What's in a name?")
ind
Out[63]:
42757
In [64]:
quote = book[ind:(ind+100)]
quote
Out[64]:
"What's in a name? That which we call a rose    By any other name would smell as sweet.    So Romeo w"
In [65]:
def word_stats(word_counts):
    """ return number of unique words and word frequencies"""
    num_unique = len(word_counts)
    counts = word_counts.values()
    return (num_unique, counts)


title_path = "Books_EngFr/English/shakespeare/Romeo and Juliet.txt"
book = read_book(title_path)

word_counts = getDictionary(book)
(num_unique, counts) = word_stats(word_counts)
In [66]:
num_unique
Out[66]:
4715
In [67]:
sum(counts)
Out[67]:
29631
In [69]:
import os
book_dir = "Books_EngFr"
for language in os.listdir(book_dir):
    if language.startswith("."): continue # hidden file
    for author in os.listdir(book_dir + "/" + language):
        if author.startswith("."): continue # hidden file
        for title in os.listdir(book_dir + "/" + language + "/" + author):
            if title.startswith("."): continue # hidden file
            
            inputfile = book_dir + "/" + language + "/" + author + "/" + title
            print(inputfile)
            
            book = read_book(inputfile)
            (num_unique, counts) = word_stats(getDictionary(book))
            print(num_unique)
Books_EngFr/English/shakespeare/Othello.txt
5897
Books_EngFr/English/shakespeare/Richard III.txt
5028
Books_EngFr/English/shakespeare/The Merchant of Venice.txt
4977
Books_EngFr/English/shakespeare/A Midsummer Night's Dream.txt
4344
Books_EngFr/English/shakespeare/Macbeth.txt
4779
Books_EngFr/English/shakespeare/Hamlet.txt
6775
Books_EngFr/English/shakespeare/Romeo and Juliet.txt
4715
Books_EngFr/French/de Maupassant/Œuvres complètes de Guy de Maupassant.txt
13730
Books_EngFr/French/de Maupassant/La Main Gauche.txt
8771
Books_EngFr/French/de Maupassant/Contes de la Becasse.txt
9151
Books_EngFr/French/de Maupassant/Claire de Lune.txt
7310
Books_EngFr/French/de Maupassant/La petite roque.txt
9611
Books_EngFr/French/de Maupassant/Le Horla.txt
9758
Books_EngFr/French/de Maupassant/L'inutile beautÇ.txt
9487
Books_EngFr/French/de Maupassant/Boule de Suif.txt
9703
Books_EngFr/French/de Maupassant/La Maison Tellier.txt
10957
Books_EngFr/French/diderot/L'oiseau blanc.txt
5767
Books_EngFr/French/diderot/Les deux amis de Bourbonne.txt
2840
Books_EngFr/French/diderot/Regrets sur ma vieille robe de chambre.txt
1847
Books_EngFr/French/diderot/Ceci n'est pas un conte.txt
3522
Books_EngFr/French/diderot/Entretien d'un päre avec ses enfants.txt
3402
Books_EngFr/French/chevalier/La fille des indiens rouges.txt
15541
Books_EngFr/French/chevalier/Les derniers Iroquois.txt
13745
Books_EngFr/French/chevalier/La capitaine.txt
11276
Books_EngFr/French/chevalier/Le chasseur noir.txt
10974
Books_EngFr/French/chevalier/L'enfer et le paradis de l'autre monde.txt
9449
Books_EngFr/French/chevalier/L'åle de sable.txt
16676
Books_EngFr/French/chevalier/La fille du pirate.txt
13250
Books_EngFr/French/sand/Le poâme de Myrza.txt
4156
Books_EngFr/French/sand/La Coupe; Lupo Liverani; Le Toast; Garnier; Le Contrebandier; La Ràverie Ö Paris.txt
11207
Books_EngFr/French/sand/Le Piccinino.txt
23120
Books_EngFr/French/sand/Jacques le fataliste et son maåtre.txt
15313
Books_EngFr/French/sand/Pauline.txt
7329
Books_EngFr/French/sand/cora.txt
4228
Books_EngFr/French/sand/La Marquise.txt
4531
Books_EngFr/French/sand/L' Orco.txt
3057
Books_EngFr/French/sand/Mattea.txt
6065
Books_EngFr/French/sand/Metella.txt
5304
Books_EngFr/French/sand/Oeuvres illustrÇes de George Sand.txt
7074
In [70]:
import pandas as pd
table = pd.DataFrame(columns = ("name", "age"))
table.loc["1st"] = "James", 22
table.loc["2nd"] = "Jess", 23
table
Out[70]:
name age
1st James 22
2nd Jess 23
In [71]:
import os
book_dir = "Books_EngFr"

stats = pd.DataFrame(columns = ("language","author", "title", "length", "unique"))
title_num = 1


for language in os.listdir(book_dir):
    if language.startswith("."): continue # hidden file
    for author in os.listdir(book_dir + "/" + language):
        if author.startswith("."): continue # hidden file
        for title in os.listdir(book_dir + "/" + language + "/" + author):
            if title.startswith("."): continue # hidden file
            
            inputfile = book_dir + "/" + language + "/" + author + "/" + title            
            book = read_book(inputfile)
            (num_unique, counts) = word_stats(getDictionary(book))
            stats.loc[title_num] = language, author, title, sum(counts), num_unique
            title_num += 1
              
            
stats 
Out[71]:
language author title length unique
1 English shakespeare Othello.txt 24740 5897
2 English shakespeare Richard III.txt 35345 5028
3 English shakespeare The Merchant of Venice.txt 19832 4977
4 English shakespeare A Midsummer Night's Dream.txt 15237 4344
5 English shakespeare Macbeth.txt 15779 4779
6 English shakespeare Hamlet.txt 26701 6775
7 English shakespeare Romeo and Juliet.txt 29631 4715
8 French de Maupassant Œuvres complètes de Guy de Maupassant.txt 58141 13730
9 French de Maupassant La Main Gauche.txt 37339 8771
10 French de Maupassant Contes de la Becasse.txt 37454 9151
11 French de Maupassant Claire de Lune.txt 28138 7310
12 French de Maupassant La petite roque.txt 44447 9611
13 French de Maupassant Le Horla.txt 44993 9758
14 French de Maupassant L'inutile beautÇ.txt 41296 9487
15 French de Maupassant Boule de Suif.txt 39181 9703
16 French de Maupassant La Maison Tellier.txt 48148 10957
17 French diderot L'oiseau blanc.txt 23691 5767
18 French diderot Les deux amis de Bourbonne.txt 8764 2840
19 French diderot Regrets sur ma vieille robe de chambre.txt 5171 1847
20 French diderot Ceci n'est pas un conte.txt 11593 3522
21 French diderot Entretien d'un päre avec ses enfants.txt 11711 3402
22 French chevalier La fille des indiens rouges.txt 72203 15541
23 French chevalier Les derniers Iroquois.txt 59248 13745
24 French chevalier La capitaine.txt 49416 11276
25 French chevalier Le chasseur noir.txt 55478 10974
26 French chevalier L'enfer et le paradis de l'autre monde.txt 43931 9449
27 French chevalier L'åle de sable.txt 77143 16676
28 French chevalier La fille du pirate.txt 53278 13250
29 French sand Le poâme de Myrza.txt 13735 4156
30 French sand La Coupe; Lupo Liverani; Le Toast; Garnier; Le... 51025 11207
31 French sand Le Piccinino.txt 166793 23120
32 French sand Jacques le fataliste et son maåtre.txt 96728 15313
33 French sand Pauline.txt 29857 7329
34 French sand cora.txt 13173 4228
35 French sand La Marquise.txt 15995 4531
36 French sand L' Orco.txt 9321 3057
37 French sand Mattea.txt 22827 6065
38 French sand Metella.txt 21325 5304
39 French sand Oeuvres illustrÇes de George Sand.txt 24527 7074
In [73]:
# yeni bir sutun ekleyelim
stats["ratio"] = stats["unique"] / stats["length"] 
stats.sort_values("ratio") # ratio ya gore siralayalim 
Out[73]:
language author title length unique ratio
31 French sand Le Piccinino.txt 166793 23120 0.138615
2 English shakespeare Richard III.txt 35345 5028 0.142255
32 French sand Jacques le fataliste et son maåtre.txt 96728 15313 0.15831
7 English shakespeare Romeo and Juliet.txt 29631 4715 0.159124
25 French chevalier Le chasseur noir.txt 55478 10974 0.197808
26 French chevalier L'enfer et le paradis de l'autre monde.txt 43931 9449 0.215087
22 French chevalier La fille des indiens rouges.txt 72203 15541 0.21524
27 French chevalier L'åle de sable.txt 77143 16676 0.21617
12 French de Maupassant La petite roque.txt 44447 9611 0.216235
13 French de Maupassant Le Horla.txt 44993 9758 0.216878
30 French sand La Coupe; Lupo Liverani; Le Toast; Garnier; Le... 51025 11207 0.219637
16 French de Maupassant La Maison Tellier.txt 48148 10957 0.227569
24 French chevalier La capitaine.txt 49416 11276 0.228185
14 French de Maupassant L'inutile beautÇ.txt 41296 9487 0.229732
23 French chevalier Les derniers Iroquois.txt 59248 13745 0.231991
9 French de Maupassant La Main Gauche.txt 37339 8771 0.234902
8 French de Maupassant Œuvres complètes de Guy de Maupassant.txt 58141 13730 0.23615
1 English shakespeare Othello.txt 24740 5897 0.238359
17 French diderot L'oiseau blanc.txt 23691 5767 0.243426
10 French de Maupassant Contes de la Becasse.txt 37454 9151 0.244326
33 French sand Pauline.txt 29857 7329 0.24547
15 French de Maupassant Boule de Suif.txt 39181 9703 0.247646
28 French chevalier La fille du pirate.txt 53278 13250 0.248696
38 French sand Metella.txt 21325 5304 0.248722
3 English shakespeare The Merchant of Venice.txt 19832 4977 0.250958
6 English shakespeare Hamlet.txt 26701 6775 0.253736
11 French de Maupassant Claire de Lune.txt 28138 7310 0.259791
37 French sand Mattea.txt 22827 6065 0.265694
35 French sand La Marquise.txt 15995 4531 0.283276
4 English shakespeare A Midsummer Night's Dream.txt 15237 4344 0.285095
39 French sand Oeuvres illustrÇes de George Sand.txt 24527 7074 0.288417
21 French diderot Entretien d'un päre avec ses enfants.txt 11711 3402 0.290496
29 French sand Le poâme de Myrza.txt 13735 4156 0.302585
5 English shakespeare Macbeth.txt 15779 4779 0.302871
20 French diderot Ceci n'est pas un conte.txt 11593 3522 0.303804
34 French sand cora.txt 13173 4228 0.32096
18 French diderot Les deux amis de Bourbonne.txt 8764 2840 0.324053
36 French sand L' Orco.txt 9321 3057 0.327969
19 French diderot Regrets sur ma vieille robe de chambre.txt 5171 1847 0.357184
In [74]:
import matplotlib.pyplot as plt
plt.plot(stats.length, stats.unique, "bo")
plt.show()
In [75]:
plt.figure(figsize=(10,10))
subset = stats[stats.language == "English"]
plt.loglog(subset.length, subset.unique, "bo", label = "English")

subset = stats[stats.language == "French"]
plt.loglog(subset.length, subset.unique, "ro", label = "French")

plt.xlabel("Length of Book")
plt.ylabel("Number of Unique Words")
plt.legend()
plt.show()
In [ ]: