Repaso de python.
Contents
Repaso de python.¶
Objetivo: El objetivo de esta sesion es estudiar conceptos basicos de python necesarios para realizar modelos de maching learning
Python Básico
Objetos
Operación filter, map, reduce
Pandas,
Matplotlib , Sea born
Aplicaciones
Python Philosophy..¶
https://www.python.org/dev/peps/pep-0020/#abstract
import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Functions, Loops, Conditionals, and list comprehensions¶
my_list = []
for number in range(0, 10):
if number % 2 == 0:
my_list.append(number)
my_list
[0, 2, 4, 6, 8]
my_list = [number for number in range(0, 10) if number %2 ==0]
my_list
[0, 2, 4, 6, 8]
def times_tables():
"""
Params:
--
Return:
-- lst: List
"""
lst = []
for i in range(10):
for j in range (10):
lst.append(i*j)
return lst
times_tables() == [j*i for i in range(10) for j in range(10)]
True
Built-in Functions¶
https://docs.python.org/3/library/functions.html
Function map¶
import numpy as np
b = map(lambda x: x**2, range(10))
type(b)
map
list(b)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
sum(b)
0
def squares_0():
squares = []
for x in range(10000):
squares.append(x**2)
return squares
# Operation use in big data files
def squares_1():
return list(map(lambda x: x**2, range(10000)))
def squares_2():
return [x**2 for x in range(10000)]
squares_0()
[0,
1,
4,
9,
16,
25,
36,
49,
64,
81,
100,
121,
144,
169,
196,
225,
256,
289,
324,
361,
400,
441,
484,
529,
576,
625,
676,
729,
784,
841,
900,
961,
1024,
1089,
1156,
1225,
1296,
1369,
1444,
1521,
1600,
1681,
1764,
1849,
1936,
2025,
2116,
2209,
2304,
2401,
2500,
2601,
2704,
2809,
2916,
3025,
3136,
3249,
3364,
3481,
3600,
3721,
3844,
3969,
4096,
4225,
4356,
4489,
4624,
4761,
4900,
5041,
5184,
5329,
5476,
5625,
5776,
5929,
6084,
6241,
6400,
6561,
6724,
6889,
7056,
7225,
7396,
7569,
7744,
7921,
8100,
8281,
8464,
8649,
8836,
9025,
9216,
9409,
9604,
9801,
10000,
10201,
10404,
10609,
10816,
11025,
11236,
11449,
11664,
11881,
12100,
12321,
12544,
12769,
12996,
13225,
13456,
13689,
13924,
14161,
14400,
14641,
14884,
15129,
15376,
15625,
15876,
16129,
16384,
16641,
16900,
17161,
17424,
17689,
17956,
18225,
18496,
18769,
19044,
19321,
19600,
19881,
20164,
20449,
20736,
21025,
21316,
21609,
21904,
22201,
22500,
22801,
23104,
23409,
23716,
24025,
24336,
24649,
24964,
25281,
25600,
25921,
26244,
26569,
26896,
27225,
27556,
27889,
28224,
28561,
28900,
29241,
29584,
29929,
30276,
30625,
30976,
31329,
31684,
32041,
32400,
32761,
33124,
33489,
33856,
34225,
34596,
34969,
35344,
35721,
36100,
36481,
36864,
37249,
37636,
38025,
38416,
38809,
39204,
39601,
40000,
40401,
40804,
41209,
41616,
42025,
42436,
42849,
43264,
43681,
44100,
44521,
44944,
45369,
45796,
46225,
46656,
47089,
47524,
47961,
48400,
48841,
49284,
49729,
50176,
50625,
51076,
51529,
51984,
52441,
52900,
53361,
53824,
54289,
54756,
55225,
55696,
56169,
56644,
57121,
57600,
58081,
58564,
59049,
59536,
60025,
60516,
61009,
61504,
62001,
62500,
63001,
63504,
64009,
64516,
65025,
65536,
66049,
66564,
67081,
67600,
68121,
68644,
69169,
69696,
70225,
70756,
71289,
71824,
72361,
72900,
73441,
73984,
74529,
75076,
75625,
76176,
76729,
77284,
77841,
78400,
78961,
79524,
80089,
80656,
81225,
81796,
82369,
82944,
83521,
84100,
84681,
85264,
85849,
86436,
87025,
87616,
88209,
88804,
89401,
90000,
90601,
91204,
91809,
92416,
93025,
93636,
94249,
94864,
95481,
96100,
96721,
97344,
97969,
98596,
99225,
99856,
100489,
101124,
101761,
102400,
103041,
103684,
104329,
104976,
105625,
106276,
106929,
107584,
108241,
108900,
109561,
110224,
110889,
111556,
112225,
112896,
113569,
114244,
114921,
115600,
116281,
116964,
117649,
118336,
119025,
119716,
120409,
121104,
121801,
122500,
123201,
123904,
124609,
125316,
126025,
126736,
127449,
128164,
128881,
129600,
130321,
131044,
131769,
132496,
133225,
133956,
134689,
135424,
136161,
136900,
137641,
138384,
139129,
139876,
140625,
141376,
142129,
142884,
143641,
144400,
145161,
145924,
146689,
147456,
148225,
148996,
149769,
150544,
151321,
152100,
152881,
153664,
154449,
155236,
156025,
156816,
157609,
158404,
159201,
160000,
160801,
161604,
162409,
163216,
164025,
164836,
165649,
166464,
167281,
168100,
168921,
169744,
170569,
171396,
172225,
173056,
173889,
174724,
175561,
176400,
177241,
178084,
178929,
179776,
180625,
181476,
182329,
183184,
184041,
184900,
185761,
186624,
187489,
188356,
189225,
190096,
190969,
191844,
192721,
193600,
194481,
195364,
196249,
197136,
198025,
198916,
199809,
200704,
201601,
202500,
203401,
204304,
205209,
206116,
207025,
207936,
208849,
209764,
210681,
211600,
212521,
213444,
214369,
215296,
216225,
217156,
218089,
219024,
219961,
220900,
221841,
222784,
223729,
224676,
225625,
226576,
227529,
228484,
229441,
230400,
231361,
232324,
233289,
234256,
235225,
236196,
237169,
238144,
239121,
240100,
241081,
242064,
243049,
244036,
245025,
246016,
247009,
248004,
249001,
250000,
251001,
252004,
253009,
254016,
255025,
256036,
257049,
258064,
259081,
260100,
261121,
262144,
263169,
264196,
265225,
266256,
267289,
268324,
269361,
270400,
271441,
272484,
273529,
274576,
275625,
276676,
277729,
278784,
279841,
280900,
281961,
283024,
284089,
285156,
286225,
287296,
288369,
289444,
290521,
291600,
292681,
293764,
294849,
295936,
297025,
298116,
299209,
300304,
301401,
302500,
303601,
304704,
305809,
306916,
308025,
309136,
310249,
311364,
312481,
313600,
314721,
315844,
316969,
318096,
319225,
320356,
321489,
322624,
323761,
324900,
326041,
327184,
328329,
329476,
330625,
331776,
332929,
334084,
335241,
336400,
337561,
338724,
339889,
341056,
342225,
343396,
344569,
345744,
346921,
348100,
349281,
350464,
351649,
352836,
354025,
355216,
356409,
357604,
358801,
360000,
361201,
362404,
363609,
364816,
366025,
367236,
368449,
369664,
370881,
372100,
373321,
374544,
375769,
376996,
378225,
379456,
380689,
381924,
383161,
384400,
385641,
386884,
388129,
389376,
390625,
391876,
393129,
394384,
395641,
396900,
398161,
399424,
400689,
401956,
403225,
404496,
405769,
407044,
408321,
409600,
410881,
412164,
413449,
414736,
416025,
417316,
418609,
419904,
421201,
422500,
423801,
425104,
426409,
427716,
429025,
430336,
431649,
432964,
434281,
435600,
436921,
438244,
439569,
440896,
442225,
443556,
444889,
446224,
447561,
448900,
450241,
451584,
452929,
454276,
455625,
456976,
458329,
459684,
461041,
462400,
463761,
465124,
466489,
467856,
469225,
470596,
471969,
473344,
474721,
476100,
477481,
478864,
480249,
481636,
483025,
484416,
485809,
487204,
488601,
490000,
491401,
492804,
494209,
495616,
497025,
498436,
499849,
501264,
502681,
504100,
505521,
506944,
508369,
509796,
511225,
512656,
514089,
515524,
516961,
518400,
519841,
521284,
522729,
524176,
525625,
527076,
528529,
529984,
531441,
532900,
534361,
535824,
537289,
538756,
540225,
541696,
543169,
544644,
546121,
547600,
549081,
550564,
552049,
553536,
555025,
556516,
558009,
559504,
561001,
562500,
564001,
565504,
567009,
568516,
570025,
571536,
573049,
574564,
576081,
577600,
579121,
580644,
582169,
583696,
585225,
586756,
588289,
589824,
591361,
592900,
594441,
595984,
597529,
599076,
600625,
602176,
603729,
605284,
606841,
608400,
609961,
611524,
613089,
614656,
616225,
617796,
619369,
620944,
622521,
624100,
625681,
627264,
628849,
630436,
632025,
633616,
635209,
636804,
638401,
640000,
641601,
643204,
644809,
646416,
648025,
649636,
651249,
652864,
654481,
656100,
657721,
659344,
660969,
662596,
664225,
665856,
667489,
669124,
670761,
672400,
674041,
675684,
677329,
678976,
680625,
682276,
683929,
685584,
687241,
688900,
690561,
692224,
693889,
695556,
697225,
698896,
700569,
702244,
703921,
705600,
707281,
708964,
710649,
712336,
714025,
715716,
717409,
719104,
720801,
722500,
724201,
725904,
727609,
729316,
731025,
732736,
734449,
736164,
737881,
739600,
741321,
743044,
744769,
746496,
748225,
749956,
751689,
753424,
755161,
756900,
758641,
760384,
762129,
763876,
765625,
767376,
769129,
770884,
772641,
774400,
776161,
777924,
779689,
781456,
783225,
784996,
786769,
788544,
790321,
792100,
793881,
795664,
797449,
799236,
801025,
802816,
804609,
806404,
808201,
810000,
811801,
813604,
815409,
817216,
819025,
820836,
822649,
824464,
826281,
828100,
829921,
831744,
833569,
835396,
837225,
839056,
840889,
842724,
844561,
846400,
848241,
850084,
851929,
853776,
855625,
857476,
859329,
861184,
863041,
864900,
866761,
868624,
870489,
872356,
874225,
876096,
877969,
879844,
881721,
883600,
885481,
887364,
889249,
891136,
893025,
894916,
896809,
898704,
900601,
902500,
904401,
906304,
908209,
910116,
912025,
913936,
915849,
917764,
919681,
921600,
923521,
925444,
927369,
929296,
931225,
933156,
935089,
937024,
938961,
940900,
942841,
944784,
946729,
948676,
950625,
952576,
954529,
956484,
958441,
960400,
962361,
964324,
966289,
968256,
970225,
972196,
974169,
976144,
978121,
980100,
982081,
984064,
986049,
988036,
990025,
992016,
994009,
996004,
998001,
...]
import numpy as np
import time
times = []
tmax = 10000
for i in range(0,tmax):
# Common for
tini0 = time.time()
squares_0()
tend0 = time.time()
# Operation map
tini1 = time.time()
squares_1()
tend1 = time.time()
# Comprhension expresion
tini2 = time.time()
squares_2()
tend2 = time.time()
times.append([tend0-tini0, tend1-tini1, tend2-tini2] )
t=np.array(times)
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Input In [14], in <cell line: 3>()
3 for i in range(0,tmax):
4 # Common for
5 tini0 = time.time()
----> 6 squares_0()
7 tend0 = time.time()
8 # Operation map
Input In [11], in squares_0()
2 squares = []
3 for x in range(10000):
----> 4 squares.append(x**2)
5 return squares
KeyboardInterrupt:
mean = np.mean(t, axis = 0)
print(mean)
[0.00034886 0.00033289 0.00029814]
# Magic command
# https://ipython.readthedocs.io/en/stable/interactive/magics.html
%%timeit -n 100
squares_0()
1000 loops, best of 5: 3.19 ms per loop
%%timeit -n 1000
squares_1()
1000 loops, best of 5: 3.03 ms per loop
%%timeit -n 1000
squares_2()
1000 loops, best of 5: 2.79 ms per loop
Filter function¶
Filtros avanzados en python a través de programacion funcional
Ref = https://docs.hektorprofe.net/python/funcionalidades-avanzadas/funcion-filter/,
https://docs.python.org/3/library/functions.html#filter
def multiple(numero): # Primero declaramos una función condicional
if numero % 5 == 0: # Comprobamos si un numero es múltiple de cinco
return True # Sólo devolvemos True si lo es
numeros = [2, 5, 10, 23, 50, 33, 5000]
a = filter(multiple, numeros)
a
<filter at 0x7fe819614110>
list(a)
[5, 10, 50, 5000]
list(a)
[]
FUNCIÓN MAP¶
a = [10.00, 11.00, 12.34, 2.0 ]
b = [9.8, 11.10, 12.34, 2.01 ]
var = map(min, a, b)
var
<map at 0x7fe819614d50>
list(var)
[9.8, 11.0, 12.34, 2.0]
var
<map at 0x7fe819614d50>
list(var)
[]
# Este resultado no muestra nada, ¿porqué?.
#La variable var ya fue evaluada a través de elementos funcionales
for item in var:
print(item)
Otro ejemplo con la operación map: Dejar el apellido de las siguientes personas
people = ['Dr. Simon Einstein', 'Dr. Pedro Euler ', 'Dr. Juan Tesla', 'Dr. Daniel Maxwell']
people[0].split()
['Dr.', 'Simon', 'Einstein']
def split_title_and_name(person):
title = person.split()[0]
lastname = person.split()[-1]
return f'{title} {lastname}'
last_names = map(split_title_and_name, people)
list(last_names)
['Dr. Einstein', 'Dr. Euler', 'Dr. Tesla', 'Dr. Maxwell']
a = []
for p in people:
a.append(split_title_and_name(p))
a
['Dr. Einstein', 'Dr. Euler', 'Dr. Tesla', 'Dr. Maxwell']
Tarea 0.1¶
Determinar los primeros 100 numeros impares empleando la funcion map
def impar(x):
if(x%2!=0):
return x
q = map(lambda x: 2*x+1 if (2*x+1)<=100 else 0 , range(50))
#list(q)
Objetos¶
class auto:
"""
Esta clase asigna un color y un tipo
a un clase tipo carro
"""
var = "taller de carros"
def set_name_tipo(self, new_tipo):
self.tipo = new_tipo
def set_name_color(self, new_color ):
self.color = new_color
carro = auto()
carro.set_name_color="rojo"
carro.set_name_tipo="bus "
print(f"El carro es {carro.set_name_color} y es un {carro.set_name_tipo} " )
El carro es rojo y es un bus
class circulo(object):
def __init__(self, R, posx, posy ):
self.R1 = R
self.posx = posx
self.posy= posy
def Area(self):
A = np.pi*(self.R1)**2
return A
def perimetro(self):
return 2*np.pi*self.R1
class circulo_(object):
def __init__(self):
self.R1 = None
self.posx = None
self.posy= None
def Area(self):
A = np.pi*(self.R1)**2
return A
def perimetro(self):
return 2*np.pi*self.R1
c = circulo(1, 0, 0)
c.Area()
c.perimetro()
6.283185307179586
cc=circulo_()
cc.posx=1
cc.posy=1
cc.R1=1
cc.R1
1
cc.perimetro()
6.283185307179586
Tarea 0.2¶
Given a sentence,you task is build a iterator the words Ref = https://www.youtube.com/watch?v=C3Z9lJXI6Qw&ab_channel=CoreySchafer
class Sentence:
def __init__(self, sentence):
self.sentence = sentence
self.index = 0
self.words = self.sentence.split()
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.words):
raise StopIteration
index = self.index
self.index += 1
return self.words[index]
my_sentence = sentence('This is a test')
print(next(my_sentence))
print(next(my_sentence))
print(next(my_sentence))
print(next(my_sentence))
#print(next(my_sentence))
This
is
a
test
a = Sentence("hola mundo esta es una prueba")
a.__next__()
'hola'
Diccionarios¶
Elementos basicos con diccionarios
students_class = { "Bob": "Physics","Alice": "Physics","Ana": "Biology" }
for i, s in enumerate(students_class):
print(i, s, students_class[s])
0 Bob Physics
1 Alice Physics
2 Ana Biology
students_class.items()
dict_items([('Bob', 'Physics'), ('Alice', 'Physics'), ('Ana', 'Biology')])
Otra forma de iteraciones para los diccionarios a través del metodo items()
for key, val in students_class.items():
print(key, val)
Bob Physics
Alice Physics
Ana Biology
Accediendo a los valores del diccionario Accediendo a las claves
metodo keys()
metodo values()
print(students_class.values())
print(students_class.keys())
dict_values(['Physics', 'Physics', 'Biology'])
dict_keys(['Bob', 'Alice', 'Ana'])
Pandas.¶
Series¶
Data Frame¶
import pandas as pd
students_class = { "Bob": "Physics",
"Alice": "Chemistry",
"Ana": "Biology" }
# Ndarray unidimensional con ejes etiquetados
s = pd.Series(students_class)
s
Bob Physics
Alice Chemistry
Ana Biology
dtype: object
# https://pandas.pydata.org/docs/reference/series.html
print(type(s.index))
s.index
<class 'pandas.core.indexes.base.Index'>
Index(['Bob', 'Alice', 'Ana'], dtype='object')
#Forma de acceder a los elementos con el número del indice
s.iloc[2]
'Biology'
#Forma de acceder a los indices
s.loc["Alice"]
'Chemistry'
s.Bob
'Physics'
Definición clave valor con enteros como clave.¶
class_code = {99:"Physics",
100:"Chemistry",
101:"English" }
s = pd.Series(class_code)
s
99 Physics
100 Chemistry
101 English
dtype: object
s.iloc[2]
'English'
s.loc[99]
'Physics'
Tambien podemos definir el objeto Serie a partir de una lista
grades = pd.Series([8,7,10,1])
grades
0 8
1 7
2 10
3 1
dtype: int64
for i, g in enumerate(grades):
print(i,g)
0 8
1 7
2 10
3 1
grades.mean()
6.5
grades.describe()
count 4.000000
mean 6.500000
std 3.872983
min 1.000000
25% 5.500000
50% 7.500000
75% 8.500000
max 10.000000
dtype: float64
Definicion a través de un numpy array
x = np.random.randint(0,20, 100)
random_s = pd.Series(x)
random_s
0 10
1 0
2 3
3 19
4 18
..
95 19
96 12
97 7
98 2
99 11
Length: 100, dtype: int64
random_s.head()
0 10
1 0
2 3
3 19
4 18
dtype: int64
#Recorrido por las claves y valores, el metodo head es considerado para mostrar pocos valores
for index, values in random_s.head().iteritems():
print(index, values)
0 10
1 0
2 3
3 19
4 18
%%timeit -n 100
x = np.random.randint(0,20, 100)
random_s = pd.Series(x)
random_s+=2 # OPeraciones vectoriales a todo el data frame, más eficiente.
#Comparar cuando se tiene un ciclo para realizar la suma, ¿cuál es mas eficiente?
100 loops, best of 5: 163 µs per loop
Agregando nuevos valores con indices diferentes
s = pd.Series([1,2,3,4,9])
s.loc["nuevo"]=2
s
0 1
1 2
2 3
3 4
4 9
nuevo 2
dtype: int64
s.loc["nuevo"]
2
s["nuevo"]
2
s.iloc[-1]
2
Otra forma de definir una serie es a través de :
juan_class = pd.Series(["a", "b","c"], index=["0","1","2"])
juan_class
0 a
1 b
2 c
dtype: object
Data Frame¶
Un DataFrame es una lista de series
d1 = { "Name":"Juan", "Topic":"Quantum Mechanics", "Score" : 10}
d2 = { "Name":"Pedro", "Topic":"statistical", "Score" : 10}
d3 = { "Name":"Ana", "Topic":"Clasical Mechanics", "Score" : 10}
record1 = pd.Series(d1)
record2 = pd.Series(d2)
record3 = pd.Series(d3)
# indices con números enteros
df1 = pd.DataFrame( [record1, record2, record3] )
df1
Name | Topic | Score | |
---|---|---|---|
0 | Juan | Quantum Mechanics | 10 |
1 | Pedro | statistical | 10 |
2 | Ana | Clasical Mechanics | 10 |
# Asignando nombre a los indices
df2 = pd.DataFrame( [record1, record2, record3] , index = ["UdeA","Unal", "ITM"] )
df2
Name | Topic | Score | |
---|---|---|---|
UdeA | Juan | Quantum Mechanics | 10 |
Unal | Pedro | statistical | 10 |
ITM | Ana | Clasical Mechanics | 10 |
#Accediendo a los indices por el nombre
df2.loc["UdeA"]
Name Juan
Topic Quantum Mechanics
Score 10
Name: UdeA, dtype: object
#Accediendo a los indices por el numero
df2.iloc[0]
Name Juan
Topic Quantum Mechanics
Score 10
Name: UdeA, dtype: object
#Accediendo a un elemento en particular
df2.loc["UdeA", "Name"]
'Juan'
#Accediendo a algunas columnas del data frame
df2.loc[:, ["Name", "Topic"]]
Name | Topic | |
---|---|---|
UdeA | Juan | Quantum Mechanics |
Unal | Pedro | statistical |
ITM | Ana | Clasical Mechanics |
Se recomienda crear copias del data frame cuando se esta trabajando con pandas a traves del metodo copy() y no con el operador =, dado que se comparte el mismo espacio de memoria
df2
Name | Topic | Score | |
---|---|---|---|
UdeA | Juan | Quantum Mechanics | 10 |
Unal | Pedro | statistical | 10 |
ITM | Ana | Clasical Mechanics | 10 |
a = df2
a
Name | Topic | Score | |
---|---|---|---|
UdeA | Juan | Quantum Mechanics | 10 |
Unal | Pedro | statistical | 10 |
ITM | Ana | Clasical Mechanics | 10 |
a.loc["UdeA", "Name"] = "JuanB"
a
Name | Topic | Score | |
---|---|---|---|
UdeA | JuanB | Quantum Mechanics | 10 |
Unal | Pedro | statistical | 10 |
ITM | Ana | Clasical Mechanics | 10 |
df2
Name | Topic | Score | |
---|---|---|---|
UdeA | JuanB | Quantum Mechanics | 10 |
Unal | Pedro | statistical | 10 |
ITM | Ana | Clasical Mechanics | 10 |
b = df2.copy()
Eliminacion de columnas
del b["Topic"]
b
Name | Score | |
---|---|---|
UdeA | JuanB | 10 |
Unal | Pedro | 10 |
ITM | Ana | 10 |
Agregando nuevas columnas al data frame
b["Nueva"] = [10, 8, 3]
b
Name | Score | Nueva | |
---|---|---|---|
UdeA | JuanB | 10 | 10 |
Unal | Pedro | 10 | 8 |
ITM | Ana | 10 | 3 |
Tarea 0.3¶
Empleando los siguientes tiempos:
t = np.linspace(0, 2, 1000)
Crear un data frame de pandas para la posicion $y=ho-0.5gt^2$, $g=9.8m/s$, $h = 100 m$
Adicione una nueva columna para la velocidad y la aceleración.
Construya un nuevo data frame desde el tiempo t=0.5s a 1.5s, solo con las posición como funcion del tiempo.
# Exercise
#https://github.com/ajcr/100-pandas-puzzles/blob/master/100-pandas-puzzles.ipynb