This article is the cheat sheet of complete Data Science in Python. Starting from basic python functions to advanced functions, including the following libraries:
- Pandas
- Numpy
- Matplotlib
- Plotly
- Cufflinks
- Seaborn
- MongoDB
- SQL
- Cassandra
The deployments in cloud platforms like:
- Heroku
- Github
- AWS
- Azure
- GCP
were also mentioned in this article. I am going to provide all files through github link at the end of the post.
Python
- type( ) Type gives us the datatype of the entered variable.
- .lower( ) Converts the uppercase string to lowercase string.
- .upper( ) Converts the string to uppercase.
- .find( ) Finds the index of given character.
- .replace(oldstring,newstring) Replaces old string with new string.
- .count( ) Counts the given character in the string.
- .islower( ) Gives us the boolean value,whether the string is in lowercase or not.
- .isupper( ) Gives us the boolean value,whether the string is in uppercase or not.
- .title( ) Capitilaizes each word in the sentence.
- .split( ) Splits the string based on given seperator.
- .partition( ) Returns the partitioned list based on the given variable.
- .isnumeric( ) Checks whether the string had any numerics or not and returns boolean value.
- .isalpha( ) Checks whether the string had any alphabets or not and returns boolean value.
Lists
- List is indicated with [ ].
- .append( ) appends any item at the end of the list.
- .insert(3,item) inserts item at the given location.
- .pop( ) gives the last value in the list.If the location of the item is given,it is popped out,
- .remove( ) Removes the first occurence of item in the list.
- .del( ) Deletes the item at particular index.
- .reverse( ) Reverses the list.
- .sort( ) Sorts the list accordingly.
- .index( ) Gives the index of an item in the list.
- .extend() Concatenates the first list with another list (or another iterable).
Tuples and Dictionary.
- Tuple is indicated with ( ).
- Dictionary is indicated with { }.
- Almost all the commands above supports in tuples and dictionaries too.
- .keys( ) Returns all the keys in the dictionaries.
- .values( ) Returns all the keys in the dictionaries.
- .popitem( ) Removes last inserted key,value pair
Exception Handling
Basic Structure
try:
You do your operations here...
...
except ExceptionI:
If there is ExceptionI, then execute this block.
except ExceptionII:
If there is ExceptionII, then execute this block.
...
else:
If there is no exception then execute this block.
finally
The finally: Block of code will always be run regardless if there was an exception in the try code block. The syntax is:
try:
Code block here
...
Due to any exception, this code may be skipped!
finally:
This code block would always be executed.
Raising an exception
We can raise an exception by using raise statement. Let us check with examples:
def raise_exc(a):
if a < 5:
raise Exception(a) # If exception raised code below to this will not execute
return a
try:
res1 = raise_exc(7)
print(res1)
res = raise_exc(2)
print(res)
except Exception as e:
print("Error is ", e)
File Handling
f= open(file_name, access_mode, buffering) #opening a file with open() built-in function.
f.close() #closes the file
f.write( ) # writes text to the file
f.read( ) #reads text from the file
f.seek( ) #seek method id used to set pointer at given index.
f.tell( ) # tells us the position of the pointer.
f.flush( ) # flushes the internal buffer.
f.readline( ) #reads a single line from the file.
f.readlines( ) #reads all lines from the file.
f.writelines( ) #writelines is used to write a string sequence to file.
f.fileno( ) #provide the integer file descriptor.
file.closed #If the file is closed, return true otherwise false.
file.name #Returns the file name.
file.mode #It returns mode of access in which the file was opened.
URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/File%20Handling.ipynb
Operating System
import os #importing os.
import shutil #imports shutil module.
os.getcwd() #returns current working directory.
os.listdir() #returns list of files and folders in the directory.
os.chdir() #change directory to specified path given as argument.
os.path.join #joins given path to the directory.
os.mkdir(), os.makedirs() #make new directory at the path given to ot as an argument.
os.rename() #renames the directory.
os.rmdir() #removes given directory.
os.removedirs() #remove directory recursively.
shutil.move( ) #removes files from source to destination.
shutil.copy() #copies files from source to destination.
Logging
import logging #Importing logging python module.
## let's create a log file with name "logger" and set the severity level to "info"
logging.basicConfig(filename='Logger1.log',level=logging.INFO)
logging.basicConfig(filename='logger2.log',level=logging.INFO,format='%(asctime)s %(message)s')
# here we have set our default level to info, so all levels after info will be displayed.Time format is given to get the timestamp displayed in log.
logging.info("Info message being logged!!") #info log
logging.warning("Warning message being logged!!") # warning log
logging.error("Error!!") #error log
console_log = logging.StreamHandler() #stream handler displays logs in console screen.
console_log.setLevel(logging.INFO) #sets level for the stream handler.
# set a format which is simpler for console use
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
console_log.setFormatter(formatter) # tell the handler to use this format
logging.getLogger('').addHandler(console_log) # add the handler to the root logger
Numpy
import numpy as np #importing numpy
np.zeros((3,4)) #Create an array of zeros
np.ones((2,3,4),dtype=np.int16) #Create an array of ones
d = np.arange(10,25,5) #Create an array of evenly spaced values (step value)
np.linspace(0,2,9) #Create an array of evenly spaced values (number of samples)
e = np.full((2,2),7) #Create a constant array
f = np.eye(2) #Create a 2X2 identity matrix
np.random.random((2,2)) #Create an array with random values
np.empty((3,2)) #Create an empty array
a.shape #Array dimensions
len(a) #Length of array
b.ndim #Number of array dimensions
e.size #Number of array elements
b.dtype #Data type of array elements
b.dtype.name #Name of data type
b.astype(int) #Convert an array to a different type
a.sum() #Array-wise sum
a.min() #Array-wise minimum value
b.max(axis=0) #Maximum value of an array row
b.cumsum(axis=1) #Cumulative sum of the elements
a.mean() #Mean
b.median() #Median
a.corrcoef() #Correlation coefficient
np.std(b) #Standard deviation
h = a.view() #Create a view of the array with the same data.
np.copy(a) #Create a copy of the array
h = a.copy() #Create a deep copy of the array
a.sort() #Sort an array
c.sort(axis=0) #Sort the elements of an array's axis
i = np.transpose(b) #Permute array dimensions
i.T #Permute array dimensions
b.ravel() #Flatten the array
g.reshape(3,-2) #Reshape, but don’t change data
h.resize((2,6)) #Return a new array with shape (2,6)
np.append(h,g) #Append items to an array
np.insert(a, 1, 5) #Insert items in an array
np.delete(a,[1]) #Delete items from an array
np.concatenate((a,d),axis=0) #Concatenate arrays
np.vstack((a,b)) #Stack arrays vertically (row-wise)
np.r_[e,f] #Stack arrays vertically (row-wise)
np.hstack((e,f)) #Stack arrays horizontally (column-wise)
np.column_stack((a,d)) #Create stacked column-wise arrays
np.c_[a,d] #Create stacked column-wise arrays
np.hsplit(a,3) #Split the array horizontally at the 3rd index
np.vsplit(c,2) #Split the array vertically at the 2nd index
Pandas
import pandas as pd #installing pandas
df=pd.read_csv('example_files/ex1.csv') #reads the file given as argument and saves in it variable s.
df.head(n) #reads top n entries in the dataframe.
df.tail(n) #reads bottom n entries in the dataframe.
df.shape #tuple of # rows, #of columns in dataframe.
# Changing column names by providing names parameter
df =pd.read_csv('example_files/ex2.csv', names=['asdfdsfs','fsdf', 'b', 'c', 'sudh', 'message'])
df #reading data frame
# Use of index_col to use as the row labels/indexes of the DataFrame
df = pd.read_csv('example_files/csv_mindex.csv',index_col=['key1', 'key2'])
pd.isnull(result) # It will check for null values
df = pd.read_excel('example_files/Store_Sales_Data.xlsx',sheet_name = 'Returns') # Name of sheet to read from for excel
pd.options.display.max_rows = 10 # Setting max row as 10 to be display
df.to_csv('example_files/out.csv') # Saving data to csv using to_csv
df.dtypes # Printing data types of all columns of a dataset
df.describe() #describes about the dataset.
df.dtypes[df.dtypes == "object"].index # Index of those columns whose data type is object
df[["column name"]][4:9] # Rows from 4 to 8 both included
df.columns # .columns will give the columns present in the dataset
df["sudh"]="sdffs" # Adding new column in the dataset
df["column name"][0:15] #check first 15 entries in coulmn name
pd.categorical(datasetname['column name']) # pd.categorical can make categories for given columns
dataset['column name'].unique()#check unique entries in the given column name
df.sample(n=10) #Randomly select n rows.
df.nlargest(n, value’) #Select and order top n entries
Use df.loc[] and df.iloc[] to select only rows, only columns or both.
Use df.at[] and df.iat[] to access a single value by row and column.
df.sort_values #sorts values of a column(low to high)
df.groupby(by="col") #Return a GroupBy object, grouped by values in column named "col"
df.groupby(level="ind") #Return a GroupBy object, grouped by values in index level named"ind".
Regular Expressions
Special Characters
^ | Matches the expression to its right at the start of a string. It matches every such instance before each \n in the string.
$ | Matches the expression to its left at the end of a string. It matches every such instance before each \n in the string.
. | Matches any character except line terminators like \n.
\ | Escapes special characters or denotes character classes.
A|B | Matches expression A or B. If A is matched first, B is left untried.
+ | Greedily matches the expression to its left 1 or more times.
* | Greedily matches the expression to its left 0 or more times.
? | Greedily matches the expression to its left 0 or 1 times. But if ? is added to qualifiers (+, *, and ? itself) it will perform matches in a non-greedy manner.
{m} | Matches the expression to its left m times, and not less.
{m,n} | Matches the expression to its left m to n times, and not less.
{m,n}? | Matches the expression to its left m times, and ignores n. See ? above.
Character Classes (a.k.a. Special Sequences)
\w | Matches alphanumeric characters, which means a-z, A-Z, and 0-9. It also matches the underscore, _.
\d | Matches digits, which means 0-9.
\D | Matches any non-digits.
\s | Matches whitespace characters, which include the \t, \n, \r, and space characters.
\S | Matches non-whitespace characters.
\b | Matches the boundary (or empty string) at the start and end of a word, that is, between \w and \W.
\B | Matches where \b does not, that is, the boundary of \w characters.
\A | Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.
\Z | Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.
Sets
[ ] | Contains a set of characters to match.
[amk] | Matches either a, m, or k. It does not match amk.
[a-z] | Matches any alphabet from a to z.
[a\-z] | Matches a, -, or z. It matches - because \ escapes it.
[a-] | Matches a or -, because - is not being used to indicate a series of characters.
[-a] | As above, matches a or -.
[a-z0-9] | Matches characters from a to z and also from 0 to 9.
[(+*)] | Special characters become literal inside a set, so this matches (, +, *, and ).
[^ab5] | Adding ^ excludes any character in the set. Here, it matches characters that are not a, b, or 5.
Groups
( ) | Matches the expression inside the parentheses and groups it.
(? ) | Inside parentheses like this, ? acts as an extension notation. Its meaning depends on the character immediately to its right.
(?PAB) | Matches the expression AB, and it can be accessed with the group name.
(?aiLmsux) | Here, a, i, L, m, s, u, and x are flags:
a — Matches ASCII only
i — Ignore case
L — Locale dependent
m — Multi-line
s — Matches all
u — Matches unicode
x — Verbose
(?:A) | Matches the expression as represented by A, but unlike (?PAB), it cannot be retrieved afterwards.
(?#...) | A comment. Contents are for us to read, not for matching.
A(?=B) | Lookahead assertion. This matches the expression A only if it is followed by B.
A(?!B) | Negative lookahead assertion. This matches the expression A only if it is not followed by B.
(?<=B)A | Positive lookbehind assertion. This matches the expression A only if B is immediately to its left. This can only matched fixed length expressions.
(?<!B)A | Negative lookbehind assertion. This matches the expression A only if B is not immediately to its left. This can only matched fixed length expressions.
(?P=name) | Matches the expression matched by an earlier group named “name”.
(...)\1 | The number 1 corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from 1 up to 99 such groups and their corresponding numbers.
Popular Python re Module Functions
re.findall(A, B) | Matches all instances of an expression A in a string B and returns them in a list.
re.search(A, B) | Matches the first instance of an expression A in a string B, and returns it as a re match object.
re.split(A, B) | Split a string B into a list using the delimiter A.
re.sub(A, B, C) | Replace A with B in the string C.
For other cheat sheets: https://www.mltut.com/data-science-cheat-sheets/
**********************************************************************************
0 Response to "Master CHEATSHEET for DataScience."
Post a Comment