Master CHEATSHEET for DataScience.

This article is the cheat sheet of complete Data Science in Python. Starting from basic python functions to advanced functions, including the following libraries:

Pandas
Numpy
Matplotlib
Plotly
Cufflinks
Seaborn

Besides, the commands of the above libraries, I am also going to write some of the databases commands like :

MongoDB
SQL
Cassandra

The deployments in cloud platforms like:

Heroku
Github
AWS
Azure
GCP

were also mentioned in this article. I am going to provide all files through github link at the end of the post.

Python

type( ) Type gives us the datatype of the entered variable.
.lower( ) Converts the uppercase string to lowercase string.
.upper( ) Converts the string to uppercase.
.find( ) Finds the index of given character.
.replace(oldstring,newstring) Replaces old string with new string.
.count( ) Counts the given character in the string.
.islower( ) Gives us the boolean value,whether the string is in lowercase or not.
.isupper( ) Gives us the boolean value,whether the string is in uppercase or not.
.title( ) Capitilaizes each word in the sentence.
.split( ) Splits the string based on given seperator.
.partition( ) Returns the partitioned list based on the given variable.
.isnumeric( ) Checks whether the string had any numerics or not and returns boolean value.
.isalpha( ) Checks whether the string had any alphabets or not and returns boolean value.

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Basic%20Python%20Functions.ipynb

Lists

List is indicated with [ ].
.append( ) appends any item at the end of the list.
.insert(3,item) inserts item at the given location.
.pop( ) gives the last value in the list.If the location of the item is given,it is popped out,
.remove( ) Removes the first occurence of item in the list.
.del( ) Deletes the item at particular index.
.reverse( ) Reverses the list.
.sort( ) Sorts the list accordingly.
.index( ) Gives the index of an item in the list.
.extend() Concatenates the first list with another list (or another iterable).

Tuples and Dictionary.

Tuple is indicated with ( ).
Dictionary is indicated with { }.
Almost all the commands above supports in tuples and dictionaries too.
.keys( ) Returns all the keys in the dictionaries.
.values( ) Returns all the keys in the dictionaries.
.popitem( ) Removes last inserted key,value pair

Exception Handling

Basic Structure

try:

You do your operations here...

...

except ExceptionI:

If there is ExceptionI, then execute this block.

except ExceptionII:

If there is ExceptionII, then execute this block.

...

else:

If there is no exception then execute this block.

finally

The finally: Block of code will always be run regardless if there was an exception in the try code block. The syntax is:

try:

Code block here

...

Due to any exception, this code may be skipped!

finally:

This code block would always be executed.

Raising an exception

We can raise an exception by using raise statement. Let us check with examples:

def raise_exc(a):

if a < 5:

raise Exception(a) # If exception raised code below to this will not execute

return a

try:

res1 = raise_exc(7)

print(res1)

res = raise_exc(2)

print(res)

except Exception as e:

print("Error is ", e)

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Exception%20Handling.ipynb

File Handling

f= open(file_name, access_mode, buffering) #opening a file with open() built-in function.

f.close() #closes the file

f.write( ) # writes text to the file

f.read( ) #reads text from the file

f.seek( ) #seek method id used to set pointer at given index.

f.tell( ) # tells us the position of the pointer.

f.flush( ) # flushes the internal buffer.

f.readline( ) #reads a single line from the file.

f.readlines( ) #reads all lines from the file.

f.writelines( ) #writelines is used to write a string sequence to file.

f.fileno( ) #provide the integer file descriptor.

file.closed #If the file is closed, return true otherwise false.

file.name #Returns the file name.

file.mode #It returns mode of access in which the file was opened.

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/File%20Handling.ipynb

Operating System

import os #importing os.

import shutil #imports shutil module.

os.getcwd() #returns current working directory.

os.listdir() #returns list of files and folders in the directory.

os.chdir() #change directory to specified path given as argument.

os.path.join #joins given path to the directory.

os.mkdir(), os.makedirs() #make new directory at the path given to ot as an argument.

os.rename() #renames the directory.

os.rmdir() #removes given directory.

os.removedirs() #remove directory recursively.

shutil.move( ) #removes files from source to destination.

shutil.copy() #copies files from source to destination.

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Operating%20System.ipynb

Logging

import logging #Importing logging python module.

## let's create a log file with name "logger" and set the severity level to "info"

logging.basicConfig(filename='Logger1.log',level=logging.INFO)

logging.basicConfig(filename='logger2.log',level=logging.INFO,format='%(asctime)s %(message)s')

# here we have set our default level to info, so all levels after info will be displayed.Time format is given to get the timestamp displayed in log.

logging.info("Info message being logged!!") #info log

logging.warning("Warning message being logged!!") # warning log

logging.error("Error!!") #error log

console_log = logging.StreamHandler() #stream handler displays logs in console screen.

console_log.setLevel(logging.INFO) #sets level for the stream handler.

# set a format which is simpler for console use

formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')

console_log.setFormatter(formatter) # tell the handler to use this format

logging.getLogger('').addHandler(console_log) # add the handler to the root logger

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Logging.ipynb

Numpy

import numpy as np #importing numpy

np.zeros((3,4)) #Create an array of zeros

np.ones((2,3,4),dtype=np.int16) #Create an array of ones

d = np.arange(10,25,5) #Create an array of evenly spaced values (step value)

np.linspace(0,2,9) #Create an array of evenly spaced values (number of samples)

e = np.full((2,2),7) #Create a constant array

f = np.eye(2) #Create a 2X2 identity matrix

np.random.random((2,2)) #Create an array with random values

np.empty((3,2)) #Create an empty array

a.shape #Array dimensions

len(a) #Length of array

b.ndim #Number of array dimensions

e.size #Number of array elements

b.dtype #Data type of array elements

b.dtype.name #Name of data type

b.astype(int) #Convert an array to a different type

a.sum() #Array-wise sum

a.min() #Array-wise minimum value

b.max(axis=0) #Maximum value of an array row

b.cumsum(axis=1) #Cumulative sum of the elements

a.mean() #Mean

b.median() #Median

a.corrcoef() #Correlation coefficient

np.std(b) #Standard deviation

h = a.view() #Create a view of the array with the same data.

np.copy(a) #Create a copy of the array

h = a.copy() #Create a deep copy of the array

a.sort() #Sort an array

c.sort(axis=0) #Sort the elements of an array's axis

i = np.transpose(b) #Permute array dimensions

i.T #Permute array dimensions

b.ravel() #Flatten the array

g.reshape(3,-2) #Reshape, but don’t change data

h.resize((2,6)) #Return a new array with shape (2,6)

np.append(h,g) #Append items to an array

np.insert(a, 1, 5) #Insert items in an array

np.delete(a,[1]) #Delete items from an array

np.concatenate((a,d),axis=0) #Concatenate arrays

np.vstack((a,b)) #Stack arrays vertically (row-wise)

np.r_[e,f] #Stack arrays vertically (row-wise)

np.hstack((e,f)) #Stack arrays horizontally (column-wise)

np.column_stack((a,d)) #Create stacked column-wise arrays

np.c_[a,d] #Create stacked column-wise arrays

np.hsplit(a,3) #Split the array horizontally at the 3rd index

np.vsplit(c,2) #Split the array vertically at the 2nd index

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Numpy.ipynb

Pandas

import pandas as pd #installing pandas

df=pd.read_csv('example_files/ex1.csv') #reads the file given as argument and saves in it variable s.

df.head(n) #reads top n entries in the dataframe.

df.tail(n) #reads bottom n entries in the dataframe.

df.shape #tuple of # rows, #of columns in dataframe.

# Changing column names by providing names parameter

df =pd.read_csv('example_files/ex2.csv', names=['asdfdsfs','fsdf', 'b', 'c', 'sudh', 'message'])

df #reading data frame

# Use of index_col to use as the row labels/indexes of the DataFrame

df = pd.read_csv('example_files/csv_mindex.csv',index_col=['key1', 'key2'])

pd.isnull(result) # It will check for null values

df = pd.read_excel('example_files/Store_Sales_Data.xlsx',sheet_name = 'Returns') # Name of sheet to read from for excel

pd.options.display.max_rows = 10 # Setting max row as 10 to be display

df.to_csv('example_files/out.csv') # Saving data to csv using to_csv

df.dtypes # Printing data types of all columns of a dataset

df.describe() #describes about the dataset.

df.dtypes[df.dtypes == "object"].index # Index of those columns whose data type is object

df[["column name"]][4:9] # Rows from 4 to 8 both included

df.columns # .columns will give the columns present in the dataset

df["sudh"]="sdffs" # Adding new column in the dataset

df["column name"][0:15] #check first 15 entries in coulmn name

pd.categorical(datasetname['column name']) # pd.categorical can make categories for given columns

dataset['column name'].unique()#check unique entries in the given column name

df.sample(n=10) #Randomly select n rows.

df.nlargest(n, value’) #Select and order top n entries

Use df.loc[] and df.iloc[] to select only rows, only columns or both.

Use df.at[] and df.iat[] to access a single value by row and column.

df.sort_values #sorts values of a column(low to high)

df.groupby(by="col") #Return a GroupBy object, grouped by values in column named "col"

df.groupby(level="ind") #Return a GroupBy object, grouped by values in index level named"ind".

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Pandas.ipynb

Regular Expressions

Special Characters

^ | Matches the expression to its right at the start of a string. It matches every such instance before each \n in the string.

$ | Matches the expression to its left at the end of a string. It matches every such instance before each \n in the string.

. | Matches any character except line terminators like \n.

\ | Escapes special characters or denotes character classes.

A|B | Matches expression A or B. If A is matched first, B is left untried.

+ | Greedily matches the expression to its left 1 or more times.

* | Greedily matches the expression to its left 0 or more times.

? | Greedily matches the expression to its left 0 or 1 times. But if ? is added to qualifiers (+, *, and ? itself) it will perform matches in a non-greedy manner.

{m} | Matches the expression to its left m times, and not less.

{m,n} | Matches the expression to its left m to n times, and not less.

{m,n}? | Matches the expression to its left m times, and ignores n. See ? above.

Character Classes (a.k.a. Special Sequences)

\w | Matches alphanumeric characters, which means a-z, A-Z, and 0-9. It also matches the underscore, _.

\d | Matches digits, which means 0-9.

\D | Matches any non-digits.

\s | Matches whitespace characters, which include the \t, \n, \r, and space characters.

\S | Matches non-whitespace characters.

\b | Matches the boundary (or empty string) at the start and end of a word, that is, between \w and \W.

\B | Matches where \b does not, that is, the boundary of \w characters.

\A | Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.

\Z | Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.

Sets

[ ] | Contains a set of characters to match.

[amk] | Matches either a, m, or k. It does not match amk.

[a-z] | Matches any alphabet from a to z.

[a\-z] | Matches a, -, or z. It matches - because \ escapes it.

[a-] | Matches a or -, because - is not being used to indicate a series of characters.

[-a] | As above, matches a or -.

[a-z0-9] | Matches characters from a to z and also from 0 to 9.

[(+*)] | Special characters become literal inside a set, so this matches (, +, *, and ).

[^ab5] | Adding ^ excludes any character in the set. Here, it matches characters that are not a, b, or 5.

Groups

( ) | Matches the expression inside the parentheses and groups it.

(? ) | Inside parentheses like this, ? acts as an extension notation. Its meaning depends on the character immediately to its right.

(?PAB) | Matches the expression AB, and it can be accessed with the group name.

(?aiLmsux) | Here, a, i, L, m, s, u, and x are flags:

a — Matches ASCII only

i — Ignore case

L — Locale dependent

m — Multi-line

s — Matches all

u — Matches unicode

x — Verbose

(?:A) | Matches the expression as represented by A, but unlike (?PAB), it cannot be retrieved afterwards.

(?#...) | A comment. Contents are for us to read, not for matching.

A(?=B) | Lookahead assertion. This matches the expression A only if it is followed by B.

A(?!B) | Negative lookahead assertion. This matches the expression A only if it is not followed by B.

(?<=B)A | Positive lookbehind assertion. This matches the expression A only if B is immediately to its left. This can only matched fixed length expressions.

(?<!B)A | Negative lookbehind assertion. This matches the expression A only if B is not immediately to its left. This can only matched fixed length expressions.

(?P=name) | Matches the expression matched by an earlier group named “name”.

(...)\1 | The number 1 corresponds to the first group to be matched. If we want to match more instances of the same expresion, simply use its number instead of writing out the whole expression again. We can use from 1 up to 99 such groups and their corresponding numbers.

Popular Python re Module Functions

re.findall(A, B) | Matches all instances of an expression A in a string B and returns them in a list.

re.search(A, B) | Matches the first instance of an expression A in a string B, and returns it as a re match object.

re.split(A, B) | Split a string B into a list using the delimiter A.

re.sub(A, B, C) | Replace A with B in the string C.

URL:https://github.com/Pa1Dasari/Master-CHEATSHEETS-for-DataScience./blob/main/Pandas.ipynb

For other cheat sheets: https://www.mltut.com/data-science-cheat-sheets/

**********************************************************************************

Master CHEATSHEET for DataScience.

0 Response to "Master CHEATSHEET for DataScience."

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel