Python dictionary and Pandas dataframe are the most frequent data structures used in dealing with data. The Pandas DataFrame, is a standard popular data structure to work with tabular data for advanced data analysis. In this article, we will get hands-on practice with how to

create,
manipulate,
- select,
- add,
- update,
- delete

data in dictionaries and dataframes.

List

First, let's talk about the basic Python data type: list. Imagine that we work for the World Bank and want to keep track of the population of each country.

Let's say we have 2021 population data of each country:

India(1,393,409,030),
Burma(54,806,010),
Thailand(69,950,840),
Singapore(5,453,570), and so on.

These data are based on Population Data | The World Bank.

To keep track about which population belongs to which country, we create 2 lists as follow, with the names of the countries in the same order as the populations.

# lists
countries = ['India', 'Burma', 'Thailand', 'Singapore']
populations = [1393409030, 54806010, 69950840, 5453570]

Now suppose that we want to get the population of Burma. First, we have to figure out where in the list Burma is, so that we can use this position to get the correct population. We will use the method index() to get the index.

burma_index = countries.index('Burma')
print(burma_index)

Output:
1

We get 1 as the index of 'Burma' because the index of python's list starts from 0. Now, we can use this index to subset the populations list, to get the population corresponding to Burma.

print(populations[burma_index])

output:
54806010

As expected, we get 54806010, the population of Burma.

Motivation for Dictionaries

So we have two lists, and used the index to connect corresponding elements in both lists. It worked, but it's a pretty terrible approach: it's not convenient and not intuitive. Wouldn't it be easier if we had a way to connect each country directly to its population, without using an index?

Dictionary

This is where the "dictionary" comes into play. Let's convert this population data to a dictionary. To create the dictionary, we need curly brackets. Next, inside the curly brackets, we have a bunch of what are called key:value pairs.

my_dict = {
   "key1":"value1",
   "key2":"value2",
}

In our case,

the keys are the country names, and
the values are the corresponding populations.

The first key is India, and its corresponding value is 1,393,409,030. Notice the colon that separates the key and value here. Let's do the same thing for the three other key-value pairs, and store the dictionary under the name country_population.

country_population = {'India':1393409030, 'Burma':54806010, 'Thailand':69950840, 'Singapore':5453570}

If we want to find the population for Burma, we simply type world_population, and then the string "Burma" inside square brackets.

print(country_population["Burma"])

output:
54806010

In other words, we pass the key in square brackets, and we get the corresponding value. This approach is not only intuitive, it's also very efficient, because Python can make the lookup of these keys very fast, even for huge dictionaries.

Create a Dictionary

We will create a dictionary of countries and capitals data where the country names are the keys and the capitals are the corresponding values.

With the strings in countries and capitals, create a dictionary called asia with 4 key:value pairs. Beware of capitalization! Strings in the code, are case-sensitive.
Print out asia to see if the result is what we expected.

# From string in countries and capitals, create dictionary called asia
asia = {'India':'New Delhi', 'Burma':'Yangon', 'Thailand':'Bangkok', 'Singapore':'Singapore'}

# Print 
print(asia)
# Print type of asia
print(type(asia))

output:
{'India': 'New Delhi', 'Burma': 'Yangon', 'Thailand': 'Bangkok', 'Singapore': 'Singapore'}
<class 'dict'>

Great! <class 'dict'> means that the class of asia is a dictionary. class is out of this article's scope and we will explain it in another article which focus on class. Now that we've built our first dictionary.

Manipulating a Dictionary

If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for India from asia we can use India as the key.

print(asia['India'])

output:
New Delhi

We can check out which keys are in asia by calling the keys() method on asia. ```python

Print out the keys in asia

print(asia.keys())

Print out value that belongs to key 'Burma'

print(asia['Burma']) python output: dict_keys(['India', 'Burma', 'Thailand', 'Singapore']) Yangon ```

Next, we created the dictionary country_population, which basically is a set of key value pairs. we could easily access the population of Burma, by passing the key in square brackets, like this.

country_population = {'India':1393409030, 'Burma':54806010, 'Thailand':69950840, 'Singapore':5453570}
print(country_population['Burma'])

output:
54806010

Note: For this lookup to work properly, the keys in a dictionary should be unique.

If we try to add another key:value pair to country_population with the same key, Burma, for example,

country_population = {'India':1393409030, 'Burma':54806010, 'Thailand':69950840, 'Singapore':5453570, 'Burma':54800000,}

we'll see that the resulting country_population dictionary still contains four pairs. The last pair('Burma':54800000) that we specified in the curly brackets was kept in the resulting dictionary.

country_population = {'India':1393409030, 'Burma':54806010, 'Thailand':69950840, 'Singapore':5453570, 'Burma':54800000,}
print(country_population)

output:
{'India': 1393409030, 'Burma': 54800000, 'Thailand': 69950840, 'Singapore': 5453570}

let's see how we can add more data to a dictionary that already exists.

Add data to a Dictionary

Our country_population dictionary currently does not have china's data. We want to add "China":1412360000 to country_population.

# Before adding China data
country_population = {'India':1393409030, 'Burma':54806010, 'Thailand':69950840, 'Singapore':5453570}
print(country_population)

output:
{'India': 1393409030, 'Burma': 54806010, 'Thailand': 69950840, 'Singapore': 5453570}

To add this information, simply write the key "China" in square brackets and assign population 1412360000 to it with the equals sign.

# After adding China data
country_population["China"] = 1412360000
print(country_population)

output:
{'India': 1393409030, 'Burma': 54806010, 'Thailand': 69950840, 'Singapore': 5453570, 'China': 1412360000}

Now if you check out world_population again, indeed, China is in there. To check this with code, you can also write 'China' in country_population which gives us True if the key China is in there. Note that China is string type and case sensitive.

print('China' in country_population)

output:
True

Update data in a Dictionary

With the syntax dict_name[key]=value, we can also change values, for example, to update the population of China to 1412000000. Because each key in a dictionary is unique, Python knows that we're not trying to create a new pair, but want to update the pair that's already in there.

country_population["China"] = 1412000000
print(country_population)

output:
{'India': 1393409030, 'Burma': 54806010, 'Thailand': 69950840, 'Singapore': 5453570, 'China': 1412000000}

Delete data from a Dictionary

Suppose now that we want to remove it. We can do this with del, again pointing to China inside square brackets. If we print country_population again, China is no longer in our dictionary.

del(country_population['China'])
print(country_population)

output:
{'India': 1393409030, 'Burma': 54806010, 'Thailand': 69950840, 'Singapore': 5453570}

List vs Dictionary

Using lists and dictionaries, is pretty similar. We can select, update and remove values with square brackets.There are some big differences though. The list is a sequence of values that are indexed by a range of numbers. The dictionary, on the other hand, is indexed by unique keys.

	List	Dictionary
Select, update, remove	use `[]`	use `[]`
Indexed by	range of numbers	unique keys
Use	when a collection of values,

order matters,
selecting entire subsets | when lookup table with unique keys |

When to use which one? Well, if we have a collection of values where the order matters, and we want to easily select entire subsets of data, we'll want to go with a list.

If, on the other hand, we need some sort of look up table, where looking for data should be fast and where we can specify unique keys, a dictionary is the preferred option.

Nested Dictionaries

Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries.

As an example, have a look at the code where another version of asia - the dictionary we've been working with all along. The keys are still the country names, but the values are dictionaries that contain more information than just the capital.

# Dictionary of dictionaries
asia = {'India': {'capital':'New Delhi', 'population':1393409030},
        'Burma': {'capital':'Yangon', 'population':54806010},
        'Thailand': {'capital':'Bangkok', 'population':69950840},
        'Singapore': {'capital':'Singapore', 'population':5453570},
        }

It's perfectly possible to chain square brackets to select elements. To fetch the population for Burma from asia,

print(asia['Burma']['population'])

output:
54806010

Use chained square brackets to select and print out the capital of Burma.

# Print out the capital of Burma
print(asia['Burma']['capital'])

output:
Yangon

Great! It's time to learn about a new data structure!

Tabular dataset examples

As a data scientist, we'll often be working with tons of data. The form of this data can vary greatly, but we can make it down to a tabular structure which is the form of a table like in a spreadsheet. Let's have a look at some examples.

Suppose we're working in a chemical factory and have a ton of temperature measurements to analyze. This data can come in the following form:

temperature	measured at	location
76	2021-03-01 12:00:01	chamber 1
86	2021-03-01 12:00:01	chamber 2
72	2021-03-01 12:00:01	chamber 1
88	2021-03-01 12:00:01	chamber 2

every row is a measurement, or an observation, and
columns are different variables.

For each measurement, there is the temperature, but also the date and time of the measurement, and the location.

Another example: we have information on India, Burma, Thailand and so on. We can again build a table with this data.

Country	Capital	Population
India	New Delhi	1393409030
Burma	Yangon	54806010
Thailand	Bangkok	69950840
Singapore	Singapore	5453570
China	Beijing	1412360000

Each row is an observation and represents a country. Each observation has the same variables: the country name, the capital and the population.

Datasets in Python

To start working on this data in Python, we'll need some kind of rectangular data structure. How about the 2D NumPy array? Well, it's an option, but not necessarily the best one. There are different data types and NumPy arrays are not great at handling these.

Datasets containing different data types

In the above data, the country and capital are string types while the population is float type. Our datasets will typically comprise different data types, so we need a tool that's better suited. To easily and efficiently handle this data, there's the Pandas package.

Pandas

Pandas is

an open source library,
built on the NumPy package,
easy-to-use data structures,
a high level data manipulation tool.

making it very interesting for data scientists all over the world. In pandas, we store the tabular data in an object called a DataFrame. Have a look at the Pandas DataFrame version of the data:

DataFrame

	Country	Capital	Population
IND	India	New Delhi	1393409030
MMR	Myanmar	Yangon	54806010
THA	Thailand	Bangkok	69950840
SGP	Singapore	Singapore	5453570
CHN	China	Beijing	1412360000

The rows represent the observations, and the columns represent the variables. Also notice that each row has a unique row label: IND for India, MMR for Myanmar, and so on. The columns, or variables, also have labels: country, capital, and so on. Notice that the values in the different columns have different types. But how can we create this DataFrame in the first place? Well, there are different ways.

Create a DataFrame from Dictionary

First of all, we can build it manually, starting from a dictionary. Using the distinctive curly brackets, we create key value pairs. The keys are the column labels, and the values are the corresponding columns, in list form.

asia_dict = {
    'country':['India', 'Myanmar', 'Thailand', 'Singapore', 'China'],
    'capital':['New Delhi', 'Yangon', 'Bangkok', 'Singapore', 'Beijing'],
    'population':[1393409030,54806010,69950840, 5453570, 1412360000]
}

After importing the pandas package as pd, we can create a DataFrame from the dictionary using pd.DataFrame.

import pandas as pd
asia_df = pd.DataFrame(asia_dict)
print(type(asia_df))
print(asia_df)

output:
<class 'pandas.core.frame.DataFrame'>
     country    capital  population
0      India  New Delhi  1393409030
1    Myanmar     Yangon    54806010
2   Thailand    Bangkok    69950840
3  Singapore  Singapore     5453570
4      China    Beijing  1412360000

If we check out asia_df now, we see that Pandas assigned some automatic row labels, 0 up to 4. To specify them manually, we can set the index attribute of asia_df to a list with the correct labels.

asia_df.index = ['IND', 'MMR', 'THA', 'SGP', 'CHN']
print(asia_df)

output:
       country    capital  population
IND      India  New Delhi  1393409030
MMR    Myanmar     Yangon    54806010
THA   Thailand    Bangkok    69950840
SGP  Singapore  Singapore     5453570
CHN      China    Beijing  1412360000

The resulting asia_df DataFrame is the same one as we saw before. Using a dictionary approach is fine, but what if we're working with tons of data, which is typically the case as a data scientist? Well, we won't build the DataFrame manually. Instead, we import data from an external file that contains all this data.

Create a DataFrame from CSV file

Suppose the countries' data that we used before comes in the form of a CSV file called countries.csv. CSV is short for comma separated values. The countries.csv file used in this article, can be downloaded at this link.

Let's try to import this data using Pandas read_csv function. We pass the path to the csv file as an argument.

countries = pd.read_csv('path\to\countries.csv')
print(countries)

output:
  Unnamed: 0    country    capital  population
0        IND      India  New Delhi  1393409030
1        MMR    Myanmar     Yangon    54806010
2        THA   Thailand    Bangkok    69950840
3        SGP  Singapore  Singapore     5453570
4        CHN      China    Beijing  1412360000

If we print countries, there's still something wrong. The row labels are seen as a column. To solve this, we'll have to tell the read_csv function that the first column contains the row indexes. We do this by setting the index_col argument, like this.

countries = pd.read_csv('path\to\countries.csv', index_col=0)
print(countries)

output:
       country    capital  population
IND      India  New Delhi  1393409030
MMR    Myanmar     Yangon    54806010
THA   Thailand    Bangkok    69950840
SGP  Singapore  Singapore     5453570
CHN      China    Beijing  1412360000

This time countries nicely contains the row and column labels. The read_csv function features many more arguments that allow us to customize our data importing. Check out its documentation for more details.

Indexing and selecting data in DataFrames

This is important to make accessing columns, rows and single elements in our DataFrame easy. There are numerous ways in which we can index and select data from DataFrames. We're going to see about how to use

square brackets [],
advanced data access methods,
- loc and
- iloc,

that make Pandas extra powerful.

Access data using square brackets [ ]

Suppose that we only want to select the country column from countries. How to do this with square brackets? Well, we type countries, and then the column label inside square brackets. Python prints out the entire column, together with the row labels.

print(countries['country'])

output:
IND        India
MMR      Myanmar
THA     Thailand
SGP    Singapore
CHN        China
Name: country, dtype: object

But there's something strange here. The last line says Name: country, dtype: object. We're clearly not dealing with a regular DataFrame here. Let's find out about the type of the object that gets returned, with the type function as follows.

print(type(countries['country']))

output:
<class 'pandas.core.series.Series'>

So we're dealing with a Pandas Series here. In a simplified sense, we can think of the Series as a 1-dimensional array that can be labeled, just like the DataFrame. If we put together a bunch of Series, we can create a DataFrame.

If we want to select the country column but keep the data in a DataFrame, we'll need double square brackets, like this.

print(countries[['country']])

output:
       country
IND      India
MMR    Myanmar
THA   Thailand
SGP  Singapore
CHN      China

If we check out the type of this result, we will see it is DataFrame type.

print(type(countries[['country']]))

output:
<class 'pandas.core.frame.DataFrame'>

Note that the single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

We can perfectly extend this call to select two columns, country and capital, for example. If we look at it from a different angle, we're actually putting a list with column labels inside another set of square brackets, and end up with a sub DataFrame, containing only the country and capital columns.

print(countries[['country', 'capital']])

output:
       country    capital
IND      India  New Delhi
MMR    Myanmar     Yangon
THA   Thailand    Bangkok
SGP  Singapore  Singapore
CHN      China    Beijing

You can also use the same square brackets to select rows from a DataFrame. The way to do it is by specifying a slice. To get the second and third rows of countries, we use the slice 1 colon 3. Remember that the end of the slice is exclusive and that the index starts at zero.

print(countries[1:3])

output:
      country  capital  population
MMR   Myanmar   Yangon    54806010
THA  Thailand  Bangkok    69950840

These square brackets work, but it only offers limited functionality. Ideally, we'd want something similar to 2D NumPy arrays.

To do a similar thing with Pandas, we have 2 ways.

loc is label-based, which means that we have to specify rows and columns based on their row and column labels.
iloc is integer index based, which we have to specify rows and columns by their integer index.

Let's start with loc first.

Access data using `loc`

Let's have another look at the countries DataFrame, and try to get the row for Myanmar. We put the label of the row of interest in square brackets after loc.

print(countries.loc['MMR'])

output:
country        Myanmar
capital         Yangon
population    54806010
Name: MMR, dtype: object

We get a Pandas Series, containing all the row's information, rather inconveniently shown on different lines.

To get a DataFrame, we have to put the 'MMR' string inside another pair of brackets.

print(countries.loc[['MMR']])

output:
     country capital  population
MMR  Myanmar  Yangon    54806010

Selecting Rows using `loc`

We can also select multiple rows at the same time. Suppose we want to also include India and Thailand. Simply add some more row labels to the list.

print(countries.loc[['MMR', 'IND', 'THA']])

output:
      country    capital  population
MMR   Myanmar     Yangon    54806010
IND     India  New Delhi  1393409030
THA  Thailand    Bangkok    69950840

This was only selecting entire rows, that's something you could also do with the basic square brackets. The difference here is that we can extend your selection with a comma and a specification of the columns of interest.

Selecting Rows & Columns using `loc`

Let's extend the previous call to only include the country and capital columns. We add a comma, and a list of column labels we want to keep.

print(countries.loc[['MMR', 'IND', 'THA'], ['country', 'capital']])

output:
      country    capital
MMR   Myanmar     Yangon
IND     India  New Delhi
THA  Thailand    Bangkok

The intersection gets returned.

Selecting Columns using `loc`

we can also use loc to select all rows but only a specific number of columns. Simply replace the first list that specifies the row labels with a colon, a slice going from beginning to end.

print(countries.loc[:, ['country', 'capital']])

output:
       country    capital
IND      India  New Delhi
MMR    Myanmar     Yangon
THA   Thailand    Bangkok
SGP  Singapore  Singapore
CHN      China    Beijing

This time, the result contains all rows, but only two columns.

So, let's take a step back. Simple square brackets countries[['country', 'capital']] work fine if we want to get columns. To get rows, we can use slicing countries[1:4].

row access: countries[1:4]
column access: countries[['country', 'capital']]

The loc function is more versatile: we can select rows, columns, but also rows and columns at the same time. When you use loc, subsetting becomes remarkable simple.

row access: countries.loc[['MMR', 'IND', 'THA']]
column access: countries.loc[:, ['country', 'capital']]
row and column access: countries.loc[['MMR', 'IND', 'THA'], ['country', 'capital']]

The only difference is that we use labels with loc, not the positions of the elements. If we want to subset Pandas DataFrames based on their position, or index, you'll need the iloc function.

Access data using `iloc`

In loc, you use the 'MMR' string in double square brackets, to get a DataFrame, like this.

print(countries.loc[['MMR']])

output:
     country capital  population
MMR  Myanmar  Yangon    54806010

In iloc, we use the index 1 instead of MMR. The results are exactly the same.

# return Series type
print(countries.iloc[1])

output:
country        Myanmar
capital         Yangon
population    54806010
Name: MMR, dtype: object

# return DataFrame type
print(countries.iloc[[1]])

output:
     country capital  population
MMR  Myanmar  Yangon    54806010

Selecting Rows using `iloc`

To get the rows for Myanmar, India and Thailand, the code is like this when using loc,

print(countries.loc[['MMR', 'IND', 'THA']])

output:
      country    capital  population
MMR   Myanmar     Yangon    54806010
IND     India  New Delhi  1393409030
THA  Thailand    Bangkok    69950840

We can now use a list with the index(in the order we want) to get the same result.

print(countries.iloc[[1,0,2]])

output:
      country    capital  population
MMR   Myanmar     Yangon    54806010
IND     India  New Delhi  1393409030
THA  Thailand    Bangkok    69950840

Selecting Rows & Columns using `iloc`

To only keep the country and capital column, which we did as follows with loc,

print(countries.loc[['IND', 'MMR', 'THA'],['country', 'capital']])

output:
      country    capital
IND     India  New Delhi
MMR   Myanmar     Yangon
THA  Thailand    Bangkok

we put the indexes 0 and 1 in a list after the comma, referring to the country and capital column when using iloc.

print(countries.iloc[[0,1,2,],[0,1]])

output:
      country    capital
IND     India  New Delhi
MMR   Myanmar     Yangon
THA  Thailand    Bangkok

Selecting Columns using `iloc`

Finally, you can keep all rows and keep only the country and capital column in a similar fashion. With loc, this is how it's done.

print(countries.loc[:,['country', 'capital']])

output:
       country    capital
IND      India  New Delhi
MMR    Myanmar     Yangon
THA   Thailand    Bangkok
SGP  Singapore  Singapore
CHN      China    Beijing

For iloc, it's like this.

print(countries.iloc[:,[0,1]])

output:
       country    capital
IND      India  New Delhi
MMR    Myanmar     Yangon
THA   Thailand    Bangkok
SGP  Singapore  Singapore
CHN      China    Beijing

loc and iloc are pretty similar, the only difference is how we refer to columns and rows. We aced indexing and selecting data from Pandas DataFrames!

Update data in a DataFrame

Updating data in dataframe is similar to selecting data from dataframe. First we select the data we want to update and assign it with new data. In the following we will try to update Country Name Myanmar to Myanmar(Burma). Note that we can do it using loc or iloc.

# Before updateing data
print(countries)

output:
       country    capital  population
IND      India  New Delhi  1393409030
MMR    Myanmar     Yangon    54806010
THA   Thailand    Bangkok    69950840
SGP  Singapore  Singapore     5453570
CHN      China    Beijing  1412360000

Change Myanmar to Myanmar(Burma)

# Update data using loc
countries.loc[['MMR'], ['country']] = 'Myanmar(Burma)'
print(countries)

output:
            country    capital  population
IND           India  New Delhi  1393409030
MMR  Myanmar(Burma)     Yangon    54806010
THA        Thailand    Bangkok    69950840
SGP       Singapore  Singapore     5453570
CHN           China    Beijing  1412360000

Change Myanmar(Burma) to Myanmar

# Update data using iloc
countries.iloc[[1], [0]] = 'Myanmar'
print(countries)

output:
       country    capital  popualation
IND      India  New Delhi   1393409030
MMR    Myanmar     Yangon     54806010
THA   Thailand    Bangkok     69950840
SGP  Singapore  Singapore      5453570
CHN      China    Beijing   1412360000

Delete data in DataFrame

During cleaning a dataset, we might want to remove some row of data from a dataframe. We can do it by using the drop method on the dataframe. Let's try to remove China row from dataframe.

# Before delete/drop data
print(countries)

output:
            country    capital  population
IND           India  New Delhi  1393409030
MMR  Myanmar(Burma)     Yangon    54806010
THA        Thailand    Bangkok    69950840
SGP       Singapore  Singapore     5453570
CHN           China    Beijing  1412360000

# we pass ['CHN'], telling we want to remove row/column related to 'CHN'
# axis=0 means,we want to drop row(s)
# inplace=True means dropping takes place on original data
countries.drop(['CHN'], axis=0, inplace=True)
print(countries)

output:
            country    capital  population
IND           India  New Delhi  1393409030
MMR  Myanmar(Burma)     Yangon    54806010
THA        Thailand    Bangkok    69950840
SGP       Singapore  Singapore     5453570

Printing countries shows that the data row we want to remove is no longer in the dataframe countries. Next let's try to remove a column population from dataframe.

# we pass ["population"], telling we want to remove row/column related to "population"
# axis=1 means,we want to drop column(s)
# inplace=True means dropping takes place on original data
countries.drop(["population"], axis=1, inplace=True)
print(countries)

output:
            country    capital
IND           India  New Delhi
MMR  Myanmar(Burma)     Yangon
THA        Thailand    Bangkok
SGP       Singapore  Singapore

As we expected, the column population is dropped from the dataframe.

Add data to DataFrame

What if we want to add data to a datafame. We can do it using square brackets[]. Let's try to add the popualation data we dropped in the previous one.

# before adding data
print(countries)

output:
            country    capital
IND           India  New Delhi
MMR  Myanmar(Burma)     Yangon
THA        Thailand    Bangkok
SGP       Singapore  Singapore

# Add population column data
# the length of column data need to be same as the number of the rows in dataframe
countries["population"] = [1393409030,54806010,69950840,5453570]
print(countries)

output:
            country    capital  population
IND           India  New Delhi  1393409030
MMR  Myanmar(Burma)     Yangon    54806010
THA        Thailand    Bangkok    69950840
SGP       Singapore  Singapore     5453570

Great! Do note that pandas does not know which population data belong to which country and will add the data in the order we give. Now, let's add our China data row back to the dataframe countries. Since our data having index label CHN, we need to add using loc.

countries.loc['CHN'] = ['China', 'Beijing', 1412360000]
print(countries)

output:
            country    capital  population
IND           India  New Delhi  1393409030
MMR  Myanmar(Burma)     Yangon    54806010
THA        Thailand    Bangkok    69950840
SGP       Singapore  Singapore     5453570
CHN           China    Beijing  1412360000

Super!! Now we mastered how to create, select, add, update, delete data in Python dictionaries and Pandas dataframes.

#python #pandas #dictionary #dataframe #data #manipulation #datastructures

PyProDev

PyProDev

Mastering Python Dictionaries & Pandas DataFrames

Table of contents

List

Motivation for Dictionaries

Dictionary

Create a Dictionary

Manipulating a Dictionary

Print out the keys in asia

Print out value that belongs to key 'Burma'

Add data to a Dictionary

Update data in a Dictionary

Delete data from a Dictionary

List vs Dictionary

Nested Dictionaries

Tabular dataset examples

Datasets in Python

Datasets containing different data types

Pandas

DataFrame

Create a DataFrame from Dictionary

Create a DataFrame from CSV file

Indexing and selecting data in DataFrames

Access data using square brackets [ ]

Access data using `loc`

Selecting Rows using `loc`

Selecting Rows & Columns using `loc`

Selecting Columns using `loc`

Access data using `iloc`

Selecting Rows using `iloc`

Selecting Rows & Columns using `iloc`

Selecting Columns using `iloc`

Update data in a DataFrame

Delete data in DataFrame

Add data to DataFrame

Mastering Python Dictionaries & Pandas DataFrames

Table of contents

List

Motivation for Dictionaries

Dictionary

Create a Dictionary

Manipulating a Dictionary

Print out the keys in asia

Print out value that belongs to key 'Burma'

Add data to a Dictionary

Update data in a Dictionary

Delete data from a Dictionary

List vs Dictionary

Nested Dictionaries

Tabular dataset examples

Datasets in Python

Datasets containing different data types

Pandas

DataFrame

Create a DataFrame from Dictionary

Create a DataFrame from CSV file

Indexing and selecting data in DataFrames

Access data using square brackets [ ]

Access data using loc

Selecting Rows using loc

Selecting Rows & Columns using loc

Selecting Columns using loc

Access data using iloc

Selecting Rows using iloc

Selecting Rows & Columns using iloc

Selecting Columns using iloc

Update data in a DataFrame

Delete data in DataFrame

Add data to DataFrame

Access data using `loc`

Selecting Rows using `loc`

Selecting Rows & Columns using `loc`

Selecting Columns using `loc`

Access data using `iloc`

Selecting Rows using `iloc`

Selecting Rows & Columns using `iloc`

Selecting Columns using `iloc`