# Mastering Logical Comparison, Control Flow, Filtering on Numpy Array and Pandas DataFrame

## Table of contents

In this article, we will learn about different comparison operators, how to combine them with Boolean operators, and how to use Boolean outcomes in control structures. Boolean logic is the foundation of decision-making in Python programs. We'll also learn to filter data in pandas DataFrames using logic, a skill that a data scientist must have.

## Comparison Operators

Comparison operators are operators that can tell how two values relate, and result in a boolean.

### Numeric comparisons

In the simplest sense, we can use these operators on numbers. For example, if we want to check if 2 is smaller than 3, we type 2 less than sign 3.

```
print(2 < 3)
```

```
output:
True
```

Because 2 is less than 3, we get `True`

. we can also check if two values are equal, with a double equals sign. From this call, we see that 5 equals 6 gives us `False`

.

```
print(5 == 6)
```

```
output:
False
```

It makes sense because 5 is not equal to 6. We can also make a combination of equality and smaller than. Have a look at this command that checks if 5 is smaller than or equal to 6.

```
print(5 <= 6)
```

```
output:
True
```

It's TRUE, but also 6 smaller than or equal to 6 is True.

```
print(6 <= 6)
```

```
output:
True
```

Of course, we can also use comparison operators directly on variables that represent these integers.

```
x = 5
y = 6
print(x < y)
```

```
output:
True
```

### Comparison between strings

All these operators also work for strings. Let's check if "abc" is smaller than "acd".

```
print("abc" < "acd")
```

```
output:
True
```

According to the alphabet order, "abc" comes before "acd", so the result is True.

### Comparison between integer and string

Let's find out if comparing a string and an integer works. Here if the integer 2 is smaller than the string "abc".

```
print(2 < "abc")
```

```
output:
TypeError: '<' not supported between instances of 'int' and 'str'
```

We get an error (`TypeError: '<' not supported between instances of 'int' and 'str'`

). Typically, Python can't tell how two objects with different types relate.

### Comparison between integer and float

Different numeric types, such as floats and integers, are exceptions.

```
print(3 < 4.12)
```

```
output:
True
```

No error this time. In general, always make sure that we make comparisons between objects of the same type.

### Compare on Numpy array

Another exception arises when we compare on NumPy array, `lengths`

, with an integer, `22`

. This works perfectly.

```
import numpy as np
lengths = np.array([21.85, 20.97, 21.75, 24.74, 21.44])
print(type(lengths))
print(lengths > 22)
```

```
output:
<class 'numpy.ndarray'>
[False False False True False]
```

NumPy figures out that we want to compare every element in `lengths`

with `22`

, and returns corresponding booleans. Behind the scenes, NumPy builds a NumPy array of the same size filled with the number `22`

, and then performs an element-wise comparison. This is concise, very efficient code, which data scientists love!

We can also compare two NumPy arrays element-wise. `house1`

and `house2`

contain the areas for the kitchen, living room, bedroom and bathroom in the same order. Which areas in `house1`

are smaller than the ones in `house2`

like this?

```
house1 = np.array([18.0, 20.0, 10.75, 9.50])
house2 = np.array([14.0, 24.0, 14.25, 9.0])
print(house1 < house2)
```

```
output:
[False True True False]
```

It appears that the living room and bedroom in `house1`

are smaller than the corresponding areas in `house2`

.

## Comparators

Here is the table that summarizes all comparison operators.

Comparator | Meaning |

< | less than |

<= | less than or equal to |

\> | greater than |

\>= | greater than or equal to |

\== | equal to |

!= | not equal to |

We are already familiar with some of these. They're all pretty straightforward, except for the not equal `!=`

. The exclamation mark followed by an equals sign stands for inequality. It's the opposite of equality.

### Equality

To check if two Python values, or variables, are equal you can use `==`

. To check for inequality, you need `!=`

. Have a look at the following examples that all result in `True`

.

```
print(2 == (1 + 1))
print("PYTHON" != "python")
print(True != False)
print("Python" != "python")
```

```
output:
True
True
True
True
```

Write a code to see if `True`

equals `False`

.

```
# Comparison of booleans
print(True == False)
```

```
output:
False
```

Write Python code to check if `-3 * 15`

is not equal to `45`

.

```
# Comparison of integers
print(( -3 * 15 ) != 45)
```

```
output:
True
```

Ask Python whether the strings `"python"`

and `"Python"`

are equal.

```
# Comparison of strings
print("python" == "Python")
```

```
output:
False
```

Note that strings are case-sensitive. What happens if you compare booleans and integers? Write code to see if `True`

and `1`

are equal.

```
# Compare a boolean with an integer
print(True == 1)
print(True == 2)
```

```
output:
True
False
```

A boolean is a special kind of integer: `True`

corresponds to `1`

, `False`

corresponds to `0`

.

### Greater and less than

We also talked about the less than and greater than signs, `<`

and `>`

in Python. We can combine them with an equals sign to get `<=`

and `>=`

. Note that `=<`

and `=>`

are not valid. For examples.

```
print(3 < 4)
print(3 <= 4)
print("alpha" <= "beta")
```

```
output:
True
True
True
```

Remember that for string comparison, Python determines the relationship based on alphabetical order.

Check if `x`

is greater than or equal to `-13`

.

```
# Comparison of integers
x = -4 * 3
print(x >= -13)
```

```
output:
True
```

Check if `True`

is greater than `False`

.

```
# Comparison of booleans
print(True > False)
```

```
output:
True
```

Remember that `True`

is 1 and `False`

is 0 in value.

## Boolean Operators

We can produce booleans by performing comparison operations. The next step is combining these booleans. We can use boolean operators for this. The three most common ones are

`and`

,`or`

, and`not`

.

`and`

The `and`

operator works just as we would expect. It takes two booleans and returns `True`

only if both the booleans themselves are `True`

.

Case1 | Case2 | Case1 `and` Case2 |

True | True | True |

True | False | False |

False | True | False |

False | False | False |

```
print(True and True)
print(True and False)
print(False and True)
print(False and False)
```

```
output:
True
False
False
False
```

Instead of using booleans, we can also use the results of comparisons. Suppose we have a variable `x`

, equal to `8`

. To check if this variable is greater than 5 but less than 15, we can use `x`

greater than `5`

and `x`

less than `15`

.

```
x = 8
print(x > 5 and x < 15)
```

```
output:
True
```

As we already learned, the first part will evaluate to `True`

. The second part will also evaluate to `True`

. So the result of this expression, `True and True`

, is `True`

. This makes sense, because `8`

lies between `5`

and `15`

.

`or`

The `or`

operator works similarly, but the difference is that only at least one of the booleans should be `True`

.

Case1 | Case2 | Case1 `or` Case2 |

True | True | True |

True | False | True |

False | True | True |

False | False | False |

```
print(True or True)
print(True or False)
print(False or True)
print(False or False)
```

```
output:
True
True
True
False
```

Also here we can make combinations with variables, like this example that checks if a variable `y`

, which is equal to `3`

, is less than `5`

or above `10`

.

```
y = 3
print(y < 5 or y > 10)
```

```
output:
True
```

`3`

less than `5`

is `True`

, `3`

greater than `10`

is `False`

. The `or`

operation thus returns `True`

.

`not`

Finally, let's the `not`

operator. It simply negates the boolean value we use it on. not True is False, not False is True. The `not`

operation is typically useful if we're combining different boolean operations and then want to negate that result.

```
print(not True)
print(not False)
```

```
output:
False
True
```

## Nested Boolean operators

Let's take the boolean operators to another level.

Note that `not`

has a higher priority than `and`

and `or`

, it is executed first.

```
x = 8
y = 9
not(not(x < 3) and not(y < 8 or y > 14))
```

```
output:
False
```

Correct! `x < 3`

is `False`

. `y < 8 or y > 14`

is `False`

as well. If you continue working like this, simplifying from inside to outward, you'll end up with `False`

.

## Filtering on NumPy arrays

Now, for NumPy arrays, things are different. Retaking the `lengths`

example, we can try to find out which `lengths`

are higher than `21`

, but lower than `22`

. The output of `lengths`

greater than `21`

is easily found, so is the one for the `lengths`

lower than `22`

.

```
print(lengths)
print(lengths > 21)
print(lengths < 22)
```

```
output:
[21.85 20.97 21.75 24.74 21.44]
[ True False True True True]
[ True True True False True]
```

Let's now try to combine those with the `and`

operator we just learned.

```
print(lengths > 21 and lengths < 22)
```

```
output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

Oops, python return `ValueError: The truth value of an array with more than one element is ambiguous`

. Clearly it doesn't like an array of booleans to work on.

Numpy provides these "array equivalents" of `and`

, `or`

and `not`

functions,

`logical_and`

,`logical_or`

and`logical_not`

.

To find out which `lengths`

are between `21`

and `22`

, we will use these functions. Again, as we expect from NumPy, the `and`

operation is performed element-wise.

```
print(np.logical_and(lengths > 21, lengths < 22))
```

```
output:
[ True False True False True]
```

To select only these `lengths`

are between `21`

and `22`

, we can use the resulting array of booleans in square brackets.

```
print(lengths[np.logical_and(lengths > 21, lengths < 22)])
```

```
output:
[21.85 21.75 21.44]
```

Again, NumPy wins when it comes to writing short yet very expressive Python code. How about this on Pandas DataFrames, the de facto standard for dataset manipulation?

## Boolean operators on NumPy Array

Before, the operational operators like `<`

and `>=`

worked with NumPy arrays out of the box. Unfortunately, this is not true for the boolean operators `and`

, `or`

, and `not`

.

To use these operators with NumPy, we will need `np.logical_and()`

, `np.logical_or()`

and `np.logical_not()`

. Here's an example on the `house1`

and `house2`

arrays.

Generate boolean arrays that answer the following questions:

- Which areas in
`my_house`

are greater than`18.5`

or smaller than`10`

?

```
# house1 greater than 18.5 or smaller than 10
print(np.logical_or(house1 > 18.5, house2 < 10))
```

```
output:
[False True False True]
```

- Which areas are smaller than
`11`

in both`house1`

and`house2`

?

```
# Both house1 and house2 smaller than 11
print(np.logical_and(house1 < 11, house2 < 11))
```

```
output:
[False False False True]
```

## Filtering on pandas DataFrames

The NumPy array can be useful to do comparison operations and boolean operations on an element-wise basis. Let's now use this knowledge on Pandas DataFrame. Click here to download the `countries.csv`

file. First, let's import the `countries`

dataset from the CSV file using pandas.

```
import pandas as pd
countries = pd.read_csv('countries.csv', index_col=0)
print(countries)
```

```
output:
country capital population
IND India New Delhi 1393409030
MMR Myanmar Yangon 54806010
THA Thailand Bangkok 69950840
SGP Singapore Singapore 5453570
CHN China Beijing 1412360000
```

Suppose you now want to keep the countries, for which the population is greater than 100,000,000. There are three steps to this.

First of all, we want to get the population column from

`countries`

.Next, we perform the comparison on this column and store its result.

Finally, we should use this result to do the appropriate selection on the DataFrame.

### Step 1: Get the column

So the first step, getting the `population`

column from `countries`

. There are many different ways to do this. What's important here, is that we ideally get a Pandas Series, not a Pandas DataFrame. Let's do this with square brackets, like this.

```
print(type(countries['population']))
print(countries['population'])
```

```
output:
<class 'pandas.core.series.Series'>
IND 1393409030
MMR 54806010
THA 69950840
SGP 5453570
CHN 1412360000
Name: population, dtype: int64
```

This `loc`

alternative and this `iloc`

version would also work perfectly fine.

```
print(countries.loc[:, 'population'])
```

```
output:
IND 1393409030
MMR 54806010
THA 69950840
SGP 5453570
CHN 1412360000
Name: population, dtype: int64
```

```
print(countries.iloc[:, 2])
```

```
output:
IND 1393409030
MMR 54806010
THA 69950840
SGP 5453570
CHN 1412360000
Name: population, dtype: int64
```

### Step 2: Compare

Next, we perform the comparison. To see which rows have a population greater than `100,000,000`

, we simply append greater than `100000000`

to the code from before, like this.

```
print(countries['population'] > 100000000)
```

```
output:
IND True
MMR False
THA False
SGP False
CHN True
Name: population, dtype: bool
```

Now we get a Series containing booleans. If you compare it to the population values, you can see that the population with a value over 100000000 corresponds to True, and the ones with a value under 100000000 correspond to False now. Let's store this Boolean Series as `is_huge`

.

```
is_huge = countries['population'] > 100000000
print(is_huge)
```

```
output:
IND True
MMR False
THA False
SGP False
CHN True
Name: population, dtype: bool
```

### Step 3: Subset the DataFrame

The final step is using this boolean Series `is_huge`

to subset the Pandas DataFrame. To do this, we put `is_huge`

inside square brackets.

```
print(countries[is_huge])
```

```
output:
country capital population
IND India New Delhi 1393409030
CHN China Beijing 1412360000
```

The result is exactly what we want: only the countries with an population greater than 100000000, namely India and China.

### Summary

So let's summarize this: we selected the population column, performed a comparison on the `population`

column and stored it as `is_huge`

so that we can use it to index the `countries`

DataFrame. These different commands do the trick. However, we can also write this in one line. simply put the code that defines `is_huge`

directly in the square brackets.

```
print(countries[countries['population'] > 100000000])
```

```
output:
country capital population
IND India New Delhi 1393409030
CHN China Beijing 1412360000
```

Great! Pandas help data scientists' life much easy.

## Boolean operators on Pandas DataFrame

Now we haven't used boolean operators yet. Remember that we used this `logical_and`

function from the NumPy package to do an element-wise boolean operation on NumPy arrays? Because Pandas is built on NumPy, we can also use that function here. Let's write the codes which keep the observations that have a population between 10,000,000 and 90,000,000.

```
print(countries)
```

```
output:
country capital population
IND India New Delhi 1393409030
MMR Myanmar Yangon 54806010
THA Thailand Bangkok 69950840
SGP Singapore Singapore 5453570
CHN China Beijing 1412360000
```

```
print(np.logical_and(countries['population'] > 10000000, countries['population'] < 90000000))
```

```
output:
IND False
MMR True
THA True
SGP False
CHN False
Name: population, dtype: bool
```

The only thing left to do is placing this code inside square brackets to subset `countries`

appropriately. This time, only Myanmar and Thailand are included. Look how easy it is to filter DataFrames to get interesting results.

```
print(countries[np.logical_and(countries['population'] > 10000000, countries['population'] < 90000000)])
```

```
output:
country capital population
MMR Myanmar Yangon 54806010
THA Thailand Bangkok 69950840
```

Now we know about comparison operators such as

`<`

`<=`

`>`

`>=`

`==`

`!=`

and we also know how to combine the boolean results, using boolean operators such as

`and`

,`or`

and`not`

.

## Control Flow

Things get interesting when we can use these concepts to change how our program behaves. Depending on the outcome of our comparisons, we might want our Python code to behave differently. we can do this with conditional statements in Python:

`if`

,`else`

and`elif`

.

### if

Suppose we have a variable `x`

, equal to 4. If the value is even, we want to print out: "x is even".

```
x = 4
if x % 2 == 0:
print('x is even.')
```

```
output:
x is even.
```

The modulo operator `%`

with `2`

will return `0`

if `x`

is even. Python checks if the condition holds. It's true, so the corresponding code is executed: "x is even" gets printed out.

Let's compare this to the general recipe for an if statement. It reads as follows: if the condition is True, execute the codes.

Notice the colon at the end, and the fact that we simply have to indent the Python code with four spaces (or a tab) to tell Python what to do in case the condition succeeds. To exit the if statement, simply continues with some Python code without indentation, and Python will know that it's not part of the if statement. It's perfectly possible to have more lines inside the if statement, like this for example.

```
x = 4
if x % 2 == 0:
print('Cheching if x (', x, ') is divisible by 2...')
print('x is even.')
```

```
output:
Cheching if x ( 4 ) is divisible by 2...
x is even.
```

The script now prints out two lines if we run it. If the condition does not pass, the expression is not executed. You can see this if we change `x`

to be `3`

and rerun the code.

```
x = 3
if x % 2 == 0:
print('Cheching if x (', x, ') is divisible by 2...')
print('x is even.')
```

```
output:
```

There's no output. Suppose now that we want to print out "x is odd" in this case. How to do this?

### else

Well, we can simply use an `else`

statement, like this.

```
x = 3
if x % 2 == 0:
print('x is even.')
else:
print('x is odd.')
```

```
output:
x is odd.
```

If we run it with `x`

equal to `3`

, the condition is not true, so the expression for the else statement gets printed out. The general recipe looks like this: for the else statement, we don't need to specify a condition. The `else`

corresponding expression gets run if the condition of the if statements don't hold `True`

.

### elif

We can think of cases where even more customized behavior is necessary. Say we want different printouts for numbers that are divisible by 2 and by 3. We can use some `elif`

in there to get the job done. Here is an example.

```
x = 3
if x % 2 == 0: # False
print('x is divisible by 2.')
elif x % 3 == 0: # True
print('x is divisible by 3.')
else:
print('x is not divisible by both 2 & 3.')
```

```
output:
x is divisible by 3
```

If x equals 3, the first condition is `False`

, so it goes over to check the next condition. This condition holds `True`

so the corresponding print statement is executed.

Suppose now that x equals 6. Both the `if`

and `elif`

conditions hold `True`

in this case. Will two printouts occur?

```
x = 6
if x % 2 == 0: # True
print('x is divisible by 2.')
elif x % 3 == 0: # never reach here
print('x is divisible by 3.')
else:
print('x is not divisible by both 2 & 3.')
```

```
output:
x is divisible by 2.
```

Nope. As soon as Python finds a true condition, it executes the corresponding code and then leaves the whole control structure after that. This means the second condition, corresponds to the `elif`

, is never reached so there's no corresponding printout. Control flow can be extremely powerful when we're writing Python scripts.

## Conclusion

In this article, we learned logical comparison, control flow, and filtering on Numpy Array and Pandas DataFrame.

**Connect & Discuss with us on LinkedIn**

#python #pandas #numpy #datascience #logical-comparison #control-flow #filtering