What is API, How to get data from the internet using API

What is API, How to get data from the internet using API


9 min read

In this article, we will learn insight into the importance of APIs, the basics of extracting data from APIs, and hands-on exercises in extracting data from the OMDB API.

Introduction to APIs and JSONs

We'll explore pulling data from the web by learning how to interact with APIs, Application Programming Interfaces.


An API is a set of protocols and routines for building and interacting with software applications. We'll learn how to use APIs in the following.


JSON is an acronym that is short for JavaScript Object Notation. The JSON file format is a standard form for transferring data through APIs. We'll focus our attention on these. Then we'll move on to getting data from APIs. It is a file format that arose out of the need for real-time server-to-browser communication that wouldn't necessarily rely on Flash or Java and was first specified and also popularized by Douglas Crockford. One of the cool things about JSONs is that they're human-readable, unlike Python pickled files. Here is JSON data got from the OMDB, Open Movie Database API. Here is the JSON data containing information about the movie "Thor".

  "Title": "Thor",
  "Year": "2011",
  "Rated": "PG-13",
  "Released": "06 May 2011",
  "Runtime": "115 min",
  "Genre": "Action, Adventure, Fantasy",
  "Director": "Kenneth Branagh",
  "Writer": "Ashley Miller, Zack Stentz, Don Payne",
  "Actors": "Chris Hemsworth, Anthony Hopkins, Natalie Portman",
  "Plot": "The powerful but arrogant god Thor is cast out of Asgard to live amongst humans in Midgard (Earth), where he soon becomes one of their finest defenders.",
  "Language": "English",
  "Country": "United States",
  "Awards": "5 wins & 30 nominations",
  "Poster": "https://m.media-amazon.com/images/M/MV5BOGE4NzU1YTAtNzA3Mi00ZTA2LTg2YmYtMDJmMThiMjlkYjg2XkEyXkFqcGdeQXVyNTgzMDMzMTg@._V1_SX300.jpg",
  "Ratings": [
      "Source": "Internet Movie Database",
      "Value": "7.0/10"
      "Source": "Rotten Tomatoes",
      "Value": "77%"
      "Source": "Metacritic",
      "Value": "57/100"
  "Metascore": "57",
  "imdbRating": "7.0",
  "imdbVotes": "851,505",
  "imdbID": "tt0800369",
  "Type": "movie",
  "DVD": "13 Sep 2011",
  "BoxOffice": "$181,030,624",
  "Production": "N/A",
  "Website": "N/A",
  "Response": "True"

First, notice that the JSON consists of name-value pairs separated by commas. This reminds us of the key-value pairs in a Python dictionary. We'll see this when loading JSON data into Python, it is natural to store them in a dictionary. The keys in JSON will always be strings enclosed in quotation marks. The values can be strings, integers, arrays or even objects. Such an object can even be a JSON and then we can have nested JSONs but we won't go further into these in this article. In the case of "Thor" JSON, all the values are strings and we can see this from the quotation marks. The value corresponding to the key 'Title' is the title of the movie as a string: "Thor". The value corresponding to the key 'Year' is the year of release as a string: "2011" and so on. There's the rating, the runtime, the director, the writers, the plot, the language and much more. We'll soon learn how to use the OMDB API and Python to automate the retrieval of such data, but first, we'll figure out how to load JSON file from a local directory.

Loading a JSON file locally

Now we know what a JSON is, we'll load a JSON file into our Python environment and explore it ourselves. Let's say that we had the JSON stored in our current working directory as a_movie.json. We'll load the JSON 'a_movie.json' into the variable json_data, which will be a dictionary. To load the JSON into our Python environment, we would first import the package json and then open a connection to the JSON file with the context manager json_file and use the function json.load() to load the JSON.

import json
print('using json version', json.__version__)
Using json version 2.0.9
with open('a_movie.json', 'r') as json_file:
    json_data = json.load(json_file)
<class 'dict'>
{'Title': 'Avatar', 'Year': '2009', 'Rated': 'PG-13', 'Released': '18 Dec 2009', 'Runtime': '162 min', 'Genre': 'Action, Adventure, Fantasy', 'Director': 'James Cameron', 'Writer': 'James Cameron', 'Actors': 'Sam Worthington, Zoe Saldana, Sigourney Weaver', 'Plot': 'A paraplegic Marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home.', 'Language': 'English, Spanish', 'Country': 'United States', 'Awards': 'Won 3 Oscars. 89 wins & 131 nominations total', 'Poster': 'https://m.media-amazon.com/images/M/MV5BNjA3NGExZDktNDlhZC00NjYyLTgwNmUtZWUzMDYwMTZjZWUyXkEyXkFqcGdeQXVyMTU1MDM3NDk0._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.9/10'}, {'Source': 'Rotten Tomatoes', 'Value': '82%'}, {'Source': 'Metacritic', 'Value': '83/100'}], 'Metascore': '83', 'imdbRating': '7.9', 'imdbVotes': '1,280,965', 'imdbID': 'tt0499549', 'Type': 'movie', 'DVD': '22 Apr 2010', 'BoxOffice': '$785,221,649', 'Production': 'N/A', 'Website': 'N/A', 'Response': 'True'}

If we then check the datatype of json_data by using type(json_data), we see that Python cleverly imported the JSON as a dictionary. We now see that the a_movie.json file contains the "Avatar" movie data.

Exploring JSON data in Python

We can explore the JSON contents by printing the key-value pairs of json_data by using a for loop.

for key, value in json_data.items():
    print(key, ':', value)
Title : Avatar
Year : 2009
Rated : PG-13
Released : 18 Dec 2009
Runtime : 162 min
Genre : Action, Adventure, Fantasy
Director : James Cameron
Writer : James Cameron
Actors : Sam Worthington, Zoe Saldana, Sigourney Weaver
Plot : A paraplegic Marine dispatched to the moon Pandora on a unique mission becomes torn between following his orders and protecting the world he feels is his home.
Language : English, Spanish
Country : United States
Awards : Won 3 Oscars. 89 wins & 131 nominations total
Poster : https://m.media-amazon.com/images/M/MV5BNjA3NGExZDktNDlhZC00NjYyLTgwNmUtZWUzMDYwMTZjZWUyXkEyXkFqcGdeQXVyMTU1MDM3NDk0._V1_SX300.jpg
Ratings : [{'Source': 'Internet Movie Database', 'Value': '7.9/10'}, {'Source': 'Rotten Tomatoes', 'Value': '82%'}, {'Source': 'Metacritic', 'Value': '83/100'}]
Metascore : 83
imdbRating : 7.9
imdbVotes : 1,280,965
imdbID : tt0499549
Type : movie
DVD : 22 Apr 2010
BoxOffice : $785,221,649
Production : N/A
Website : N/A
Response : True

Getting keys from JSON dictionary

As our JSON data is imported as a python ditionary, we can use dictionary's keys() method to get the keys from the data.

dict_keys(['Title', 'Year', 'Rated', 'Released', 'Runtime', 'Genre', 'Director', 'Writer', 'Actors', 'Plot', 'Language', 'Country', 'Awards', 'Poster', 'Ratings', 'Metascore', 'imdbRating', 'imdbVotes', 'imdbID', 'Type', 'DVD', 'BoxOffice', 'Production', 'Website', 'Response'])

Let's print out the values corresponding to the keys 'Title' and 'Year'.


Cool! Now we printed out only the movie's data we wanted to see.

Interacting with the internet using APIs

JSONs are everywhere and one of the main motivating reasons for getting to know how to work with them as a Data Scientist is that much of the data that we get from APIs is packaged as JSONs. We'll learn what APIs are, why they are so important, and how to connect to APIs and pull and parse data from them.

What is an API?

So what is an API and why are they so important? As we mentioned at the beginning of this article, an API is a set of protocols and routines for building and interacting with software applications. Another way to think of it is that an API is a bunch of code that allows two software programs to communicate with each other. For example, if we wanted to stream Twitter data by writing some Python code, we would use the Twitter API. If we wanted to automate pulling and processing information from Wikipedia in our programming language of choice, we could do so using the Wikipedia API.

APIs are everywhere

Using APIs became a standard way of interacting with applications. Twitter has an API that is used by marketing companies and social scientists engaged in research concerning social networks. Uber, Facebook and Instagram all have their own APIs. Now let's see how to connect to an API and how to pull data from it.

Getting data from an API

Here, we'll pull movie data from the OMDB(Open Movie Database) API. We will

  • import requests library,

  • assign the API query request URL to the variable url,

  • package and make an API query request using get() method and catch the response in the variable r.

import requests
print('Using requests version', requests.__version__)
Using requests version 2.25.1
url = 'https://www.omdbapi.com/?apikey=YourAPIkey&t=Thor'
r = requests.get(url)
<class 'requests.models.Response'>

Another cool aspect of the requests package is that the Response objects, r, have an associate method json(), which is a built-in JSON decoder for dealing with JSON data. This returns a dictionary and we can then print all the key-value pairs to check out what we pulled from the OMBD API.

json_data = r.json()
<class 'dict'>
{'Title': 'Thor', 'Year': '2011', 'Rated': 'PG-13', 'Released': '06 May 2011', 'Runtime': '115 min', 'Genre': 'Action, Adventure, Fantasy', 'Director': 'Kenneth Branagh', 'Writer': 'Ashley Miller, Zack Stentz, Don Payne', 'Actors': 'Chris Hemsworth, Anthony Hopkins, Natalie Portman', 'Plot': 'The powerful but arrogant god Thor is cast out of Asgard to live amongst humans in Midgard (Earth), where he soon becomes one of their finest defenders.', 'Language': 'English', 'Country': 'United States', 'Awards': '5 wins & 30 nominations', 'Poster': 'https://m.media-amazon.com/images/M/MV5BOGE4NzU1YTAtNzA3Mi00ZTA2LTg2YmYtMDJmMThiMjlkYjg2XkEyXkFqcGdeQXVyNTgzMDMzMTg@._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.0/10'}, {'Source': 'Rotten Tomatoes', 'Value': '77%'}, {'Source': 'Metacritic', 'Value': '57/100'}], 'Metascore': '57', 'imdbRating': '7.0', 'imdbVotes': '851,505', 'imdbID': 'tt0800369', 'Type': 'movie', 'DVD': '13 Sep 2011', 'BoxOffice': '$181,030,624', 'Production': 'N/A', 'Website': 'N/A', 'Response': 'True'}

What was the API URL?

Now let's understand how the API URL, https://www.omdbapi.com/?apikey=YourAPIkey&t=Thor, we used, actually pulled data from the API. The HTTPS signifies that we're making an HTTPS request to the www.omdbapi.com, OMDB API. Then there is the ?apikey=YourAPIkey&t=Thor part which is a query string.


We know how to perform such an API request based on the OMDB API's documentation. Under the documentation's usage section, they explicitly state that

Send all data requests to: omdbapi.com/?apikey=[yourkey]&parameter..

Query Strings

This ?apikey=YourAPIkey&t=Thor string that begins with a question mark in the URL is called a Query String. Our query string has two arguments:

  • authentication: apikey=YourAPIkey and

  • parameter: t=Thor.

& combines these 2 arguments. What follows the question mark, ?, in the query string is the query we are making to the OMBD API.

API key

An application programming interface key, or API key, is a unique identifier used to authenticate a user or program connecting to an API. You can request your API key at the OMDB API key request page. The API key is passed as apikey=YourAPIkey for authentication.


OMDB API documentation has a query string parameters table that shows how to query a particular title or a particular movie ID. The t in the query stand for title. Querying t=Thor, asked the API to return the data about the movie with the title containing Thor.

The query we just made was simple. It is also worth mentioning that there is nothing special about this API URL and we can also navigate to the URL in our browser of choice. It will generally display JSON data on the web page view.


In this article, we learned

  • what API and JSON are,

  • how to import and explore a JSON file locally,

  • how to get data from the internet using APIs and

  • how the API URL works.

#python #api #requests #data #json