January 19, 2020

Write a Data-Based Library to Summarize What's Around Here

How I improved my travel adventures with the Google Maps Nearby API

Write a Data-Based Library to Summarize What's Around Here

Occasionally, during my travel adventures, I often stop, look at the horizon and find myself asking, “what kind of places are in this area?” Sure, you can look around, and might spot a restaurant here, a cafe there, and a hotel that looks like each night cost 500+ USD. But what is really here? I want to know! And also, as a follow up to this question comes the classic “what’s the average price range of this region?” (a backpacker’s favorite).

Anyway, the point I want to illustrate here is that I have questions, questions that require answers, and answers that come in the shape of data. As you might know, a tool that provides these answers is Google Maps. But do I want to open the Maps app every time the curiosity bug bites me? Nope. Do I want to summarize all that data in my head to deduct what kind of places are here? Also nope. So what’s the alternative here? Well, let’s do a library.

And so I did. The library I built, is a wrapper around Google Maps Nearby Places API that retrieves locations within a particular area and summarizes certain attributes of them. The library’s name is Places Summarized, and in this article, I want to present it and show the summary of some locations I’ve visited in the last couple of months. Follow me.

Sunset in Odaiba, Tokyo. Photo by me (https://www.instagram.com/juandesr/)

Google Maps API Places Nearby

Before heading to the good part, I want to give a bit of background and say a few words about Google Maps Places Nearby API, the tool behind Places Summarized. Places Nearby is a feature of Google Maps API that searches for places near a given set of coordinates. For instance, suppose you want to know what’s around your hometown (you probably know this by heart, but hey, I said suppose!). To do so, you could either use Google Maps (but where’s the fun on that) or get an API key and do a request on their service using the coordinates of your town and the desired radius.

A call to Nearby API returns a JSON string with locations near the given coordinates and attributes of them. Some of these attributes are places_level (the place price level) and rating. By default, Google ranks the places near the location by importance, so not all of them are actually returned. If you wish to narrow down the results and get the sites sorted by distance, then you must use some of the optional parameters like “rankby” (to specify the ranking mode) and “type,” which describes the location type, e.g., restaurant or shopping center.

Also, this service is not exactly free. However, Google Maps provides a credit of $200 every month, which is more than enough for projects such as this.

Places Summarized

The goal of Places Summarized is to summarize the attributes returned by Places Nearby. The library features a Client object and a method named places_summary that takes as arguments the location, as well as the radius, and returns a Summary with the summarized values. In the library’s first version, the summary consists of the following:

  • The number of locations.
  • A list of the locations ratings.
  • A list of the number of ratings a location has.
  • A list of the locations’ price levels. This value represents the location’s price level on a scale of 1 to 4.
  • A dictionary with the count of each location type.
  • The average rating of all the locations.
  • The average number of ratings of all the locations.
  • The average price level.

Besides, Summary has two methods:

  • ratings_by_type: returns the ratings of the locations of the given type
  • average_rating_by_type: calculates the average rating of the locations of the given type.

By default, the Places Nearby call returns a maximum of 20 locations, which is a bit disappointing. However, the returned object contains apagetoken value, a string that you can use as a parameter to places_summary and Places Nearby in general. In this case, it returns the following up to 20 results from the previous search. The Summary object keeps this pagetoken as an attribute. I designed this way because Client has a method named get_more_results that takes a Summary and uses that pagetoken to retrieve the next locations. Below you will find an example:

client = Client(key=key)
summary = client.places_summary(location=location, radius=radius)
# Get more results!
for _ in range(4):
    print(summary.nearby_results)
    time.sleep(5)
    client.get_more_results(summary)
    r = summary.result()

To test the library, I created a FakeClient class that works just like Client, but with the exception that it doesn’t have any argument, and doesn’t actually do an API call. Instead, it reads from a local file that’s part of the library, a response from the API I saved as a JSON file; the location is the area around the Google offices in Sydney. Below, you will find an example of FakeClient and the response.

import pprint
from places_summarized.fake_client import FakeClient

pp = pprint.PrettyPrinter(indent=4)

fclient = FakeClient()
summary = fclient.places_summary()
pp.pprint(summary.result())

But enough chitchat, let’s see some results.

Pigstone Beach, Bali. Photo by me (https://www.instagram.com/juandesr/)

A use case: summarizing some of my visited locations

To test the library (and well, this is also the reason why I created it), I wrote a small script that uses the library to summarize and visualize the result of some of the places I’ve visited during my travels. Specifically speaking, I’m plotting the histogram of the ratings, user_ratings_total, and price levels, as well as a bar plot that shows the type of locations percentage-wise.

The places in question are Singapore’s center (1.2871404,103.844437), Canggu, a region from Bali, Indonesia (-8.6465434,115.1367221), Shinjuku, Tokyo (35.6955425,139.7009607) and China Town, Kuala Lumpur, Malaysia (3.1453656,101.6986553). For each of these places, I’m using a radius of 1000 m. and get_more_results four times. Below you will find all the graphs. But first, let’s see the first part of the code.

import argparse
import time
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from places_summarized import Client
parser = argparse.ArgumentParser()
parser.add_argument('--key', '-K',
                    help="Google Maps API key", type=str, default='')
parser.add_argument('--location', '-L',
                    help="Location Coordinates; latitude,longitude", type=str,
                    default='-33.8670522, 151.1957362')
parser.add_argument('--radius', '-R', type=int,
                    default=1000)
parser.add_argument('--get', '-G', type=int,
                    default=0)
# Set Seaborn's color palette.
sns.set_color_codes()
# Parse the arguments
args = parser.parse_args()
key = args.key
location = args.location
radius = args.radius
number_gets = args.get
client = Client(key=key)
summary = client.places_summary(location=location, radius=radius)
# Get more results!
for i in range(number_gets):
    print(summary.nearby_results)
    time.sleep(5)
    client.get_more_results(summary)
r = summary.result()

The script takes as arguments the API key, the location, radius, and a parameter “get” (horrible name, I should change it) that specifies how many times you want to call get_more_results. The first thing the script does is parsing the arguments. Then, we declare the client using the key and call client.places_summary to obtain the summary. Following this, we iterate “get” times to get more results.

Note that here in the loop, I’m using a sleep statement to pause the script for 5 seconds because otherwise, the call to the API fails; I believe this is related to my customer/payment tier, or some weird thing (if someone knows, please let me know). Now let’s take a look at the graph starting with the ratings (the image’s caption specifies the location).

Ratings

Ratings from places near Canggu
Ratings from places near Singapore’s Core
Ratings from places near China Town, Kuala Lumpur
Ratings from places near China Town, Kuala Lumpur

Of all these locations, the one with the highest average rating is Canggu. While this is cool to see, it doesn’t surprise me that much because Canggu is essentially a touristic area full of beaches, bars, restaurants, beaches, and more restaurants. And well, I don’t know many people who give bad ratings to beaches (send me a screenshot if you do).

Speaking of ratings, I also wanted to investigate how many reviews a location has. In the next section, you’ll find the values.

Total reviews per location

Total user ratings from places near Canggu
Total user ratings from places near Singapore’s Core
Total user ratings from places near China Town, Kuala Lumpur
Total user ratings from places near Shinjuku, Tokyo

According to the histograms, and their peaks early in the x-axis, the four different locations have many places with a low number of reviews. In Singapore, the graph drops at around 2000 and then increases a bit before doing down. Then, at the very end, there’s a place with an astonishing 30000 reviews; that area is Clarke Quay. Overall, the average value is 1630.62, but we need to consider that the value at 30000 is shifting the average towards that number. For the future, I should replace this with the median.

Regarding the other locations, the average for Shinjuku is 371.32, KL 538.810, and Canggu 100.92. But what is this really telling us? Well, it certainly says that the people and visitors of Singapore like to rate their places.

To create the histograms, I used the following code:

sns.distplot(r['ratings']).set_title(
    'Ratings of locations from {}'.format(location))
plt.savefig('{}_{}.png'.format(location, 'ratings'),
            dpi=320, orientation='landscape')
plt.clf()

Locations

To answer the question, “what kind of places are around here?” I calculated the percentages of the location types per region. Below you will find the graphs.

Locations types of places near Canggu
Locations types of places near Singapore’s Core
Locations types of places near China Town, Kuala Lumpur
Locations types of places near China Town, Kuala Lumpur

“Lodging” is the big winner here. In three of the four locations (except Shinjuku), places of type “lodging” are the most abundant of the region, followed by food-related locations. In Canggu, some of the locations include a bit of shopping, doctors, spas, and car rentals (have you seen how many motorcycles are there?!?!?). From Singapore, the tops places are also similar, but then the bars and night clubs dance their way into the plot while the banks and financial sites also say hi.

Lastly, there’s the Shinjuku area, which in my opinion, is the most unique of the four. For starters, lodging is not the most common location — food is, with around 30% of the locations. There’s a reason why Japanese food is so well known and wanted!

Something I didn’t mention before is that one location can have more than one type. Usually, this occurs when the location in question has a generic type like “food.” In such a case, the generic label is accompanied by a granular and more precise one. For example, in the graph, you can also see several categories that are somehow related to food, e.g., “meal takeaway,” “cafe,” “meal delivery,” “supermarkets,” and “bakery.” But Japan is not all about food! They are also into fashion and fancy hairs. Take a look at the graph, and you’ll find “beauty salon” and “hair care.”

Below is the code used to produce the plot. Note that first I’m converting r[‘location_types’] (a dict) to a Pandas DataFrame.

# Plot location_types
df = pd.DataFrame.from_dict(r['location_types'], orient='index')
df.index.name = 'location'
df.reset_index(inplace=True)
df.rename(columns={0: 'val'}, inplace=True)
# Remove the row with point_of_interest and establishment location
df = df[(df.location != 'point_of_interest') & (df.location != 'establishment')]
plt.figure(figsize=(16, 11))
sns.barplot(x="location", y="val", data=df, order=df.sort_values(
    'val', ascending=False)['location']).set_title('Location types from {}'.format(location))
plt.xticks(rotation=45)
plt.subplots_adjust(bottom=0.15)
KL Tower. Photo by me (https://www.instagram.com/juandesr/)

Price levels

Now the last group of graphs, the price levels. During my experiments, I found that out that not many places have the price level. It kind of makes sense, though. This attribute is most commonly seen in locations of type restaurants, the kind of place where people would assign a price level.

However, as we just saw, restaurants, account for less than 10% of the locations, except for the Shinjuku data. So, as a result, the sample of price levels is small. For instance, there are only three price levels in the data from Singapore, two from Canggu, and none from KL. Shinjuku, on the other hand, is booming with eateries, and so, it returned 30 prices. Unfortunately, because of the lack of data, I’ll only plot the price levels from Tokyo. In the future, I want to add the option to filter by “type” to the Client, so that we can pass it to the API call to retrieve locations of one specific type. Now, let’s see the data.

Price levels of locations from Shinjuku, Tokyo

Self-explanatory, right? According to this graph, the average price range of places in this region is “2.” How does this exactly translate into actual money? That’s a bit more complicated and would require extra data. But since we know that the maximum price level is “4,” we could interpret and approximate that the price should be in the OK range, which, in a place like Tokyo, has to be around $10 to $20 USD. Do you agree?

Shinjuku, Tokyo. Photo by Juan De Dios Santos (https://www.instagram.com/juandesr/)

Recap

There’s a small chance that at least once you have asked, “What sort of places are around here?” I know I have. To answer this question, I created a small Python library, named Places Summarized, that uses Google Maps Nearby API to summarize attributes of places near a given location. In this article, I’ve presented and summarized the library. I’ve also shown a small use case using, as an example, some areas I’ve visited during my travels. In this experiment, I found out that, according to the ratings, most places in Canggu are pretty good and if you ever need a nice hairstylist, then Shinjuku is the right place.

The library is not yet at the place I want it to be. For starters, the documentation is weak. Then there are the features I previously mentioned that I want to add, e.g., the filter-by-type mechanism. On top of this, I’d like to implement a method that writes the summary to a JSON file and another one that converts it to a Pandas DataFrame. Let’s see how it goes.

Now, if you’ll excuse me, I’ll go out and look for those cheap restaurants. Bye!

You can find the library here: https://github.com/juandes/places-summarized

And the complete script I’ve used in this article here: https://github.com/juandes/wanderdata-scripts/blob/master/places-summarized/main.py

Thanks for reading!