Bulk Geocode addresses using Google Maps and GeoPy
What is Geocoding ?
Geocoding refers to the method of transforming physical addresses such as street locations into precise geographic coordinates like latitude and longitude. With Woosmap, you have the capability to query nearby locations or exhibit numerous geographic entities such as stores or other points of interest on a map. Additionally, the bulk geocoding process greatly simplifies data analysis by enabling you to process extensive datasets efficiently.
In order to fully leverage these features, it is essential firstly to integrate geocoded locations into our system via our Woosmap data management API. Often, datasets contain addresses without accompanying location details, which can now be resolved through an efficient batching process.
The subsequent script, which is available on our Woosmap Github Organization, serves as a fundamental tool for geocoding CSV files that consist of address data. This utility will parse the file, add coordinate information, and append metadata about the geocoded results such as the location type while ensuring the accuracy of the location information.
We have provided open access to the source code, permitting modifications and encouraging contributions to enhance its functionality.
GeoPy
GeoPy functions as a Python client for several prevalent geocoding web services, facilitating the determination of coordinates for addresses, cities, countries, and landmarks globally by Python developers, tapping into third-party geocoders and other data repositories. It supports multiple geocoder classes, which include but are not limited to the OpenStreetMap Nominatim, ESRI ArcGIS, and Google Geocoding API (V3). Furthermore, GeoPy includes support for services such as Baidu Maps, Mapzen Search, and IGN France, offering comprehensive geocoding solutions for diverse user requirements.
Geocoding API
The Google Maps Geocoding API offers geocoding and reverse geocoding services for addresses, allowing developers to use these transformations for various applications like logistical operations or spatial analysis. This article reviews utilizing the GeoPy wrapper for the API in our Python script, although it is noteworthy that integrating the Python Client for Google Maps Service proposed by the Google GitHub organization can serve as a viable alternative. The integration can be smoothly handled as the Geocoding method remains compatible regarding input parameters and the response data structure.
API key
Every request made to the Google Maps Web Service necessitates obtaining an API key, which can be freely acquired with a Google Account at the Google Developers Console. The specific API key required is a Server key. It's crucial to restrict your API key for security purposes, which can be done through the Google Cloud console to limit usage to trusted sources.
To acquire an API key (see the API keys guide):
- Navigate to Google Developers Console and sign in using a Google Account.
- Choose an existing project or create a new one.
- Enable the Geocoding API.
- Generate a Server key.
- If preferred, restrict requests to a unique IP address at this moment.
Important: Ensure to keep this key confidential on your server, although you have the option to revoke a key and create a new one if necessary.
Client ID + Digital Signature (Google Maps Platform Users)
A Client ID is provided upon signing up for the Google Maps Platform, supplemented by a Digital Signature generated using a cryptographic key supplied by Google, which authenticates requests when using the Client ID.
To retrieve your Client ID and Cryptographic Key (review the API keys guide):
- Visit the the Google Maps Platform Support Portal and log in with your administrative account.
- Navigate to the Maps: Manage Client ID menu section.
- Select your Client ID from the dropdown menu.
- Click the Show Crypto Key button.
Important: The Crypto Key should remain undisclosed as it cannot be nullified!
Script Usage
The script processes an input csv file laden with addresses to geocode, generating an output csv file inclusive of all initial csv values along with appended fields:
- Latitude
- Longitude
- Location_Type (see below)
- Formatted_Address
- Error (if needed, for failed geocoded addresses)
Download the script locally. Afterwards, execute:
python google_batch_geocoder.py
Adjust it for your own CSV file and Google credentials by setting parameters atop the script.
Mandatory Parameters
ADDRESS_COLUMNS_NAME - LIST - determines a Google geocoding query by merging these values into a single comma-separated string, contingent on the CSV input file. Parameters should be chosen wisely to avoid hitting quota limits rapidly while ensuring accurate geocoding results.
- NEW_COLUMNS_NAME - List - these appended column names will become part of the processed data CSV (there's no necessity to modify this, however, additional columns can be attached depending on Geocoding Google Results).
- DELIMITER - String - delineator for your input CSV file
- INPUT_CSV_FILE - String - designation and path for input CSV file
- OUTPUT_CSV_FILE - String - designation and path for output CSV file
Optional Parameters
COMPONENTS_RESTRICTIONS_COLUMNS_NAME - DICT - defines component restrictions for Google geocoding. Detailed information can be found in the Google component Restrictions doc.
- GOOGLE_SECRET_KEY - String - Google Secret Key, utilized by GeoPy in creating a Digital Signature, enabling Google Maps API users to geocode.
- GOOGLE_CLIENT_ID - String - Google Client ID, facilitates tracking and analyzing requests for Google Maps API users. When deploying, GOOGLE_SECRET_KEY should also be provided.
- GOOGLE_API_KEY - String - Google API Server Key. It is recommended to include this as a mandatory parameter imminently.
Input Data
The example data, (hairdresser_sample_addresses.csv), is inherently a CSV file denoting diverse hairdressers worldwide, mimicking real-world data samples to grasp the utility of geocoding in large datasets. A brief segment of the file depicted is:
Script Explanation
Prepare the Destination File
As previously elaborated, the destination file amalgamates all origin fields with geocoded data (latitude, longitude along with error in case any arise) along with associated metadata (formatted_address and location_type). To elucidate using our sample file, the Header of the destination CSV file appears as:
with open(INPUT_CSV_FILE, 'r') as csvinput:
with open(OUTPUT_CSV_FILE, 'w') as csvoutput:
# new csv based on same dialect as input csv
writer = csv.writer(csvoutput, dialect="ga")
# create a proper header with stripped fieldnames for new CSV
header = [h.strip() for h in csvinput.next().split(DELIMITER)]
# read Input CSV as Dict of Dict
reader = csv.DictReader(csvinput, dialect="ga", fieldnames=header)
# 2-dimensional data variable used to write the new CSV
processed_data = []
# append new columns, to receive geocoded information
# to the header of the new CSV
header = list(reader.fieldnames)
for column_name in NEW_COLUMNS_NAME:
header.append(column_name.strip())
processed_data.append(header)
Build an address line
The logic involves constructing an address line per row by combining multiple values into a string, guided by ADDRESS_COLUMNS_NAME, to feed into Google Geocoder. For example, the initial line in our CSV translates to the address line:
"Hairhouse Warehouse, Riverlink Shopping centre, Ipswich".
# iterate through each row of input CSV
for record in reader:
# build a line address
# based on the merge of multiple field values to pass to Google Geocoder`
line_address = ','.join(
str(val) for val in (record[column_name] for column_name in ADDRESS_COLUMNS_NAME))
Apply the Component Restrictions
The Geocoding API facilitates limited address results within a designated area, a restriction implemented within the script via the dict, COMPONENTS_RESTRICTIONS_COLUMNS_NAME. Each filter embodies component-value pairings, which can be left unpopulated {} if no restricted areas are needed.
# if you want to use componentRestrictions feature,
# build a matching dict {'googleComponentRestrictionField' : 'yourCSVFieldValue'}
# to pass to Google Geocoder
component_restrictions = {}
if COMPONENT_RESTRICTIONS_COLUMNS_NAME:
for key, value in COMPONENT_RESTRICTIONS_COLUMNS_NAME.items():
component_restrictions[key] = record[value]
Geocode the address
Prior to invoking the geocoding method in GeoPy, instantiate a fresh GoogleV3 Geocoder with your Google credentials, incorporating at minimum the Server API Key. An organized structure helps in managing large datasets through batching.
geo_locator = GoogleV3(api_key=GOOGLE_API_KEY,
client_id=GOOGLE_CLIENT_ID,
secret_key=GOOGLE_SECRET_KEY)
Thereafter, invoke the geocoding method, relaying the geocoder instance, constructed address line, and optional component restrictions.
# geocode the built line_address and passing optional componentRestrictions
location = geocode_address(geo_locator, line_address, component_restrictions)
def geocode_address(geo_locator, line_address, component_restrictions=None, retry_counter=0):
# the geopy GoogleV3 geocoding call
location = geo_locator.geocode(line_address, components=component_restrictions)
# build a dict to append to output CSV
if location is not None:
location_result = {"Lat": location.latitude, "Long": location.longitude, "Error": "",
"formatted_address": location.raw['formatted_address'],
"location_type": location.raw['geometry']['location_type']}
return location_result
Retry on Failure
The script retries to geocode the given line address when intermittent failures occur. That is, when the GeoPy library raised a GeocodeError exception that means that any of the retriable 5xx errors are returned from the API. By default the retry counter (RETRY_COUNTER_CONST) is set to 5:
# To retry because intermittent failures sometimes occurs
except (GeocoderQueryError) as error:
if retry_counter 〈 RETRY_COUNTER_CONST:
return geocode_address(geo_locator, line_address, component_restrictions, retry_counter + 1)
else:
location_result = {"Lat": 0, "Long": 0, "Error": error.message, "formatted_address": "",
"location_type": ""}
Handle Generic and Geocoding Exceptions
Other exceptions can occur, like when you exceed your daily quota limit or request by seconds. To support them import the GeoPy exceptions and handle each errors after geocode call. The script also raises an error when no geocoded address is found. The error message is appended to Error CSV field.
# import Exceptions from GeoPy
from geopy.exc import (
GeocoderQueryError,
GeocoderQuotaExceeded,
ConfigurationError,
GeocoderParseError,
)
# after geocode call, if no result found, raise a ValueError
if location is None:
raise ValueError("None location found, please verify your address line")
# To catch generic and geocoder errors.
except (ValueError, GeocoderQuotaExceeded, ConfigurationError, GeocoderParseError) as error:
location_result = {"Lat": 0, "Long": 0, "Error": error.message, "formatted_address": "", "location_type": ""}
Query-per-Second
If you’re not customer of Google Maps Platform, you will have to wait between two API Call otherwise you will raise an OVER_QUERY_LIMIT error due to the quotas request per seconds.
# for non customer, we have to sleep 500 ms between each request.
if not GOOGLE_SECRET_KEY:
time.sleep(0.5)
Results
Apart from the Latitude and Longitude fields, here are the 3 main results appended to destination CSV file.
Formatted Address
The formatted_address matches a readable address of the place (and original line address). This address is most of the time equivalent to the “postal address”.
Geocoding Accuracy Labels
For each succeeded geocoded address, a geocoding accuracy results is returned in location_type field. It comes from the Geocoding API service (see Google geocoding Results doc for more information). The following values are currently supported:
- “ROOFTOP” indicates that the returned result is a precise geocode for which we have location information accuracy down to street address precision.
- “RANGE_INTERPOLATED” indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
- “GEOMETRIC_CENTER” indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
- “APPROXIMATE” indicates that the returned result is approximate.
Error
Whenever the call to Google Maps API failed, the script will return an error message “None location found, please verify your address line”. For all other errors, the message raised by GeoPy will be appended to the field value. See GeoPy Exceptions for more details.
Useful Links
- This script on the Woosmap Github Organization
- GeoPy
- Alternative to GeoPy : Python Client for Google Maps Services
- Google Geocoding Strategies
- Google geocoding Results doc
FAQ
What are the necessary parameters I need to set up to use the GeoPy script for bulk geocoding addresses using Google Maps?
To use the GeoPy script for bulk geocoding addresses using Google Maps, you need to set the following parameters:
- Google API Key : The Server API Key from Google Maps Geocoding API.
- Input CSV File : The path and name of the input CSV file containing the addresses to be geocoded (`INPUT_CSV_FILE`).
- Output CSV File : The path and name of the output CSV file where the geocoded data will be saved (`OUTPUT_CSV_FILE`).
- Address Columns : The names of the columns in the CSV file that contain the address data (`ADDRESS_COLUMNS_NAME`).
- New Columns : The names of the new columns to be appended to the output CSV file (`NEW_COLUMNS_NAME`).
- Delimiter : The delimiter used in the input CSV file (`DELIMITER`).
- Optional Component Restrictions : If needed, define component restrictions for the geocoding query (`COMPONENTS_RESTRICTIONS_COLUMNS_NAME`).
How do I handle errors and exceptions, such as exceeding the daily quota limit or failed geocoding, when using the GeoPy script?
To handle errors and exceptions when using GeoPy, you can use the specific exception classes provided by the library. Here are some steps:
- Catch Specific Exceptions :
- Use `try-except` blocks to catch exceptions like `GeocoderQuotaExceeded`, `GeocoderAuthenticationFailure`, `GeocoderInsufficientPrivileges`, and the generic `GeocoderServiceError` . python from geopy.exc import GeocoderQuotaExceeded, GeocoderAuthenticationFailure, GeocoderServiceError try: location = geolocator.geocode("Nieuwerkerk AD ijssel") except GeocoderQuotaExceeded as e: print(f"Quota exceeded: {e}") except GeocoderAuthenticationFailure as e: print(f"Authentication failure: {e}") except GeocoderServiceError as e: print(f"Geocoder service error: {e}")
- Retry Failed Requests : Implement a retry mechanism to handle temporary failures. You can use a loop to retry the request a certain number of times before giving up . python import time max_retries = 4 for _ in range(max_retries): try: location = geolocator.geocode("Nieuwerkerk AD ijssel") break except GeocoderServiceError as e: print(f"Retrying due to error: {e}") time.sleep(1) # Wait before retrying
- Adjust Timeout and User-Agent : Ensure you have a reasonable timeout and a valid `User-Agent` header to avoid unnecessary errors . python geopy.geocoders.options.default_timeout = 10 geopy.geocoders.options.default_user_agent = 'my_app/1'
What additional fields are appended to the output CSV file after geocoding addresses using the GeoPy and Google Maps API?
After geocoding addresses using the GeoPy and Google Maps API, the following additional fields are appended to the output CSV file:
- Latitude
- Longitude
- Location_Type
- Formatted_Address
- Error (if needed, for failed geocoded addresses)
What are the different types of geocoding accuracy labels (e.g., ROOFTOP, RANGE_INTERPOLATED) returned by the Google Maps Geocoding API, and what do they indicate?
The Google Maps Geocoding API returns several types of geocoding accuracy labels:
- `ROOFTOP`: Indicates a precise geocode with location information accurate down to the street address level.
- `RANGE_INTERPOLATED`: Indicates an approximation, usually on a road, interpolated between two precise points (such as intersections).
- `GEOMETRIC_CENTER`: Indicates the geometric center of a result, such as a polyline (street) or polygon (region).
- `APPROXIMATE`: Indicates that the returned result is approximate.