Module Reference

class gtfslite.gtfs.GTFS(agency, stops, routes, trips, stop_times, calendar=None, calendar_dates=None, fare_attributes=None, fare_rules=None, shapes=None, frequencies=None, transfers=None, pathways=None, levels=None, translations=None, feed_info=None, attributions=None)

A representation of a single static GTFS feed and associated data.

All parameters should be valid Pandas DataFrames that follow the structure corresponding to the dataset as defined by the GTFS standard (http://gtfs.org/reference/static).

Parameters
  • agency (pandas.DataFrame) – Transit agencies with service represented in this dataset.

  • stops (pandas.DataFrame) – Stops where vehicles pick up or drop off riders. Also defines stations and station entrances.

  • routes (pandas.DataFrame) – Transit routes. A route is a group of trips that are displayed to riders as a single service.

  • trips (pandas.DataFrame) – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.

  • stop_times (pandas.DataFrame) – Times that a vehicle arrives at and departs from stops for each trip.

  • trips – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.

  • trips – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.

  • calendar (pandas.DataFrame, conditionally required) – Service dates specified using a weekly schedule with start and end dates. This file is required unless all dates of service are defined in calendar_dates.

  • calendar – Exceptions for the services defined in calendar. If calendar is omitted, then calendar_dates is required and must contain all dates of service.

  • fare_attributes (pandas.DataFrame, default None) – Fare information for a transit agency’s routes.

  • fare_rules (pandas.DataFrame, default None) – Rules to apply fares for itineraries.

  • shapes (pandas.DataFrame, default None) – Rules for mapping vehicle travel paths, sometimes referred to as route alignments.

  • frequencies (pandas.DataFrame, default None) – Headway (time between trips) for headway-based service or a compressed representation of fixed-schedule service.

  • transfers (pandas.DataFrame, default None) – Rules for making connections at transfer points between routes.

  • pathways (pandas.DataFrame, default None) – Pathways linking together locations within stations.

  • levels (pandas.DataFrame, default None) – Levels within stations.

  • feed_info (pandas.DataFrame, default None) – Dataset metadata, including publisher, version, and expiration information.

  • translations (pandas.DataFrame, default None) – In regions that have multiple official languages, transit agencies/operators typically have language-specific names and web pages. In order to best serve riders in those regions, it is useful for the dataset to include these language-dependent values..

  • attributions (pandas.DataFrame, default None) – Dataset attributions.

Raises

FeedNotValidException – If the feed doesnt’ contain the required files or is otherwise invalid.

date_trips(date: date) DataFrame

Finds all the trips that occur on a specified day. This method accounts for exceptions included in the calendar_dates dataset.

Parameters

date (datetime.date) – The service day to count trips on

Returns

A dataframe of trips which are run on the provided date.

Return type

DataFrame

delete_routes(route_ids: list[str], clean_stops=False)

Delete a route with associated trips, stops, shapes, and other data

This method removes the provided routes from the GTFS by removing all reference to specific route ids and the trips that the route runs.

Parameters
  • route_ids (list[str]) – A list of routes to remove. A single route_id can also be provided

  • clean_stops (bool, optional) – Whether or not to remove stops that are no longer served by the GTFS, by default False

classmethod load_zip(filepath, **pandas_kwargs)

Creates a GTFS object based on a provided zipfolder.

For parsing feeds with different encodings, you can pass any Pandas read_csv keyword arguments along.

Parameters

filepath (str) – The path to the zipfile

Returns

A GTFS object with loaded data.

Return type

GTFS

route_frequency_matrix(date: date, interval: int = 60, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') DataFrame

Generate a matrix of route headways throughout a given time period.

Produce a matrix of headways by a given interval (in minutes) for each route_id throughout the service period of a given day.

Parameters
  • date (datetime.date) – The service day to analyze

  • interval (int, optional) – The number of minute bins to divide the day into, by default 60

  • start_time (str, optional) – The start time of the analysis. Only trips which start after this time will be included. Can be greater than 24:00:00. A None value will consider all trips from the start of the service day, by default None

  • end_time (str, optional) – A string representation (HH:MM:SS) of the end time of the analysis. Only trips which end before this analysis time will be included. ‘ Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None

  • time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’

Returns

A dataframe containing the following columns:

route_id: The ID of the route bin_start: The start of the time interval measured (HH:MM) bin_end: The end of the time interval measured (HH:MM) trips: The count of the number of trips on that route in that interval frequency: The frequency of trips (in trips/hour) on the route

Return type

pd.DataFrame

route_summary(date, route_id)

Assemble a series of attributes summarizing a route on a particular day.

The following columns are returned: * route_id: The ID of the route summarized * total_trips: The total number of trips made on the route that day * first_departure: The earliest departure of the bus for the day * last_arrival: The latest arrival of the bus for the day * service_time: The total service span of the route, in hours * average_headway: Average time in minutes between trips on the route

Parameters
  • date (datetime.date) – The calendar date to summarize.

  • route_id (str) – The ID of the route to summarize

Returns

A series with summary attributes for the date

Return type

pandas.Series

routes_summary(date)

Summarizes all routes in a given day. The columns of the resulting dataset match the columns of route_summary()

Parameters

date (datetime.date) – The day to summarize.

Returns

A pandas.DataFrame object containing the summarized data.

service_hours(date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') float

Computes the total service hours delivered for a specified date within an optionally specified time slice.

This method measures this value by considering partial trips as having stopped at the end of the time slice. In other words, partial trips are included in the total service hours during the specified time slice.

Parameters
  • date (datetime.date) – The dat of analysis

  • start_time (str, optional) – The starttime in the format HH:MM:SS, by default None

  • end_time (str, optional) – The starttime in the format HH:MM:SS, by default None

  • time_field ({'arrival_time', 'departure_time'}, optional) – The time field to use for the calucation, by default ‘arrival_time’

Returns

The total service hours in the specified period.

Return type

float

Raises

DateNotValidException – The date falls outside of the feed’s span.

stop_summary(stop_id: str, start_time: str = None, end_time: str = None) Series

Assemble a series of attributes summarizing a stop on a particular day. The following columns are returned:

  • stop_id: The ID of the stop summarized

  • total_visits: The total number of times a stop is visited

  • first_arrival: The earliest arrival of the bus for the day

  • last_arrival: The latest arrival of the bus for the day

  • service_time: The total service span, in hours

  • average_headway: Average time in minutes between arrivals

Parameters

stop_id (str) – The ID of the stop to summarize

stop_times_at_stop(stop_id: str, date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') DataFrame

Get the stop times that visit a particular stop over a day or a subset of the day.

Parameters
  • stop_id (str) – The stop ID to analyse

  • date (datetime.date) – The calendar date to analyse

  • start_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will assume consider all trips from the start of the service day, by default None

  • end_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None

  • time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’

Returns

A copy of the stop_trips dataframe containing the filtered stop events.

Return type

pd.DataFrame

summary() Series

Assemble a series of attributes summarizing the GTFS feed with the following columns:

  • agencies: list of agencies in feed

  • total_stops: the total number of stops in the feed

  • total_routes: the total number of routes in the feed

  • total_trips: the total number of trips in the feed

  • total_stops_made: the total number of stop_times events

  • first_date: (datetime.date) the first date the feed is valid for

  • last_date: (datetime.date) the last date the feed is valid for

  • total_shapes (optional): the total number of shapes.

Returns

A Pandas series containing the required data.

Return type

pandas.Series

trip_distribution(start_date, end_date)

Summarize the distribution of service by day of week for a given date range. Repeated days of the week will be counted multiple times.

Parameters
  • start_date (datetime.date) – The start date for the summary (inclusive)

  • end_date (datetime.date) – The end date for the summary

Returns

A series containing as indices the days of the week and as values the total number of trips found in the time slice provided.

Return type

pandas.Series

unique_trip_count_at_stops(stop_ids: list, date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') int

Get a count of unique trips that visit a given set of stops

This function returns a subset of the trips table which include trips that stop at _any_ of the stops provided, within provided times.

Parameters
  • stop_ids (list) – A list of stop_ids to check

  • date (datetime.date) – The service day to check

  • start_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips from the start of the service day, by default None

  • end_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None

  • time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’

Returns

An integer specifying the total number of unique trips that visit the supplied set of stops.

Return type

int

Notes

Not all GTFS datasets include arrival and/or departure times for every stop. In cases where times are only set at time points, and no interpolation is provided, this function will not work.

valid_date(date_to_check: date)

Checks whether the provided date falls within the feed’s date range.

Note that this does not check whether any trips run on a given date, only whether or not the calendar and calendar dates files span or include the provided date in their service.

Parameters

date (datetime.date) – A date or object to be validated against the feed

Returns

Whether the date is valid or not.

Return type

bool

write_zip(filepath, include_optional=True)

Write the current GTFS into a zipfile.

Parameters
  • filepath (str) – The filepath to write the zip to (should be a .zip extension)

  • include_optional (bool, optional) – Whether or not to include files marked optional by the GTFS spec, by default True