Module Reference
- class gtfslite.gtfs.GTFS(agency, stops, routes, trips, stop_times, calendar=None, calendar_dates=None, fare_attributes=None, fare_rules=None, shapes=None, frequencies=None, transfers=None, pathways=None, levels=None, translations=None, feed_info=None, attributions=None)
A representation of a single static GTFS feed and associated data.
All parameters should be valid Pandas DataFrames that follow the structure corresponding to the dataset as defined by the GTFS standard (http://gtfs.org/reference/static).
- Parameters
agency (pandas.DataFrame) – Transit agencies with service represented in this dataset.
stops (pandas.DataFrame) – Stops where vehicles pick up or drop off riders. Also defines stations and station entrances.
routes (pandas.DataFrame) – Transit routes. A route is a group of trips that are displayed to riders as a single service.
trips (pandas.DataFrame) – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.
stop_times (pandas.DataFrame) – Times that a vehicle arrives at and departs from stops for each trip.
trips – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.
trips – Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period.
calendar (pandas.DataFrame, conditionally required) – Service dates specified using a weekly schedule with start and end dates. This file is required unless all dates of service are defined in calendar_dates.
calendar – Exceptions for the services defined in calendar. If calendar is omitted, then calendar_dates is required and must contain all dates of service.
fare_attributes (pandas.DataFrame, default None) – Fare information for a transit agency’s routes.
fare_rules (pandas.DataFrame, default None) – Rules to apply fares for itineraries.
shapes (pandas.DataFrame, default None) – Rules for mapping vehicle travel paths, sometimes referred to as route alignments.
frequencies (pandas.DataFrame, default None) – Headway (time between trips) for headway-based service or a compressed representation of fixed-schedule service.
transfers (pandas.DataFrame, default None) – Rules for making connections at transfer points between routes.
pathways (pandas.DataFrame, default None) – Pathways linking together locations within stations.
levels (pandas.DataFrame, default None) – Levels within stations.
feed_info (pandas.DataFrame, default None) – Dataset metadata, including publisher, version, and expiration information.
translations (pandas.DataFrame, default None) – In regions that have multiple official languages, transit agencies/operators typically have language-specific names and web pages. In order to best serve riders in those regions, it is useful for the dataset to include these language-dependent values..
attributions (pandas.DataFrame, default None) – Dataset attributions.
- Raises
FeedNotValidException – If the feed doesnt’ contain the required files or is otherwise invalid.
- date_trips(date: date) DataFrame
Finds all the trips that occur on a specified day. This method accounts for exceptions included in the calendar_dates dataset.
- Parameters
date (datetime.date) – The service day to count trips on
- Returns
A dataframe of trips which are run on the provided date.
- Return type
DataFrame
- delete_routes(route_ids: list[str], clean_stops=False)
Delete a route with associated trips, stops, shapes, and other data
This method removes the provided routes from the GTFS by removing all reference to specific route ids and the trips that the route runs.
- Parameters
route_ids (list[str]) – A list of routes to remove. A single route_id can also be provided
clean_stops (bool, optional) – Whether or not to remove stops that are no longer served by the GTFS, by default False
- classmethod load_zip(filepath, **pandas_kwargs)
Creates a GTFS object based on a provided zipfolder.
For parsing feeds with different encodings, you can pass any Pandas read_csv keyword arguments along.
- Parameters
filepath (str) – The path to the zipfile
- Returns
A GTFS object with loaded data.
- Return type
- route_frequency_matrix(date: date, interval: int = 60, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') DataFrame
Generate a matrix of route headways throughout a given time period.
Produce a matrix of headways by a given interval (in minutes) for each route_id throughout the service period of a given day.
- Parameters
date (datetime.date) – The service day to analyze
interval (int, optional) – The number of minute bins to divide the day into, by default 60
start_time (str, optional) – The start time of the analysis. Only trips which start after this time will be included. Can be greater than 24:00:00. A None value will consider all trips from the start of the service day, by default None
end_time (str, optional) – A string representation (HH:MM:SS) of the end time of the analysis. Only trips which end before this analysis time will be included. ‘ Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None
time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’
- Returns
- A dataframe containing the following columns:
route_id: The ID of the route bin_start: The start of the time interval measured (HH:MM) bin_end: The end of the time interval measured (HH:MM) trips: The count of the number of trips on that route in that interval frequency: The frequency of trips (in trips/hour) on the route
- Return type
pd.DataFrame
- route_summary(date, route_id)
Assemble a series of attributes summarizing a route on a particular day.
The following columns are returned: * route_id: The ID of the route summarized * total_trips: The total number of trips made on the route that day * first_departure: The earliest departure of the bus for the day * last_arrival: The latest arrival of the bus for the day * service_time: The total service span of the route, in hours * average_headway: Average time in minutes between trips on the route
- Parameters
date (datetime.date) – The calendar date to summarize.
route_id (str) – The ID of the route to summarize
- Returns
A series with summary attributes for the date
- Return type
pandas.Series
- routes_summary(date)
Summarizes all routes in a given day. The columns of the resulting dataset match the columns of
route_summary()
- Parameters
date (
datetime.date
) – The day to summarize.- Returns
A
pandas.DataFrame
object containing the summarized data.
- service_hours(date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') float
Computes the total service hours delivered for a specified date within an optionally specified time slice.
This method measures this value by considering partial trips as having stopped at the end of the time slice. In other words, partial trips are included in the total service hours during the specified time slice.
- Parameters
date (datetime.date) – The dat of analysis
start_time (str, optional) – The starttime in the format HH:MM:SS, by default None
end_time (str, optional) – The starttime in the format HH:MM:SS, by default None
time_field ({'arrival_time', 'departure_time'}, optional) – The time field to use for the calucation, by default ‘arrival_time’
- Returns
The total service hours in the specified period.
- Return type
float
- Raises
DateNotValidException – The date falls outside of the feed’s span.
- stop_summary(stop_id: str, start_time: str = None, end_time: str = None) Series
Assemble a series of attributes summarizing a stop on a particular day. The following columns are returned:
stop_id: The ID of the stop summarized
total_visits: The total number of times a stop is visited
first_arrival: The earliest arrival of the bus for the day
last_arrival: The latest arrival of the bus for the day
service_time: The total service span, in hours
average_headway: Average time in minutes between arrivals
- Parameters
stop_id (str) – The ID of the stop to summarize
- stop_times_at_stop(stop_id: str, date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') DataFrame
Get the stop times that visit a particular stop over a day or a subset of the day.
- Parameters
stop_id (str) – The stop ID to analyse
date (datetime.date) – The calendar date to analyse
start_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will assume consider all trips from the start of the service day, by default None
end_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None
time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’
- Returns
A copy of the stop_trips dataframe containing the filtered stop events.
- Return type
pd.DataFrame
- summary() Series
Assemble a series of attributes summarizing the GTFS feed with the following columns:
agencies: list of agencies in feed
total_stops: the total number of stops in the feed
total_routes: the total number of routes in the feed
total_trips: the total number of trips in the feed
total_stops_made: the total number of stop_times events
first_date: (datetime.date) the first date the feed is valid for
last_date: (datetime.date) the last date the feed is valid for
total_shapes (optional): the total number of shapes.
- Returns
A Pandas series containing the required data.
- Return type
pandas.Series
- trip_distribution(start_date, end_date)
Summarize the distribution of service by day of week for a given date range. Repeated days of the week will be counted multiple times.
- Parameters
start_date (datetime.date) – The start date for the summary (inclusive)
end_date (datetime.date) – The end date for the summary
- Returns
A series containing as indices the days of the week and as values the total number of trips found in the time slice provided.
- Return type
pandas.Series
- unique_trip_count_at_stops(stop_ids: list, date: date, start_time: str = None, end_time: str = None, time_field: str = 'arrival_time') int
Get a count of unique trips that visit a given set of stops
This function returns a subset of the trips table which include trips that stop at _any_ of the stops provided, within provided times.
- Parameters
stop_ids (list) – A list of stop_ids to check
date (datetime.date) – The service day to check
start_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips from the start of the service day, by default None
end_time (str, optional) – A string representation (HH:MM:SS) of the number of hours since midnight on the analysis date. Can be greater than 24:00:00. A None value will consider all trips through the end of the service day, by default None
time_field (str, optional) – The name of the time column in stop_times to consider, either ‘arrival_time’ or ‘departure_time’. By default ‘arrival_time’
- Returns
An integer specifying the total number of unique trips that visit the supplied set of stops.
- Return type
int
Notes
Not all GTFS datasets include arrival and/or departure times for every stop. In cases where times are only set at time points, and no interpolation is provided, this function will not work.
- valid_date(date_to_check: date)
Checks whether the provided date falls within the feed’s date range.
Note that this does not check whether any trips run on a given date, only whether or not the calendar and calendar dates files span or include the provided date in their service.
- Parameters
date (datetime.date) – A date or object to be validated against the feed
- Returns
Whether the date is valid or not.
- Return type
bool
- write_zip(filepath, include_optional=True)
Write the current GTFS into a zipfile.
- Parameters
filepath (str) – The filepath to write the zip to (should be a .zip extension)
include_optional (bool, optional) – Whether or not to include files marked optional by the GTFS spec, by default True