GSP-341: Create ML Models with BigQuery ML
Overview
Please follow these commands:
Task 1: Create a dataset to store your machine learning models
In Cloud Shell
1bq mk austin
Navigation Menu -> BigQuery.
Task 2: Create a forecasting BigQuery machine learning model
In BigQuery Console Query Editor
1CREATE OR REPLACE MODEL austin.location_model
2OPTIONS
3 (model_type='linear_reg', labels=['duration_minutes']) AS
4SELECT
5 start_station_name,
6 EXTRACT(HOUR FROM start_time) AS start_hour,
7 EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
8 duration_minutes,
9 address as location
10FROM
11 `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
12JOIN
13 `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
14ON
15 trips.start_station_name = stations.name
16WHERE
17 EXTRACT(YEAR FROM start_time) = <replace year>
18 AND duration_minutes > 0
Task 3: Create the second machine learning model
In BigQuery Console Query Editor
Query - 1
1CREATE OR REPLACE MODEL austin.subscriber_model
2OPTIONS
3 (model_type='linear_reg', labels=['duration_minutes']) AS
4SELECT
5 start_station_name,
6 EXTRACT(HOUR FROM start_time) AS start_hour,
7 subscriber_type,
8 duration_minutes
9FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
10WHERE EXTRACT(YEAR FROM start_time) = <replace year>
Query - 2
1CREATE OR REPLACE MODEL austin.subscriber_model
2OPTIONS
3 (model_type='linear_reg', labels=['duration_minutes']) AS
4SELECT
5 start_station_name,
6 EXTRACT(HOUR FROM start_time) AS start_hour,
7 subscriber_type,
8 duration_minutes
9FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
10WHERE EXTRACT(YEAR FROM start_time) = <replace year>
Task 4: Evaluate the two machine learning models
In BigQuery Console Query Editor
Query - 1
1-- Evaluation metrics for location_model
2SELECT
3 SQRT(mean_squared_error) AS rmse,
4 mean_absolute_error
5FROM
6 ML.EVALUATE(MODEL austin.location_model, (
7 SELECT
8 start_station_name,
9 EXTRACT(HOUR FROM start_time) AS start_hour,
10 EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
11 duration_minutes,
12 address as location
13 FROM
14 `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
15 JOIN
16 `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
17 ON
18 trips.start_station_name = stations.name
19 WHERE EXTRACT(YEAR FROM start_time) = <replace year> )
20)
21
22
Query - 2
1-- Evaluation metrics for subscriber_model
2SELECT
3 SQRT(mean_squared_error) AS rmse,
4 mean_absolute_error
5FROM
6 ML.EVALUATE(MODEL austin.subscriber_model, (
7 SELECT
8 start_station_name,
9 EXTRACT(HOUR FROM start_time) AS start_hour,
10 subscriber_type,
11 duration_minutes
12 FROM
13 `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
14 WHERE
15 EXTRACT(YEAR FROM start_time) = <replace year>)
16)
Task 5: Use the subscriber type machine learning model to predict average trip durations
In BigQuery Console Query Editor
Query - 1
1SELECT
2 start_station_name,
3 COUNT(*) AS trips
4FROM
5 `bigquery-public-data.austin_bikeshare.bikeshare_trips`
6WHERE
7 EXTRACT(YEAR FROM start_time) = <replace year>
8GROUP BY
9 start_station_name
10ORDER BY
11 trips DESC
Query - 2
1SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length
2FROM ML.predict(MODEL austin.subscriber_model, (
3SELECT
4 start_station_name,
5 EXTRACT(HOUR FROM start_time) AS start_hour,
6 subscriber_type,
7 duration_minutes
8FROM
9 `bigquery-public-data.austin_bikeshare.bikeshare_trips`
10WHERE
11 EXTRACT(YEAR FROM start_time) = <replace year>
12 AND subscriber_type = 'Single Trip'
13 AND start_station_name = '21st & Speedway @PCL'))
Congratulations, you're all done with the lab 😄