GSP-341: Create ML Models with BigQuery ML

GSP-341: Create ML Models with BigQuery ML

Overview

Please follow these commands:

Task 1: Create a dataset to store your machine learning models

In Cloud Shell

1bq mk austin

Navigation Menu -> BigQuery.

Task 2: Create a forecasting BigQuery machine learning model

In BigQuery Console Query Editor

 1CREATE OR REPLACE MODEL austin.location_model
 2OPTIONS
 3  (model_type='linear_reg', labels=['duration_minutes']) AS
 4SELECT
 5    start_station_name,
 6    EXTRACT(HOUR FROM start_time) AS start_hour,
 7    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
 8    duration_minutes,
 9    address as location
10FROM
11    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
12JOIN
13    `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
14ON
15    trips.start_station_name = stations.name
16WHERE
17    EXTRACT(YEAR FROM start_time) = <replace year>
18    AND duration_minutes > 0

Task 3: Create the second machine learning model

In BigQuery Console Query Editor

Query - 1

 1CREATE OR REPLACE MODEL austin.subscriber_model
 2OPTIONS
 3  (model_type='linear_reg', labels=['duration_minutes']) AS
 4SELECT
 5    start_station_name,
 6    EXTRACT(HOUR FROM start_time) AS start_hour,
 7    subscriber_type,
 8    duration_minutes
 9FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
10WHERE EXTRACT(YEAR FROM start_time) = <replace year>

Query - 2

 1CREATE OR REPLACE MODEL austin.subscriber_model
 2OPTIONS
 3  (model_type='linear_reg', labels=['duration_minutes']) AS
 4SELECT
 5    start_station_name,
 6    EXTRACT(HOUR FROM start_time) AS start_hour,
 7    subscriber_type,
 8    duration_minutes
 9FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
10WHERE EXTRACT(YEAR FROM start_time) = <replace year>

Task 4: Evaluate the two machine learning models

In BigQuery Console Query Editor

Query - 1

 1-- Evaluation metrics for location_model
 2SELECT
 3  SQRT(mean_squared_error) AS rmse,
 4  mean_absolute_error
 5FROM
 6  ML.EVALUATE(MODEL austin.location_model, (
 7  SELECT
 8    start_station_name,
 9    EXTRACT(HOUR FROM start_time) AS start_hour,
10    EXTRACT(DAYOFWEEK FROM start_time) AS day_of_week,
11    duration_minutes,
12    address as location
13  FROM
14    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
15  JOIN
16   `bigquery-public-data.austin_bikeshare.bikeshare_stations` AS stations
17  ON
18    trips.start_station_name = stations.name
19  WHERE EXTRACT(YEAR FROM start_time) = <replace year> )
20)
21
22

Query - 2

 1-- Evaluation metrics for subscriber_model
 2SELECT
 3  SQRT(mean_squared_error) AS rmse,
 4  mean_absolute_error
 5FROM
 6  ML.EVALUATE(MODEL austin.subscriber_model, (
 7  SELECT
 8    start_station_name,
 9    EXTRACT(HOUR FROM start_time) AS start_hour,
10    subscriber_type,
11    duration_minutes
12  FROM
13    `bigquery-public-data.austin_bikeshare.bikeshare_trips` AS trips
14  WHERE
15    EXTRACT(YEAR FROM start_time) = <replace year>)
16)

Task 5: Use the subscriber type machine learning model to predict average trip durations

In BigQuery Console Query Editor

Query - 1

 1SELECT
 2  start_station_name,
 3  COUNT(*) AS trips
 4FROM
 5  `bigquery-public-data.austin_bikeshare.bikeshare_trips`
 6WHERE
 7  EXTRACT(YEAR FROM start_time) = <replace year>
 8GROUP BY
 9  start_station_name
10ORDER BY
11  trips DESC

Query - 2

 1SELECT AVG(predicted_duration_minutes) AS average_predicted_trip_length
 2FROM ML.predict(MODEL austin.subscriber_model, (
 3SELECT
 4    start_station_name,
 5    EXTRACT(HOUR FROM start_time) AS start_hour,
 6    subscriber_type,
 7    duration_minutes
 8FROM
 9  `bigquery-public-data.austin_bikeshare.bikeshare_trips`
10WHERE 
11  EXTRACT(YEAR FROM start_time) = <replace year>
12  AND subscriber_type = 'Single Trip'
13  AND start_station_name = '21st & Speedway @PCL'))

Congratulations, you're all done with the lab 😄