GSP-327: Engineer Data in Google Cloud
Overview
Please execute the following tasks:
Task - 1: Clean your training data
1CREATE OR REPLACE TABLE
2taxirides.<Table_Name_as_mention_in_lab> AS
3SELECT
4(tolls_amount + fare_amount) AS <Fare Amount_as_mention_in_lab>,
5pickup_datetime,
6pickup_longitude AS pickuplon,
7pickup_latitude AS pickuplat,
8dropoff_longitude AS dropofflon,
9dropoff_latitude AS dropofflat,
10passenger_count AS passengers,
11FROM
12taxirides.historical_taxi_rides_raw
13WHERE
14RAND() < 0.001
15AND trip_distance > 3 [Change_as_mention_in_lab]
16AND fare_amount >= 2.0 [Change_as_mention_in_lab]
17AND pickup_longitude > -78
18AND pickup_longitude < -70
19AND dropoff_longitude > -78
20AND dropoff_longitude < -70
21AND pickup_latitude > 37
22AND pickup_latitude < 45
23AND dropoff_latitude > 37
24AND dropoff_latitude < 45
25AND passenger_count > 3 [Change_as_mention_in_lab]
Task - 2: Create a BQML model called taxirides.fare_model
1CREATE OR REPLACE MODEL taxirides.<Model Name_as_mention_in_lab>
2TRANSFORM(
3* EXCEPT(pickup_datetime)
4
5, ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
6, CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek
7, CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday
8)
9OPTIONS(input_label_cols=['<Fare Amount_as_mention_in_lab>'], model_type='linear_reg')
10AS
11
12SELECT * FROM taxirides.<Table_Name_as_mention_in_lab>
Task - 3: Perform a batch prediction on new data
1CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions
2AS
3SELECT * FROM ML.PREDICT(MODEL taxirides.<Model Name_as_mention_in_lab>,(
4SELECT * FROM taxirides.report_prediction_data)
5)
Congratulations, you're all done with the lab 😄