
castform is a package used to download historic hourly weather station data from Environment Canada. This data is downloaded from:
https://climate.weather.gc.ca/historical_data/search_historic_data_e.html
Available Functions
This package has various functions that allow for the download, processing, and analysis of historical weather station data. This includes functions to:
- Download latest station inventory list
- Search for available stations by province and year range
- Download a station’s data files from:
- A single month
- Multiple months
- A single province/territory
- A certain year Range
- Download all available hourly station files
- Create a database from downloaded files
- Create exploratory plots to identify:
- Missing data
- Data ranges
- Yearly variable means
- Missing data strings
- Repeated data strings
- Detect extreme heat events in historical data
Installation
You can install the development version of castform from GitHub with:
# install.packages("pak")
pak::pak("shawn-0303/castform_package")Usage
Loading Metadata
This needs to be the FIRST STEP of your analysis.
Download the latest station inventory list using get_metadata(). This function will download the latest station inventory list and store each new version as a .csv and .rda file. Whenever this function is run, it will also load the station list into the global environment as Hourly_station_info.
Searching for Station Information
All the download wrappers require specific information about the station(s) the user wants to download. This information can be pulled from the metadata using station_lookup().
station_lookup(province = "prince edward island",
start_year = 1953,
end_year = 2001)You can search for stations by Province as well as the start_year and end_year of hourly data collection.
Downloading files
The following are various download wrappers that will download historic weather station data as .csv files.
By default, all downloads are written to a new “station_data” folder in the working directory.
Download a Single Station File
Download a single .csv file from a specified station that stores a month of hourly weather data.
If your goal is a larger download, it is a good idea to verify your station information and output directories using this function.
get_single_station_file(station_name = "discovery island",
station_id = 27226,
year = 1997,
month = 1,
root_folder = "station_data")Downloading Multiple Station Files
get_multiple_station_files(station_name = "discovery island",
station_id = 27226,
number_of_files = 10,
year = 1997,
month = 1,
parallel_threshold = 50,
root_folder = "station_data")Downloading Files by Station
Can specify by year and month, but if left empty, will download all data available for that station.
Downloading Files by Province
Can specify by year and month, but if left empty, will download all data available for that province.
province_station_files(province = "prince edward island",
parallel_threshold = 50,
root_folder = "station_data")Downloading Files by Year Range
year_range_station_files(station_name = "discovery island",
station_id = 27226,
start_year = 1997,
end_year = 1999,
parallel_threshold = 50,
root_folder = "station_data")Making Databases
Creates a searchable database with a specified folder of hourly weather station data.
build_station_database <- function(db_name = "BC_station_data",
output_dir = "castform_outputs",
root_folder = "downloaded_data/BC") This builds a database with the expected scheme:
-
Weather: Stores weather conditions and their associated numeric codes -
Station: Stores weather station information using HLY_station_info -
Observation: Stores information from downloaded station data (.csv) files
Validate the Database
After creation, it is a good idea to validate the created database using validate_database().
validate_database(db_name = "BC_station_data",
db_dir = "castform_outputs")From the expected schema, produced tables should have:
-
Weather54 records -
StationAs many records as HLY_Station_Info -
ObservationAs many records as stored in the downloaded data files
Exploratory Data Analysis
Queries and produces .html outputs to visualize the structure and summary of the data. Each table provides buttons for users to copy the output or download a .csv or .pdf.
Create Map of Stations
Creates a map to visualize stations of metadata. If metadata_stations = TRUE, this function will map all stations with hourly data available.
station_map(db_name = "BC_station_data",
db_dir = "castform_outputs",
output_dir = "castform_outputs",
output_name = "BC_station_map",
metadata_stations = FALSE)
station_map(metadata_stations = TRUE)Data Missingness
Creates a table outlining the expected and actual data counts, along with the percentage of missing data for each variable in each station.
Data Ranges
Creates a table summarizing the average, minimum and maximum value of each variable in each station.
Yearly Means
Create plots that summarize the means of every variable over time (where station data is available)
Missing Strings
Creates a table identifying when data is missing.
NOTE: This will take longer to run on larger datasets.
Repeated Strings
Creates a table identifying strings of repeated values that occur for three hours or more. The table stores the length (in hours) and start and end date/time of the repeated strings. Also creates a plot to visualize these strings.
NOTE: This will take longer to run on larger datasets. Large datasets will also require zooming into plots to see outputs or else the plot will look empty.
pull_repeated_strings(db_name = "BC_station_data",
db_dir = "castform_outputs",
output_dir = "castform_outputs",
output_name = "BC_repeated_strings",)Heat Wave Detector
Detects of extreme heat events using user input temperature thresholds (in Celcius) and creates a table and plot summarizing daily temperature averages.
Uses ECCC’s definition of extreme heat events, which defines them as “events during which daily temperatures have reached heat warning thresholds on 2 or more consecutive days with no relief overnight”.
heatwave_detector(db_name = "BC_STATION_DATA",
max_threshold = 28,
min_threshold = 13)