Documented Datasets: :afos climodat iemre metar vtec

NWS Text Product Archive

Summary

This archive consists of raw ASCII text products issued by the National Weather Service. Some places on the website will refer to this as "AFOS", which is an archaic old abbreviation associated with this dataset. The realtime source of this dataset is the processing of text products sent over the NOAAPort SBN, but archives have been backfilled based on what exists at NCEI and also some archives provided by the University of Wisconsin.

Download Interface: IEM On-Demand
Spatial Domain: Generally US NWS Offices that issue text products.
Temporal Domain: Some data back to 1996, but archive quality and completeness greatly improves for dates after 1998.

Justification for processing

While the Internet provides many places to view current NWS Text Products, archives of these are much more difficult to find. One of the primary goals of the IEM website is to maintain stable URLs, so when links are generates to NWS Text Products, they need to work into the future! Many of these text products have very useful information in them for researchers and others in the public.

Other Sources of Information

The NCEI Site would be an authoritative source, but their archives of this data are very painful to work with. There are a number of other sites that have per-UTC day files with some text products included. For example Oklahoma Mesonet.

Processing and Quality Control

Generally, some quality control is done to ensure that the data is ASCII format and not filled with control characters. There are also checks that product timestamps are sane and represent a timestamp that is close to reality. For example over the NOAAPort SBN feed, there is about one product per day that is a misfire or some other error that is not allowed to be inserted into the database.

This database culls some of the more frequently issued text products. The reason being to save space and some of the text products are not very appropriate for long term archives. The most significant deletion are the SHEF products, which would overwhelm my storage system if I attempted to save the data! The script that does the database culling each day contains the exact AWIPS IDs used for this cleaning.

Frequently Asked Questions

How can I bulk download the data?

Sadly, this is not well done at the moment. The WX AFOS is about the best option as it has a "Download Text" button.

Please describe any one-offs within the archive?

The RRM product is generally SHEF and thus was culled from the database to conserve space. An IEM user requested that this product be retained, so the culling of it stopped on 31 March 2023. So this product's archive only dates back till then.

Updated: April 06, 2023 Permalink

IEM Climodat Reports and Data

Summary

This document describes the once daily climate dataset that provides observations and estimates of high and low temperature, precipitation, snowfall, and snow depth. Parts of this dataset have been curated over the years by a number of Iowa State employees including Dr Shaw, Dr Carlson, and Dr Todey.

Download Interface: IEM On-Demand
Spatial Domain: United States
Temporal Domain: varies by station
Variables Provided: Once daily high and low temperature, precipitation, snowfall, snow depth

Justification for processing

The most basic and important long term climate record are the once daily reports of high and low temperature along with precipitation and sometimes snow. The most commonly asked question of the IEM datasets are climate related, so curating a long-term dataset of daily observations is necessary.

Other Sources of Information

A great source of much of the same type of data is Regional Climate Centers ACIS. The complication when comparing IEM Climodat data to other sources is the difference in station identifiers used. The history of station identifiers is long and complicated. The National Center for Environmental Information (NCEI) has made strides in attempting to straighten the identifiers out. This continues to be complicated as the upstream data source of information uses a completely different set of identifiers known as National Weather Service Location Identifiers (NWSLI), which are different than what NCEI or the IEM uses for our climate datasets.

Processing and Quality Control

There is nothing easy or trivial about processing or quality control of this dataset. After decades of work, plenty of issues remain. Having human observers be the primary driver of this dataset is both a blessing and a curse. The good aspects include the archive dating back to the late 1800s for some locations and relatively high data quality. The bad aspects include lots of metadata issues due to observation timing, station moves, and equipment siting.

The primary data source for this dataset is the National Weather Service COOP observers. These reports come to the IEM over a variety of paths:

Realtime reports found in NWS SHEF Text Products, processed in realtime by the IEM
Via manually downloaded data archives provided by NCEI
Via web services provided by RCC ACIS

The merging of these four datasets creates a bit of a nightmare to manage.

Snowfall and snow depth data is always problematic. First, lets clarify what the terms mean. "Snowfall" is the amount of snow that fell from the sky over the previous 24 hour period. "Snow depth" is the amount of snow measured to be on the ground due to previous snowfalls. These numbers may sometimes contradict with snowfall amounts being larger than snow depth due to melting and/or compaction. Care should be used when analyzing the snowfall and snow depth reports.

Frequently Asked Questions

This data contains 'Statewide' and 'Climate District' observations, where do they come from?

The IEM produces gridded analyses of observed variables. A GIS-style weighted spatial sampling is done from this grid to derive averaged values over geographies like states and climate districts. Of course, when you average something like precipitation over a large area, you end up with rates that are lower than peak station rates and also with more precipitation events than individual stations within the region.
The download provides only a date, what time period does this represent?

Almost always, this date does not represent a local calendar date's worth of data. This date represents the date on which the observation was taken. Typically, this observation is taken at about 7 AM and so represents a 24 hour period prior to 7 AM. Explicitly providing the time of observation is a future and necessary enhancement to this dataset, but just tricky to do. Some observation locations have switched times over the years and some even observe 24 hour precipitation totals at a different time than the temperature values. Nothing is ever easy with this dataset...
Where does the radiation data come from?

The NWS COOP Network does not provide observations of daily solar radiation, but this variable is extremely important to many folks that use this information for modelling. As a convience, the IEM processes a number of reanalysis datasets and produces point sampling from the gridded information to provide "daily" radiation totals. A major complication is that the 'daily' COOP observations are typically at 7 AM and the gridded solar radiation data is extracted on close to a local calendar day basis. In general, the 7 AM value is for the previous day.
Where does the non-radiation data come from?

This information is primarily driven by the realtime processing of NWS COOP observations done by the IEM. For data older than the past handful of years, it is taken from the NCEI databases and now the ACIS web services. Some manual work is done to meld the differences in site identifiers used between the various online resources.
How do I know which variables have been estimated and which are observations?

The download interface for this dataset provides an option to include a flag for which observations have estimated temperatures or precipitation. Presently, a general flag is provided for both high and low temperature and no flag is provided for the snowfall and snow depth information.

Updated: May 17, 2021 Permalink

IEM Reanalysis

Summary

The IEM Reanalysis dataset is a daily gridded product combining a number of datasets into one product hopefully void of missing values. In some cases, data is interpolated and in other cases, the data is resampled from another grid. Keeping the workflow doing is a daily challenge due to changes in various input datasets and quirks with datasets over time.

Download Interface: IEM On-Demand
Spatial Domain: ...
Temporal Domain: ...

Justification for processing

A consistent and complete gridded analysis enables many downstream products and applications. Single point observations are of higher quality, but often have gaps and their representativity varies depending on many factors.

There are many alternative sources available today with similiar data, but it is good to have a product under IEM workflow control that is not subject to outages and data service dropouts. For example, government shutdowns.

Other Sources of Information

There are many. NARR, ERA5, and the list goes on.

Processing and Quality Control

To be written...

Frequently Asked Questions

When are daily fields updated?

Well, there is a long story!
This is another question I have?

Well, there is another story?

Updated: June 13, 2019 Permalink

ASOS/AWOS Global METAR Archives

Summary

The primary format that worldwide airport weather station data is reported in is called METAR. This format is somewhat archaic, but well known and utilized in the community. The data is sourced from a number of places including: Unidata IDD, NCEI ISD, and MADIS One Minute ASOS. The weather stations included are typically called "Automated Surface Observation System (ASOS)". The term "Automated Weather Observation System (AWOS)" is often used inter-changably.

Download Interface: IEM On-Demand
Spatial Domain: Worldwide
Temporal Domain: 1928-present

Justification for processing

The highest quality weather information comes from the ASOS sites. These stations get routine maintenance, considerable quality control, and is the baseline hourly interval dataset used by all kinds of folks. The data stream processed by the IEM contains global stations, so extending the ingest to the entire data stream was not significant effort.

Other Sources of Information

NCEI Integrated Surface Database (ISD) is likely the most authoritative source of this information.

Processing and Quality Control

A Python based ingestor using the metar package processes this information into the IEM database.

Frequently Asked Questions

Why is precipitation data all missing / zero for non-US locations?

It is the IEM's understanding that precipitation is not included in the global data streams due to previous data distribution agreements. The precipitation data is considered of very high value as it can be used to model and predict the status of agricultural crops in the country. Such information could push commodity markets. For the present day, other satellite datasets likely negate some of these advantages, but alas.

How are "daily" precipitation totals calculated?

In general, the ASOS stations operate in local standard time for the entire year round. This has some implications with computation of various daily totals as during daylight saving time, the calendar day total will represent a 1 AM to 1 AM local daylight time period. For the context of this METAR dataset, not all METAR reporting sites will generate a total that can be used for assignment of a calendar day's total. So the IEM uses a number of approaches to arrive at this total.

A script manually totals up the hourly precipitation reports and computes a true local calendar day total for the station, this total may later be overwritten by either of the below.
A real-time ingest process gleans the daily totals from the Daily Summary Message (DSM) issued by some ASOS sites.
A real-time ingest process gleans the daily totals from the Climate Report (CLI) that is issued for some ASOS sites by their respective local NWS Forecast Offfice.

Not all stations have DSM and/or CLI products, so the manual totaling provides a minimum accounting. The complication is that this total does not cover the same period that a CLI/DSM product does. So complicated!

Please explain the temperature units, storage and processing.

This is why we can not have nice things. The following discussion generally applies to the US observation sites. No matter what you see in various data feeds, the ASOS stations internally store their temperatures in whole degree Fahrenheit. The issues happen when the station transmits the data in whole degree Celsius and thus not have enough precision to covert back to Fahrenheit. For example, if the station observed a 78F temperature and then transmitted a 26C value, that 26C value converts back to 78.8F, which rounds to 79F. And down the rabbit-hole we go!

The IEM's archive of ASOS/METAR data comes from 3 main sources and some minor auxillary ones. The main source is the NOAA satellite feed, called NOAAPort. This feed provides data in METAR format, so the transmitted units are always whole degree Celsius, but sometimes the METAR T-group is included, so there is enough added precision to reliabily convert back to whole degree Fahrenheit. The IEM's processing attempts to prioritize those METARs that include the T-group, so that reliable Fahrenheit storage can occur.

The next main source is from the MADIS 5-minute ASOS dataset, previously called High Frequency METAR. This data feed has a significant issue whereby the transmitted data from the FAA to the NWS is only in whole degree Celsius. Such data can not be reliably converted back to whole degree Fahrenheit. For this reason, the IEM database stores these values as missing and they are not included in the data download. BUT, for those that really want this information, these values are included in the IEM-encoded raw METAR string that you can download with the data. You can find further discussion on this IEM News Item.

The third main source is from the NCEI ISD. At this time, there are no known issues with the temperature data in this feed being reliable for whole degree Fahrenheit.

Updated: November 30, 2020 Permalink

NWS Valid Time Extent Code (VTEC) Archives

Summary

The National Weather Service uses a rather complex and extensive suite of products and methodologies to issue watch, warnings, and advisories (WWA). Back in the 1990s and early 2000s, the automated processing of these products was extremely difficult and rife with errors. To help with automated parsing, the NWS implemented a system called Valid Time Extent Code (VTEC) which provides a more programatic description of what an individual WWA product is doing. The implementation of began in 2005 and was mostly wrapped up by 2008. The IEM attempts to do high fidelity processing of this data stream and has a number of Internet unique archives and applications to view this information.

Download Interface: Shapefile/KML Download
Spatial Domain: United States, including Guam, Puerto Rico and some other islands
Temporal Domain: Most WWA types back to 2008 or 2005, an archive of Flash Flood Warnings goes back to 2002 or so, and Tornado / Severe Thunderstorm Warnings goes back to 1986

Justification for processing

NWS issued WWA alerts are an important environmental hazard dataset and has broad interest in the research and insurance industries. Even in 2017, there are very few places that you can find long term archives of this information in usable formats.

Other Sources of Information

The National Center for Environmental Information has raw text product archives that do not contain processed atomic data of the actual WWA alerts. So the user is left to the adventure of text parsing the products. Otherwise, it is not clear if any other archive exists on the Internet of this information.

Processing and Quality Control

The pyIEM python package is the primary code that does the text parsing and databasing of the WWA products. A large number of unit tests exist against the various variations and quirks found with processing the WWA data stream since the mid 2000s. New quirks and edge cases are still found today with minor corrections made to the archive when necessary. The IEM continuously alerts and annoys the NWS when various issues are found, hoping to get the NWS to correct their products. While it has been a long and frustrating process, things do eventually get fixed leading to more robust data archives.

The pyIEM parsers send emails to the IEM developer when issues are found. The parser alerts when the following errors are encountered:

VTEC Event IDs (ETNs) being used that are out of sequential order.
Warning product segments are missing or have invalid Universal Geographic Code (UGC) encoding
Product segment has invalid VTEC encoding
Polygons included in the warning are invalid or counterclockwise
Timestamps are formatted incorrectly
The UGC / VTEC sequence of a particular product contains logical errors, for example a UGC zone silently drops out or into a warning.
Products are expired outside of the acceptable temporal bounds
Any other type of error and/or code bug that caused a processing fault

Frequently Asked Questions

Please fully describe the schema used within the downloaded shapefiles.

Grab some coffee and headache medicine as I am going to try to explain how the IEM processes these events into the database. The first concept to understand is that when the NWS issues a Watch, Warning, Advisory (WaWA) event, this event undergoes a lifecycle. The NWS can issue updates that modify the start and end times of the event and the spatial extent of the event. They can also do upgrades on the event, for example moving from a watch into a warning. The IEM database does not necessary fully document the event's lifecycle, but provides the metadata for the last known state of the event.

For the context of IEM provided shapefiles, here is a discussion of what each DBF column represents. We will go into an example afterwards attempting to illustrate what each column means.

But first, the timestamps. The presented timestamps are always in UTC timezone. The timestamp is represented by a 12 character string in the form of year, month, day, 24-hour,minute. To my knowledge, there is no timestamp data type in DBF, so this is the pain we have to live with.

DBF Column	Type	Description
WFO	3 Char	This is the three character NWS Office/Center identifier. For CONUS locations, this is the 4 character ID dropping the first `K`. For non-CONUS sites, this is the identifier dropping the `P`.
ISSUED	12 Char	This timestamp represents the start time of the event. When an event's lifecycle begins, this issued value can be updated as the NWS issues updates. The value presented represents the last known state of the event start time.
EXPIRED	12 Char	Similiar to the ISSUED column above, this represents the products event end time. Again, this value is updated as the event lifecycle happens with updates made by the NWS.
INIT_ISS	12 Char	This is timestamp of the NWS Text Product that started the event. This timestamp is important for products like Winter Storm Watch, which have a begin time a number of days/hours into the future, but are typically considered to be in effect at the time of the text product issuace. Yeah, this is where the headaches start. This timestamp can also be used to form a canonical URL back to the IEM to fetch the raw NWS Text for this event. It is not updated during the event's lifecycle.
INIT_EXP	12 Char	Similiar to `INIT_ISS` above, this is the expiration of the event denoted with the first issuance of the event. It is not updated during the event's lifecycle.
PHENOM or TYPE	2 Char	This is the two character NWS identifier used to denote the VTEC event type. For example, `TO` for Tornado and `SV` for Severe Thunderstorm. A lookup table of these codes exists here.
SIG	1 Char	This is the one character NWS identifier used to denote the VTEC significance. The same link above for `PHENOM` has a lookup table for these.
GTYPE	1 Char	Either `P` for polygon or `C` for county/zone/parish. The shapefiles you download could contain both so-called storm-based (polygon) events and traditional county/zone based events.
ETN	Int	The VTEC event identifier. A tracking number that should be unique for this event, but sometimes it is not. Yes, more headaches. Note that the uniqueness is not based on the combination of a UGC code, but the issuance center and a continuous spatial region for the event.
STATUS	3 Char	The VTEC status code denoting the state the event is during its life cycle. This is purely based on any updates the event got and not some logic on the IEM's end denoting if the event is in the past or not.
NWS_UGC	6 Char	For county,zone,parish warnings `GTYPE=C`, the Universal Geographic Code that the NWS uses. Sadly, this is not exactly FIPS.
AREA_KM2	Number	The IEM computed area of this event, this area computation is done in Albers (EPSG:2163).
UPDATED	12 Char	The timestamp when this event's lifecycle was last updated by the NWS.
HV_NWSLI	5 Char	For events that have H-VTEC (Hydro VTEC), this is the five character NWS Location Identifier.
HV_SEV	1 Char	For events that have H-VTEC (Hydro VTEC), this is the one character flood severity at issuance.
HV_CAUSE	2 Char	For events that have H-VTEC (Hydro VTEC), this is the two character cause of the flood.
HV_REC	2 Char	For events that have H-VTEC (Hydro VTEC), this is the code denoting if a record crest is expected at issuance.
EMERGENC	Boolean	Based on unofficial IEM logic, is this event an "Emergency" at any point during its life cycle.
POLY_BEG	12 Char	In the case of polygons (GTYPE=P) the UTC timestamp that the polygon is initially valid for.
POLY_END	12 Char	In the case of polygons (GTYPE=P) the UTC timestamp that the polygon expires at.
HAILTAG	Number	The IBW hail size tag (inches). This is only included with the (GTYPE=P) entries as there is a 1 to 1 association between the tags and the polygons. If you do not include SVS updates, it is just the issuance tag.
WINDTAG	Number	The IBW wind gust tag (MPH). See HAILTAG.
TORNTAG	16 Char	The IBW tornado tag. See HAILTAG.
DAMAGTAG	16 Char	The IBW damage tag. See HAILTAG.
PROD_ID	36 Char	Issuance text. IEM identifier used to uniquely (99% of the time) identify NWS Text Products. The value can be passed to `https://mesonet.agron.iastate.edu/p.php?pid=PROD_ID` for a website viewer or against the IEM API service `https://mesonet.agron.iastate.edu/api/1/nwstext/PROD_ID`.

I notice entires with an expire timestamp before the issue timestamp. How can this be?

Oh my, buckle up for some confusion. The first point in this space is that our database represents the most recent snapshot of the given VTEC event during its life cycle. The life cycle includes the issuance to its death via a cancels, expiration, or upgrade to a different VTEC event.

To illustrate the evolution of the database fields with a VTEC event lifecycle, please consider this example. At noon on 19 March 2019, NWS Des Moines wfo=DMX issues a Winter Storm Watch phenom=WS sig=A for Story County (nws_ugc=IAZ048). This watch goes into effect at 6 PM (tomorrow, 20 March) until 6 AM 21 March. The storm is a day away yet... The database entry looks like so:

STATUS ISSUE INIT_ISS EXPIRE INIT_EXP

NEW 201903202300 201903171700 201903211100 201903211100

Now tomorrow comes and the NWS needs to decide what to do with the watch prior to 6 PM, since these type of watches can not reach their issuance time without either being cancelled or upgraded. So at 5 PM, the NWS decides to issue a Winter Storm Warning. Now the database entry for the watch looks like so:

STATUS ISSUE INIT_ISS EXPIRE INIT_EXP

UPG 201903202300 201903171700 201903202200 201903211100

See how the EXPIRE column is now less than the ISSUE column, but the INIT_ISS and INIT_EXP columns are unchanged to hopefully help the end user deal with this situation. You have life choices to make on how to deal with this situation.

In general, the watch practically is in effect once the NWS issued it, regardless of when the actual bad weather is going to start. So the recommendation is to use the INIT_ISS column as the watch start time and the EXPIRE as the watch end time, but this logic is totally at your discretion.
How do Severe Thunderstorm, Flash Flood, or Tornado warnings have VTEC codes for dates prior to implementation?

Good question! A number of years ago, a kind NWS manager provided a database dump of their curated WWA archive for dates between 1986 and 2005. While not perfect, this archive was the best/only source that was known at the time. The IEM did some logic processing and attempted to back-compute VTEC ETNs for this archive of warnings. The database was atomic to a local county/parish, so some logic was done to merge multiple counties when they spatially touched and had similiar issuance timestamps. Again from the above, automated machine parsing of the raw text is next to impossible. The ETNs were assigned as a convience so that various IEM apps and plots would present this data online.
The database has Weather Forecast Offices (WFOs) issuing WWA products for dates prior to the office even existing? How can this be!?!?

Yeah, this is somewhat poor, but was done to again provide some continuity with current day operations. The archive database provided to the IEM did not contain the issuance forecast office, so without a means to properly attribute these, the present day WFOs were used. This issue is rarely raised by IEM users, but it is good to document. Maybe someday, a more authoritative archive will be made and these old warnings and be assigned to the various WSOs, etc that existed at the time.
What are the VTEC phenomena and significance codes?

The phenomena code (two characters) and significance code (one character) denote the particular WWA hazzard at play with the product. The NWS VTEC Site contains a one pager PDF that documents these codes. The NWS uses these codes to color encode their WAWA Map found on their homepage. You can find a lookup reference table of these codes and colors here.
How do polygon warnings exist in the IEM archive prior to being official?

The NWS offices started experimenting with polygons beginning in 2002. These polygons were included with the warnings, but sometimes were not geographically valid and/or leaked well outside of a local office's CWA bounds. On 1 October 2007, these polygons became the official warning for some VTEC types. In general, the IEM's data ingestor attempts to save these polygons whenever found.

STATUS	ISSUE	INIT_ISS	EXPIRE	INIT_EXP
NEW	201903202300	201903171700	201903211100	201903211100

STATUS	ISSUE	INIT_ISS	EXPIRE	INIT_EXP
UPG	201903202300	201903171700	201903202200	201903211100

Updated: March 05, 2024 Permalink