A Blog To Remember
Environics Analytics is in the business of making data useful – whether it’s providing clients with our data in useful formats or helping our clients turn their own data into valuable insights. And we’re always on the lookout for more data that enhances our products and helps our clients. One potential source for additional data is the increasing availability of “open data” — that is, datasets that are freely accessible to the public. This is a topic still being covered in Ontario news media since the provincial government’s announcement this past October 21st of an Open Data Evaluation Panel. Of course, many cities, including Toronto, already have a wide variety of open datasets, and the federal government’s open data portal (www.data.gc.ca) has been in operation for a few years. However, because of our corporate orientation, local and provincial open data sources are not particularly useful; we really need national coverage.
But every once in a while, I visit the federal government’s open data website (www.data.gc.ca) to see if there’s anything interesting – and useful. Unfortunately, I haven’t found many that are particularly useful – and here are some of the reasons why:
- Border wait times (Canadian Border Services Agency) show land crossings only with no spatial element (only the name of the crossing).
- The list of National Parks and National Historic Sites (Parks Canada) has no spatial element; the “Map of National Parks” is just a graphic.
- Anything from Natural Resources Canada is problematic because the enormous volume of information makes it is impossible to find anything: 190,181 files from NRC compared to 5,396 from Statistics Canada—the next most prolific agency.
I suppose a lesson to be learned with open data is that the providers should first evaluate the usefulness of the information (i.e. complete, comprehensive, readable) and then consider how to best make the data available. A good data librarian is required!
However, while on a recent surf of the open data website, I did find one file that I wanted to share, especially with the Remembrance Day commemoration coming up on November 11th. It’s called the National Inventory of Canadian Military Memorials from National Defence, and you can link to it here. While the dataset isn’t particularly useful to the marketing needs of Environics Analytics and its clients, it does have spatial information (some records have GPS coordinates, others have street addresses or intersections) and, of particular interest to me, the text of the actual inscription on the memorial. I don’t quite know why I find this sort of thing fascinating; I suppose part of it is that I think about my grandfather who served as a radar technician in World War II. I also imagine those people in a town who, after the war, wanted to make sure their neighbours’ sacrifice was remembered, and came together to make something permanent.
If you do nothing else on the 11th, I hope you’ll read some of the inscriptions, visit a memorial near you and wear a poppy…to remember those who allow us to live in freedom and security today.
The producer of the database, National Defence — and more specifically the Directorate of History and Heritage — has done a good job of making the data useful as well. In the interest of creating a complete and comprehensive database, they have created a website where you can look for a memorial in your community (Search for a Memorial). And if you can’t find it, you can submit a form to have one included (Record a Memorial). I looked for a particular memorial that I pass on my commute to work through Toronto Union Station, and I didn’t find it. So I asked Joseph Ng Chow, an EA Information Technology Specialist who is also our in-house photographer (check out his blog posting A Picture Perfect Geek), if we could get a photo of the memorial. I plan to submit it to the database along with the inscription text:
This tablet commemorates those in the service
of the Canadian Pacific Railway Company who
at the call of King and Country left all that
was dear to them, endured hardship, faced danger
and finally passed out of sight by men by the
path of duty and self sacrifice, giving up their
own lives that others might live in freedom.
Let those who come after see to it
that their names be not forgotten.
Unfortunately, the XML data file that was supplied to the open data portal was not as useful as the website. But because I found the subject matter so interesting, I asked myself, what I could do to this data file to make it useful? I had an idea that I wanted to write this blog and support it with an interactive interface which would show a map and allow for better searches in the inscription text.
First, I turned to Alteryx and put together a module that downloads the XML file, adds a date stamp to the file name, and parses the XML so that different data attributes appeared in different columns in my file. You can’t do this in Excel because a few of the text fields contained over 40,000 characters because of the inscriptions. Next, I had noticed that some records included a tag for GPS coordinates. Again I put together an Alteryx module that pulled out the GPS tagged records, but the formatting was inconsistent – as you can see in the examples below:
GPS location: N 42° 01.965 W 082° 44.344
GPS: 45.287 -74.860
GPS location: 56° 09' 22" N 99° 11' 27" W
GPS location: 52°55'N 118°18'W
GPS Location; 49d40'19"N 124d55'41"W
across from Communauté Chrétienne Saint Pierre (GPS Location: 46°37'30"N 61°01'00"W)
Old Holy Trinity Anglican Church (cemetery / GPS Location: 44°56'04"N 65°05'10"W)
After quite a bit of parsing, I was able to clean up the records. Then, looking at records that didn’t have the GPS tag, I found that there was a whole other set of records with latitude and longitude tags, so I did a similar clean-up of those records. Additionally, I was able to use the address fields in the table to get coordinates for some additional records. I also did a bit of clean-up of the inscription text to remove the XML formatting elements and to join together the four fields of arbitrarily divided inscription text. Based on this work, it made sense to add a couple of fields to the database for consistently formatted coordinates.
With the help of Tony Bursey, an EA Software Developer and our Renaissance Geek, we discussed alternative approaches for creating an interactive interface. Along the way, we noticed that the National Defence web page information for each memorial was not embedded in the data file. With a bit of detective work, we found out how to construct the link ourselves and I embedded that in my Alteryx module. By the way, while working on this blog, Tony asked me to look at Clarenville, NL – a personal connection, and what I hope readers would do – where we found this listing (see Memorial #10001-006, left side listing). Because records with coordinates are in the minority, Tony and I concluded that a map-based interface alone wouldn’t be ideal so we called on Vito De Filippis, an EA Client Advocate, to take the cleaned data into Tableau where we could have a map, a link to the National Defence web pages and an interface where we could display non-mappable records all in one place. Alteryx exports directly into Tableau format so this was a particularly simple mechanism for making the data useful. Another benefit of the Tableau approach is that we could make the inscription text fully searchable – for example, if you search for “IN THE MORNING”, you will get every memorial inscribed with “AT THE GOING DOWN OF THE SUN AND IN THE MORNING WE WILL REMEMBER THEM”. Vito, Tony and I were able to quickly create a highly interactive Tableau dashboard, which you can access here.
Ultimately, our effort illustrates that, even when databases are open and in the public domain, they are often not very useful for a variety of reasons—content, organization and formatting inconsistency. Hopefully, however, the Ontario government initiative can overcome some of these issues and become a model for a revised—and improved—version of the federal open data portal. If you have some useful open data sources, please let me know.
And please take the time to look for memorials that you know about…Lest We Forget.
Tom Montpool is Director of Standard Data at Environics Analytics.