Language selection

Search

CSPS Data Demo Week: From Raw to Refined – Organizing Data with Intent (Video)

Description: Watch this walk through the life cycle of data projects highlighting key components of data integration, the importance of data standards, and the value of the enormous amount of data that exists within government. This demonstration of Namara by ThinkData's Chief Executive Officer and Co-Founder Bryan Smith is followed by a discussion featuring Vik Pant, former Chief Scientist at Natural Resources Canada.

CSPS Data Demo Week: From Raw to Refined – Organizing Data with Intent

Date: April 28, 2021

Duration: 01:02:34

Resolution: 1080p


Transcript

[The animated white Canada School of Public Service logo appears on a purple background. Its pages turn, opening it like a book. A maple leaf appears in the middle of the book that also resembles a flag with curvy lines beneath. Text is beside it reads: Webcast | Webdiffusion.]

[It fades out, replaced two title screens side by side in English and French. At the top, it shows three green maple leaves, each made of different textures. Text on Screen reads:

CSPS Data Demo Week

Organizing Data with Intent

GC data community

[It fades out, replaced by a Zoom video call. The video window is filled with a man with glasses, wearing a blue button-down shirt. He sits in front of a purple Zoom background that features the CSPS logo, the Government of Canada logo and reads "CSPS Data Demo Week" in English and French.]

Neil Bouwer: Good morning. I'm Neil Bouwer with the Canada School of Public Service, and I'm really happy to welcome you to today's event. Today is one of our series of our Data Demo Week. All week we have been demonstrating new approaches to data analytics and artificial intelligence with partners within the government and outside of government. Today, we're really happy to bring you a demonstration and a presentation on a really interesting data demo. First, I just want to mention a couple of things. First of all, I'm in the National Capital Region, so that puts me on the traditional territory of the Anishinaabe people. It's a good time for me to reflect on what it means to be on the traditional, unceded territory of the Anishinaabe people. I encourage everyone to take a moment to reflect, wherever you are, on the traditional territory that you may be on. I also want to mention that we have simultaneous interpretation today, so the event is going to be mostly in English, but there is simultaneous interpretation in French, so please take advantage of that. It is available to you in the link that you got for the event and when you signed in initially, you can choose simultaneous interpretation. Donc, s'il vous plaît soyez à l'aise en français en utilisant l'interprétation simultanée que nous vous donnons.

We've got two great speakers today. We're going to have a chance to hear from each of them. We're also going to have a chance to have questions and answers from you. There is a "raise hand" function that you can use on the platform to pose your questions. So please, don't hesitate. Our guests are here to answer your questions. Let me introduce each of them just very briefly.

[Two people join the call. On the top right Vik Pant, a man in a crisp blue suit, sits in front of a white background. On the bottom panel, Bryan Smith, a blonde man sits in a home's basement.]

Neil Bouwer: Firstly, we have our very own Vik Pant, who's the Chief Science Officer at Natural Resources Canada. In that role, he has been bringing new levels of data analytics and artificial intelligence to the role in the science‑based department working in the community. We're really happy to have Vik with us today. And he will be followed by Bryan Smith, who is a Co‑Founder and CEO of a company called ThinkData, which is a Toronto‑based company that uses data and data analytics to look at data that is of interest to government and other interested departments, brings those data together and to help add public value to various users in the public interest, including departments. Bryan's not only going to explain to us what all that means and the approach that data has taken, but he's going to demo it for us. We're going to see it in practice and to see how it works. So, really excited to have both of you here today. Thank you for joining us. Let's start off with Vik. If you could just start us off maybe with a few remarks on data in your context. Over to you.

Vik Pant: Absolutely, Neil. Thank you so much. I'd like to start by thanking Mr. Taki Sarantakis, the President of the Canadian School and the entire organizing team for this excellent event and for bringing us together in the spirit of collaborative learning and knowledge sharing. I'm delighted to be here also to speak with Bryan, who is an industry colleague. I'm very much looking forward to a live demo. That's always the highlight of any such show.

[Vik's panel fills the screen. A purple text box in the bottom left corner identifies him: Vik Pant, Natural Resources Canada.]

Vik Pant: Neil, most of all, I want to thank you. I've had the privilege of knowing you ever since I came to Ottawa a couple of years ago, and we've had some excellent conversations and enlightening discussions about the role of data, the role of digital in truly amplifying and magnifying what the impact and contributions of it can be in the world of government and public service. Neil, as our neuroscientist friends say, when I associate with you, it's sort of like a heavy and learning, neurons that wire together, fire together. I'm always very much excited in our chemistry and our energy. Thank you, Neil, for having me here today.

I think this topic of From Raw to Refined is very important for us at Natural Resources Canada. We live it first‑hand. I really like the title itself because it's very appropriate. We always say in the Natural Resources Canada Digital Accelerator, which I'll talk about in just a minute, that models live downstream of data. If you have poor quality data, you're going to have poor performing AI systems and in the Natural Resources Canada Digital Accelerator, we have teams of folks that work specifically on this kind of a problem around data engineering and ensuring that all of the upstream tasks are done properly so that the downstream data mining and all of the modelling can be really extremely high quality and high performance. In our department, we focus on forestry, energy, and mining. Our department is a very science‑based department. We want our scientists and our policy leaders to really take advantage of all of these amazing new technologies, all of these breakthroughs which are happening around us and are chronicled in the popular press and in the mainstream media to collectively elevate the performance and utility that we can extract from the various data sets we have.

Now, in the accelerator, we have data scientists. They work with our domain scientists, as well as policy leaders on creative applications of innovative technologies to advance science policy integration, which is to say that we build digital solutions which are data-centric, that help us to advance and accelerate forward on the mission and mandate, as well as the priorities and plans of our department. This is work we do, going from raw to refined on a daily basis, and I'd love to share in these opening remarks just a couple of quick insights that we have learned along the way and indeed, in partnership with many federal departments. We've all collectively learned certain things that I'd like to be able to share through these opening remarks. I think the first thing that comes to mind when we talk about raw to refined are some of the technical type of tasks which we generally associate with transactional or master data management, right? So, things around distillation, synchronization, standardization, systematization of our data sets, having a catalogue, making sure that data is complete, cohesive, coherent, and compatible. These are things for which we use our standard software packages, indeed, before tools such as the ones that Bryan will be showing you were available, we'd have to really roll up our sleeves and rely on R Libraries and Python packages, which we still do to some extent, but we are very happy to know that there are solutions like ThinkData out there that allow us to really just quickly take some of these repeatable tasks, repeatable activities, which go into data management and get our data sets from raw to refined, from original state to a machine‑consumable state to really be able to consume value from it downstream.

Speaking of value, this brings us to the second and, dare I say, more important aspect of going from raw to refined, which is essentially locking in on the value propositions, the inherent benefit propositions, of a data set and truly understanding not just the what, but the so what. It's great to say that we go from raw to refined, but it's not a costless exercise. There's people's time involved. There's opportunity cost. There's actual financial costs involved when you have software, and when you take time to do certain things to data to, if you will, massage it to get it ready for processing. The question then becomes, we want to ensure that we're not just doing it because everybody else does it or because we read somewhere that going from raw to refined is the right thing to do in some abstract sense, but we truly want to tie any data set, irrespective of where it is in that quality continuum, to the things that it enables, because certainly the things that it enables may be more or less sensitive to the quality of the data that go in. For instance, if you're training a machine learning model or a reinforcement learning agent or something like that.

Let me explain this a little bit more, because in my collaborating with many federal departments and many private sector organizations, I think everybody gets the first part that I talked about, that if you have raw data and it's corrupt, or it's incompatible, or it's incoherent, or it doesn't have proper statistical properties that are amenable to data science, then of course, you're going to have to get it up to a state where you can do that. But I think for many organizations, that's where it sort of stops—this thinking about going from raw to refined, Neil and Bryan. What I'd like to posit is certainly it's a necessary condition, but in the way that I've framed it, it's not a sufficient condition by itself. Now we get to this second piece of the pie, which is the conceptual going from the raw to the refined. How we do this is by using some conceptual modelling techniques, as distinct from machine learning or reinforcement or deep learning techniques, where we actually build human‑understandable models. There's some ground‑breaking work going on at the University of Toronto. We've actually partnered up with researchers from the University of Toronto. We've also partnered up with other departments, including Agriculture Canada, because many departments have the same kind of needs that I'm articulating that NRCan has.

What we do is we take a look at any given data set and, as I'll explain, we can go bottom up or top down. We build this interpretive scaffolding. We start with a data set at the very ground level. What we can say is that, "okay, what use case is this data set geared for?" If we are having difficulty mapping a data set to any given use case, then probably it's a question of do we need to spend any amount of resources to take it from raw to refined in the technical sense. But I'd say that there's another element to this, which is that you may have a data set that was acquired or that was obtained, I don't know, maybe a couple of years ago for some very high‑priority initiative. But perhaps the priority of that initiative has waned or perhaps that project is over. Yet, there is some residual use in this data set where we can couple this with some other data sets that is currently tied to some other very high‑priority initiative. By putting these two data sets together, instead of going from one plus one to two, we can go from one plus one to 11. And that's the beauty of what data gives us.

I've said this in other occasions as well is that data has this very beautiful property where it's idiosyncratic in terms of its complementarity. When you have two different data sets and you put them together, there are certain latent compatibilities or latent, if you will, synergies that emerge that elevate the overall collective of both our multiple data sets so that the overall quality of whatever application you're using or utilizing those data set in grows up quite a bit. One way we do this bottom up in my team is we take a look at any data set and we start to map out what programs or priorities, what mission and mandate of our department does this specific data set map to? Because, as I mentioned before, data management and especially data enrichment and taking it from raw to refined is not a costless enterprise. It's not a costless activity. We need to have that traceability. We need to have that rationale, especially as our CFOs in the Department are asking more questions about investments into data and digital solutions. We need to be able to justify why it is that one data set is being prioritized over another data set. That's one way to do it is we go bottom-up and, certainly, we don't go from a data set at one level all the way up to, hey, it's going to help us build better climate models. As you can imagine, it's going to lead to many intermediary sets of goals and sub‑goals and other kinds of conceptual entities as it goes through that mapping process, but it's one that has to be done to be able to strategically map the data set from the bottom to the ultimate enterprise objective in a systematic and a structured way. That is then amenable to analysis, and documentation, and communication, and really be brought into the enterprise governance framework that indeed all enterprise assets and resources need to be brought to.

Similarly, we can do this process top-down. We can start with the mission and mandate of our department. We talk about net zero. We talk about climate change mitigation, adaptation, remediation. We talk about carbon sinks. We talk about greenhouse gas reduction, et cetera. That could be a high‑level strategic objective. OK, that's the mission, the mandate, the plan, the priority of our department. Now, we start an iterative and incremental process of refining and decomposing that high‑level goal into lower and lower elements in a conceptual hierarchy all the way until we get to a point where we say, "okay, to answer this kind of question coming out of this kind of machine learning model, we need this kind of a data set."

Going bottom up from a catalogue or an inventory of known data sets and top down from a mission and priority level of an organization top down, what we can find is build this very interesting conceptual map or graph, if you will, of all the data assets in our enterprise. Mapped all the way through multiple levels of goals and objectives of our enterprise, using key performance indicators and critical success factors and metrics along the way to the highest level strategic imperatives of our department. What's very interesting here is that, conceptually, these kinds of frameworks are being developed. Neil, you talked about the power of partnerships. We've always talked about the importance of alliances. That's why I'm very happy to have ThinkData on this on this meeting as well. As I said before, the same way that solutions like ThinkData help us to go beyond some of the technical tasks we'd have to do by writing code in Python or writing code in R—although there's nothing wrong with that. Now we're getting to that same place with these conceptual modelling frameworks. As I mentioned, our department is working closely with U of T researchers to really take the framework that they have built, which is still being built, certainly. But bring it to the public sector context and adapt it to our specific requirements and our specific situations as well. Collectively, what we're finding is, Neil, you have the technical piece and Bryan, you have that organizational strategic piece that really gives us a much more holistic and much more comprehensive roadmap, if you will, or guide map, if you will, of going from raw to refined. I know Bryan's got an amazing demo lined up. I know that I probably should end my prepared remarks right here. And happy to transition it back to you, Neil, and then to Bryan. I'm happy to join the conversation afterwards. But great to see you as always, Neil. Thank you for having me.

[Neil and Bryan's panels rejoin the frame with Neil's panel sitting on the bottom and Bryan on the top left.]

Neil Bouwer: Super. Thanks, Vik. Thanks for those remarks. I really like the way you describe it in the two directions, bottom up or top down, because I think for some of us public servants, sometimes it feels really top down in the sense that you've got a strategic objective, maybe you need a dashboard or maybe you need to report progress on some information. That naturally flows top down. But other times, if you're in a program or a service and if you understand the data that you collect or if you understand the data sets that are available to you, then this idea of developing use cases—in other words, what other kinds of questions you want answered, who's going to ask them, and how do they need to understand them—is a really practical way to get into data. Because it can be complicated, but actually the way you describe it, it's a pretty natural process. So, thank you for that, and we'll come back to that in questions, I'm sure. Bryan, over to you.

Bryan Smith: Awesome. Thanks, everyone and Vik. Thanks for that. It's really amazing to hear all the great work you're doing and I'm going to do my best to tie this demo and this presentation to what you're working on as well. I was noodling around a little bit to try and align some stuff. So, hopefully we can link the two conversations together. I think for the audience, there's going to be a lot of overlap between what Vik and I are saying, because he's hitting the nail on the head, which is great. I'll get into a demo, for sure.

[Bryan's panel fills the screen. A purple text box in the bottom left corner identifies him: Bryan Smith, ThinkData.]

Bryan Smith: I want to start with a bit of a slide deck just to set it up. I'm going to walk through two use cases. One's a private sector use case and then one is a Government of Canada use case that we ran with Treasury Board. Then I'll get into the demo itself and how it all works. Give me one second just to share my screen here.

[Bryan clicks, and his computer browser fills the majority of the screen. Bryan's video panel rests much smaller on the righthand side of the screen. He starts a presentation. A logo in the top left corner of the slide reads "ThinkData Works." A line separates the text from a droplet shape with a plus and equals sign in it. In bold letters, the slide reads "From Raw to Refined: Organizing your data with intent."

A new slide replaces it. The title reads "Our vision." Below it, text reads "Our Mission: Enable any organization to discover, govern and enrich the data that powers analysis and insight." Beside it, three icons are labelled "Discover, Govern," and "Enrich."

A timeline at the bottom of the screen adds details:

"2014 — Vision: Index world of open data to unlock value for enterprises

2016 — Reality: Needed cataloging software to manage +250K open datasets

2018 — Evolve: From Data Vendor to Data Platform

2021 — Refine: Unlocking analysis and enrichment with a data catalogue"]

Bryan Smith: All right. Before we kick into the demo, I guess I'll give you a bit of a background just on what we do at ThinkData Works. We really have a pretty clear vision on what we try and help organizations do, which is enable any organization to discover, govern, and enrich the data assets that are going to go and power their business and deliver insight. We break down this problem into three pretty big principles. One: people have to discover data, two: we have to govern it properly, and then three: we have to enrich it or start monetizing it, either internally or externally within our organization. We started the company back in 2014, around the launch of the open data movement globally. One of the issues that we saw was, it was amazing that governments were starting to publish data all over the place, but that also led to a big data problem, which was there was no standard way to access it.

The first thing we went out to do was index the world of public data. We ended up pulling together around 250 thousand open data sets from around the world. We then started evolving that out of just purely an open data focus and into indexing as much public data as possible, and building a tool that enabled the flow of that information to come from wherever it resided, collect all the meta information, collect all the source information, collect all the licensing, and serve it up to data science teams so that they could very easily plug into it and just get over that discovery hurdle as fast as possible. So, we could get into the insight generation, which is where we really see the big value. We built a lot of tools to make this happen. An evolution of that toolset was also how do we govern this data properly, and then how do we get it in the hands of data scientists as fast as possible so that they can start monetizing on it—in whatever that term means to the end user. That's ultimately what we have as a vision.

[The slide changes. The next slide is titled "Agenda." On it, there are bullet points:

  • "The data economy
  • The problems data teams face
  • Organizing data with intent
    • Geotab case study
    • TBS case study
  • Consolidating data into a Data Hub
  • Demo"]

Bryan Smith: I think where I'll start is just a walkthrough of what we're seeing in the data economy. I'll focus a lot on the problems that teams face and where data scientists are spending a lot of their time, because I think that's the crux of the issue that we can discuss today. I'll walk through those two examples and then we'll dive into the demo.

[The slide changes. The next one is titled "The data landscape is changing." The slide features three subtitled pieces of text:

"5.3X — Data volumes are exploding. IDC predicts the datasphere will grow from 33ZB to 175ZB by 2025.

40% - Privacy is becoming law. GDPR fines grew to $330M in 2020. CCPA and Canada Privacy Act will increase the impact to businesses.

$5.8T — AI is driving demand for data. McKinsey forecasts that AI will drive up to $5.8T in annual revenue globally."]

Bryan Smith: Three high‑level stats just to set the pace. We're seeing data volumes explode year over year. This is really that whole concept of data discovery being a big hurdle that companies are trying to get over. The second big trend we're seeing is privacy is becoming law. GDPR, as we all know, Canada having the Privacy Act in the House now, governance is becoming a very important thing. Not only do we need to discover data, we need to govern it properly and make sure that we're following regulation. The key here is we need to introduce a model where governance aids in the use of data and it doesn't shut it down. I think that's where we're really trying to move in the direction of proper data governance, is how do we put a data governance model in place that enables an organization to share data as much as possible, as securely as possible, and has confidence that they can share the data the way they can, rather than through traditional means of emailing it to each other, downloading it or putting on USBs, which is really what this legislation is trying to stamp out.

Finally, why data volumes are exploding and why companies need to focus on data governance. Ultimately, it comes down to the fact that AI is driving demand for data. We're seeing massive, massive investments in modern data analysis and training data required for AI and machine learning, which is really driving a lot of agendas at the high level with a lot of Fortune 500 or Fortune 1000 companies. That's where we're seeing a lot of investment. In order to get to that level of actually training models and running things through a machine environment—Vik spoke about this really clearly as well—you've got to make sure your underlying data assets are clean and you need as much data as possible in order to power those models and train them. This is where that whole volume of data is coming from and where it's going is to power this big movement and big shift in how companies are looking at the future of their tech stacks, ultimately.

[The slide changes. The new slide is split into two halves, a blue half and a white half. Two sections of a quote run across it, separated by an ellipsis. "By 2022, there will be an even greater demand for good data from large organizations...but less than 5% of companies know how to fill that supply with trusted data on their own."]

Bryan Smith: With that table set, we're seeing a massive amount of demand for good data within large companies. The issue we're facing is that not a lot of companies actually know how to fill that supply with trusted data. So the real question is why? If organizations, governments, the public domain is indexing more data than we ever have before and we're doing it at 5X year over year, why are companies not plugging into it? Why is there a bottleneck from supply to usage?

[The slide changes. The new slide is titled "Available data isn't accessible data." Three subtitles sit in arrows pointing in a line. Below each subtitle is text. They read:

Data is Hard to Find — Wasted Time Costing Millions — "A data scientist spends 8-% of their time finding and preparing data. Good data is a rarity, and project-ready data is a unicorn.

Data is Hard to Use — No Common Standards — Even when teams find the data they need, it's seldom ready to use — the data needs transformation and enrichment.

Data is Hard to Sell — Addressing The Market — How do you bring your data to market safely? How do you get visibility and searchability? What's the delivery vehicle?"]

Bryan Smith: This really comes down to what factors data scientists and data teams face at any company, whether it's government working with their own data coming from different departments or a company like a bank looking at the broader public domain of data and trying to pull stuff in to help make decisions on whatever models they're running. It really comes down to three factors. One: data is hard to find. Even within the government realm, I would say not everyone knows who owns what data asset. We expand that to the entire universe of data, and no one is an expert on who owns what and what's being published in the public domain. There's a massive discovery hurdle that needs to be taken away so that people just know where to find data and can plug into it and in a very efficient way.

The second hurdle we face is even when we do find data, what we're finding is data's hard to use. That's no common standards amongst data. We have to spend a lot of our time or data scientists have to spend a lot of their time stitching data together, driving standards across really similar or related data. But a lot of the effort up front to prep assets for use really comes down to getting it into a common format so that we can actually run it through a standard model that we're building, or a product that we have launched.

Finally, on the supply side, it's actually pretty hard to get data into market for a handful of reasons. One: you can throw it into an open data portal. Government has, I think, spent the last five years thinking about why that's valuable and what the value is. For private companies, that metric isn't as defined as we see in the public realm. So, data still has an undisclosed price tag on it and people are trying to figure it out. Even when they do put it in market, there's a lot of questions around how much data do we need to put in, at what level of granularity? How do we make sure we're not offside on a security or regulatory perspective? All these unanswered questions that hold back really valuable data from entering the market. What we do at ThinkData Works is really work across this spectrum and help companies from the finding all the way to the using and, ultimately, to the monetizing—is the word that we use to collect all this stuff together.

[The slide changes. The new slide is titled "Data science is stuck in low gear." The slide shows a pie chart split into three sections. 76% of the chart is devoted to "Finding data, gaining access to data, cleansing data, aggregating and normalizing data, validating data accuracy, testing for "lift" in models," and "updating assets and monitoring feeds." 14% of the chart is devoted to "analyzing data" and 10% of the chart is devoted to "reporting on findings."]

Bryan Smith: From a data science perspective, we see data scientists at more or less every organization (public, private, small, large) all really stuck spending about 80 percent of their time working on the underlying issues of data prep and processing. That's everything from finding data to cleansing it to aggregating it, all the way to things like validating and testing. And then following the actual use of it, the updating the assets and monitoring them to ensure that the data is constantly flowing in the format that's required in order to power a model with confidence. It's like a pre‑usage and then post‑model, and that's where data scientists are getting bogged down significantly. If you're working with one to 10 to 50 data assets, it may be manageable for a person or a team to consistently manage the flow of that data as it comes in from places to consistently validate it, plug it into models, and monitor and update it—do that whole song and dance in a manual process. But in this new world, where there's trillions and trillions of data points coming from thousands and hundreds of thousands of sources across the public domain, it becomes a non‑human problem that we need to solve for, because it's just impossible to set up a governance model that can control the flow of that information into an organization to map all the usage and requirement regulations, to standardize those assets and then to deliver them consistently with confidence to whatever they're powering.

That's really where we're at as far as an inflection point in the world of working with public external data or just lots of big data. We need to set up systems in place that take that load off of the data scientist shoulders and put it on to a system that enables them to migrate their time towards analyzing data and reporting on findings and generating value. And really, automating as much of this 80% that we can, using tools that we have available to us with smart companies like us, but also just advancements in technology that enable us to process data at a level that we otherwise wouldn't have seen in the past. This is really where a lot of organizations are spinning their gears. They're still investing on the human capital side of solving this problem, and not looking at the data science environment that's required to support valuable data science work when you hire people and bring them on to the organization.

We really like to focus here and say "What we're saying isn't we're coming in to replace your data scientists. We're coming in to make sure that they can focus on what they're best at and what they think is the sexy part of data science." Unfortunately, organizations believe that all of the 76% that I'm talking about is just part of the day‑to‑day of a data scientist, but this is really low value stuff that, although it's a requirement, isn't where really smart people should be spending their time and isn't where the business is going to get value. That's a recipe for automation and that's really what we focus on as a company.

[The slide changes. On the new slide is a quote. "Once a company's business leaders realize that one company's trash is another company's treasure, new data-driven value streams start to materialize.' – Rajana Young, VP & Global Head of Data, Cardinal Health."]

Bryan Smith: Another really cool thing we're seeing, which I'll start transitioning into, is that companies are starting to realize that the data they're generating or potentially collecting by accident is actually really valuable to the market. They're making it available through either open data initiatives, through APIs or selling it into the market as an asset. This isn't personally identifiable information about how to target people. This is information about everything from high‑level geographic things to trends on flow. All these sorts of anonymized types of information that if you look at them as a single asset, there may not be a lot of value outside of its core competency, but to Vik's point, when you start layering this stuff on top of itself with other assets that are relatable and you standardize how you can access that information, you can now start pulling value out of seemingly unrelated data sets in a way that could provide insight at scales that we otherwise wouldn't have been able to do, just looking at this with manual eyes and Excel.

[The slide changes. The new slide is titled "ThinkData Catalogue" and shows screenshots of the ThinkData interface with long lists of information. Text reads "Discover, govern, and enrich data to increase value across all parts of your organization." Below the text, three bullet points read:

"1. Discover: Get organization-wide visibility into any data asset from any source.

2. Govern: Flexible platform compliments existing workflows instead of replacing them.

3: Protect: Role-based access controls and a single source of truth ensures data protection."]

Bryan Smith: That's really a trend that we're seeing, and that's what we're ultimately cataloguing, indexing, and making available to the market through our data catalogue. I'll go through a demo shortly, but generally speaking, what we find is a big missing piece is just tying the data suppliers to the data users at the end of the day, and providing them a way to discover data, effectively govern data in a flexible way so that we can share it with confidence, and then also protect the flow of that information. So that we can actually enable more sharing and more usage across data assets with the confidence that we need in order to pull a bunch of data together and distribute it in ways that's required across an organization. I'll dive into the demo shortly on that.

[The slide changes. The new slide is titled "How we're creating value." The subtitle reads "We help build a catalogue that taps into the value of data, no matter the source or complexity. Below it, the majority is taken up with two logos: Geotab and the Government of Canada. The Geotab logo features streamline letters in a block font and the Government of Canada logo features the word "Canada" with a small Canadian flag waving over the final "a." Below Geotab, text reads "Massive-scale fleet telematics were an untapped goldmine. Below the Government of Canada logo, text reads
Aggregation of data from 70+ sources to drive a common standard and increase insight."]

Bryan Smith: So, quickly, just to tie this whole presentation to two real world examples. We're going to use Geotab as one example, which is a fleet telematics company. They actually have a little device that they put on fleets like FedEx trucks and Canada Post trucks. It collects a ton of data on how those fleets are moving across North America. I'm also going to use a Canadian example with the Treasury Board and aggregating procurement data across 70 departments.

[The slide changes. The new slide is titled "Building new insights products." Titles in arrows label a flow chart on screen reading from left to right.

Under the arrow reading "Partner Data Insights", boxes with data sources such as airport traffic analysis, service centre metrics, weather and road impediments all shoot lines into the next step of the flow chart at the same point.

Under the arrow "Data Products" The lines all connect at a ThinkData Works logo. Three new lines flow away from the logo into new boxes: route optimization, toll road analysis and regional economic index. The three lines converge into an arrow pointing to the final step of the flow chart.

Under the arrow reading "Clients Served" sits three bullet points:

  • "Global Management Consulting Firm
  • Toll Road Tracking Commercial Traffic
  • Government — COVID Impact Studies."]

Bryan Smith: With Geotab, I think we've got this little device that sits on the bottom of the fleet truck. The obvious use case is you know where your truck is at all points in time. The not so obvious part of this process is that little device is collecting a ton of data on a lot of different points in time. Yes, we know where the car is. We know how long they wait at lights. On a centimetre by centimetre basis across the country, we know where they're trying to park and when they're trying to get gas. We know how long they're waiting at the border. All these insights start coming out of a single data asset, which isn't really the core competency of the device, per say, but it is generating a ton of value by just existing and collecting this information. Working with Geotab, we actually started pulling in all of the data they had in their database, which was basically just a massive flat table of information—not really usable. But when we started breaking it down and turning it into data products, we could start looking at how that would be valuable for the market. Rather than selling road impediments and intersections and parking data, we would tie those together into a route optimization product and bring that to market with toll road tracking commercials or a global consulting firm that was working on efficiencies with FedEx, as an example. That's a really cool concept because we're getting one level above the data assets and turning it into a product that's a bit more digestible to the end users. It's also looking at data in a slightly different way. That wasn't really possible if we just looked at data as that gigantic flat table that was just sitting in the warehouse. As soon as we deployed our platform and we started building assets out of that master file, it became clearer and clearer how we could put the Lego pieces back in place to build these new products and bring them to market.

I think it's a really cool example of how Geotab may have had control over their data asset, but they didn't have the governance figured out in order to see the wider picture and how they could turn a data asset—to use fixed language—from a one plus one equals two to a one plus one equals 11 by turning it into data products. That's really the value we're trying to get at with the governance. It's the value we're trying to get at with the visibility of the asset, and the value we're getting at as delivering it to a client in an effective way. A global consulting firm wouldn't have been able to buy a route‑optimized data product from Geotab in their old model. Geotab would have had to give them access to their entire data base and the consulting firm would have had to go through the process and build everything. What we are doing is adding that governance layer that can split up that process and deliver it to the users in a way where they can immediately plug it into something that's going to drive value for them.

[The slide changes. The new slide is titled "Working with TBS." The subheading reads "ThinkData partnered with Treasury Board to centralize and analyze department spend, increasing visibility into government procurement." Below it, five boxes reading "Health Canada, National Defence, Revenue Agency, Border Services," and "Real Estate" are gathered into a logo reading "Namara" by a squiggly bracket. An arrow points down from the Namara logo to text block entitled "Financial Reporting Dashboard." It has three bullet points:

  • "Cleansed and normalized data
  • Disparate date merged for holistic view
  • Custom and configurable visualization
  • Single pane, single source of truth"

On the right-hand side of the slide, a laptop shows a screenshot of the Namara interface with multiple graphs.]

Bryan Smith: Another quick example is working with the Treasury Board. This is a project we did a few years ago, and I think another really good example that pulled together procurement data and everything above $50,000 from 70 or so different departments across the government. I think this is a really good example of how, again, using a proper governance platform and pulling like‑minded data from similar sources. Obviously, every department had the same type of procurement data that was being released. How that data was actually organized, collected, and released was all different. It was really difficult to find out how much does the Government of Canada spend on IBM on a yearly basis? That was actually a really difficult question because IBM is recorded in a bunch of different ways. They have about 36 different ways of recording their name in databases. We won't talk about whether they do that on purpose or not. Once we actually started tying all that stuff together, we were able to launch dashboards for Treasury Board to look at overall government spend. This project focussed really specifically on the IT portfolio and how we could consolidate licences across the government.

Once we did this, companies like OpenText, Cisco, and HP started putting their hands up and saying, "hey, if you guys can actually consolidate your licences, it'll be easier for us and we're happy to cut a deal for the government." With proper governance, pulling in data, building that standardization, putting it into a format that's visual, I think led to an outcome where both parties were happy to come to the table and discuss how to better deliver a service for an organization. The data was always available to the government, but that whole process of turning it into insight is really unlocked through the process of properly governing the data and delivering it to an organization.

[The slide changes. The new one is titled 'Organizing data with intent." Its subheading reads "Deploying a data catalogue lets organizations curate, collect, transform and normalize all their data in a single environment, increasing transparency and governance while reducing operational overhead and expediting insight and analysis." A graphic shows "Data Sources" reading: "Secure Internal Data, Transformed Internal Data, Public Data," and "Third-party Data" leading into a computer labelled "Data catalogue." Below the computer, bullets read:

  • "A collection of valuable data assets across the organization
  • Single place to categorize, query, monitor and maintain data
  • Operationalizes data that was previously inaccessible to a large percent of your organization"

Arrows separate and flow away from the computer into boxes under the label "Performance Gains." The boxes read "Single Source of Truth, Increased Data Visibility, Enable Data Quality," and "Better Data Governance".]

Bryan Smith: On that note, I think that's really what we do with the catalogue. It's really organizing data with intent, whether we're pulling internal data, external data, public data or third party data. Really, we want to pull that through a standard view that can be shared across the organization in whatever way makes sense so that we can drive performance gains across the data, creating a single source of truth, adding visibility, enabling a lot more data quality, and just all around better data governance, which really unlocks these opportunities.

[The slide changes. The new slide features a circle and an abstract scribble. The only word on the slide is "Demo." Bryan exits the presentation mode, and moves to another tab in his browser, landing on a page reading "Namara marketplace. Use Better Data." A search bar is prominently featured om the middle of the page. As Bryan speaks, he scrolls down the page, passing lists of topics, data providers and available data, all with clickable badges.]

On that note, I'll dive into the demo portion of it, which I think is what everyone's looking for. Before I get into the actual platform itself, I believe the CSPS team sent around a few links from our side. One of the links was to our public marketplace. If anyone on the conference wants to go click on that link, this is an open marketplace that anyone can come and browse. What we do publicly, just out of collective good, to provide a place where people can just start doing data discovery in a way that makes sense for their organization rather than spending a lot of their time scraping Google to figure out where and what data is available. We've put a lot of effort, since we started the company, into organizing data in a way that makes the most sense. How we've evolved over the past few years is really breaking the types of data down into a few different categories. There's open data providers, which I think everyone should be familiar with, especially the Government of Canada's great open‑data portal. What we find is there's also this concept of public data, that is data that is available through the public domain but isn't necessarily released through an open‑data portal. Something like the Toronto Police Force, they release data, but not necessarily through an open‑data portal, but it's still public information, so we call that public data.

Then finally, there's data providers. These are companies that are actually making data available through a handful of different means. That could be through an API. They could have a website where they make it available for download, or we've set up partnerships with organizations so that they can start releasing data into the public domain.

[Bryan clicks on a badge with a beehive icon, and is taken to a profile page for BuzzBuzzHome. The page is segmented into a boxes with identifying info about the company, related sources, a left-hand sidebar with search filters, and a search bar above a list of dataset files.]

Bryan Smith: All of these are actually live data sets that if you do click through our marketplace, you'll see a description, some general information on the data set itself, and also look at the actual assets that are available. There's that way of click and browse.

[Bryan clicks the Namara marketplace logo, and is taken back to the home page. He scrolls to a badge list entitled "Demographics and Lifestyle."]

Bryan Smith: What we're trying to do is just tie all those different assets together into a common way, where we connect and get access. The other thing that we're pushing is an exercise to start tying together like‑minded data into products that we can start plugging people into. If we click into Market Development Index here, we're seeing that in the US, this data is being pulled in multiple different ways.

[Bryan clicks on the badge, and is taken to the dataset page for "Market Development Index (US)".]

Bryan Smith: In order to get a full census view of the US, we actually have to plug into about 50 or so different types of data and standardize it. What we've done here is pulled all the data, standardized it together, and we make it available through a standard product. These are things that we've built ourselves, but they're our side projects that are just helping to enable data access in a more efficient way, because at the end of the day, the value of stitching this stuff together is what everyone's looking for, and rather than offloading that onto data science teams, we're solving that through the platform and serving it up as a standard product.

[Bryan heads back to the home page and types into the search bar. He's taken to a page listing an array of datasets.]

Bryan Smith: The other thing we can do here is just general searches. If we want to search something like environment, it's really amazing to hear what Vik is working on with the Canadian assets ourselves, but let's put ourselves in a position, maybe a couple of years from now, where Vik's team has been so successful in Canada that they're looking at adding other jurisdictions around the world and becoming a true world leader in this concept of releasing geospatial and environment data. What Vik's team could start doing is looking at pulling in all of these other data assets from different regions around the world. We cover all of these areas now. What we can do through our platform is really start connecting to the stuff, pulling it in, ingesting it, and transforming it in a way that would match how Vik's team has mapped all of the assets that they own in Canada, so that if they are pulling in a data set from California and the data is really relatable. As we pull it through the platform, we can also transform it to match the Canadian data and we can layer it directly on top of whatever they're building internally.

This is really the first step of data discovery. The goal here is just to cut down the amount of time companies are spending searching for data and also giving them a new view into what is available, especially with all the new companies that are releasing data. Following this step, after we identify a bunch of data assets that would be valuable, we really move out of that discovery phase into the data governance phase.

[Bryan moves to a different tab in his browser. It's another Namara page titled "organization data." A subheading reads "These are all the datasets that are uploaded and shared to the organization." Below the subheading, a list shows datasets and some metadata. The left-hand sidebar shows data viewing options reading "Data, Topics, Members, Groups" and "Imports."]

That really gets unlocked when we start looking at the data governance platform or the data catalogue that we have, and call Namara, here at ThinkData Works. This is just a small sample. I've pulled in some trade data samples. Ultimately, this would be the internal platform that an organization would use to organize data. They would use it to ingest data and actually load it into a centralized place. We would categorize it here. We would share it. All the governance concepts would reside in this sort of view. I've already uploaded a few data sets here just to make the demo a bit more streamlined, but if we were to click into one of these...

[Bryan selects a dataset named "US Importers — Macro Data set (Dec 2019) and a table full of data segmented into metadata categories appears.]

Bryan Smith: ...we actually have the data set here that we can do a bunch of stuff related to and really validate the data. But the goal here is really to add sort of things like...Looking at this as a master file that we want to share across an organization, we want to be able to use this as a way to decide what views we want to share across the organization.

[Bryan selects a data box and three small icons appear in it. He selects the filter icon, and a menu drops down from his selection. He quickly picks an option and the data re-organizes in the table. He moves the cursor over to a dropdown menu on the top right of the table.]

Bryan Smith: I just did a filter for the Gap in this particular case, which is US importers. I can also limit the number of rows. If we look at the columns here, we've got a bunch of stuff related to this particular shipment. Let's say a supplier name has PII data in it and we don't actually want to share that with people. We also don't want to share an address because that's information that, from a governance perspective, we're actually not allowed to share across the organization.

[Bryan unchecks two corresponding boxes from the dropdown menu and presses a button marked "apply." The table re-organizes.]

Bryan Smith: Instead of opening up this file in Excel or downloading it, what we are really doing is working up this master file that we've created a connection to and we're pulling from this particular source. We're limiting the view to the Gap and we're limiting the columns by eliminating those two potential PII columns that we need to protect. As we can see here, we've got 16 of 18 columns available, and with the Gap, we've got, in this particular sample, about 1900 of 4300 rows that can be shared. Now, I've only got two people in this organization, so it's not going to let me share, but I can then share this particular view with anyone else in my organization for whatever they're doing. Rather than having to ship an entire data set from person to person or worse, export the data set, download it into Excel, create a version of it, and then ship it off, I can control the flow of this information from a centralized point, control the view that people have of this data set, and really just start managing data centrally. Now I've got one point of entry into my entire org for a single data set, and I can control who is plugging into what across the org, which makes data management really easy and really scalable.

[Above the table, Bryan selects a tab within the data set labelled "Dataspec." It shows large empty fields with data metrics.]

Bryan Smith: We also have a data spec feature built in. This is where we generate all the stats on the data and the runs. I just pulled this data in now, so it doesn't have any information, unfortunately. If the data is updating on a daily basis, we'll be able to see the general stats that manage, monitor, and maintain the data set. From a high level, rather than having to go into the data set itself, I can look at data spec to ensure that that data is flowing consistently and flowing the same way that I'm used to seeing it so that my models are always going to react to it in the same way and I don't have to dive in and fix anything or flag anything for the organization.

[Bryan selects "Topics" from the left-hand side bar. On the "topics page" a single dataset sits in a searchable list.]

Bryan Smith: The other thing we can do, which, again I'll wrap on what Vik was saying, is how do we add business or ROI visibility to data sets? And how do we then have educated discussions about what data is valuable to the organization and what should be prioritized? Let's say we're using this catalogue as a way to collect all of our assets. Another thing we could do is start creating topics.

[Bryan selects a button in the corner labelled "Create topic." A dialogue box pops up and Bryan types into a field within it.]

Bryan Smith: I could put "Vik's strategic project" here. We'll call it number one, because I think Vik has a handful of strategic projects on the go. What I've done here is really created a space where I can start cataloguing data.

[As he speaks, Bryan navigates back to the dataset and clicks a button labelled "classify." A sidebar pops up on the right, showing the organization, and a topics list with a search function. Bryan clicks on a box beside "Vik's Strategic Project #1." He navigates back to the Topics page and selects "Vik's Strategic Project #1", below the title, the dataset appears.]

Bryan Smith: If I go back to data, let's assume US importers is one that we actually want to collect in this particular case. I can now classify it, add it to Vik's strategic project, and now I'm organized from a business ROI perspective, where I can actually just start looking at this stuff from topics and say, hey, we're an organization that's working off of thousands and thousands of data sets, but now let's look at our assets from a particular business value or an ROI or project value, and let's make decisions on a high level of which ones we need to prioritize, which ones we need to focus on, and how we share those across the organization. Lots of different ways that we can do governance within this platform, but all really dynamic and really customizable, which is what we were trying to go for.

[Neil and Vik's video panels pop onto the far right side of the screen below Bryan.]

Neil Bouwer: Bryan, I just have to say, this is so cool because I think you're showing us a single source of truth. There's one place for data that everybody's accessing. You're really tailoring that for individual teams. I would love to be a public service leader and know the data assets I have available to me in this way. So anyway, I just have to say that this is great stuff.

Bryan Smith: Yeah, thanks, Neil. I'll wrap up really quickly just with the last thing, and, again, Vik brought this up and I think it's a really important point, is getting data and organizing it is two of three pillars. The last one is really the enrichment, or the monetization, or the time to insight, I guess I would call it. Just giving people access and just giving people the ability to organize data, yeah, there's a lot of value there, but there's still a lot of ways that we have to then take this data and plug it into a model or plug it into a dashboard.

[Bryan clicks a button labelled "connect to API" and a sidebar pops up, showing different programming language options.]

Bryan Smith: What we're doing through this platform as well is making it really easy to connect directly via the API. We're introducing a bunch of different integrations where we're customizing the way that we can then pull this and plug it directly into an application or into a data science environment that an organization has. The really cool thing about this is because we can interact directly with the platforms API for a particular data set.

[Bryan closes the sidebar.]

Bryan Smith: We can also better understand the data set usage across the organization as everyone is connected to this master record and that single source of truth, as Neil mentioned. The problem with downloading and sharing data is you lose visibility into what that asset is being used for. That's really bad from a policy- or, sorry, government regulation perspective because we need control of all those assets. But, it's also bad for a use case perspective as well.

Let's imagine Vik's team is plugged into 100 data sets and the business question comes down: what data are we actually using to solve problems or what data is the most valuable for training this model we're building? When we plug directly into an API from a platform like this, we have all of the metrics on the back end as to how that data is being used, where it's being used, how often it's being used, and what it's being used for. That becomes really important from a business perspective to say like, "hey, if this data asset is valuable, let's invest in it, or if we're paying for this asset, let's validate the cost of it on its actual usage metrics, not just on the qualitative concept of it existing in our ecosystem." That's, I think, where we're working with a lot of business leaders to better understand. It's not proper form to not have a governance solution in place. Yes, there's the regulation and good governance, and good corporate citizen aspect of it, which is really important, but that's the bad bucket. You don't want to get sued. You don't want to do anything offside. You don't want to give away people's information.

There's also a positive side to data governance, which is we can actually increase visibility and confidence of data usage across your organization, and we can very easily generate metrics that validate investment. I think when we look at ROI from a business perspective, not from a data science perspective, it's really important to look at budget confidence and good corporate governance. That, I think, is the other side of the equation that us, as engineers and data scientists, we often get stuck just talking about the value prop for the data scientist. For those on the business side, it's also important that the stuff makes life a lot easier when validating where to spend resources and what to buy and what to focus on. On that note, I'll stop there and I think we're going to do a question and answer period. So, happy to dive into anything in any more detail.

[Bryan's screenshare disappears, and the three video panels take over the screen.]

Neil Bouwer: For sure. First of all, thanks, Bryan. That was an absolutely amazing presentation, a great demo, really interesting solution. Kind of a dream come true, I think, for a lot of people who want to understand their data assets and they want to try to develop these use cases or questions about how to use the data and then actually create that value. I have a question for both of you just from the get go. You've each spoken a bit about sharing data outside of government, but of course, government's a big, complex organization and I think a lot of folks would also like to understand the extent to which this platform and things like it and the value proposition of sharing data between government departments or between parts of a government department. Let's face it, sometimes these data are hidden away. Maybe Vik and then Bryan, just asking you any thoughts you have on, first of all, the value of sharing data internally and also what you think about this solution as analogous to what departments need to do internally.

Vik Pant: Thank you. Thank you, Neil, and thank you, Bryan, for a very illuminating demo. It's always tricky to do a live demo. I really applaud you for having shown us the full spectrum of what your system can do. I suppose it's actually not the full spectrum of what the system can do, but great demo. Thank you. Neil, absolutely. I think both Bryan and I spoke about the importance of data being this very peculiar asset where it's got multiplicative and exponential‑type gains when you start putting it together with other data.

[Vik's panel fills the screen. A purple text box in the bottom left corner identifies him: Vik Pant, Natural Resources Canada.]

Vik Pant: This is not necessarily the case with many other assets or resources when you put them together. You may have some linear or even sublinear additive improvements, however, you're measuring the outcome of the performance of what it is you're using that asset for. I think from a very intuitive perspective, sharing data is great simply because you put different kinds of data sets together. From an intentional perspective, if you know that data set B coupled with data set A would lead to better insights on the model which you train compared to perhaps just data set A or data set B alone. I think it's also the unknown unknowns. Very often you have these latent or hidden relationships within data sets that when you put them together, it's only when you actually start to train models and do different tuning, and calibration, and optimization that you start to unlock some of that hidden value. I think at an abstract level, there is certainly that value of data.

I think from a very practical perspective, data sharing is also important because I think there is now an awareness within the government. We know we have Chief Data Officers. We have very data savvy Chief Information Officers. Our ADMs and leaders throughout the government are very astutely aware of what the data can do for them to advance, even in a non‑technical sense, the policy priorities and objectives they deal with, Neil. I think there is this openness and there is a receptivity to want to have conversations about data sharing, and knowledge co‑creation, and collaborative learning, and transfer learning, and things like that. I think another part of looking at this is, I really like how Bryan touched on the importance of governance because as we know, data being a value-laden asset with amazing propositions in terms of benefit creation and game creation, that it's not as simple as just folks, you know, Neil, you and I talking and saying, "Hey, Neil, you've got a data set A. I've got a data set B. Can we do a swap?" As Bryan touched on, there are legal things we have to think about it as well. There are rules. There are regulations. There are codes of conduct on how you can use data, what you can use it for, how you can transfer it, how you cannot transfer it. This is also where I think, within the government, we have very good legal expertise and in some cases some of these relationships might be the first time that a department is doing it, so they might have to involve their legal team. Other departments might already be quite advanced in terms of how they bring and share data from multiple different sources. They might be more comfortable in exchanging data and transferring data. The value prop is there. I think the awareness of the value prop of data sharing and knowledge exchange are there, and now, it's simply a matter of operationalizing it and making it practical. Even there, as I mentioned, Neil, are different departments, are different degrees of, shall we say, maturity in terms of being able to manage multi-party data‑trust‑type relationships. I think this is a movement that's already taken off within the government, and we're only going to continue to see adoption of such tools as a very important pillar in advancing forward in that combined journey.

Neil Bouwer: For sure. Bryan, how about you? Have you seen use cases within organizations as well?

[The screen returns to three video panels, then Bryan's panel fills the screen.]

Bryan Smith: Yeah. I think from our perspective, it's interesting because you think the more you share, the less control you have, and what we're finding with proper governance is it's the exact opposite, actually. That's what we're working with a lot of organizations on, which is in order to get to where we all know we need to get to with data science, we know there's a ton of data out there. We know it can solve all our problems, and we know it's too big for us to handle it right now.

[A purple text box in the bottom left corner identifies him: Bryan Smith, ThinkData.]

A key to that is we have to share it, because the collective intelligence is always bigger than someone's singular intelligence. In order to unlock that, we need to share data, and we need to do that in a secure way, which isn't you've got A, I've got B, let's go do this on a local device and no one else knows that we're doing it. With a proper governance model in place, that sharing becomes a lot easier, a lot more universal, and a lot more secure. It's crazy to think that sharing it more makes it more secure, but it does, at the end of the day. It's a weird thing, but if you get the right model underneath, that's the real value of what we're pitching from a governance perspective. You need to get it to a place where you're controlling the tip‑to‑tail delivery of the data from source to end. You need to collect all the meta‑information throughout that process. That sets you up for success.

I just see a question coming up of why do this, rather than just query through Excel or Power BI? I would say that you do that stuff after this process. You set up the sharing properly, and then you integrate with our Excel, or you integrate with our Power BI API because you're still collecting all the information and you're setting yourself up for proper management of the data set. If a source changes their licensing terms, you're catching that up front rather than finding out three months later that you've been offside, delivering something on a data set that you're not allowed to use anymore. There's all of these things that need to be in place that really support that growth of sharing. Once you have that stuff and you're confident that you're collecting it all, you can share with anyone you want because if a red flag gets raised anywhere, you can close all of your channels down. That's a lot easier than trying to remember how many times you zipped this file, made copies of it, who you sent it to, and who you have to go send a really bad email to to say, "please tell me you're not using that in a production environment." That's really the power sharing from our perspective.

[The three video panels return.]

Neil Bouwer: Sounds great. Sounds like controlling a water pipe instead of bringing buckets of water. A few quick questions for you, Bryan, based on your demo. Are there data sets on the Namara marketplace that are free and accessible to the Canadian public?

Bryan Smith: Yeah, so all of the open data is free and available for everyone who uses the platform. We don't monetize on open data. As a public good service, we pull it together to make it easier to find for people. The data providers- so when we're working with companies, there's varying terms associated with it. Some people sell it, some people make it available through an API with terms of service, some is for commercial and some is not. We've collected all that information on the platform and you can find it as you click through. High level, there is large enough samples available for anyone to test any model they're trying to build, which we always push our providers to make available so that we can just enable people to move forward with using more data.

Neil Bouwer: Amazing. Are you using the Open Geospatial Consortium Standards for geospatial data? That's another question.

Bryan Smith: Good question. We are using a global standard. I don't want to say we're using this one specific. But we do, instead of making our own standards, we standardize to whatever the global consortium is. That's a cautious yes, because it's our general...

Neil Bouwer: Philosophy.

Bryan Smith: Philosophy, yeah.

Neil Bouwer: Got it. Are you using an SQL or NoSQL database for the platform?

Bryan Smith: It's SQL and the entire platform is on a query language that's SQL based. Basically, any query that's similar to SQL, you could do any of the same functions on the platform to access the data or pull it.

Neil Bouwer: Okay. Amazing. There are more questions, but we're going to have to wrap it up here. Maybe I could just ask you each to give a closing thought. We've got public servants on the line, about 1600 of them, who have checked out this demo, who are probably thinking about data in their organization. Have you got any advice as we enter this age of data explosion and data platforms and data and analytics? Have you got a piece of advice that you'd like to leave the participants with? Maybe, Vik, I'll start with you and end with Bryan.

[Neil's panel fills the screen as he nods along with Vik. Vik's panel replaces it.]

Vik Pant: Thank you. Thank you, Neil. That's a great question. I look at that last question about vehicles for procuring the great solution that Bryan showed us. One thing I'd like to say is certainly something we do in the Natural Resources Canada Digital Accelerator is think about new models of partnership. When you look at ThinkData Works, certainly there is software that they bring to the table and there is data they bring to the table but think about the expertise that Bryan brings to the table, his team of data scientists bring to the table. Think about all of the connections they bring to the table with them as well. When we at Natural Resources Canada partner with firms such as ThinkData Works, we don't want it to be transactional. We truly want it to be, as Bryan said, strategic. We want it to be relational. Of course, there is software in exchange for the compensation for that software, but what's important for us is not just the bits. What we're interested in is co‑creating collaboratively, co‑developing synergistic complementary solutions that lead to win‑win outcomes for us and for all of our partners. Thank you.

[All three panels return to the screen.]

Neil Bouwer: Amazing. Thanks. Bryan.

[Bryan's video panel fills the screen.]

Bryan Smith: Yes. I'll say the same thing that Vik is saying. We're not here to push product sales. We're here to push development in this movement. We're a Canadian company that started on the back of the Canadian Open Data Movement. I think we're one of the companies that invested really early in it. We're seeing the value of that. That was only possible because the Government of Canada made all that stuff available and led the way. We're on a bunch of standing offer lists and all that sort of stuff. All the typical procurement channels are open. But, what we're more interested in is finding these opportunities to test pulling data together, because at the end of the day, we'd rather someone buy a solution to a problem instead of buying a piece of technology to add to an environment. That sets all of us up for success.

What I would say is, if you have an opportunity, come talk to us. There's ways that we can get this spun up and there's ways that we'll figure it out. This isn't a show up and you have to pay before you can get anything. We have deployments already available in the Government of Canada that can be tested and validated. Hopefully we can work with Natural Resources and plug this into what they're working on as well, which I think would be a really great use case. Don't hesitate, reach out. We're here to collaborate, not be a typical vendor for sure.

[All three panels return to the screen.]

Neil Bouwer: Amazing. Thank you so much to both of you for your remarks and for the demo, Bryan, to really put it in practical terms for us. We're very lucky to have you and thanks to everyone for joining.

[Neil sets his Zoom background to a Government of Canada webpage entitled "The Trust Series: The Evolving Roles of Public Institutions." His video panel fills the screen. As he speaks, Neil navigates through various event pages.]

Neil Bouwer: Before I end, I just want to make a little commercial for a few upcoming events that you'll see at the Canada School of Public Service that are data related. One is this event on trust. It is on May 3rd and it features a discussion from Edelman Canada, on the trust index—very relevant to Canada. We're continuing with Data Demo Week. Tomorrow, we have a presentation on Regulatory Artificial Intelligence use cases. There's a few demos there, including one by Transport as well as from interdepartmental use cases. We also have MindBridge, which is a company specializing in financial data analytics coming in on Friday. That's going to be a great session. The same day, we have an event on Rules as Code, which is the idea of publishing regulatory and other requirements in the form of rules as code, starring our very own Pia Andrews from ESDC, as well as great speakers from Code for Canada and the Government of France. Thank you to everyone for joining us today. Thanks for sticking with us and that really fascinating presentation. We hope to see you at the Data Demo Week events tomorrow and Friday. There's also a meetup up for the data community on Friday after the data demo event. You can find details for that on GCconnex. We hope to see you at those events. Take care and have a great day.

[All three video panels fill the screen. Neil smiles, and Bryan and Vik wave. The Zoom call fades out. The animated white Canada School of Public Service logo appears on a purple background. Its pages turn, closing it like a book. A maple leaf appears in the middle of the book that also resembles a flag with curvy lines beneath. The government of Canada Wordmark appears: the word "Canada" with a small Canadian flag waving over the final "a." The screen fades to black.]


Date modified: