The Future of Geospatial Analytics
The Project
While most of our attention is spent on the present or dwelling on the past, thinking about and predicting potential futures are a worthwhile exercise for making decisions that help assure future successes. While this is true to our daily lives, it is especially true for research, the sciences, and academics.
In this multi-part series, Rob Dunn hosts conversations with leaders in several fields, tasking them to see into the future – 10, 25, 50, 100 years – in an effort to help shape decisions that steer their fields of research to more fruitful scenarios for students, researchers, and stakeholders alike.
Geospatial Analytics
The following conversation was facilitated and edited by Megan Skrip, science communicator for NC State’s Center for Geospatial Analytics.
The conversation included:
- Aaron Hipp –– Associate Director, Center for Geospatial Analytics; Associate Professor, Dept. of Parks, Recreation and Tourism Management
- Josh Gray –– Faculty Fellow, Center for Geospatial Analytics; Associate Professor, Dept. of Forestry and Environmental Resources
- Ross Meentemeyer –– Director, Center for Geospatial Analytics; Goodnight Distinguished Professor of Geospatial Analytics, Dept. of Forestry and Environmental Resources
- Mirela Tulbure –– Faculty Fellow, Center for Geospatial Analytics; Associate Professor, Dept. of Forestry and Environmental Resources
- Chris Jones –– Research Scholar, Center for Geospatial Analytics
- Helena Mitasova –– Associate Director, Center for Geospatial Analytics; Professor, Dept. of Marine, Earth and Atmospheric Sciences
- Eric Money –– Associate Director, Center for Geospatial Analytics; Associate Teaching Professor of Geospatial Information Science & Technology
Rob Dunn (RRD): Let’s start with some background. Can you define “geospatial analytics?” What should people think about when they hear that term? I think people have a sense that “geo” means Earth, “spatial” means something about space and “analytics” must have to do with math or statistics, but walk us through how you think and the term and your field.
Aaron Hipp: Personally, I would say “world” rather than “Earth” for “geo,” because I usually think about how space and location matter for humans, human interactions and human well being––how we interact with our world, our space, and how it interacts with us. “Analytics” is that interaction piece: What are the tools being used, what are the analyses, what are the statistics, software, hardware being used? And who’s involved, who’s providing the data, how the data is being communicated?
Josh Gray: I’d say that for us, geospatial analytics means using statistics and math to come up with information, which is one step beyond statistics in my opinion. There’s a process that’s happening in space we want to understand, or spatial analysis can give us information about another problem that maybe we didn’t even think was a spatial problem.
Ross Meentemeyer: This is something I have thought about a lot, and I talk about it with our new students when they arrive, because the term “analytics” often gets misused, or used in a range of ways. The way I like to think of it is how it’s different from analysis or stats. “Analysis” you can think about as examination of patterns of data. But “analytics” is the discovery and communication of meaningful patterns in data that can be acted on, or used to make decisions––which is information, as Josh said. And the communication part, I think, is really critical for analytics.
Mirela Tulbure: I’d add that in this day and age the term “geospatial analytics,” at least to me, implies coding and big data––the volume and diversity of data.
RRD: For a little background, what have the big moments in your field been in the last few decades? What is possible now that wasn’t possible when you were students?
Ross Meentemeyer: A few decades is a long time for geospatial analytics! I always say we were “the original big data,” though––we just didn’t brand ourselves that way––if you think about all the satellite imagery coming in since the 70s.
Josh Gray: Computation has changed dramatically over the years. Recently we’ve seen distributed computing and computing as a service, allowing you to scale up things that used to require investing in large, in-house servers. Now anybody with a laptop and an Internet connection can have the world’s largest supercomputer––all you have to do is pay for it. And that’s enabled us to ask questions at even bigger scales. In my field specifically, the amount of satellite data that we have has created revolutionary new opportunities for us, just because of its abundance. Even though there’s nothing radically different about the satellites that we have now versus 20 years ago, the fact that we have so many more of them means that we get to see places much more frequently, get to see things in different ways that give us complementary information. I think that’s been really revolutionary for us. And now we’re on the cusp of a commercial satellite revolution.
Chris Jones: To build on what Josh was saying, the sheer volume of data, and the increase both in spatial and temporal resolution of the satellite data, means there’s so much more you can do with it. And there are things like Google Earth Engine that allow you to process it really fast for free, which is fantastic and something that wasn’t available not that long ago.
Mirela Tulbure: Another tremendous new development is the free availability of satellite data. Landsat data is now available back to 2008, which has allowed us to not only analyze one single image footprint as a snapshot in time but actually look at seasonal, annual and decadal changes. When I was a student, getting a Landsat image cost us two thousand dollars and it took months to get, with a lot of back and forth to make sure it wasn’t too cloudy. Now, we analyze five terabytes of data with 10,000 or more images, just at a regional scale. The free availability of Landsat data by the USGS has prompted other agencies to make their data free and open access, like the European Space Agency’s Sentinel missions, JAXA, etc. Newer and higher resolution data, like from PlanetScope, and the availability of time-series of radar data in addition to optical data allow different views of the Earth’s surface. (A paper we wrote last year has some of the key points that have happened in recent years: https://zslpublications.onlinelibrary.wiley.com/doi/full/10.1002/rse2.248)
Helena Mitasova: Another thing is the new types of sensors that are multi-dimensional and not just from satellites, but airborne sensors and on drones. That has dramatically changed things. For example, LiDAR [Light Detection And Ranging], as a routine mapping tool, has been available for about 20 years, and that completely changed mapping topography and mapping the three-dimensional structure of vegetation. Now, drones have made three-dimensional mapping available for anybody at very low cost. Another thing is citizen science––citizens being able to collect data through cell phones and also data about their own physiology by carrying the sensors themselves. You can measure not only what’s happening when you are sitting in the doctor’s office, but also when you are moving around. There have been many, many developments like this. The satellite revolution is probably, in terms of volume of data, the biggest, but there are many other ones on the ground. Remote collaboration, and remote interaction with data, has also changed quite a bit, and that was not possible a couple decades ago. The networks were just not fast enough. So, networks––that’s really another big thing.
Chris Jones: To feed off of Helena’s point, you have things now like Google Street View and Instagram, all these other sources of pictures, which you can analyze and pull out the geographic coordinates and determine what’s there. That’s just another massive source of imagery data that is relatively untapped.
Aaron Hipp: Definitely wearables and personal data have been big, so I’m just doubling down on what Helena said about the wearables. I’ll also double down on Google Maps, for the ease of doing things like street audits to understand things like crosswalks and sidewalks and street trees. Something else I don’t think has been explicitly mentioned yet is GPS and how anyone outside of our Center knows about Google Maps or GPS. Everyone uses GPS and the maps in their car or on their phone to navigate now, and I think that has just exploded in the past 20 years.
Ross Meentemeyer: To triple down on what Aaron said about GPS, I think that’s a really important point. I just think about when I taught a course on GPS 20 years ago using Trimble XT units, explaining the science behind the technology and then using the GPS units in class…and, well, just think about how easy it is now! You’re getting the data, and you don’t even think about it. So, that has been revolutionary, I agree. And it supports the citizen science that Helena was talking about.
Aaron Hipp: And I think in the social science and health fields, the adage that “zip code is more important than genetic code” is pretty commonly understood now. That’s bringing space into the equation as an important driver of outcomes and behaviors. Also, just the general amount of data that’s available and accessible and open source. And the sheer number of fields that are using geospatial data has expanded. I think about where our students are able to get jobs, and where they’re coming from. Computer vision, augmented reality, virtual reality are all coming as well.
Eric Money: I feel like a lot of what we’re talking about has happened in the last 10 years. GPS is a really good example of seminal things that have transitioned our field over the past 20, 30, 40 years. And related to that is the launching of satellites, the move to cloud computing, the move from paper maps to digital maps.
Ross Meentemeyer: Right. There’s just been this explosion in the past 10 years of these things that we’ve talked about. It has been gradual, but there has been a big punctuation recently for sure.
Helena Mitasova: I had been drawing maps with computers when I was a student, and that was a very long time ago. So, digital has been around for decades, and its progression has been gradual.
RRD: That is a lot of change in twenty or thirty years. I want to turn now to considering the future. In my experience, scholars are relatively uncomfortable talking about the far future. There is, of course, so much uncertainty. But I think these conversations are important because we are laying the brickwork for the future already. We are training students NOW who will hit mid-career in 2050. We are planning buildings that might still be in operation in 3050. We are making decisions about what to study or not study that shine light on some phenomena and ignore others. So it behooves us to spend some time thinking about potential future scenarios and which of those scenarios we prefer and which we want to avoid. So, with that preamble, tell me a little bit about what you think work in geospatial analytics will look like in 2030 (we’ll get to the farther future)?
Josh Gray: So, I looked at when the commercial satellite company Planet was founded, because I think that’s been the most recent really game-changing thing for us. They started in 2010. That’s about the time horizon then, say, from ideation to company creation to operating a swarm of nearly 300 satellites and selling those data around the world––that happened in about 12 years. I don’t know what’s going to happen in 10 years, but you know, the seeds are being planted right now. I wish I knew what those cool things were, because I’d invest!
Ross Meentemeyer: Well, I remember 10, even 15 years ago, the visionaries said by the early 2020s GIS would be dead, that we wouldn’t have the ArcGIS-type bench computer things. So in some ways they’re wrong, but in some ways they’re right, that we’re moving towards everything being distributed through the cloud.
Helena Mitasova: But it is still ArcGIS, you know. [Laughs] It’s online, but the operations and the functionality––it’s pretty close. It’s more that the interaction and the access to data has changed dramatically. They said that “GIS is dead” in the year 2000 at the GIScience Conference, and at that time it was supposed to be replaced by “location technology,” whatever that means. Nobody really knew what that meant. So, it is really hard to predict, but Josh is right, the seeds are probably being planted now.
Chris Jones: To feed off of Josh’s point about Planet, there are a bunch of other companies that are popping up that just sell spatial data. There’s a whole industry around that now, where their entire business model is selling spatial data. Just the fact that that’s a thing, and it wasn’t 10 years ago, has really changed a lot, in terms of opportunities people have when they go to work in geospatial analytics.
Helena Mitasova: And I think what we may see is better use of these data. Now we are using, say, 10% of collected data; so maybe if appropriate tools are developed and the computational capabilities are there, we may be using maybe 50% of the data. But again that is a big, big question.
Ross Meentemeyer: One thing that comes to my mind is something that could seem like the dark side to us, but maybe the bright side to some people in industry. Uber or Lyft, for example––they have custom platforms that are collecting huge amounts of data, not necessarily to make money from taking people from location A to B, but to understand why they’re going from location A to B and to profit on that. So I think that is a type of future for geospatial analytics, in that there’s a huge commercialization of it.
Josh Gray: These are things that happened in the last 10 years too––Uber and Opendoor and things like that. Those are all spatial analytics sort of things that didn’t exist 10 or 15 years ago, at least in the way they do now, that have just revolutionized whole industries. I expect there’ll be something like that in the future. And it’ll probably involve machine learning. I think we’re all going to be more comfortable 10 years from now with having machines do certain things for us. We’re going to be more comfortable even maybe in research with having machines point the direction that we should go, but I think that will be incremental.
Ross Meentemeyer: Maybe one of the best examples is that geospatial analytics will be driving cars. We don’t have anybody doing that work in our group, but self-driving cars––that is geospatial analytics.
Eric Money: I do think there are distinctions to be made between when we say “GIS” and when we say “geospatial analytics.” There are some differences if we look through the lens of GIS and how that’s maybe changed, and maybe it’s sort of transitioned into being more about geospatial analytics than at least how GIS has been perceived in the past, which was a software tool to make maps. It’s expanding beyond that definition. But I’m also thinking about, what does work in geospatial analytics look like in the future? Putting on my education hat from the master’s program perspective and the students that we’re graduating in that program [professional master’s in Geospatial Information Science and Technology; MGIST], they do quite different things than, say, someone in the Ph.D. program in Geospatial Analytics. And I see work for them continuing to be more IT, information technology relevant things. Obviously I think analytics and data science will flow into those types of of work situations as well, but I think it’s important to make that distinction between GIS and geospatial analytics, because this is one thing that I talk about in my GIS 501 class––I actually have students read an article about the “death of GIS” and the future of GIS and have them write a reflection on how they perceive that sort of statement and how they view GIS from all these different perspectives. Is it just a software tool or has it become more of a field in itself in terms of the development of new technologies that integrate into it, the combination of people and the system and the IT? It’s going back to what we call our Center. It’s not the “Center for GIS,” and that was a decision that was consciously made, not to use “GIS.”
Aaron Hipp: I think the instruction piece definitely matters, and how the instruction is delivered, not just the content. The delivery will be crucial for the next 10 years and important to the Center and the message I think that we want out there.
Ross Meentemeyer: If the question is whether or not to really distinguish between “GIS” and “geospatial analytics,” I think that’s relevant to where have we come from, because we have come from kind of a GIS software, workbench type environment to what we call “geospatial analytics” now, which is potentially more flexible, open, but can be commercialized too. But noting, as Eric said, that GIS has not gone away.
Aaron Hipp: I think there’s going to be further reaction over the next 10 years to ownership and privacy of the amount of personal data that’s being used. I think Western Europe has started that, and I think others will just continue to seek that too. I am a user of that cell phone and wearable data, but I also expect that people will gradually understand more of what all they’re providing for free and there will be some political efforts and advocacy efforts to take some of that ownership back. I also hope there’s greater data literacy and that people understand better geospatial analytics but also their own geospatial data and also how to use it. I had self-driving cars on my mind as well. And another thing I think that’s kind of out there is the metaverse. I think there’s already a geospatial piece to the metaverse, and I don’t fully understand it, but I think that’s 10 years out there, that as we’re creating these alternate realities, there are people interacting, there’s commerce, and space is going to be a part of that, and I think there’s going to be some geospatial analytics involved in that space.
Ross Meentemeyer: You got me thinking about some more things, Aaron. I’d love to think in 10 years we would be doing even more in geospatial analytics to expose social injustices. As you mentioned earlier, zip code is more important than genetic code, and through place, region, location, we can understand a lot of things better and make a difference.
Mirela Tulbure: I think big data will continue to be bigger and bigger and thus it will be impossible to look at all of it, so the importance of machine learning is only going to grow.
RRD: Are there technologies out there already at the interface of geospatial analytics (and its connections with big data and machine learning algorithms) that you think would surprise people?
Eric Money: Perhaps how social media is something that uses geospatial analytics behind the scenes all the time. There’s more out there now in general about how people are using your data, but the use of that data is widespread.
Ross Meenetemeyer: I think it surprised all of us when we learned that Uber’s business model was not to profit on just moving people around; it was to understand why people are going from location A to B, and that data was worth billions, that the company is collecting data.
Eric Money: And this has been around for awhile, it’s not new, but military intelligence uses geospatial analytics all the time. I think that does surprise a lot of people. I know it surprises students when we talk about it. They don’t necessarily realize the pervasiveness of it.
Josh Gray: I imagine there are applications that would surprise all of us, particularly around the way that our personal location data are being used.
Ross Meentemeyer: I know people know that companies have those data, but they don’t really know how they’re using them as a collective. You know, the company doesn’t necessarily care about an individual but rather the collective patterns from the data, and what they’re doing with them I don’t think people understand.
Josh Gray: It’s possible that they’ve not yet delivered on that promise, right? They could still be in the stage where it’s like, “These data are really valuable, we need to hoover them all up and then magic happens.” But there’s a lot of heavy lifting involved in connecting the dots to make the magic come out the other end of this increasing swamp of data. So, it wouldn’t surprise me if there weren’t a lot of those things yet. But we can expect those companies to try to make good on that in the next decade, having invested so much in harvesting the data. If they’re not to that stage yet, I anticipate we’ll see a lot more of this sort of thing. You know, maybe you’ll get advertisements that’ll pop up on your phone when you’re close to a particular store; that’s been talked about for a decade. It’s clearly possible. I don’t know why it hasn’t happened yet. I think that using those data to suggest things to people, to start getting suggestions at just the right time will start to happen more and more.
Aaron Hipp: I think all of our work would surprise most people, right? I think we’re in such an echo chamber within the Center and within the university. I think the average reader of even, say, The News and Observer, would be really surprised and amazed at the collective work that the Center, the staff, the students, the Faculty Fellows are doing, from Tangible Landscape to drones. Even knowing about Planet and its 300 satellites up there––I bet 85%, 90% of the US isn’t aware of that.
Josh Gray: It’s true. There are a lot of findings that are pretty interesting that are always surprising to the general population. I mean we had a storyline the past couple years that remote sensing is used to detect human enslavement, and it’s because you can remotely sense brick kilns, and in this region of the world brick kilns were almost 100% associated with enslaved peoples doing the labor. So it’s probably surprising that we could remotely sense modern slavery. Other findings like that, like ship locations––we can find illegal fishing and harvest with location data from ships. I guess eight years ago, trying to explain to my father-in-law what I did, he was surprised that we couldn’t dial up Google Earth and see if the people he’d hired to clear some trees on his property had done it. And now fast forward eight years, I can show him that now. That is possible, and I think he’ll be surprised to see that. Aaron, you’re right––it’s the water we swim in, so it’s a little hard to appreciate it all the time.
RRD: If computing power increases (costs decrease), say, tenfold, what might be possible that isn’t possible now?
Josh Gray: In my world, it’s either you can look at longer periods of time or much larger geographies or higher spatial resolution or something like that. So the stuff that we used to only be able to do with pixels that are 500 meters scale for the globe, now we can do it at 30 meters scale for the globe. And another tenfold increase on that will allow us to do it at three-meter resolution for the globe, and there’s a lot of questions you can answer with three-meter resolution data that you can’t with coarser resolution data. So that’s kind of just a nuts-and-bolts, easy sort of thing that would happen, but I think there’ll be more interesting things on data synergies. If we can store all these data in the same place, and it’s cheap and easy, and they’re all already plumbed together, and we don’t have to waste a lot of time manifolding the different formats together just to get to data streams in the same place––if we couple that computing power with things like Google Earth Engine, or what Planetary Computer or Microsoft have done for remote sensing, if we do that, more generally, I think that’ll be really powerful because you have all these data on hand beside the huge computers you need to do something cool with them. We’ve seen that in remote sensing by putting all this imagery up in the same place and then hooking up to big computers, but if we did that more broadly, start to bring in sensor network data and census data or whatever it might be, there will be a lot of opportunities there. And then just machine learning methodologies––they choke on horsepower; they need more, more instances and more iterations and then they’ll improve, so I think those will just continue to get bigger and better, more generally.
Helena Mitasova: What it will also allow is more real-time response, real-time predictions at higher resolution. So that means things like adaptive management, hazard management. Like when you have a hurricane, people will be able to respond much more accurately, so you won’t be evacuating the entire coast, but only those locations that will be affected. And that would be across many, many disciplines.
Eric Money: If costs decrease then it opens up the accessibility of doing geospatial analytics to more organizations, smaller organizations, nonprofits, as well.
Chris Jones: To feed off Helena’s point, you’d have a lot more power in your pocket––your phone––because it’s going to be 10 times faster. You’re talking about real-time analytics on your phone where you could upload something to a cloud database and get feedback straight away. It’s kind of already happening, but think bigger scales, like at an agricultural scale. And to feed off Josh’s point, USDA is trying to put all their data in one big thing on Azure and have all the compute go to the cloud, to do like a Google Earth Engine type thing, but with all the USDA data to make them available for the researchers. Because currently the data are all siloed and not available across units. For example, APHIS can’t get Forest Service data, etc. There are just a whole lot of hassles in terms of passing data across the agency, so thinking about that is interesting.
Megan Skrip: How does “edge computing,” when computing is done on the sensor itself, relate to this discussion question?
Aaron Hipp: One example would go with computer vision and privacy––the way it can be used is if you have cameras that are tracking, say, vehicular traffic downtown and the number who may be running a red light or turning right, to think about safety or just speed and infrastructure––the edge computing doesn’t have to store the images. It does the calculation within the CPU that’s there on the camera. So then, you’re not storing images that could be stolen or leaked or misappropriated in any way. So for our work with cameras to count pedestrians and cyclists, we like edge computing because we are avoiding having on our clouds and desktops any images that might cause some ethical or IRB challenges.
Ross Meentemeyer: It also avoids all the IO, or input-output, problems, so it speeds everything up.
Josh Gray: I think edge computing will have a lot of applications in enabling sensor networks to work too, because they monitor extremes and most of the time, the data are exactly the same. You’re monitoring, waiting for a rare event. For example, we put these security cameras out on towers in the woods to see when spring happens, but we get a picture every 15 minutes. You only need a couple of those pictures, right? And at particular times of year. You just don’t know when you need them yet. But if there were an onboard computer that can just say––for instance, Roland Kays does wildlife camera trapping. The camera can send him a picture only when something changes, but what if it could send him pictures of only, say, a particular species. Like, “Oh, this is a possum; let’s not clog the server with another possum picture. We’ll just wait for the cloud leopard or something to come through.” If we could do that on the sensor, it frees up a lot of bandwidth in just moving data around and then processing to figure out what’s there. So people are talking a lot about doing this on satellite sensors too, because that’s less information you have to beam back to Earth.
Chris Jones: The first thing that comes to my mind is Anders Huseth’s [CGA Faculty Fellow; Assistant Professor and Extension Specialist, Dept. of Entomology and Plant Pathology] work with insect traps where they use a pheromone lure and a camera to count the number of moths that come in. It keeps track of the number and it just updates weekly or daily, however often they’re collecting it. Before they had to go out and check that trap, whenever they could do it. So, in this case, you’re taking out person-hours, and just getting the data that you need instantaneously and at much faster temporal resolutions.
Megan Skrip: What do you wish could be possible with more powerful computing?
Chris Jones: Helena mentioned this, but doing completely interactive geovisualizations, like where you’re interacting with the data in real-time, getting feedback. Right now we’re doing it on fairly small scales and it takes a while to run, but if you had a 10X increase in compute, you could have much larger-scale decision support systems, potentially even with a virtual reality component where you’re actually immersed in it. I think that would change the way decision makers make decisions, when they can actually experience it a bit more immersively than looking at a 2D map.
Josh Gray: On my wishlist would be that computing develops in a way that makes it more accessible to savvy but not computer scientist people. I think we’re moving in a direction where we have a very exciting opportunity with GPUs [graphics processing units] and AWS [Amazon Web Services cloud computing], but we’re also getting to the point where we do need a full-time computer scientist to just make that happen for us. And we all code and are familiar with computing, but it is a full-time job almost just to stand those machines up. All these cool technologies make everything easier eventually, but somebody’s got to know how to do it––you know, virtualization and dockerization, and all this stuff. All these are great technologies, but they’re very specialized technologies. And we need personnel. I’d love to see it move in a direction where there’s a lot more of those people who want to join teams like ours, or we don’t need to rely on them as much and they can use their talents for other problems.
RRD: Now let’s look a little farther into the future. Let’s say 2050. Hopefully by then you are all retired, enjoying fruity drinks on an island somewhere, thinking emeritus thoughts. In 2050, I’d like to hear each of you offer your thoughts on what role the technology used in geospatial analytics plays in societies. Often, it is useful to think both about your hopes as well as what the status quo scenarios might look like. Feel free to contextualize the answers in the context of your own subdisciplines.
Josh Gray: I think geospatial technology will be much more pervasive, and speaking particularly from the satellite space, we’re going to have thousands and thousands more satellites. So, the night sky is going to look pretty different, for better or worse. But I think we’re going to have something basically like live video from space everywhere on Earth. So the possibilities are frightening and also exciting. I think that’s where our communications are going to go, like Starlink. Everything’s going to be something like Starlink. So, that’s going to become much more pervasive, and all the frightening possibilities are going to become much more real. But I think there are hopeful scenarios too for geospatial technologies. There are going to be a lot more sensor networks, both ad hoc like 500 cell phones in a certain area are now a sensor network, along with others that are established, for example, to monitor soil moisture and nitrogen status and crops. Those are going to be communicating directly with satellites and downstream synthesis with satellite-provided data and other sorts of things to, for example, move robots to apply pesticides at the right place and nitrogen at the right place. All those sorts of things are exciting possibilities. Also disaster response, and search and rescue, and navigation, and real-time traffic, and hyper-local weather warnings, and things like that––all those things are going to be real sooner than 2050, so those are exciting hopefully.
Helena Mitasova: Let me add a little bit on the human side. We just had a meeting about the wastewater monitoring we’re doing for the SARS-CoV-2 virus. That kind of monitoring will be widespread and everything will be monitored––all the diseases that are in the community, all the medications that people are taking, all of that will be monitored. Right now the areas where something is identified as coming from are rather large, but with a sensor network, there can be critical communities that will be monitored more than other communities. So I think health monitoring and behavioral monitoring will be on different levels and different scales. I mentioned monitoring physiology with wearables earlier, but this is another type of monitoring, where you’re not measuring individuals but groups of people without anybody knowing that their community is being monitored for different kinds of things, anything that can be found in wastewater. Even what you have been eating and things like that. But it won’t be personal; it will really be for groups of people, but then you can point out, this is the community where we have something troubling happening, for example, some virus or something that we don’t understand, so the resources can be put into that community to find out what’s going on.
Aaron Hipp: It could be personal though, and I’m sure you don’t disagree.
Helena Mitasova: Yes, definitely.
Aaron Hipp: I’m aware of some wastewater surveillance now, but there’s no reason there couldn’t be monitoring of individual houses or even toilets at some point, certainly by 2050. With wearables and personalization of a lot of movement and what people are exposed to, it’s hard to know what people will continue to accept, what they’ll become aware of, how privacy plays into it. It’s just more. To Josh’s point, the scale is getting smaller and smaller and closer to real time across our different fields.
Helena Mitasova: And the big question will be: we will have all those data, what are we going to do with those data?
Aaron Hipp: Yeah, and who owns them?
Josh Gray: I think the privacy point is obviously paramount here; it’s running through all of our minds as we think about these things. And that’s the big question mark. Even right now, what’s permissible and what’s not varies by jurisdiction and it’s going to change through time. So I wonder, can you ever put the genie back in the bottle? If something is currently permissible in a certain country, could someone develop technologies that enable them to do things that are just too tempting to ever forget? For example, you can’t unlearn how to make nuclear weapons; can you unlearn how to make facial tracking software? And link that with a location to have a frightening understanding of somebody and their motivations? That could be used for all sorts of ill. Is that just too tempting? Because it has good purposes too; once it’s made, can you ever unmake it?
Aaron Hipp: A lot of geospatial analytics and GPS technologies were formed in a military context, and of course satellites play into it too. I’m certain things from drones to all sorts of technologies could be utilized in the Defense space.
RRD: It seems likely that in the future all of our digital devices will be measuring an extraordinary diversity of features of the world around us. Sounds. Weather. The movement of people. Health metrics. The movement of animals. What is the best scenario with regard to how these data are integrated and who owns them?
Chris Jones: It seems like there has to be a sweet spot between individual privacy and open anonymized data for researchers to be able to use but not be able to have the data that people want to protect. There has to be some type of opt out feature for people who don’t want their data collected. Ideally, it’s some type of open source and open data initiative that allows the data to be open for anyone to use for those of us who opt in.
Helena Mitasova: And I think it’s not just the researchers who should have access to data; I think it’s also the public too. So that, essentially, you don’t have one entity that has full control over the data. But, again, how can you anonymize some data? That’s a really big question. A lot of geospatial data are completely open, open access data, and I think that works quite well. But then we also work with data where you just can’t have that openness because of the privacy issues. For example, health data are really, really protected.
Aaron Hipp: Yeah, it’ll be interesting to see who maintains data, because it won’t be just one body. How will those different bodies speak and work with each other, and over what area are the data aggregated? In my world, you end up with social science and health data aggregated at a census tract or zip code or Congressional district or school district or some type of policy-level districts, state, county––and it can just make trying to pull together different data sets super challenging. Because some data sets overlap, and some don’t. There are obviously time scale differences. This is a really good question. I don’t know what the best scenarios are. These are super challenging and super important questions.
Josh Gray: I think the best case scenario is benevolent leadership. [Laughs] Because I think a frightening amount of the data is going to be owned by people that create them, and the people that create them right now are Facebook, etc. Nobody likes Facebook and everybody opts in to give them all their data, right? So, even with a service that people are dissatisfied with, we are willing to give up everything. We care about privacy but then not in practice really. Maybe that’s because it doesn’t really matter? Because maybe we haven’t felt any ill effects? Maybe those are coming and people will care more, I don’t know. But I think the people who will own the data are the people who create them. And then I’m worried that people don’t have an understanding of what happens to that data later or verifiable ways to understand. So it’s like, I’ll opt in to giving Strava my location while I run, but how do I know that Strava doesn’t sell that off to somebody else downstream who wants to model my behavior so they can sell me the right pair of shoes or something like that? I didn’t opt into that. But I think, right now, you probably do. It would be neat if there was a technology that allowed you to sort of verify that, perhaps some sort of encrypted key or something so that you know when your data are used.
Aaron Hipp: Yeah, a similar thought was going through my head. It’s almost like Henrietta Lacks––her DNA and her cell line have been used in cancer research and across all sorts of different cell cultures for decades without her consent. I tracked my run this morning, and I’ve also tried to obtain Strava data for research, and I know they’ve never reached back out to me as a runner to say, “Hey, there are people who want to aggregate these data, or answer this question.” So, yeah, you are giving it up, but I wonder if there will be a reaction, like some of the reactions in health over the past 60, 70 years.
Megan Skrip: I’m very intrigued by the idea of an app alerting me, as you were saying, Josh, “Hey, your data is going to be used for something. You sure about this?” So you’re continually opting in or opting out. I see that as potentially being powerful.
Josh Gray: Well, hey, trademarked! [Laughs]
Megan Skrip: I’ll make sure you get the credit for that here! [Laughs]
Josh Gray:I’m going to go get an undergrad over in Computer Science to make that happen. I’ll be right back. [Laughs]
Megan Skrip: Okay. Keep us posted. [Laughs]
RRD: In a way, this is a way of all seeing technology, but as a biologist I’m also aware of how far we are from knowing everything about anything. What are the gaps in what we see through technology in the future?
Helena Mitasova: One thing that came up recently––there was a really good seminar about a national water model and hydrologic modeling, and we don’t know how runoff is generated because we don’t have any data about the subsurface. We have extremely detailed, spatially and temporally, data about the surface thanks to satellites and other technologies that are on the ground. But we don’t really know what’s happening under the surface. Those data are very sparse, because we can’t use remote sensing to collect them. So understanding the subsurface is like the next frontier. The inaccuracies really build up when you need to incorporate subsurface data. And it’s infrastructure, it’s soil properties, groundwater. The composition and properties of the subsurface environment influences a lot of disciplines.
Josh Gray: We can also think about what blind spots we have because of technology. In the case of geospatial technology, particularly satellite stuff, there’s an authority fallacy that can accompany those sorts of data. It’s a very highly technical, protected knowledge base to make and launch satellites, and so we trust the numbers that they produce. More so than we might, maybe more than we should. And I wonder if, as analysis and geospatial analytics become increasingly black-box, like machine learning, we’re tempted to even do that more. We’re very literally like, “Well, I don’t know how it works, but I trust what it tells me.” So that will probably have some blind spots. And some we already know about, like garbage-in-garbage-out. If you train a machine to diagnose liver cancer, it might end up detecting thumbprints on the slides––that is a real case. So all types of weird things like that I’m sure will be blind spots because of the technology, but I think the more technical it becomes, the more mysterious it becomes, the greater is, sometimes, our tendency to trust it.
Megan Skrip: How do you think this relates to education and how broadly versus deeply we need to train the next generation to handle all this technology?
Aaron Hipp: It just continues to be a challenge. It’s already that case in some ways, with more and more specialized fields and more and more specialized tools and analytical methods. And so being able to use those across groups, between groups, to understand, work with, hire, translate, etc. continues to be a challenge.
Chris Jones: It definitely seems like you need a combination of super highly specialized people that know the ins and outs of the individual technology, the individual model, whatever they’re doing, and translational people that have an inch-deep-but-mile-wide type of knowledge base to loop them all together and be able to translate between potentially.
Aaron Hipp: In terms of gaps that technology hasn’t been able to help with––there’s, what, 7 billion people on the planet? From a behavioral standpoint, a lot of stuff is always going to be challenging. Like why we do what we do as individuals. And trying to aggregate that in any way to say, for example, “This is why Josh and Aaron ran at this park, even though they live miles apart.” Those choices are always going to be hard to model appropriately.
Eric Money: We have all this great technology, but in the future are there still going to be things that we miss, despite the technology? I’m sure there are things that we will miss. But I don’t know what they are.
Megan Skrip: I think of some work that I was recently doing with another researcher at the Center, about choices people make in response to sea level rise. And not all of their choices are, as the literature says, “rational” choices. So this goes back to what you were saying, Aaron––can we get better at predicting human behavior?
Aaron Hipp: No.
Megan Skrip: [Laughs] Okay. So technology may not be able to help us understand why people make the decisions that they do?
Aaron Hipp: There’s just too many of us with too many things going on to be rational, and I think that’s just increasing with the amount of influences that we have access to. Just pure communication channels, and I don’t see that reducing. Back to Josh’s point, you’re not going to put social media back in a box. That’s going to influence what we’re doing.
Megan Skrip: Sounds like that does make it difficult to forecast environmental change or policy change, but we can’t pin our hopes on technology to solve that, can we?
Josh Gray: Not in 30 years.
RRD: What do we need to be doing in order to steer the use of these technologies toward the more hopeful scenarios?
Aaron Hipp: The two obvious things that come to my brain are open––open data, open source, as open as possible so that there’s as limited a black box as possible, so that people can understand what’s going on––and funding. I mean, I think a lot of what underlies some of what we’ve discussed is just capitalism and the ability to monetize some of these technologies or the data from them, or even the analytical tools. For DARPA [Defense Advanced Research Projects Agency] or DOE [Department of Energy] or whoever to have funding for research, for evaluation, for translation that isn’t responsive but is out front would mean it wouldn’t all just be reactive. I’m not terribly hopeful, but that would be my wish. Having that funding for real conversations like this one, having people thinking through how do we make sure these things aren’t happening in 30 years, the things that we’ve identified as potential negatives.
Eric Money: If the “we” in your question is the collective we of people in the field and doing the work, I think one thing that we need to do is not ignore the potential negative impacts of the things that we’re doing. Because, as the people working in it, we have some ideas of ways that things can be used in a negative way or how data can be interpreted in a negative way. We have an obligation to not only present our results in an objective way, but also to highlight their potential negative aspects and not just relegate it to, “Oh, we know this could potentially be used in this way” but it’s not expressed in any way in an outwardly fashion. I think the more transparent people are up front about positives and negatives, the easier it is to head off some of the negatives down the road.
Aaron Hipp: And, Eric, I think this is where your work with our students and our courses in the graduate certificate, master’s, and Ph.D. are so crucial. There are courses that include some of these difficult questions and considerations for ethics and the future. The Ph.D. and master’s students we’re graduating each year, many of these students who are in our courses and labs, they may be able to make some of these decisions in 10 or 20 years. And so making sure we’re having constructive conversations through our courses and lab work is crucial. It’s a great opportunity across each of our educational platforms.
Megan Skrip: Aaron was talking about the need to have these broader conversations to head off negative scenarios. Who should be at the table? Who else should be having these conversations and how often?
Aaron Hipp: Politicians and business owners. We’ve got the Center for Geospatial Analytics, we’ve got Centennial Campus, where there are corporations. We are in the seat of State government here in Raleigh. Location matters, and we have a physical proximity to these things. At least locally, for the state of North Carolina, maybe even the Southeast, there’s some good opportunities there. One other thought, too. I think this is where the translation of our own work comes in. The translation of all the stuff that we and our students are doing is so critical––to not just be behind a paywall of a journal, but translating any of the work we’re collectively doing, and showing what that impact is or can be, is crucial so that we’re not just sharing within our own community but much more broadly. So that people are aware of what we’re finding, the tools we’re using, who’s involved. I think that translation pieces is essential.
Josh Gray: I think we have an important role to play in what problems we articulate as important to our students. Because I think what we see is a lot of talent gets diverted to solve “not problems” because that’s where the money is. We have really smart, well intentioned students, and I think that if they got really turned on to solving problems for good, they’re more likely to use their talents to do that, rather than, you know, work to make it a microsecond quicker to share pictures of cats with a million people––things that aren’t problems. For example, one-third of people in the world don’t have access to clean water. We can shape a generational trajectory by the things that we talk about in our classes and reminding students that there are big problems to work on and articulate the role for geospatial technologies in solving those challenges. I think that’s really important.
Eric Money: Related to that, broadly speaking, teaching scientists to be better communicators is important. So being able to speak outside of their bubble and communicate to the public effectively. It goes back to the translational piece. If we can’t effectively communicate what we’re doing and the positives or negatives of it, then I think that’s a problem.
Helena Mitasova: And let me also add that we ourselves should be aware of and understand how the data, information, and knowledge that we are generating can be used. So very often, we are focused on the technical part and maybe some immediate application that was driving our research questions, but we may not be thinking beyond that. So I think even our own awareness as scientists may be important in understanding what can be done with the data that we are generating.
Megan Skrip: Helena, do you mean beyond what you were working on and might that apply to open science? Like someone might be able to do something further with the data?
Helena Mitasova: Yes. Again, as a scientist, I need to be aware, let’s say, if I’m deriving some data about how people move, for example, or how they behave or about, let’s say, distribution of some pollutants, I will be really interested in making sure the data that I’m producing are the most accurate, but I may not be thinking about who will be using these data and how they are going to use them. And that happens very often. You just don’t think about, say, how these data may influence the value of land and may make somebody bankrupt. When you have broader discussions and you interact with the stakeholders, that’s where you may learn more. We have a lot of research now focusing on participatory research and participatory modeling, and I think that can help us to understand what the consequences of our research could be. The coastal erosion work comes to mind; it can really alter people’s lives, based on what kind of information you give to people.
RRD: As the links between real-time satellite data, geospatial analytics and on the ground actions (where a tractor drives, where a irrigation system waters, where a drone strikes) become more automated and integrated, what does the role of humans become in these systems?
Josh Gray: Gotta be really good coders and, in particular, we’ve gotta write good objective functions. That is a very specific thing, but it makes all the difference. If we’re going to be trusting these machines that have been trained towards certain aims, we’ve got to be really, really, really thoughtful about what those aims are. The classic example is building a machine that just maximizes creating paperclips, and sooner or later it enslaves humanity in the pursuit of making paper clips because we didn’t put failsafes in. All our work, before we let it loose, we’ve got to have good built-in failsafes and checks. It’s going to be important having humans in the loop at some of these decisions. But of course you don’t realize these technologies’ potential if you put humans in the loop. So, being good coders.
Eric Money: I think too having humans that know what to do when the technology ultimately fails, so what’s the backup plan. Or the flip side of that, how do we fix it. Humans will still need to be able to fix technology issues but also theoretically understand how to do things if the technology fails. Not that everything can be mimicked in a non-technical way, but I think humans have that role. Maybe it becomes more of a translational thing too. More and more people will need to be able to interpret what the technology is doing and explain that to different audiences.
Aaron Hipp: And the “why” and the “who” will need to still be in there at some level. For example, folks in the military sitting somewhere in the US controlling drone attacks happening somewhere else; there’s such a separation between what they’re doing in their office versus the real-world consequences. And there are real-world consequences. If it’s more soybeans, that’s fantastic, but not completely separating that. I keep thinking of the book Ender’s Game, or something like that––you can’t completely separate the humanity.
Chris Jones: Coming back to Josh’s objective functions––I’m thinking about the irrigation system that monitors soil moisture, weather forecasts, but it also needs to be aware of policy restrictions on watering during drought conditions, right? How do you get that in there? The rest of it’s all technical and monitored. Where does that more qualitative information flow into the system?
Josh Gray: And people are doing this right now. The classic “trolley problem”––people have written code in the past 10 years that had to solve that problem.
Megan Skrip: Can you explain to me quickly what the trolley problem is?
Aaron Hipp: And I want to hear their answer! [Laughs]
Josh Gray: The problem is this: you’re operating a trolley and you can either run over the one old lady crossing the street right in front of you or switch tracks and run over the whole kindergarten class. So what choice do you make? And you can make any variation of the situation––maybe it’s actually your grandmother over here and a whole bunch of kids that you don’t particularly like over there. Whatever. But that sort of thing is something that these AIs [artificial intelligences] face and have to be able to make a decision about, or learn how to make a decision.
Megan Skrip: So it sounds like we don’t have a solution to the trolley problem?
Josh Gray: We all have our personal solution.
Megan Skrip: Aha, alright.
Aaron Hipp: Tesla’s, though, is going to be the most important answer there. [Laughs]
Chris Jones: That’s where the open data and open source code comes into play, knowing what those assumptions are in these systems that are becoming continually more and more important in the world. Because if you’re driving you know what you would do, but if your car suddenly makes a different decision, and you didn’t know what that decision was supposed to be…I don’t know.
Josh Gray: Well, yeah, particularly when that decision might be you’re the one who needs to be sacrificed. Like you’re the driver of the car, and the owner of the technology decides that you need to be eliminated. That might be the best ethical solution or something like that. These are things that have to be coded in.
Eric Money: But in that case, is it the owner that actually made that decision, or is it the person who wrote the code?
Aaron Hipp: Yeah, I agree. Who is it? Is it the code writer? The owner? Is it Elon Musk? The person in the car? Who’s culpable?
Josh Gray: Well it’s less the coder at that point as it is the weights that are in the neural net. But those were learned. And the coder’s like, “I didn’t do anything except make something that could learn! You fed it the data that made it learn these behaviors.” So, it’s kind of hard to figure out how to put your hands around who’s responsible in whatever outcome. This is the world we now live in. This is not 30 years from now we’ll have to solve this problem. That’s my point. It’s something that is being solved right now. And to Chris’s point, your ethics might be different than somebody else’s ethics. Can we have our personal AIs or automatons or whatever the representations or AIs that exist in sort of some other space, the metaverse––can they have our own ethics encoded in them? And are those the ones that drive our cars and make the decisions that you would?
Megan Skrip: That’s an intriguing idea. In that case, then, if your AI is just like you, is the advantage just that it has a better response time, or it can’t fall asleep at the wheel? Using the AI is just about getting around those kinds of human foibles?
Josh Gray: Yeah. I mean, someone’s ethic might be to never stop at stop signs, so certain rules of the road should probably be encoded. But when it comes down to like, “I don’t know, it’s a value space,” maybe it’s up to you.
Megan Skrip: Any other parting thoughts on this conversation?
Josh Gray: I think the overarching idea of just how pervasive geospatial analytics is. Like we say, everything has location; location is important. It is that spatial context that is so important in making decisions and understanding, if not predicting, human behavior in the future. Context is so, so important. And for all the frightening aspects that we’ve talked about, I think that there are more hopeful cases. And hopefully those win out, and some of that is up to us and universities and the way that we talk about the things we do. There are a lot of cool problems to be solved, and geospatial technologies will play a really big role, in so many different places.
RRD: Given what you have said about the future of your field, what could NC State be investing in to better anticipate that future and its opportunities?
Josh Gray: Investing in research computing personnel (appointment level similar to postdoc, research associate, etc.) would be a difference-maker for us. There are exciting opportunities to scale our data science, but they are increasingly sophisticated and harder and harder to implement. Thus, while the potential continues to grow, the techniques become increasingly inaccessible to domain science researchers. For example, we know of a myriad of computing strategies and technologies that would dramatically speed up our workflows, but it takes all of our creative energy to push the algorithms themselves forward, and we lack the time, expertise, and personnel to direct towards implementing the latest and greatest computing tech. I have seen the difference firsthand, both in my previous appointment at Boston University and in collaboration with NC State’s Department of Statistics. Most recently, a research computing employee in Statistics transformed our working, but inefficient, R code to a very fast and streamlined R package using some compiled C components. That made our algorithm run ~100x faster, which means we can interrogate areas that are 100x larger, or iterate much faster on algorithm improvements. Likewise, at BU, we had a research computing scientist produce a GPU-enabled version of our code. In both cases, these people did work that we were not able to (and my group has above average computing ability) and allowed us to answer scientific questions that were previously impossible. I want to note that we did try to hire someone like this in the past four years in the College of Natural Resources at NC State, but the position received few overall, and no competitive, applications over an extended application period. Perhaps part of the problem with that search was that the research computing portion of the position was only half funded, so the person also needed to be capable of sys admin work. Anecdotally, colleagues at BU said it took over 2 years to find and hire the right person for that type of job. The ideal candidate has a Ph.D. in a computationally demanding quantitative field (or extensive research computing experience instead), and the desire to focus on computational aspects of a wide variety of problems. For the right person, this is a dream job!
Chris Jones: I would like to echo Josh’s idea. Specifically, I would love to see a position for someone that is capable of taking existing code and making it more efficient (faster) and run at scale with interactive interfaces. Our lab has done a great deal of building out these technologies, but, like Josh’s group, we focus most of our time on the science and improving the models. We don’t have the time or expertise to master containers, optimize cloud computing environments through technology like Kubernetes [a system for managing software], or know the best way to manage multiple applications that need to interact with each other seamlessly and securely. This work is key to having truly interactive geospatial decision support systems but requires that there is someone that has this knowledge and experience. This type of work could be done by someone with a bachelor’s in computer science and has worked in industry learning these types of workflows and technologies, as both Kubernetes and containers (e.g., Docker) are widely used as part of industry workflows but are much less common in research applications. A geospatial data manager could also be extremely useful; this is someone who would be in charge of managing all of the geospatial data and making sure it was open to all groups. For example, if I download a bunch of Planet data for a project, we could record the location of the tiles and metadata for others to use.
Helena Mitasova: I agree with the need for investment in computational infrastructure in terms of personnel, cloud resources, and related expertise and capabilities. We have had opportunities to develop high-profile web applications, and we would like to provide access to our modeling and analytics tools and data through state-of-the-art online capabilities, but we do not currently have sufficient resources to make this happen. Eric may also provide some input on opportunities to host web applications for nonprofits and other smaller organizations that our MGIST students develop and may have potentially high impact.
Eric Money: I also agree with what everyone else has said––the need for dedicated computing resources and personnel capable of working with those environments across a wide range of applications is extremely important. I also think, beyond the research side, there is a similar need on the education side as well. If we want our students to come away from NC State with the most up-to-date knowledge and skill sets, they need to be able to work with and understand these high computing environments, and this goes for our undergraduates, master’s, and Ph.D. students. Our master’s students work with dozens of organizations every year on capstone projects, many of which turn into long-term solutions for these organizations; however, many of them (especially non-profits), don’t have the geospatial infrastructure in place to keep these projects going after the student graduates. In a few cases, our Center has continued to host projects, but this takes a large amount of computing resources to make it happen in any scalable way. I also think funding personnel who help foster relationships with industry and other organizations doing geospatial work is extremely important. There are so many missed opportunities because our faculty and staff simply don’t have the time or expertise to nurture these relationships and turn them into actionable research and education opportunities.
This post was originally published in Department of Applied Ecology.
- Categories: