What Is Open Science? An Intro, Common Misconceptions and Advice
In January, the United States government declared 2023 the Year of Open Science. At NC State University’s Center for Geospatial Analytics, our researchers have been doing open science for years. Yet, misconceptions about the term remain among the scientific community. What exactly is meant by “open science”? And how can a scientist be sure their (and others’) research is as “open” as they think it is?
What is open science?
Open science is “what science should be,” says Vaclav Petras, a geospatial research software engineer and one of the Center for Geospatial Analytics’ open science advocates. “It’s sharing all different parts of research that are traditionally not shared, as far as possible in terms of privacy.” The term is a broad umbrella, he says, encompassing six main areas:
- open source –– software used in scientific research that is free to use and modify
- open data –– information either analyzed or produced (or both) by a research project
- open access –– research publications that are free to read
- open educational resources –– free and reusable teaching materials
- open methodology –– free, modifiable instructions for how to do or create something else
- open peer review –– transparent critique, and revision, of research reports submitted for publication in scientific journals
Open peer review relates to the process by which scientists evaluate each other’s work and decide whether it is publishable, but the other five relate to products or information, which may or may not be “open” depending on how a scientist produces and shares them.
“Often open is used to refer to something that is available publicly, but actually the word ‘open’ has a strict definition,” Petras points out. “It’s not that you can just download something from the internet, but that something has a specific license associated with it, specifying rights and limitations for how it can be used. And that is often overlooked.”
Do scientists sometimes think a research product is “open”…when it’s not?
Yes. According to Petras, there are four main ways this happens:
No license = not open
Just because something is publicly available for download, and free, does not mean that it is open…if there is no information provided about how it can be used or reused.
“Different licenses exist for software code, data, images, text,” Petras says, and each license varies in how much it does and does not allow. “Everything is copyrighted; it’s just proprietary or open. Licenses use copyright to give permissions of what people can do or cannot do.”
For example, Creative Commons licenses for images and text range from public domain (no restrictions on reuse) to noncommercial use, which requires giving credit to the original source and permits no adaptation or monetary gain from use. Public domain licenses are completely permissive for fully open material, while noncommercial licenses are restrictive and associated with proprietary material. A spectrum of licenses exists in between, and, importantly, all open licenses allow commercial applications.
So, what if a geospatial data scientist uses a software development platform like GitHub to make their code widely available…but they don’t specify a license for that code? According to Petras, the scientist’s code is not open.
“If [putting code on GitHub] is just for showing the public or for review, it’s enough,” Petras says, “but it’s not enough if you want it to be reused. If something is available on GitHub, it can be viewed publicly, but it’s not truly open unless there is a license associated with it that defines the rights and permissions.”
Petras encourages data scientists and software developers to alert GitHub contributors when a license is missing. “If you don’t see a license for something on GitHub, and you want to be able to use that code or software, the correct thing to do is to click on ‘Issues’ while you’re logged into your GitHub account and request the developer to add a license. That way, if they add a license, it’s not just a bunch of code you can see but can’t use; you can actually run it.”
Open-source licenses––licenses specifically for open-source software––also vary in how restrictive or permissive they are, and software developers can use sites like choosealicense.com, Petras says, to find the one that best suits their work.
“Open source” ≠ freeware and doesn’t automatically make output open
Open-source software is a key component of geospatial research (and open science in general, really). For example, geospatial tools like GRASS GIS and QGIS are open source––software that is both free to use and for which the source code is accessible and modifiable by anyone, allowing even commercial applications.
But, Petras says, the term “open source” is often misused. “Something may be called ‘open-source software’ when it is actually ‘freeware,’” he says. “Freeware can be downloaded for free, but it’s proprietary.” The licensing of proprietary software does not allow that software to be modified in any way for a new purpose, let alone a commercial one.
Open-source licenses, meanwhile, govern how open-source software can be used and typically have very generous permissions for modification and very specific instructions for giving credit. They also determine how your use of the software determines the “openness” of resulting creations or modifications of the code.
“Using open-source software does not automatically make something open,” Petras says. For example, if someone uses open-source image-editing software like GIMP or Inkscape to create an image, that image is not automatically open. If a geospatial scientist uses the open-source FUTURES model to forecast changes in urbanization across a landscape, their data or model outputs are also not automatically open.
Yet, Petras says, using open-source code to create new software may require the resulting product to be open source. “If you are using and modifying code,” Petras explains, “it depends on the license whether your product is then also open; some licenses require you to keep the original license, and others allow you to change it.”
A parting word on open source before we move on: Petras also notes it is incorrect to use the phrase “open-source science” or “open access software.” Software that is open is simply called open source. “People use terminology that they hear,” he explains, “but it’s not necessarily correct.”
“Open access” ≠ open research products
To some scientists, “open science” simply equates to “open access”––when scientific publications are available to view and download for free. Petras disagrees.
“The purpose of an open access paper is to have free access to reading that paper, rather than have the material in that paper be ‘open’ per se,” Petras says. “Open access publishing exists because people want access to the article for free. They don’t want to change and reuse it.”
Open access papers are published with an open license, however. “The publishers are doing it the proper open way,” Petras says, “with licenses that are actually permissive.” For example, the publishers MDPI and Springer Nature use a Creative Commons license that allows reusing all or part of a scientific article, including tables and figures, as long as the original article is cited.
Nevertheless, an open access license, for an open access paper, does not make any component of the study, other than the written report itself, open. The software, data and methodologies remain not open unless the researchers specifically make them open.
Data format matters
Lastly, Petras points out that the way data and research products are created and shared determines how truly reusable (i.e., open) they are.
“If you publish data that only one software can read, is it really open?” he asks.
The problem is a complicated one, Petras points out, because the field of geospatial analytics generates and uses very specialized file and data formats suited to specific analytical tools. “It leads to proliferation of different file formats. The issue comes when they are not open. If there is some standard, that helps.”
A standard describes how a data format can be read and written by software. “Standards should be open access and created in an open way,” Petras says, so that standardization can be applied to data by many different geospatial scientists using different software.
Want to do open science? Make sure you budget for it.
Typically, open science is more expensive for a research team than doing science that is not open. Open access publishing has higher upfront costs than traditional publishing, and more time and effort are required to make data and software open, ensuring that they are properly licensed, accessible and easily reusable by another research team. The advantages, though, are more transparent and shareable research and the potential for faster and more efficient innovation.
“While there are people who want to do open science but don’t know how, more and more people now know how but don’t have the time or money to do it,” Petras notes. His recommendation: “Budget money for open publications, open data, open software in grant proposals and projects. If you can budget $3,000 for open access, also budget for open data and open-source software. This could be the salary of a student or the salary of a software engineer. You have to have it in the budget.” The Center for Geospatial Analytics, for example, welcomes collaborators to include Center research staff in grant proposals to support open science efforts.
Given the US Federal Government’s focus on open science this year, the Center for Geospatial Analytics is optimistic that federal funding agencies will look favorably on grant budgets that recognize the cost, and importance, of making science open.
Want to learn more about open science? Check out the following resources: