Ranga Vatsavai
Bio
Raju is a Chancellor’s Faculty Excellence Program Cluster Associate Professor in Geospatial Analytics in the Department of Computer Science. As the Center for Geospatial Analytic’s Associate Director of Spatial Computing & Technology, Raju plays a leadership role in our strategic vision for spatial computing research. He works at the intersection of big data management, data analytics, and high performance computing with applications in national security, geospatial intelligence, natural resources, climate change, location-based services, and human terrain mapping. Raju was previously the lead data scientist for the computational sciences and engineering division of the Oak Ridge National Laboratory (ORNL). He holds MS and PhD degrees in computer science from the University of Minnesota and is coming to NC State with more than 20 years of research and development experience in large-scale spatiotemporal data management and geographic knowledge discovery. A leader in the field, Raju is passionate about understanding the world through (high-) resolution, dimensional, and temporal pixels by developing innovative and computationally efficient algorithms.
Publications
- Geospatial Foundation Models: Recent Advances and Applications , PROCEEDINGS OF THE 12TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA, BIGSPATIAL 2024 (2024)
- Multi-spectral Gradient Residual Network for Haze Removal in Multi-sensor Remote Sensing Imagery , MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-APPLIED DATA SCIENCE TRACK, PT X, ECML PKDD 2024 (2024)
- Cloud Imputation for Multi-sensor Remote Sensing Imagery with Style Transfer , MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII (2023)
- Context Retrieval via Normalized Contextual Latent Interaction for Conversational Agent , 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023 (2023)
- Harmonization-guided deep residual network for imputing under clouds with multi-sensor satellite imagery , PROCEEDINGS OF 2023 18TH INTERNATIONAL SYMPOSIUM ON SPATIAL AND TEMPORAL DATA, SSTD 2023 (2023)
- NOVEL DEEP LEARNING FRAMEWORK FOR IMPUTING HOLES IN ORTHORECTIFIED VHR IMAGES , IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (2023)
- Persona-Coded Poly-Encoder: Persona-Guided Multi-Stream Conversational Sentence Scoring , 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI (2023)
- Q-learning Based Simulation Tool for Studying Effectiveness of Dynamic Application of Fertilizer on Crop Productivity , (2023)
- Remote Sensing Based Crop Type Classification Via Deep Transfer Learning , IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING (2023)
- Deep Residual Network with Multi-Image Attention for Imputing Under Clouds in Satellite Imagery , 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2022)
Grants
The Science and Technologies for Phosphorus Sustainability (STEPS) Center is a convergence research hub for addressing the fundamental challenges associated with phosphorus sustainability. The vision of STEPS is to develop new scientific and technological solutions to regulating, recovering and reusing phosphorus that can readily be adopted by society through fundamental research conducted by a broad, highly interdisciplinary team. Key outcomes include new atomic-level knowledge of phosphorus interactions with engineered and natural materials, new understanding of phosphorus mobility at industrial, farm, and landscape scales, and prioritization of best management practices and strategies drawn from diverse stakeholder perspectives. Ultimately, STEPS will provide new scientific understanding, enabling new technologies, and transformative improvements in phosphorus sustainability.
Plant disease outbreaks are increasing and threaten food security for the vulnerable in many areas of the world and in the US. Climate change is exacerbating weather events that affect crop production and food access for vulnerable areas. Now a global human pandemic is threatening the health of millions on our planet. A stable, nutritious food supply will be needed to lift people out of poverty and improve health outcomes. Plant diseases, both endemic and recently emerging, are spreading and exacerbated by climate change, transmission with global food trade networks, pathogen spillover and evolution of new pathogen genetic lineages. Prediction of plant disease pandemics is unreliable due to the lack of real-time detection, surveillance and data analytics to inform decisions and prevent spread. In order to tackle these grand challenges, a new set of predictive tools are needed. In the PIPP Phase I project, our multidisciplinary team will develop a pandemic prediction system called ����������������Plant Aid Database (PAdb)��������������� that links pathogen transmission biology, disease detection by in-situ and remote sensing, genomics of emerging pathogen strains and real-time spatial and temporal data analytics and predictive simulations to prevent pandemics. We plan to validate the PAdb using several model pathogens including novel and host resistance breaking strains of lineages of two Phytophthora species, Phytophthora infestans and P. ramorum and the cucurbit downy mildew pathogen Pseudoperonspora cubensis Adoption of new technologies and mitigation interventions to stop pandemics require acceptance by society. In our work, we will also characterize how human attitudes and social behavior impact disease transmission and adoption of surveillance and sensor technologies by engaging a broad group of stakeholders including growers, extension specialist, the USDA APHIS, Department of Homeland Security and the National Plant Diagnostic Network in a Biosecurity Preparedness workshop. This convergence science team will develop tools that help mitigate future plant disease pandemics using predictive intelligence. The tools and data can help stakeholders prevent spread from initial source populations before pandemics occur and are broadly applicable to animal and human pandemic research.
Develop novel approaches for peta-byte sized remote sensing image data management and analysis. Develop spatiotemporal indexing scheme, spatiotemporal datacube system and associated components such as interpolation, reprojection, and caching. Implement parallel and distributed algorithms to scale datacube operations.
In many real-world applications, data loses its value if it������������������s not analyzed in near real time. Examples include natural disasters, crop disease identification and bioterrorism, traffic monitoring, monitoring human activities and public places. Edge computing refers to pushing computing power to the edge of the network or bringing it closer to the sensors. We envision that the embedded supercomputers (e.g., Jetson TX1 and TX2; 1 Teraflop; ~10 Watts) allow computing at the edge (e.g., UAVs). This framework would then allow near real-time analytics on streaming data, which is critical for first responders to national security agencies alike, and compress/reduce data before transmitted to the cloud or data centers. In this project, we propose to develop novel machine learning algorithms on the embedded supercomputers while the data is still in device memory and demonstrate the technology in two real-world applications: crop monitoring and traffic monitoring. Proposed technical work involves following three key stages. (i) Generate a statistical model from historical data (e.g., spectral signatures of different crops) by using statistically principled mixture model (e.g., Gaussian Mixture Model (GMM)), (ii) As the data is being acquired compare new (streaming) data with the GMM model to identify any anomalous patterns (e.g., weeds), (iii) generate event signal about the anomaly before the data is being compressed and transferred out from devise memory.
NC State University, in partnership with University of Michigan, Purdue University, University of Illinois at Urbana Champaign, Kansas State University, Georgia Institute of Technology, NC A&T State University, Los Alamos National Lab, Oak Ridge National Lab, and Pacific Northwest National lab, proposes to establish a Consortium for Nonproliferation Enabling Capabilities (CNEC). The vision of CNEC is to be a pre-eminent research and education hub dedicated to the development of enabling technologies and technical talent for meeting the grand challenges of nuclear nonproliferation in the next decade. CNEC research activities are divided into four thrust areas: 1) Signatures and Observables (S&O); 2) Simulation, Analysis, and Modeling (SAM); 3) Multi-source Data Fusion and Analytic Techniques (DFAT); and 4) Replacements for Potentially Dangerous Industrial and Medical Radiological Sources (RDRS). The goals are: 1) Identify and directly exploit signatures and observables (S&O) associated with special nuclear material (SNM) production, storage, and movement; 2) Develop simulation, analysis, and modeling (SAM) methods to identify and characterize SNM and facilities processing SNM; 3) Apply multi-source data fusion and analytic techniques to detect nuclear proliferation activities; and 4) Develop viable replacements for potentially dangerous existing industrial and medical radiological sources. In addition to research and development activities, CNEC will implement educational activities with the goal to develop a pool of future nuclear non-proliferation and other nuclear security professionals and researchers.
The purpose of the updated and extended project is to provide additional technical support, consultation and research related to developing and evaluating new methods in the application of geospatial analysis for the Rivers, Trails, and Conservation Assistance (RTCA) program within the Conservation and Outdoor Recreation (COR) Branch of the National Park Service (NPS). The additional technical support, consultation and research activities will include, but are not limited to: 1) developing training material for using the RTCA web mapping application; 2) incorporating additional COR Branch program data into the existing RTCA Enterprise database; and 3) enhancing the current RTCA web mapping application by incorporating existing themed GIS web services.
Massive amounts of remote sensing data are being collected and archived from satellites and airborne platforms (including drones) on daily basis. This data supports a wide range of applications of national importance. Examples of applications include crop type mapping, forest mapping, urban neighborhood mapping, damages due to flooding, hailstorms, and forest fires, impacts of climate change on crops, unusual crop detection (e.g., poppy plantations), changes in biomass, understanding complex interaction between food, energy, and water, etc. Classification of these high-resolution images requires object and arbitrary patch based classification to capture relevant spatial context. The advent of multiple instance learning and deep learning took the natural image processing community by storm. However, its application to satellite images has been slow due to training data and computational requirements. In this project, we develop deep learning algorithms for classification of satellite images and scale these algorithms on Lenovo/Intel������������������s new architectures and software infrastructure (e.g., Neon, Caffe, Theano, and MXNet).
Slums have become an inescapable feature of cities in the developing world, and the number of people living in slums has increased rapidly, coming close to 1 billion and rising higher (UN-Habitat 2010). Relatively little is known, however, about patterns of slum development over periods of time and about factors associated with progressive improvements. One of the objectives of this research is to develop a prototype methodology for semi-automatic slum identification and categorization that can speedily and reliably be adapted for use in other cities.
Datasets being generated by experiment and simulation today are increasingly large, and nations across the world ������������������ including China, the United States, Europe, and Japan ������������������ have all invested heavily in developing computers capable of processing or generating these datasets. These datasets come from applications in many areas, and are driven by national security issues as well as industries of strategic value. Large-scale computing is seen as driving technological developments in biology and biomedicine, high-energy physics (a key to stockpile stewardship), and materials science. All of these are areas where the United States has traditionally led the world. However, recent developments have placed China as the leader in building large-scale computers and there is a concern that this could result in the loss of a leadership role for the United States in many of the related technologies. With this in mind, the United States has placed a renewed emphasis on developing exascale computer platforms, and strategically, on the development of algorithms which can make use of large computers to support decisions in science and engineering ������������������ an area where the United States arguably still leads the world. In this proposal, we propose adapting Kitware������������������s Catalyst and Cinema platforms to perform new summarization tasks including compression scalably on these new architectures. To demonstrate the effectiveness of these summarizations for this phase I project, we will adapt a simulation program to use Catalyst for in-situ processing ������������������ saving only the dynamic summarization ������������������ in order to provide stakeholders with the information necessary to make a decision and be confident in the simulation process.
Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a challenging task. Despite the growing need for analysis from science domains that are generating ���������������Big Data������������������ from instruments and simulations, building high-performance analytical workflows of data-specific algorithms has been an impediment due to: (i) evolving nature of the ���������������Big Data������������������ hardware and software architecture landscape, (ii) newer architectures impose new programming models, and (iii) lack of understanding of data-parallel kernels of analysis algorithms and their performance on different architectures and programming environments. NCUS will conduct research on benchmarking core graph kernels and computing primitives.