AI on Forests: making sense on heaps of data
Introduction
As mentioned in the May newsletter, collecting field data from forests is difficult and expensive, so the more information one extracts from it, the better. Depending on the technology, the distance to the canopies and the job specifications, the field output could amount to 40MB to 500MB per hectare surveyed (= 16MB to 200MB per acre). As the output of the field data is the input of all Artificial Intelligence systems, decision makers require any decision support system to be capable of high abstraction capabilities in order to extract the critical variables from which to act upon.
The analysis of forests usually depends on the efficient computation of two kinds of three-dimension elements, and on the ability to correctly separate them: the scenario, which is deemed immutable, and the time varying vegetation.
The scenario is mostly composed of the ground surface but, depending on the context or application it may contain landmarks or features such as overhead lines, railways, roads, canals and aqueducts, constructions or greenhouses, etc.. In many applications, the coordinates of the scenario are roughly known from cartography which does not waiver the need for identifying the features in the scenario, as cartographic data could be outdated or with insufficient accuracy and detail (the width of a road, for instance). Feature extraction from the scenario is thus often a primary requisite: for instance, when measuring forest encroachment along a road it is efficient to localise the road using geographical coordinates from cartography, then to measure distances from trees and branches to the actual envelope of the road, with its turns and variations in width, crossings, roadside facilities, etc.. The scenario and its features may change across the years, yet the main difference is that the system does not follow it as changing “asset” but as a fixed asset that might need an update.
While the vegetation changes slowly in time, especially forestry that grows in years or decades scales opposite to farming that has intra-annual cycles, such variation is one of the critical values in the business. Thus, the asset management system is designed to embrace change as its key variable that correlates observations and epitomizes information and value.
The emphasis in efficiency is dominant since, in the forestry context, accuracy and minimisation of computational resources are opposite goals: one is improved at the expense of the other. Thus, it is important to master the quality requisites from the onset as they might lead to different data acquisition technologies and procedures.
Traditional & Modern Artificial Intelligence
At Albatroz, the term Artificial Intelligence [AI] is used in a broad sense. It embraces both “traditional” algorithms and signal processing and “modern” methods such as machine learning. Moreover, in some cases, approaches from both families were tested against the same problem and the comparison of results is elucidative [1]. We notice that whereas e.g. learning is expected to be a more adequate approach for, e.g., tree species classification, other tasks such as identification of particular objects or characteristics may be equally successfully tackled with more direct simple techniques such as applying some rules and data analysis on plain 2D images (e.g., the detection of pine processionary nests).
Since most machine learning methods depend on the availability of large sets of instances, this could be a barrier to explore new ways to make sense of data. On the contrary, signal processing is rule based and, while requiring more insight into the issue to be solved, it can be developed from an handful of examples from the field. Overall, the competence on both toolkits, allows for choosing the best tool for each stage of R&D tackling each problem.
From LiDAR and image sensors data to information
Forest data may be acquired in a mission using LiDAR and/or image sensors and, possibly, infrared/thermography sensors. These are necessary raw data, which will feed some algorithm(s), either simple (e.g. counting isolated trees), medium (e.g.: measuring tree-to-wire distances in powerlines) or complex (e.g. detecting “exotic” species in a forest or estimating understory volume to assess wildfire risk). We then need to extract or infer the information we want to handle, in a higher abstraction level, in a human and structured form. For that, we may use different tools. From a set of 3D points (of LiDAR) or 2D pixels (of a single image), we may first find our objects of interest, such as trees, but also (depending on the context and goals) rocks, over-head line [OHL] towers and wires, or other human-made structures (e.g. houses, bridges).
This information is naturally 3D in the case of LiDAR (which may include features behind range distances, such as intensity). Joining 2D images (photogrammetry), which may include colour, we are also able to get 3D info (although, as mentioned in the previous section, for some goals that may be unnecessary or inadequate).
First of all, due to the huge amount of data and also for business purposes, it is useful to partition acquisition data in workable business units, such as spans (defined by two OHL towers) in the case of an OHL inspection, kilometre or mileage for roads or railways (Figure 1) or fixed size terrain squares in more general-purpose applications.
For each working data/business unit, together with object recognition or identification, we may model the terrain (DTM: Digital Terrain Model) and overlaying surface (DSM: Digital Surface Model). In addition, we may (and often do) want to calculate statistics and relations between objects, such as distance of trees to OHL. Notice that we may apply different levels of details/granularity of objects, depending on goals, business, quality of data, or available resources. We may be interested on the forest as a whole, possibly as vegetation blocks, or on identifying each single tree, possibly also with its main stem and some branches (or, in the case of an OHL tower, we may represent it with a single point, or have a more complete model, even including insulators).
In addition, when we have at least two different observations of the same object with a time interval, we may compare them to see its evolution, translating into measurable and meaningful information such as vegetation growth (height and volume).
Having collected and crunched many data over time, we may be able to infer more information. For instance, (many kinds of) risk analysis can be refined, reducing margin errors, due to higher data support. As another example, tree species classification can be performed, especially in conjunction with good (in quality, variety, and quantity) ground truth data.
From information to presentation and understanding
We may have stored huge amounts of raw data and inferred lots of new useful data, but all that may lose its purpose if we are not able to understand it and also visualise it and being able to present it to concerned stakeholders. Notice that visualisation may also be important in the understanding process. It is not of much use having many different positive and negative numbers of, e.g., trees growth, distance to OHL, wood or biomass volume variation, number of bugs or defects, if we do not understand what is going on. Is the current condition bad or good? Better or worse than past observations? When and where? How is it evolving? How critical is it? Which are the likely causes?
To help reasoning on data, we must analyse it, starting with a statistical treatment, both globally and locally (in space and time), and find correlations. Numbers can be visualised in some spreadsheet or tableau, with global and partial numbers aggregated and organized by business units, together with charts and graphs, stressing the more relevant business facts (e.g. different colours for different levels of risk). Map visualisation with an interactive application, also with colours for regions of interest, is extremely useful both for a general and catchy view and to locate problems, delving into details if individual (e.g. tree) information is included and available with zoom tools.
From understanding to decision support
Understanding processed data, in the sense of the questions expressed in the previous sections (what is the situation, what is going on, why, where, what are the risks), is key to support management decisions. With calculated, quantified and located risks and/or business fruits, managers can justifiably decide on type and amount of investment and maintenance to promote and undertake in time and space, optimising distribution of available resources. The extra knowledge obtained also provides insight on possible new business directions. Figure 3 highlights how spatial and numerical data contribute in different ways for decision making. Together with ground sampling of trunk diameters and number of trees, an optimal schedule for tree cut can be established along with a more accurate estimate of economical value.
Operational processed data, together with some executive report, are thus a precious input to the client’s own tools with their private data, for its decision making process.
Conclusion
Forest and vegetation data collected using LiDAR and image sensors over time may be an information treasure for the respective business and should thus not be discarded. Each acquisition is, however, costly and generates a huge amount of data. Hence, in addition to efficiently store it for future use with any kind of algorithms, more abstract data should be calculated from it and also properly stored. We may then have the history of each individual tree, which will allow better informed management decisions, having the ability to understand both local cases and the global picture with its space-time distributions. For that, data analysis and aggregation, together with visual tools are also essential for an easier assessment and decision, which expectedly contribute to the optimisation of resources allocation and planning, minimising costs/risks, and maximising profit or other goals, in addition to extending our subject’s knowledge. Overall, we’ll be widening business and research horizons while addressing environmental concerns.
References
[1] J. Gomes-Mota, “Automated Visual Inspections – Comparing Computer Vision to Machine Learning” – 25th International Conference on Electricity Distribution, paper nº 52, CIRED, Madrid, Spain, 3-6 June 2019.