click here to download the project Abstract
ABSTRACT:
Machine learning, satellites or local sensors are key factors for a sustainable and resource-saving optimization of agriculture and proved its values for the management of agricultural land. Up to now, the main focus was on the enlargement of data which were evaluated by means of supervised learning methods. Nevertheless, the need for labels is also a limiting and time-consuming factor, while in contrast, ongoing technological development is already providing an ever-increasing amount of unlabeled data.
Self-supervised learning (SSL) could overcome this limitation and incorporate existing unlabeled data. Therefore, a crop type data set was utilized to conduct experiments with SSL and compare it to supervised methods. A unique feature of our data set from 2016 to 2018 was a divergent climatological condition in 2018 that reduced yields and affected the spectral fingerprint of the plants. Our experiments focused on predicting 2018 using SLL without or a few labels to clarify whether new labels should be collected for an unknown year. Despite these challenging conditions, the results showed that SSL contributed to higher accuracies. We believe that the results will encourage further improvements in the field of precision farming, why the SSL framework and data will be published.
INTRODUCTION
Food sustainability is one of the grand challenges of the next decades, and a rigorous monitoring of the global food system is needed to allocate our resources (Fanzo et al., 2021). In particular, cropland use monitoring is essential, to assess the supply chain, but also evaluate the impact on natural ecosystems
and keep track of the related hidden costs and subsidies (Rockstrom et al., 2020). Earth Observation (EO) is a global recurring ¨proxy for land use monitoring, which is why it is widely used especially in areas where accessibility and infrastructure are a problem.
Today’s opportunity is that there is much more image data available for feature extraction, even if not labeled. In recent years,new methods such as self-supervision have emerged that allow solid representations to be extracted in an unsupervised manner and could provide a more reliable representation for crop type mapping or other precision farming applications. Contrastive
learning is a powerful approach to self-supervision, which aims to learn a representation where similar pairs of samples, such as time series of one crop type, are close to each other in the embedding space and different time series are far apart. In addition, only a few available labels do not impose a restriction as with supervised learning. Self-Supervised Learning (SSL) was successfully applied to detect changes (Leenstra et al., 2020), using techniques such as pretext tasks or augmentations to learn an invariant representation (Guldenring and Nalpantidis, 2021). ¨ Baevski et al. (2022) applied self-supervision for several tasks (NLP, speech, computer vision) with the objective to apply one augmentation suitable for different domains. The core idea was to mask a part of the input instead of using augmentations such
as rotation or color distortion which are only suitable for certain use cases. The increasing number of self-supervised learning methods differ mainly in terms of loss function, augmentation or architecture, with the choice of the underlying encoder playing an important role. In this work, we used a transformer (TF)
as an encoder, which was verified in previous studies in a supervised manner (Rußwurm et al., 2020). SimCLR was one of the first architectures proposed where augmentation and the use of positive and negative pairs is an important property. Another similar example is MOCO which uses a memory bank in addition to negative and positive pairs. An overview as well as comparison of these siamese networks was presented in (Chen and He, 2021). We build on the recent SimSiam method, which combines a dual-stream siamese network with various data augmentations on positive pairs of input data (Chen and He, 2021). SimSiam also achieved promising results with a small batch size and a small number of epochs.
Self-supervised learning has rarely been used in precision agriculture although it could make an important contribution. For instance, for field-level yield prediction, there are typically very few labels available, which is a limitation for most supervised learning methods. In addition, adapting to a new region or a year with different climatic conditions reduces the effort required to record new labels. In fact, plant morphology varies in different climates, and plant growth and spectral response may change from year to year, due to climatic variations or agricultural practices. Models trained on specific imagery and crop type may therefore not transfer optimally to new regions or future years (Belgiu and Csillik, 2018). From a machine learning perspective, this is framed as domain adaptation. In this context, transfer learning solved similar problems by means of learned knowledge. It used other data sources such as ImageNet to pre-train a neural network and then transfer it to a downstream task (Nowakowski et al., 2021; Lucas et al., 2021). Castillo-Navarro et al. (2022) experimented with out-of distribution data and confirmed the power of self-supervised learning with very few labels. Nyborg et al. (2022) used Thermal
Positional Encoding (TPE) for attention-based crop classifiers to classify crop types in different regions in Europe and, most importantly, to reduce the effects of climate on the spectral responses of plants. Orynbaikyzy et al. (2022) identified appropriate Sentinel-1 and Sentinel-2 features to fit a new target region. Reducing the features by eliminating the weatherdependent bands likewise improved the results in our experiments. Another initial application of SSL in the agricultural sector outperformed deep learning approaches with a limited number of labels (Guldenring and Nalpantidis, 2021). Agastya ¨
et al. (2021) successfully applied self-supervised learning for irrigation detection. These promising approaches will save considerable time and reduce the need for labels. In this study, SimSiam was applied with and without augmentation. In one experiment, we omitted augmentation and aimed to bring in our existing labels to learn an invariant representation per crop type. Data from previous years already include
several variations of time series for each crop type, covering not only small climatological but also soil-related differences. This provided an additional advantage because, unlike normal augmentation with 1D time series, the risk of shifting to another crop type is reduced. Dwibedi et al. (2021) followed a similar
hypothesis and added nearest neighbors from the data set to find additional positive pairs, assuming that more similar variations would be found this way.
The objectives of this research were as follows:
- A comparison of supervised learning with self-supervised learning.
- Experiments with/without augmentation to assess their impact on SSL performance.
- Analysis to what extent SSL is suitable for the prediction of unknown or deviating years. We assume that this also paves the way for the prediction of crop types in new regions.