3 Copyright 2026 - Osmar Yupanqui & Marvin Quispe

Authors

Affiliation

Osmar Yupanqui

Conservación Amazónica - ACCA

Marvin Quispe

Conservación Amazónica - ACCA

Show code

# Conservación Amazónica - ACCA
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

4 Selective Logging Detection with Deep Learning and Very-High Resolution Imagery

4.0.0.1 Authors: Osmar Yupanqui and Marvin Quispe

4.1 Part 2: Modeling

This notebook shows the workflow used to train a deep learning model with the patches saved from the previous notebook (Notebook n.º1). We will load our TFRecords, train a model with them and save it for further use (inference). The notebook was developed and configured specifically to run on Google Colab, so its implementation is optimized to run on that platform. In addition, it is necessary to configure the environment to enable GPU computing by changing the runtime type. First, navigate to Edit → Notebook Settings and select GPU from the Hardware Accelerator options.

4.2 1. Data collection and initial settings

We are going to connect Google Drive with Google Colab to manage the exported data (patches) from the previous Notebook.

Show code

from google.colab import drive
drive.mount('/content/drive')

Get the data that has been staged Yupanqui Carrasco, O., & Quispe Sedano, M. J. (2026). Selective Logging Detection with Deep Learning and Very-High Resolution Imagery_TF_records [Data set]. Zenodo. https://doi.org/10.5281/zenodo.19614389

Show code

import os
import zipfile
import requests

# =========================
# CONFIG
# =========================
use_drive = True  # True = save to Google Drive, False = save to /content

if use_drive:
    from google.colab import drive
    drive.mount('/content/drive')
    base_path = "/content/drive/MyDrive/DL_book"
else:
    base_path = "/content/DL_book"

# Zenodo file URL (direct download)
url = "https://zenodo.org/records/19614389/files/dataset.zip?download=1"

zip_path = os.path.join(base_path, "dataset.zip")

# =========================
# SETUP DIRECTORY
# =========================
os.makedirs(base_path, exist_ok=True)

# =========================
# DOWNLOAD FILE
# =========================
print("Downloading dataset...")
response = requests.get(url, stream=True)

with open(zip_path, "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

print(f"Downloaded to: {zip_path}")

# =========================
# UNZIP FILE
# =========================
print("Extracting dataset...")
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(base_path)

print(f"Extracted to: {base_path}")

# =========================
# (OPTIONAL) DELETE ZIP
# =========================
os.remove(zip_path)
print("Zip file removed.")

Let’s see the contents of our Drive folder using the linux command !ls. Note: This command only runs on Google Colab

Show code

!ls -F /content/drive/MyDrive/DL_Book

Our folder should contain several files whose names begin with: - train - testing - validation

Those files are our exported patches from the previous Notebook. If you do not wish to run the previous notebook , you can create a copy from the original folder. Click here to go to the Drive folder. Make sure to modify the paths to match the current Notebook.

We will also import some libraries that will contribute to the training of the model. This notebook was built and tested using Google Colab, so the libraries used correspond to the latest version available on this platform at the time of construction and testing (e.g. Tensorflow 2.19.0). Check the latest versions of TensorFlow here.

Show code

import tensorflow as tf
print(tf.__version__)

Numpy is another important library, that allow us to perform operations with tensors, matrices and vectors.

Show code

import numpy as np

We will also import Matplotlib, for the data visualization.

Show code

import matplotlib.pyplot as plt

Finally, we will also import os, to read files and manipulate paths.

Show code

import os

4.3 2. Data visualization

We will start by viewing a single TFRecord file, if we explore the folder where the files were downloaded, we can see that they are “compressed” with a special .gz format, Tensorflow allows the use of this special format, without the need to decompress the files.
We will start by creating a variable called record, which will include in a container tf.data.TFRecordDataset() the path of an exported file. Please verify that the export path is the same, in the previous example we exported our data to the DL_Book folder in Google Drive

Show code

record = tf.data.TFRecordDataset('/content/drive/MyDrive/DL_Book/testing_0.tfrecord.gz', compression_type = 'GZIP') # This loads our TFRecord into a TFRecordDataset container
print(record)

We can see that the TFRecordDataset does not show all the information, we cannot know the shape of the tensors, even though we know based on the previous code that they have a 128 by 128 shape. We also cannot know the dtype (the data type) neither the number of elements that our TFRecordDataset has.

Because TFRecords are binary storage files, we must transform them into a readable structure. To do this, we will create a dictionary that contains the bands that we exported in the previous code and assign each band a shape with tf.io.FixedLenFeature() and a data type.

Show code

features = {
    'b1': tf.io.FixedLenFeature([128, 128], tf.float32),
    'b2': tf.io.FixedLenFeature([128, 128], tf.float32),
    'b3': tf.io.FixedLenFeature([128, 128], tf.float32),
    'b4': tf.io.FixedLenFeature([128, 128], tf.float32),
    'class': tf.io.FixedLenFeature([128, 128], tf.float32),
}

We will create a function to read a serialized example into the structure defined by the dictionary created above.

Show code

def parse_tfrecord(example_proto):
    return tf.io.parse_single_example(example_proto, features) # This parses each TFRecord to a defined structure

We will apply the newly created function to our TFRecordDataset using the .map() function, which allows iterating through each element of our TFRecordDataset.

Show code

serialized = record.map(parse_tfrecord) # With .map() we loop through each element of our TFRecordDataset
print(serialized)

We can see that the newly created object named serialized has a different form than the original TFRecordDataset. This new object has the structure defined by the dictionary created. To display the numerical values of each of the bands of our new object, we will use the function get_single_element().

Show code

element = serialized.get_single_element() # With .get_single_element() we retrieve an individual element from our TFRecordDataset
print(element)

The result is a dictionary, from which we can extract each band using get() followed by the key, which in this case is the name of the exported band.

Show code

b1 = element.get('b1')
b2 = element.get('b2')
b3 = element.get('b3')
b4 = element.get('b4')
label = element.get('class')

We can now view the numerical values of any exported band. We can also transform the objects to a numpy array using the numpy() function.
Now we will create an “image” to be able to visualize it, concatenating the values of the created bands, into a single tensor.

Show code

img = tf.constant([b3.numpy(), b2.numpy(), b1.numpy()]) # This appends each band to a constant tensor
print(img)

If we visualize the shape of the tensor, we see that it has the shape of (3, 128, 128), in order to plot it we must transform the shape to (128, 128, 3). To do this we will use the .transpose() function. In addition, since Matplotlib does only tolerate values from 0 to 255, we will normalize our image (tensor) with .divide().

Show code

imgRGB = tf.transpose(tf.divide(img, 12000), [1, 2, 0])

Now we will plot the results using the matplotlib library. We will be able to contrast how the logging activity is visualized with a SkySat (0.5 m) image (on the left), and the hand-digitized label (on the right).

Show code

fig, axs = plt.subplots(1, 2, figsize = (15, 15))

axs[0].set_title('RGB Image')
axs[0].imshow(imgRGB)

axs[1].set_title('Label')
axs[1].imshow(label.numpy())

plt.show()

This plot compares the exported SkySat satellite image patch with the hand-digitized label. Performing the manual digitization process takes many hours, especially when the logging activity is intense. Deep Learning models greatly speed up this manual process and allow the user to focus on evaluating the output.

4.4 3. Data processing

4.4.1 3.1 Definition of variables

Before performing data processing, some variables will be defined for the training. - The variable kernel_size contains the size of the patch, from which we build its shape. - The variable patch_path refers to the folder containing the previously exported patches. - buffer_size this number is the amount of data that will be filled when shuffling. For example, if our data contains 100 records and we have a buffer_size of 10, then shuffle will select a random element from the first 10 records. The space of this element will be replaced by the 101-st element. Since our dataset contains less than 250 files, and we want the model to shuffle randomly the whole dataset then we will define this parameter to 1000 (it could be higher).

The model also needs hyperparameters, defining these parameters can sometimes be time consuming and experimental. Some of the most important are described below: - batch_size: The number of examples used in a pass throught the network. A high batch size can lead to faster training times but may result in lower accuracy and overfitting, while a small batch sizes can provide better accuracy, but can be computationally expensive and time-consuming. We will set our batch size to 4 (because this is a small dataset). - epochs: This is the amount of training cycles through all of the samples in the training dataset. This is how many times the model will see the entire training data before completing training. We will set this value to 10. - learning_rate: This controls how much the model’s parameters are adjusted during each training step. This will be set to 0.1

Show code

bands = ['b1', 'b2', 'b3', 'b4']
response = ['class']
features = bands + response

kernel_size = 128
kernel_shape = [kernel_size, kernel_size]

columns = [
  tf.io.FixedLenFeature(shape = kernel_shape, dtype = tf.float32) for k in features
]

# Path to the folder with patches (Benchmark dataset)
patch_path = '/content/drive/MyDrive/DL_Book'
train_path = f'{patch_path}/train*'
val_path   = f'{patch_path}/validation*'
test_path  = f'{patch_path}/testing*'

features_dict = dict(zip(features, columns))

train_size = len([f for f in os.listdir(patch_path) if f.startswith('train')])
val_size = len([f for f in os.listdir(patch_path) if f.startswith('validation')])
test_size = len([f for f in os.listdir(patch_path) if f.startswith('testing')])

# Hyperparameters
batch_size = 4
epochs = 10
buffer_size = 1000
learning_rate = 0.1

4.4.2 3.2 Create a TFRecordDataset

TFRecordDataset are a collection of TFRecords, so they are very useful for storing large amounts of data, and are optimized to save memory.
First we will create a function called .parse_tfrecord(), which will perform the same functionality as in the data visualization, done previously. We will transform the structure of each TFRecord.

Show code

def parse_tfrecord(example_proto):
    return tf.io.parse_single_example(example_proto, features_dict)

Second, a function called .to_tuple() will be created, which will convert the tensor dictionary to a tuple, with the scheme (inputs, outputs).

Show code

def to_tuple(inputs):
    inputsList = [inputs.get(key) for key in features]
    stacked = tf.stack(inputsList, axis = 0) # This stacks the input list to a single tensor
    stacked = tf.transpose(stacked, [1, 2, 0]) # Transposition of the stacked tensor to the shape 128x128x4
    return stacked[:, :, :len(bands)], stacked[:, :, len(bands):]

Third, a function will be created to read the tensors, this function will also apply the functions created previously: .parse_tfrecord() and .to_tuple()

Show code

def get_dataset(pattern):
    glob = tf.io.gfile.glob(pattern) # This reads the files in our path
    dataset = tf.data.TFRecordDataset(glob, compression_type = 'GZIP') # Add the files to a TFRecordDataset container
    dataset = dataset.map(parse_tfrecord, num_parallel_calls = 5)
    dataset = dataset.map(to_tuple, num_parallel_calls = 5)
    return dataset

We will create 3 TFRecordDataset, which will contain our training, validation and test data set, respectively. To do this we will use functions, within each one there will be an object called glob that will indicate the path of the data sets.

Show code

def get_patch_from_path(path, batch_size, buffer_size = 1000, shuffle = False, repeat = False):
    dataset = get_dataset(path + '*')
    dataset = dataset.shuffle(buffer_size, seed = 42) if shuffle else dataset # We will shuffle only the training dataset
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat() if repeat else dataset
    return dataset

training_data = get_patch_from_path(train_path, batch_size = batch_size, buffer_size = buffer_size, shuffle = True, repeat = True)
validation_data = get_patch_from_path(val_path, batch_size = 1, repeat = True)
testing_data = get_patch_from_path(test_path, batch_size = 1)

If we print the 3 sets of data we will see that the training and validation data we see are of type RepeatDataset(), that is, they are files that are constantly being generated, which based on the previous functions are being ordered randomly and the Outgoing data set has a size of 1 (based on batch size).

Show code

print(training_data)
print(validation_data)
print(testing_data)

4.5 4. Deep Learning

4.5.1 4.1 Model construction

In this notebook we will use a modification of the U-Net architecture developed by Ronneberger et al., (2015). The U-Net architecture was initially designed for use in biomedical imaging but proved to be very efficient in the segmentation of satellite and drone images, and is currently one of the most widely used for that purpose. This architecture has demonstrated remarkable effectiveness in various environmental monitoring applications, including agricultural field boundary detection John & Zhang, 2022, urban land use classification, and coastal change detection. Ulmas & Liiv, 2020 further validated U-Net’s superiority in crop type mapping from multi-spectral satellite imagery, showing that its ability to preserve spatial resolution through the decoder path results in more precise segmentation boundaries compared to traditional convolutional neural networks or patch-based classification approaches. The architecture’s robustness to varying image resolutions and its capacity to learn from limited training data through effective data augmentation make it particularly well-suited for remote sensing applications where high-quality labeled data may be scarce Manos et al., 2022.

To represent the architecture we will use some layers of tensorflow.keras. This modified version of U-Net has more encoding and decoding blocks than the original architecture, which increases the depth of the model and its learnability. Unlike the original design, which starts and ends with 64 filters, this implementation starts and ends with a layer of 32 filters, slightly reducing the initial complexity before continuing with the standard progression. Below is a diagram of the modified U-Net architecture used.

This modified architecture allows a superior multi-scale feature extraction (due to the additional encoder and decoder levels). However, this comes with a cost, this architecture has much more parameters compared to the standard U-Net. This will negatively impact training and inference times.

Show code

def conv_block(input_tensor, num_filters):
    encoder = tf.keras.layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)
    encoder = tf.keras.layers.BatchNormalization()(encoder)
    encoder = tf.keras.layers.Activation('relu')(encoder)
    encoder = tf.keras.layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)
    encoder = tf.keras.layers.BatchNormalization()(encoder)
    encoder = tf.keras.layers.Activation('relu')(encoder)
    return encoder

def encoder_block(input_tensor, num_filters):
    encoder = conv_block(input_tensor, num_filters)
    encoder_pool = tf.keras.layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)
    return encoder_pool, encoder

def decoder_block(input_tensor, concat_tensor, num_filters):
    decoder = tf.keras.layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)
    decoder = tf.keras.layers.concatenate([concat_tensor, decoder], axis=-1)
    decoder = tf.keras.layers.BatchNormalization()(decoder)
    decoder = tf.keras.layers.Activation('relu')(decoder)
    decoder = tf.keras.layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)
    decoder = tf.keras.layers.BatchNormalization()(decoder)
    decoder = tf.keras.layers.Activation('relu')(decoder)
    decoder = tf.keras.layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)
    decoder = tf.keras.layers.BatchNormalization()(decoder)
    decoder = tf.keras.layers.Activation('relu')(decoder)
    return decoder

def get_model():
    inputs = tf.keras.Input(shape=[kernel_size, kernel_size, len(bands)])
    encoder0_pool, encoder0 = encoder_block(inputs, 32)
    encoder1_pool, encoder1 = encoder_block(encoder0_pool, 64)
    encoder2_pool, encoder2 = encoder_block(encoder1_pool, 128)
    encoder3_pool, encoder3 = encoder_block(encoder2_pool, 256)
    encoder4_pool, encoder4 = encoder_block(encoder3_pool, 512)
    center = conv_block(encoder4_pool, 1024)
    decoder4 = decoder_block(center, encoder4, 512)
    decoder3 = decoder_block(decoder4, encoder3, 256)
    decoder2 = decoder_block(decoder3, encoder2, 128)
    decoder1 = decoder_block(decoder2, encoder1, 64)
    decoder0 = decoder_block(decoder1, encoder0, 32)
    outputs = tf.keras.layers.Conv2D(1, (1, 1), activation='sigmoid')(decoder0)

    model = tf.keras.models.Model(inputs=[inputs], outputs=[outputs])
    return model

Now we will define the model, using the created function .get_model() and compiling it.

The output of our model will be single channel and binary (0 and 1) because we used a sigmoidactivation function. Therefore, we will use the BinaryCrossentropy loss function since it is ideal for binary classifications. In addition, we will use the Adam optimizer which is robust on both large and noisy data sets and is considered computationally efficient. Finally, we will use the IoU as a metric as it is ideal for binary image classifications because it evaluates the intersection between prediction and actual values.

Show code

model = get_model()
model.compile(
    optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
    loss = tf.keras.losses.BinaryCrossentropy(),
    metrics = [tf.keras.metrics.IoU(num_classes = 2, target_class_ids = [0] , name = 'iou')])

Once the model is compiled, we can see the layers it has using the .summary() function

Show code

model.summary()

4.5.2 4.2 Model training

Once the model is defined, we will proceed to train it, adjusting it to the already processed data set, using the .fit() function. Feel free to change the parameters and use the GPU environment!

Show code

history = model.fit(
    x = training_data,
    epochs = epochs,
    batch_size = batch_size,
    verbose = 1,
    steps_per_epoch = int(train_size / batch_size),
    validation_data = validation_data,
    validation_steps = val_size)

We can plot the training results by epoch using matplotlib.

Show code

# Metrics graph
fig, ax = plt.subplots(nrows = 2, sharex = True, figsize = (15,10))

ax[0].plot(history.history['loss'], color = '#1f77b4', label = 'Training Loss')
ax[0].plot(history.history['val_loss'], linestyle = ':', marker = 'o', markersize = 3, color = '#1f77b4', label = 'Validation Loss')
ax[0].set_ylabel('Loss')
ax[0].set_ylim(0.0, 1)
ax[0].legend()

ax[1].plot(history.history['iou'], color = '#E5D31F', label = 'Training IoU')
ax[1].plot(history.history['val_iou'], linestyle = ':', marker = 'o', markersize = 3, color = '#E5D31F', label = 'Validation IoU')
ax[1].set_ylabel('IoU')
ax[1].legend(loc="lower right")

ax[1].set_xticks(history.epoch)
ax[1].set_xticklabels(range(1, len(history.epoch) + 1, 1))
ax[1].set_xlabel('Epoch')
ax[1].set_ylim(0.0, 1)

plt.legend();

The training loss exhibits a clear downward trend, decreasing from approximately 0.08 to around 0.04 over the 10 epochs, indicating that the model is effectively learning and reducing error on the training dataset. The training IoU remains stable at about 0.97 throughout the process, while the validation IoU is consistently slightly higher at approximately 0.975, suggesting good generalization performance.

Although the validation loss shows some instability, particularly a very high value during the first epoch followed by fluctuations in subsequent epochs, it quickly stabilizes to low values, which may be attributed to initialization effects or data scaling issues rather than persistent model misfit. Overall, the model demonstrates strong and consistent segmentation performance.

Given these results, extending the training beyond 10 epochs may yield only marginal improvements. Considering the relatively small dataset, additional training could increase the risk of overfitting without significantly enhancing model performance.

4.5.3 4.3 Model evaluation

Now we will use the test dataset, reserved for this moment. We can get the total metrics for the entire test data set.

Show code

evaluation = model.evaluate(
    x = testing_data,
    verbose = 1,
    steps = test_size
)

We can also apply the model to the test dataset, and graph the results.

Show code

prediction = model.predict(
    x = testing_data,
    verbose = 1,
    steps = test_size
)

By running the following code block we can visualize 3 random images from our test data set, and the model predictions.

Show code

np.random.seed(42)
n_images = 3

listTest = np.random.choice(test_size, size = n_images, replace = False)

print(listTest)

fig, axs = plt.subplots(n_images, 2, figsize = (15, 15))

for i in range(len(listTest)):
    imgTest = tf.math.divide(tf.squeeze(list(testing_data)[listTest[i]][0]), 12000)[:, :, 0:3].numpy()
    imgPred = prediction[listTest[i]].reshape(kernel_size, kernel_size)

    axs[i, 0].set_title('RGB Test Image ' + str(listTest[i]))
    axs[i, 0].imshow(imgTest)
    axs[i, 1].set_title('Prediction ' + str(listTest[i]))
    axs[i, 1].imshow(imgPred, cmap='coolwarm')

plt.subplots_adjust(wspace = -0.3, hspace = 0.4)
plt.show()

Despite being trained on a small dataset for only 10 epochs, the model produces high-quality segmentations with well-defined boundaries and contiguous regions, validating both the architecture choice and the training approach for this VHR satellite imagery application. This model proves that effective selective logging monitoring is achievable without massive datasets or extensive computational power. This modified U-Net architecture provides a powerful combination of high precision, data efficiency, and proven real-world effectiveness.

4.5.4 4.4 Save the model

Once we have an adequate model based on the metrics used, we will proceed to save it.
To do this we will use the tf.keras.save_model() function and we will have to specify the path to save the model.

Show code

# If necessary
!mkdir /content/drive/MyDrive/DL_Book/model/

Show code

modelDir = '/content/drive/MyDrive/DL_Book/model/logging_model.keras'
tf.keras.models.save_model(model, modelDir)

In the next chapter, we will load our model again and apply it to the whole image!