Commit fa23b014 by Gencer

Initial commit

parents
# BigEarthNet Deep Learning Models
This repository contains code to use the BigEarthNet archive for deep learning applications.
If you use BigEarthNet archive, please cite our paper given below:
> G. Sumbul, M. Charfuelan, B. Demir, V. Markl, BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding, IEEE International Conference on Geoscience and Remote Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019.
```
@inproceedings{BigEarthNet,
author = {Gencer Sumbul and Marcela Charfuelan and Begüm Demir and Volker Markl},
title = {BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding},
booktitle={IEEE International Geoscience and Remote Sensing Symposium},
year = {2019},
pages = {5901--5904}
doi={10.1109/IGARSS.2019.8900532},
month={July}
}
```
# Pre-trained Deep Learning Models on BigEarthNet
We provide code and model weights for fhe following deep learning models that have been pre-trained on BigEarthNet for scene classification:
Deep Learning Models pre-trained on the BigEarthNet with multi-labels associated to Level-3 class nomenclature of CLC 2018:
| Model Names | Pre-Trained TensorFlow Models | F<sub>1</sub> Score |
| ------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| K-Branch CNN | [http://bigearth.net/static/pretrained-models/original_labels/K-BranchCNN.zip](http://bigearth.net/static/pretrained-models/original_labels/K-BranchCNN.zip)| 74.35% |
| VGG16 | [http://bigearth.net/static/pretrained-models/original_labels/VGG16.zip](http://bigearth.net/static/pretrained-models/original_labels/VGG16.zip) | 78.18% |
| VGG19 | [http://bigearth.net/static/pretrained-models/original_labels/VGG19.zip](http://bigearth.net/static/pretrained-models/original_labels/VGG19.zip) | 77.30% |
| ResNet50 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet50.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet50.zip) | 75.00% |
| ResNet101 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet101.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet101.zip) | 72.76% |
| ResNet152 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet152.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet152.zip) | 75.86% |
The results provided in the [BigEarthNet paper](http://bigearth.net/static/documents/BigEarthNet_IGARSS_2019.pdf) is different from those given above due to the selection of different train, validation and test sets.
The TensorFlow code for these models can be found [here](https://gitlab.tu-berlin.de/rsim/bigearthnet-models-tf).
# Generation of Training/Test/Validation Splits
After downloading the raw images from https://www.bigearth.net, they need to be prepared for your ML application. We provide the script `prep_splits.py` for this purpose. It generates consumable data files (i.e., TFRecord) for training, validation and test splits which are suitable to use with TensorFlow. Suggested splits can be found with corresponding csv files under `splits` folder. The following command line arguments for `prep_splits.py` can be specified:
* `-r` or `--root_folder`: The root folder containing the raw images you have previously downloaded.
* `-o` or `--out_folder`: The output folder where the resulting files will be created.
* `-n` or `--splits`: A list of CSV files each of which contains the patch names of corresponding split.
To run the script, either GDAL or rasterio package should be installed. TensorFlow package should also be installed. The script is tested with Python 2.7, TensorFlow 1.3 and Ubuntu 16.04.
# Bugs and Requests
If you face a bug or have a feature request, please create an issue:
https://gitlab.tubit.tu-berlin.de/rsim/bigearthnet-models/issues
Authors
-------
**Gencer Sümbül**
http://www.user.tu-berlin.de/gencersumbul/
**Tristan Kreuziger**
https://www.rsim.tu-berlin.de/menue/team/tristan_kreuziger/
# License
The BigEarthNet Archive is licensed under the **Community Data License Agreement – Permissive, Version 1.0** ([Text](https://cdla.io/permissive-1-0/)).
The code in this repository to facilitate the use of the archive is licensed under the **MIT License**:
```
MIT License
Copyright (c) 2019 The BigEarthNet Authors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```
{
"original_labels":{
"Continuous urban fabric": 0,
"Discontinuous urban fabric": 1,
"Industrial or commercial units": 2,
"Road and rail networks and associated land": 3,
"Port areas": 4,
"Airports": 5,
"Mineral extraction sites": 6,
"Dump sites": 7,
"Construction sites": 8,
"Green urban areas": 9,
"Sport and leisure facilities": 10,
"Non-irrigated arable land": 11,
"Permanently irrigated land": 12,
"Rice fields": 13,
"Vineyards": 14,
"Fruit trees and berry plantations": 15,
"Olive groves": 16,
"Pastures": 17,
"Annual crops associated with permanent crops": 18,
"Complex cultivation patterns": 19,
"Land principally occupied by agriculture, with significant areas of natural vegetation": 20,
"Agro-forestry areas": 21,
"Broad-leaved forest": 22,
"Coniferous forest": 23,
"Mixed forest": 24,
"Natural grassland": 25,
"Moors and heathland": 26,
"Sclerophyllous vegetation": 27,
"Transitional woodland/shrub": 28,
"Beaches, dunes, sands": 29,
"Bare rock": 30,
"Sparsely vegetated areas": 31,
"Burnt areas": 32,
"Inland marshes": 33,
"Peatbogs": 34,
"Salt marshes": 35,
"Salines": 36,
"Intertidal flats": 37,
"Water courses": 38,
"Water bodies": 39,
"Coastal lagoons": 40,
"Estuaries": 41,
"Sea and ocean": 42
}
}
\ No newline at end of file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# This script creates splits with TFRecord files from BigEarthNet
# image patches based on csv files that contain patch names.
#
# prep_splits.py --help can be used to learn how to use this script.
#
# Author: Gencer Sumbul, http://www.user.tu-berlin.de/gencersumbul/
# Email: gencer.suembuel@tu-berlin.de
# Date: 16 Dec 2019
# Version: 1.0.1
# Usage: prep_splits.py [-h] [-r ROOT_FOLDER] [-o OUT_FOLDER]
# [-n PATCH_NAMES [PATCH_NAMES ...]]
from __future__ import print_function
import argparse
import os
import csv
import json
# Spectral band names to read related GeoTIFF files
band_names = ['B01', 'B02', 'B03', 'B04', 'B05',
'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12']
GDAL_EXISTED = False
RASTERIO_EXISTED = False
with open('label_indices.json', 'rb') as f:
label_indices = json.load(f)
def prep_example(bands, original_labels, original_labels_multi_hot, patch_name):
return tf.train.Example(
features=tf.train.Features(
feature={
'B01': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B01']))),
'B02': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B02']))),
'B03': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B03']))),
'B04': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B04']))),
'B05': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B05']))),
'B06': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B06']))),
'B07': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B07']))),
'B08': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B08']))),
'B8A': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B8A']))),
'B09': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B09']))),
'B11': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B11']))),
'B12': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B12']))),
'original_labels': tf.train.Feature(
bytes_list=tf.train.BytesList(
value=[i.encode('utf-8') for i in original_labels])),
'original_labels_multi_hot': tf.train.Feature(
int64_list=tf.train.Int64List(value=original_labels_multi_hot)),
'patch_name': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[patch_name.encode('utf-8')]))
}))
def create_split(root_folder, patch_names, TFRecord_writer):
progress_bar = tf.contrib.keras.utils.Progbar(target = len(patch_names))
for patch_idx, patch_name in enumerate(patch_names):
patch_folder_path = os.path.join(root_folder, patch_name)
bands = {}
for band_name in band_names:
# First finds related GeoTIFF path and reads values as an array
band_path = os.path.join(
patch_folder_path, patch_name + '_' + band_name + '.tif')
if GDAL_EXISTED:
band_ds = gdal.Open(band_path, gdal.GA_ReadOnly)
raster_band = band_ds.GetRasterBand(1)
band_data = raster_band.ReadAsArray()
bands[band_name] = np.array(band_data)
elif RASTERIO_EXISTED:
band_ds = rasterio.open(band_path)
band_data = np.array(band_ds.read(1))
bands[band_name] = np.array(band_data)
original_labels_multi_hot = np.zeros(
len(label_indices['original_labels'].keys()), dtype=int)
patch_json_path = os.path.join(
patch_folder_path, patch_name + '_labels_metadata.json')
with open(patch_json_path, 'rb') as f:
patch_json = json.load(f)
original_labels = patch_json['labels']
for label in original_labels:
original_labels_multi_hot[label_indices['original_labels'][label]] = 1
example = prep_example(
bands,
original_labels,
original_labels_multi_hot,
patch_name
)
TFRecord_writer.write(example.SerializeToString())
progress_bar.update(patch_idx)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=
'This script creates TFRecord files for the BigEarthNet train, validation and test splits')
parser.add_argument('-r', '--root_folder', dest = 'root_folder',
help = 'root folder path contains multiple patch folders')
parser.add_argument('-o', '--out_folder', dest = 'out_folder',
help = 'folder path containing resulting TFRecord files')
parser.add_argument('-n', '--splits', dest = 'splits', help =
'csv files each of which contain list of patch names', nargs = '+')
args = parser.parse_args()
# Checks the existence of patch folders and populate the list of patch folder paths
folder_path_list = []
if args.root_folder:
if not os.path.exists(args.root_folder):
print('ERROR: folder', args.root_folder, 'does not exist')
exit()
else:
print('ERROR: folder', args.patch_folder, 'does not exist')
exit()
# Checks the existence of required python packages
try:
import gdal
GDAL_EXISTED = True
print('INFO: GDAL package will be used to read GeoTIFF files')
except ImportError:
try:
import rasterio
RASTERIO_EXISTED = True
print('INFO: rasterio package will be used to read GeoTIFF files')
except ImportError:
print('ERROR: please install either GDAL or rasterio package to read GeoTIFF files')
exit()
try:
import tensorflow as tf
except ImportError:
print('ERROR: please install tensorflow package to create TFRecord files')
exit()
try:
import numpy as np
except ImportError:
print('ERROR: please install numpy package')
exit()
if args.splits:
try:
patch_names_list = []
split_names = []
for csv_file in args.splits:
patch_names_list.append([])
split_names.append(os.path.basename(csv_file).split('.')[0])
with open(csv_file, 'r') as fp:
csv_reader = csv.reader(fp, delimiter=',')
for row in csv_reader:
patch_names_list[-1].append(row[0].strip())
except:
print('ERROR: some csv files either do not exist or have been corrupted')
exit()
try:
writer_list = []
for split_name in split_names:
writer_list.append(
tf.python_io.TFRecordWriter(os.path.join(
args.out_folder, split_name + '.tfrecord'))
)
except:
print('ERROR: TFRecord writer is not able to write files')
exit()
for split_idx in range(len(patch_names_list)):
print('INFO: creating the split of', split_names[split_idx], 'is started')
create_split(
args.root_folder,
patch_names_list[split_idx],
writer_list[split_idx]
)
writer_list[split_idx].close()
This folder contains the suggested training, validation, and test splits. Each csv file includes patch names of the corresponding split. They must be passed to the `prep_splits.py` script.
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment