Commit 1e2d0208 by Gencer

fix

parent 52ce56c5
# BigEarthNet Deep Models
This repository contains code to use the BigEarthNet archive for deep learning applications.
# Citations
For the original release and introduction of BigEarthNet, please refer to:
If you use BigEarthNet archive, please cite our paper given below:
> G. Sumbul, M. Charfuelan, B. Demir, V. Markl, BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding, IEEE International Conference on Geoscience and Remote Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019.
```
@article{BigEarthNet,
Author = {Gencer Sumbul and Marcela Charfuelan and Begüm Demir and Volker Markl},
Title = {BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding},
Year = {2019},
Eprint = {arXiv:1902.06148},
Doi = {10.1109/IGARSS.2019.8900532},
Pages = {5901-5904}
author = {Gencer Sumbul and Marcela Charfuelan and Begüm Demir and Volker Markl},
title = {BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding},
booktitle={IEEE International Geoscience and Remote Sensing Symposium},
year = {2019},
pages = {5901--5904}
doi={10.1109/IGARSS.2019.8900532},
month={July}
}
```
# Class Labels
There are two sets of class labels available:
* Multi-labels associated to level 3 class nomenclature of CLC 2018
* Compact multi-labels associated to rearranged class nomenclature of CLC 2018
* Multi-labels associated to Level-3 class nomenclature of CLC 2018
* Compact multi-labels associated to rearranged class nomenclature of CLC 2018 (Coming Soon!)
# Pre-trained Deep Learning Models on BigEarthNet
We provide code and models for fhe following models that have been pre-trained on BigEarthNet for scene classification:
Please find an elaborate discussion about the characteristics of the class labels in the literature above.
Deep Learning Models pre-trained on the BigEarthNet with multi-labels associated to Level-3 class nomenclature of CLC 2018 (i.e., original labels):
## Generation of Training/Test/Validation Splits
After downloading the raw images from https://www.bigearth.net, they need to be prepared for your ML application. We provide the script `prep_splits.py` for this purpose. It generates consumable data files for your ML framework of choice. The following command line arguments can be specified:
| Model Name | Pre-Trained TensorFlow Models with Original Labels |
| ------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| K-Branch CNN | [http://bigearth.net/static/pretrained-models/original_labels/K-BranchCNN.zip](http://bigearth.net/static/pretrained-models/original_labels/K-BranchCNN.zip)|
| VGG16 | [http://bigearth.net/static/pretrained-models/original_labels/VGG16.zip](http://bigearth.net/static/pretrained-models/original_labels/VGG16.zip) |
| VGG19 | [http://bigearth.net/static/pretrained-models/original_labels/VGG19.zip](http://bigearth.net/static/pretrained-models/original_labels/VGG19.zip) |
| ResNet50 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet50.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet50.zip) |
| ResNet101 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet101.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet101.zip) |
| ResNet152 | [http://bigearth.net/static/pretrained-models/original_labels/ResNet152.zip](http://bigearth.net/static/pretrained-models/original_labels/ResNet152.zip) |
* `-r` or `--root_folder`: The root folder containign the raw images you have previously downloaded.
Deep Learning Models pre-trained on the BigEarthNet with compact multi-labels associated to rearranged class nomenclature of CLC 2018 (i.e., compact labels):
Coming Soon!
The TensorFlow code for these models can be found [here](https://gitlab.tu-berlin.de/rsim/bigearthnet-models-tf).
# Generation of Training/Test/Validation Splits
After downloading the raw images from https://www.bigearth.net, they need to be prepared for your ML application. We provide the script `prep_splits.py` for this purpose. It generates consumable data files (i.e., TFRecord) for training, validation and test splits which are suitable to use with TensorFlow. Suggested splits can be found with corresponding csv files under `splits` folder. The following command line arguments for `prep_splits.py` can be specified:
* `-r` or `--root_folder`: The root folder containing the raw images you have previously downloaded.
* `-o` or `--out_folder`: The output folder where the resulting files will be created.
* `--update_json`: An optional flag indicating whether or not compact labels shoudl be added to the json file for each patch.
* `-n` or `--patch_names`: A list of CSV files containing the patch names.
* `-n` or `--splits`: A list of CSV files each of which contains the patch names of corresponding split.
Run `python prep_splits.py -h` to see all available parameters.
# Bugs and Requests
----
## Pre-trained Models
We provide code and models for fhe following models that have been pre-trained on BigEarthNet for scene classification:
If you face a bug or have a feature request, please create an issue:
https://gitlab.tubit.tu-berlin.de/rsim/bigearthnet-models/issues
| Model Name | Pre-Trained TensorFlow Model |
| ------------- |-------------------------------|
| K-Branch CNN | [LINK]() |
| VGG16 | [LINK]() |
| VGG19 | [LINK]() |
| ResNet50 | [LINK]() |
| ResNet101 | [LINK]() |
| ResNet152 | [LINK]() |
Authors
-------
The TensorFlow code for these models can be found [here](https://gitlab.tu-berlin.de/rsim/bigearthnet-models-tf).
**Gencer Sümbül**
http://www.user.tu-berlin.de/gencersumbul/
**Tristan Kreuziger**
https://www.rsim.tu-berlin.de/menue/team/tristan_kreuziger/
## Training Own Models From Scratch
Please check [`train.py`](https://gitlab.tu-berlin.de/rsim/bigearthnet-models-tf/blob/master/train.py) to see how the models can be trained from scratch with BigEarthNet.
# License
The BigEarthNet Archive is licensed under the **Community Data License Agreement – Permissive, Version 1.0** ([Text](https://cdla.io/permissive-1-0/)).
......
{
"full_labels":{
"original_labels":{
"Continuous urban fabric": 0,
"Discontinuous urban fabric": 1,
"Industrial or commercial units": 2,
......@@ -43,47 +43,5 @@
"Coastal lagoons": 40,
"Estuaries": 41,
"Sea and ocean": 42
},
"label_conversion": [
[0, 1],
[2],
[11, 12, 13],
[14, 15, 16, 18],
[17],
[19],
[20],
[21],
[22],
[23],
[24],
[25, 31],
[26, 27],
[28],
[29],
[33, 34],
[35, 36],
[38, 39],
[40, 41, 42]
],
"compact_labels":{
"Urban fabric": 0,
"Industrial or commercial units": 1,
"Arable land": 2,
"Permanent crops": 3,
"Pastures": 4,
"Complex cultivation patterns": 5,
"Land principally occupied by agriculture, with significant areas of natural vegetation": 6,
"Agro-forestry areas": 7,
"Broad-leaved forest": 8,
"Coniferous forest": 9,
"Mixed forest": 10,
"Natural grassland and sparsely vegetated areas": 11,
"Moors, heathland and sclerophyllous vegetation": 12,
"Transitional woodland, shrub": 13,
"Beaches, dunes, sands": 14,
"Inland wetlands": 15,
"Coastal wetlands": 16,
"Inland waters": 17,
"Marine waters": 18
}
}
\ No newline at end of file
This folder contains the splits for the images into training, validation, and test. They must be passed to the `prep_splits.py` script.
\ No newline at end of file
......@@ -4,15 +4,13 @@
# This script creates splits with TFRecord files from BigEarthNet
# image patches based on csv files that contain patch names.
#
# The script is also capable of updating BigEarthNet labels.
#
# prep_splits.py --help can be used to learn how to use this script.
#
# Author: Gencer Sumbul, http://www.user.tu-berlin.de/gencersumbul/
# Email: gencer.suembuel@tu-berlin.de
# Date: 16 Dec 2019
# Version: 1.0.1
# Usage: prep_splits.py [-h] [-r ROOT_FOLDER] [-o OUT_FOLDER] [--update_json]
# Usage: prep_splits.py [-h] [-r ROOT_FOLDER] [-o OUT_FOLDER]
# [-n PATCH_NAMES [PATCH_NAMES ...]]
from __future__ import print_function
......@@ -26,17 +24,11 @@ band_names = ['B01', 'B02', 'B03', 'B04', 'B05',
'B06', 'B07', 'B08', 'B8A', 'B09', 'B11', 'B12']
GDAL_EXISTED = False
RASTERIO_EXISTED = False
UPDATE_JSON = False
with open('label_indices.json', 'rb') as f:
label_indices = json.load(f)
label_conversion = label_indices['label_conversion']
compact_label_idx = {v: k for k, v in label_indices['compact_labels'].iteritems()}
def prep_example(bands, full_labels, compact_labels, full_labels_multi_hot,
compact_labels_multi_hot,
patch_name):
def prep_example(bands, original_labels, original_labels_multi_hot, patch_name):
return tf.train.Example(
features=tf.train.Features(
feature={
......@@ -64,16 +56,11 @@ def prep_example(bands, full_labels, compact_labels, full_labels_multi_hot,
int64_list=tf.train.Int64List(value=np.ravel(bands['B11']))),
'B12': tf.train.Feature(
int64_list=tf.train.Int64List(value=np.ravel(bands['B12']))),
'full_labels': tf.train.Feature(
'original_labels': tf.train.Feature(
bytes_list=tf.train.BytesList(
value=[i.encode('utf-8') for i in full_labels])),
'compact_labels': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[
i.encode('utf-8') for i in compact_labels])),
'full_labels_multi_hot': tf.train.Feature(
int64_list=tf.train.Int64List(value=full_labels_multi_hot)),
'compact_labels_multi_hot': tf.train.Feature(
int64_list=tf.train.Int64List(value=compact_labels_multi_hot)),
value=[i.encode('utf-8') for i in original_labels])),
'original_labels_multi_hot': tf.train.Feature(
int64_list=tf.train.Int64List(value=original_labels_multi_hot)),
'patch_name': tf.train.Feature(
bytes_list=tf.train.BytesList(value=[patch_name.encode('utf-8')]))
}))
......@@ -97,39 +84,22 @@ def create_split(root_folder, patch_names, TFRecord_writer):
band_data = np.array(band_ds.read(1))
bands[band_name] = np.array(band_data)
full_labels_multi_hot = np.zeros(
len(label_indices['full_labels'].keys()), dtype=int)
compact_labels_multi_hot = np.zeros(len(label_conversion),dtype=int)
original_labels_multi_hot = np.zeros(
len(label_indices['original_labels'].keys()), dtype=int)
patch_json_path = os.path.join(
patch_folder_path, patch_name + '_labels_metadata.json')
with open(patch_json_path, 'rb') as f:
patch_json = json.load(f)
full_labels = patch_json['labels']
for label in full_labels:
full_labels_multi_hot[label_indices['full_labels'][label]] = 1
original_labels = patch_json['labels']
for label in original_labels:
original_labels_multi_hot[label_indices['original_labels'][label]] = 1
for i in range(len(label_conversion)):
compact_labels_multi_hot[i] = (
np.sum(full_labels_multi_hot[label_conversion[i]]) > 0
).astype(int)
compact_labels = []
for i in np.where(compact_labels_multi_hot == 1)[0]:
compact_labels.append(compact_label_idx[i])
if UPDATE_JSON:
patch_json['compact_labels'] = compact_labels
with open(patch_json_path, 'wb') as f:
json.dump(patch_json, f)
example = prep_example(
bands,
full_labels,
compact_labels,
full_labels_multi_hot,
compact_labels_multi_hot,
original_labels,
original_labels_multi_hot,
patch_name
)
TFRecord_writer.write(example.SerializeToString())
......@@ -138,14 +108,12 @@ def create_split(root_folder, patch_names, TFRecord_writer):
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=
'This script creates TFRecord files for splits of the BigEarthNet image patches')
'This script creates TFRecord files for the BigEarthNet train, validation and test splits')
parser.add_argument('-r', '--root_folder', dest = 'root_folder',
help = 'root folder path contains multiple patch folders')
parser.add_argument('-o', '--out_folder', dest = 'out_folder',
help = 'folder path containing resulting TFRecord files')
parser.add_argument('--update_json', default = False, action = "store_true", help =
'flag for adding compact label to the json file of each patch')
parser.add_argument('-n', '--patch_names', dest = 'patch_names', help =
parser.add_argument('-n', '--splits', dest = 'splits', help =
'csv files each of which contain list of patch names', nargs = '+')
args = parser.parse_args()
......@@ -184,11 +152,11 @@ if __name__ == "__main__":
print('ERROR: please install numpy package')
exit()
if args.patch_names:
if args.splits:
try:
patch_names_list = []
split_names = []
for csv_file in args.patch_names:
for csv_file in args.splits:
patch_names_list.append([])
split_names.append(os.path.basename(csv_file).split('.')[0])
with open(csv_file, 'r') as fp:
......@@ -209,11 +177,7 @@ if __name__ == "__main__":
except:
print('ERROR: TFRecord writer is not able to write files')
exit()
if args.update_json:
UPDATE_JSON = True
# Reads spectral bands of all patches whose folder names are populated before
for split_idx in range(len(patch_names_list)):
print('INFO: creating the split of', split_names[split_idx], 'is started')
create_split(
......
This folder contains the suggested training, validation, and test splits. Each csv file includes patch names of the corresponding split. They must be passed to the `prep_splits.py` script.
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment