Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
S
seminar-breakout
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Shashank Suhas
seminar-breakout
Commits
3476cb43
Commit
3476cb43
authored
Aug 31, 2018
by
Yuxin Wu
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
45d63caf
Changes
12
Hide whitespace changes
Inline
Side-by-side
Showing
12 changed files
with
51 additions
and
26 deletions
+51
-26
.github/ISSUE_TEMPLATE/feature-requests.md
.github/ISSUE_TEMPLATE/feature-requests.md
+4
-2
docs/tutorial/dataflow.md
docs/tutorial/dataflow.md
+5
-6
docs/tutorial/inference.md
docs/tutorial/inference.md
+5
-4
docs/tutorial/save-load.md
docs/tutorial/save-load.md
+1
-1
docs/tutorial/symbolic.md
docs/tutorial/symbolic.md
+2
-1
examples/FasterRCNN/NOTES.md
examples/FasterRCNN/NOTES.md
+5
-4
examples/FasterRCNN/coco.py
examples/FasterRCNN/coco.py
+3
-1
examples/FasterRCNN/config.py
examples/FasterRCNN/config.py
+4
-1
examples/FasterRCNN/data.py
examples/FasterRCNN/data.py
+1
-1
examples/FasterRCNN/train.py
examples/FasterRCNN/train.py
+9
-3
examples/GAN/GAN.py
examples/GAN/GAN.py
+1
-1
tensorpack/callbacks/misc.py
tensorpack/callbacks/misc.py
+11
-1
No files found.
.github/ISSUE_TEMPLATE/feature-requests.md
View file @
3476cb43
...
...
@@ -8,5 +8,7 @@ about: Suggest an idea for Tensorpack
(See http://tensorpack.readthedocs.io/en/latest/tutorial/index.html#extend-tensorpack).
It does not have to be added to Tensorpack unless you have a good reason.
+
"Could you improve/implement an example/paper ?"
-- The answer is: we have no plans to do so. We don't take feature requests for
examples or implement a paper for you. If you don't know how to do it, you may ask a usage question.
-- The answer is: we have no plans to do so. We don't consider feature
requests for examples or implement a paper for you, unless it demonstrates
some Tensorpack features not yet demonstrated in the existing examples.
If you don't know how to do it, you may ask a usage question.
docs/tutorial/dataflow.md
View file @
3476cb43
...
...
@@ -14,7 +14,7 @@ that yields datapoints (lists) of two components:
a numpy array of shape (64, 28, 28), and an array of shape (64,).
As you saw,
DataFlow is __independent
__ of TensorFlow
since it produces any python objects
DataFlow is __independent
of TensorFlow__
since it produces any python objects
(usually numpy arrays).
To
`import tensorpack.dataflow`
, you don't even have to install TensorFlow.
You can simply use DataFlow as a data processing pipeline and plug it into any other frameworks.
...
...
@@ -24,7 +24,7 @@ You can simply use DataFlow as a data processing pipeline and plug it into any o
One good thing about having a standard interface is to be able to provide
the greatest code reusability.
There are a lot of existing DataFlow utilities in tensorpack, which you can use to compose
complex DataFlow with a long
data pipeline. A common pipeline usually
DataFlow with complex
data pipeline. A common pipeline usually
would __read from disk (or other sources), apply transformations, group into batches,
prefetch data__, etc. A simple example is as the following:
...
...
@@ -38,13 +38,12 @@ df = BatchData(df, 128)
# start 3 processes to run the dataflow in parallel
df
=
PrefetchDataZMQ
(
df
,
3
)
````
You can find more complicated DataFlow in the
[
ResNet training script
](
../examples/ResNet
/imagenet_utils.py
)
You can find more complicated DataFlow in the
[
ImageNet training script
](
../examples/ImageNetModels
/imagenet_utils.py
)
with all the data preprocessing.
Unless you are working with standard data types (image folders, LMDB, etc),
you would usually want to write the source DataFlow (
`MyDataFlow`
in the above example) for your data format.
See
[
another tutorial
](
extend/dataflow.html
)
for simple instructions on writing a DataFlow.
See
[
another tutorial
](
extend/dataflow.html
)
for simple instructions on writing a DataFlow.
Once you have the source reader, all the
[
existing DataFlows
](
../modules/dataflow.html
)
are ready for you to complete
the rest of the data pipeline.
...
...
@@ -62,7 +61,7 @@ Nevertheless, tensorpack supports data loading with native TF operators / TF dat
### Use DataFlow (outside Tensorpack)
Normally, tensorpack
`InputSource`
interface links DataFlow to the graph for training.
If you use DataFlow in
some
custom code, call
`reset_state()`
first to initialize it,
If you use DataFlow in
other places such as your
custom code, call
`reset_state()`
first to initialize it,
and then use the generator however you like:
```
python
df
=
SomeDataFlow
()
...
...
docs/tutorial/inference.md
View file @
3476cb43
...
...
@@ -11,7 +11,7 @@ There are two ways to do inference during training.
See
[
Write a Callback
](
extend/callback.html
)
.
2.
If your inference follows the paradigm of:
"
fetch some tensors for each input, and aggregate the results
".
"
evaluate some tensors for each input, and aggregate the results in the end
".
You can use the
`InferenceRunner`
interface with some
`Inferencer**.
This will further support prefetch & data-parallel inference.
More details to come.
...
...
@@ -22,18 +22,19 @@ You can use this predicate to choose a different code path in inference mode.
## Inference After Training
Tensorpack is a training interface -- __it doesn't care what happened after training__.
You have everything
need
ed for inference or model diagnosis after
You have everything
you ne
ed for inference or model diagnosis after
training:
1. The trained weights: tensorpack saves them in standard TF checkpoint format.
2. The model: you've already written it yourself with TF symbolic functions.
Therefore, you can build the graph for inference, load the checkpoint, and then use whatever deployment methods TensorFlow supports.
Therefore, you can build the graph for inference, load the checkpoint, and apply
any processing or deployment TensorFlow supports.
And you'll need to read TF docs and __do it on your own__.
### Don't Use Training Metagraph for Inference
Metagraph is the wrong abstraction for a "model".
It stores the entire graph which contains not only the model, but also all the
It stores the entire graph which contains not only the m
athematical m
odel, but also all the
training settings (queues, iterators, summaries, evaluations, multi-gpu replications).
Therefore it is usually wrong to import a training metagraph for inference.
...
...
docs/tutorial/save-load.md
View file @
3476cb43
...
...
@@ -39,5 +39,5 @@ Variables that appear in only one side will be printed as warning.
## Transfer Learning
Therefore, transfer learning is trivial.
If you want to load
some
model, just use the same variable names.
If you want to load
a pre-trained
model, just use the same variable names.
If you want to re-train some layer, just rename it.
docs/tutorial/symbolic.md
View file @
3476cb43
...
...
@@ -2,7 +2,8 @@
# Symbolic Layers
Tensorpack contains a small collection of common model primitives,
such as conv/deconv, fc, bn, pooling layers.
such as conv/deconv, fc, bn, pooling layers.
**You do not need to learn them.**
These layers were written only because there were no alternatives when
tensorpack was first developed.
Nowadays, these implementation actually call
`tf.layers`
directly.
...
...
examples/FasterRCNN/NOTES.md
View file @
3476cb43
...
...
@@ -46,12 +46,13 @@ Model:
Speed:
1.
The training will start very slow due to convolution warmup, until about 10k
steps to reach a maximum speed.
1.
The training will start very slowly due to convolution warmup, until about
10k steps (or more if scale augmentation is used) to reach a maximum speed.
As a result, the ETA is also inaccurate at the beginning.
You can disable warmup by
`export TF_CUDNN_USE_AUTOTUNE=0`
, which makes the
training faster at the beginning, but perhaps not in the end.
1.
After warmup the training speed will slowly decrease due to more accurate proposals.
1.
After warmup
,
the training speed will slowly decrease due to more accurate proposals.
1.
This implementation is about 10% slower than detectron,
probably due to the lack of specialized ops (e.g. AffineChannel, ROIAlign) in TensorFlow.
...
...
@@ -62,7 +63,7 @@ Speed:
Possible Future Enhancements:
1.
Define a
n
interface to load custom dataset.
1.
Define a
better
interface to load custom dataset.
1.
Support batch>1 per GPU.
...
...
examples/FasterRCNN/coco.py
View file @
3476cb43
...
...
@@ -24,7 +24,9 @@ class _COCOMeta(object):
'val2014'
:
'val2014'
,
'valminusminival2014'
:
'val2014'
,
'minival2014'
:
'val2014'
,
'test2014'
:
'test2014'
'test2014'
:
'test2014'
,
'train2017'
:
'train2017'
,
'val2017'
:
'val2017'
,
}
def
valid
(
self
):
...
...
examples/FasterRCNN/config.py
View file @
3476cb43
...
...
@@ -88,7 +88,9 @@ _C.BACKBONE.FREEZE_AT = 2 # options: 0, 1, 2
# See https://github.com/tensorflow/tensorflow/issues/18213
# In tensorpack model zoo, ResNet models with TF_PAD_MODE=False are marked with "-AlignPadding".
# All other models under `ResNet/` in the model zoo are trained with TF_PAD_MODE=True.
# All other models under `ResNet/` in the model zoo are using TF_PAD_MODE=True.
# Using either one should probably give the same performance.
# We use the "AlignPadding" one just to be consistent with caffe2.
_C
.
BACKBONE
.
TF_PAD_MODE
=
False
_C
.
BACKBONE
.
STRIDE_1X1
=
False
# True for MSRA models
...
...
@@ -101,6 +103,7 @@ _C.TRAIN.STEPS_PER_EPOCH = 500
# LR_SCHEDULE means "steps" only when total batch size is 8.
# Otherwise the actual steps to decrease learning rate are computed from the schedule.
# Therefore, there is *no need* to modify the config if you only change the number of GPUs.
# LR_SCHEDULE = [120000, 160000, 180000] # "1x" schedule in detectron
_C
.
TRAIN
.
LR_SCHEDULE
=
[
240000
,
320000
,
360000
]
# "2x" schedule in detectron
_C
.
TRAIN
.
NUM_EVALS
=
20
# number of evaluations to run during training
...
...
examples/FasterRCNN/data.py
View file @
3476cb43
...
...
@@ -292,7 +292,7 @@ def get_train_dataflow():
class: numpy array of k integers
is_crowd: k booleans. Use k False if you don't know what it means.
segmentation: k lists of numpy arrays (one for each box).
Each list of numpy array corresponds to the mask for one instance.
Each list of numpy array
s
corresponds to the mask for one instance.
Each numpy array in the list is a polygon of shape Nx2,
because one mask can be represented by N polygons.
...
...
examples/FasterRCNN/train.py
View file @
3476cb43
...
...
@@ -417,6 +417,8 @@ class EvalCallback(Callback):
self
.
dataflows
=
[
get_eval_dataflow
(
shard
=
k
,
num_shards
=
self
.
num_predictor
)
for
k
in
range
(
self
.
num_predictor
)]
else
:
if
hvd
.
size
()
>
hvd
.
local_size
():
logger
.
warn
(
"Distributed evaluation with horovod is unstable. Sometimes MPI hangs for unknown reasons."
)
self
.
predictor
=
self
.
_build_coco_predictor
(
0
)
self
.
dataflow
=
get_eval_dataflow
(
shard
=
hvd
.
rank
(),
num_shards
=
hvd
.
size
())
...
...
@@ -495,7 +497,7 @@ if __name__ == '__main__':
if
get_tf_version_tuple
()
<
(
1
,
6
):
# https://github.com/tensorflow/tensorflow/issues/14657
logger
.
warn
(
"TF<1.6 has a bug which may lead to crash in FasterRCNN
training
if you're unlucky."
)
logger
.
warn
(
"TF<1.6 has a bug which may lead to crash in FasterRCNN if you're unlucky."
)
args
=
parser
.
parse_args
()
if
args
.
config
:
...
...
@@ -540,7 +542,7 @@ if __name__ == '__main__':
init_lr
=
cfg
.
TRAIN
.
BASE_LR
*
0.33
*
min
(
8.
/
cfg
.
TRAIN
.
NUM_GPUS
,
1.
)
warmup_schedule
=
[(
0
,
init_lr
),
(
cfg
.
TRAIN
.
WARMUP
,
cfg
.
TRAIN
.
BASE_LR
)]
warmup_end_epoch
=
cfg
.
TRAIN
.
WARMUP
*
1.
/
stepnum
lr_schedule
=
[(
int
(
np
.
ceil
(
warmup_end_epoch
)
),
cfg
.
TRAIN
.
BASE_LR
)]
lr_schedule
=
[(
int
(
warmup_end_epoch
+
0.5
),
cfg
.
TRAIN
.
BASE_LR
)]
factor
=
8.
/
cfg
.
TRAIN
.
NUM_GPUS
for
idx
,
steps
in
enumerate
(
cfg
.
TRAIN
.
LR_SCHEDULE
[:
-
1
]):
...
...
@@ -549,6 +551,10 @@ if __name__ == '__main__':
(
steps
*
factor
//
stepnum
,
cfg
.
TRAIN
.
BASE_LR
*
mult
))
logger
.
info
(
"Warm Up Schedule (steps, value): "
+
str
(
warmup_schedule
))
logger
.
info
(
"LR Schedule (epochs, value): "
+
str
(
lr_schedule
))
train_dataflow
=
get_train_dataflow
()
# This is what's commonly referred to as "epochs"
total_passes
=
cfg
.
TRAIN
.
LR_SCHEDULE
[
-
1
]
*
8
/
train_dataflow
.
size
()
logger
.
info
(
"Total passes of the training set is: {}"
.
format
(
total_passes
))
callbacks
=
[
PeriodicCallback
(
...
...
@@ -573,7 +579,7 @@ if __name__ == '__main__':
traincfg
=
TrainConfig
(
model
=
MODEL
,
data
=
QueueInput
(
get_train_dataflow
()
),
data
=
QueueInput
(
train_dataflow
),
callbacks
=
callbacks
,
steps_per_epoch
=
stepnum
,
max_epoch
=
cfg
.
TRAIN
.
LR_SCHEDULE
[
-
1
]
*
factor
//
stepnum
,
...
...
examples/GAN/GAN.py
View file @
3476cb43
...
...
@@ -170,7 +170,7 @@ class MultiGPUGANTrainer(TowerTrainer):
list
(
range
(
num_gpu
)),
lambda
:
self
.
tower_func
(
*
input
.
get_input_tensors
()),
devices
)
#
Simply
average the cost here. It might be faster to average the gradients
#
For simplicity,
average the cost here. It might be faster to average the gradients
with
tf
.
name_scope
(
'optimize'
):
d_loss
=
tf
.
add_n
([
x
[
0
]
for
x
in
cost_list
])
*
(
1.0
/
num_gpu
)
g_loss
=
tf
.
add_n
([
x
[
1
]
for
x
in
cost_list
])
*
(
1.0
/
num_gpu
)
...
...
tensorpack/callbacks/misc.py
View file @
3476cb43
...
...
@@ -35,10 +35,20 @@ class InjectShell(Callback):
"""
Allow users to create a specific file as a signal to pause
and iteratively debug the training.
Once t
rigger
ed, it detects whether the file exists, and opens an
Once t
he :meth:`trigger` method is call
ed, it detects whether the file exists, and opens an
IPython/pdb shell if yes.
In the shell, `self` is this callback, `self.trainer` is the trainer, and
from that you can access everything else.
Example:
.. code-block:: python
callbacks=[InjectShell('/path/to/pause-training.tmp'), ...]
# the following command will pause the training when the epoch finishes:
$ touch /path/to/pause-training.tmp
"""
def
__init__
(
self
,
file
=
'INJECT_SHELL.tmp'
,
shell
=
'ipython'
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment