PyText is a modeling framework that helps researchers and engineers build end-to-end pipelines for training or inference.
PyText, built on PyTorch 1.0 6, is designed to achieve the following:
components
required for a training or inference task into a pipeline.{
"config":{
"task":{
"DocClassificationTask":{
"data_handler":{
"columns_to_read":[
"doc_label",
"text"
],
"shuffle":true
},
"model":{
"representation":{
"BiLSTMPooling":{
"pooling":{
"SelfAttention":{
"attn_dimension":128,
"dropout":0.4
}
},
"bidirectional":true,
"dropout":0.4,
"lstm":{
"lstm_dim":200,
"num_layers":2
}
}
},
"output_config":{
"loss":{
"CrossEntropyLoss":{
}
}
},
"decoder":{
"hidden_dims":[
128
]
}
},
"features":{
"word_feat":{
"embed_dim":200,
"pretrained_embeddings_path":"/tmp/embeds",
"vocab_size":250000,
"vocab_from_train_data":true
}
},
"trainer":{
"random_seed":0,
"epochs":15,
"early_stop_after":0,
"log_interval":1,
"eval_interval":1,
"max_clip_norm":5
},
"optimizer":{
"type":"adam",
"lr":0.001,
"weight_decay":0.00001
},
"metric_reporter":{
"output_path":"/tmp/test_out.txt"
},
"exporter":{
}
}
}
}
}
handling of raw data
, reporting of metrics
, training methodology
and exporting of trained models
.This library, a featurization library
, preprocesses the raw input by performing tasks like
- Training: raw data -> Data Handler( invoke a feature lib) -> Trainer -> Exporter
- Inference: raw data -> Data Processor(invoke the same feature lib) -> Predictor(holding a model exported by a exporter) -> Predictions