当前位置: 首页 > 工具软件 > PyText > 使用案例 >

PyText Notes





PyText is a modeling framework that helps researchers and engineers build end-to-end pipelines for training or inference.


PyText, built on PyTorch 1.0 6, is designed to achieve the following:

  1. Make experimentation with new modeling ideas as easy and as fast as possible.
  2. Make it easy to use pre-built models on new data with minimal extra work.
  3. Define a clear workflow for both researchers and en- gineers to build, evaluate, and ship their models to production with minimal overhead.
  4. Ensure high performance (low latency and high throughput) on deployed models at inference.


  • Everything in PyText is a component.
  • Task: combines various componentsrequired for a training or inference task into a pipeline.
    there is a sample config for a document classification task. It can be configured as a JSON file that defines the parameters of all the children components




  • It provides ways to customize handling of raw data, reporting of metrics, training methodologyand exporting of trained models.
  • PyText users are free to implement one or more of these components and can expect the entire pipeline to work out of the box.
  • A number of default pipelines are implemented for popular tasks which can be used as-is.


  1. Implement the model in PyText, and make sure offline metrics on the test set look good.
  2. Publish the model to the bundled PyTorch-based infer- ence service, and do a real-time small scale evaluation on a live traffic sample.
  3. Export it automatically to a Caffe2 net. In some cases, e.g. when using complex control flow logic and custom data-structures, this might not yet be supported via PyTorch 1.0.
  4. If the procedure in 3 isn’t supported, use the Py- Torch C++ API9 to rewrite the model (only the torch.nn.Module10 subclass) and wrap it in a Caffe2 operator.
  5. Publish the model to the production-grade Caffe2 pre- diction service and start serving live traffic


  • Data pre-processing

    This library, a featurization library, preprocesses the raw input by performing tasks like

    • Text tokenization and normalization
    • Mapping characters to IDs for character-based models
    • Perform token alignments for gazetteer features
    • Training: raw data -> Data Handler( invoke a feature lib) -> Trainer -> Exporter
    • Inference: raw data -> Data Processor(invoke the same feature lib) -> Predictor(holding a model exported by a exporter) -> Predictions
  • Vocabulary management

Future Work

  • Modeling Capabilities
  • Performance Benchmarks and Improvements
    • Training speed
    • Inference speed
  • Model Interpretability
  • Model Robustness
  • Mobile Deployment Support

Other Intro Doc

  • https://www.jiqizhixin.com/articles/2018-12-15-3?from=synced&keyword=pytext
  • https://blog.csdn.net/sinat_33455447/article/details/85064284


