Reduce inference time for BERT models using neural architecture search and SageMaker Automated Model Tuning

By Admin 19/01/2024

AWS Machine Learning Blog In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained language models (PLMs) are undergoing rapid commercial and enterprise adoption in the areas of productivity tools, customer service, search and recommendations, […]Continue reading