...
001473312 896x598 C

Guide to Using Hugging Face Models Offline & Deploying with Flask & FastAPI

Table of Contents

  1. How to Use Hugging Face Models Offline
  2. Advanced Usage of Hugging Face Models
  3. Deploy Hugging Face Model with Flask
  4. Deploy Hugging Face Model with FastAPI

How to Use Hugging Face Models Offline

Hugging Face provides state-of-the-art machine learning models, but sometimes you need to run these models offline. Here’s how to do that using Python.

Step 1: Install Necessary Libraries

First, install the Hugging Face transformers library along with torch (or tensorflow depending on the backend).

pip install transformers torch

Step 2: Load the Pre-trained Model

You can use the transformers library to load a pre-trained model and tokenizer. For offline usage, you’ll need to download the model files first and cache them locally.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')

# Save the model to a local directory
model.save_pretrained('./my_local_model')
tokenizer.save_pretrained('./my_local_model')

Step 3: Load the Model Locally for Offline Use

Once the model is saved locally, you can load it without an internet connection by specifying the path to the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load locally saved model
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')

# Test the model with some input
inputs = tokenizer("Hello, Hugging Face!", return_tensors="pt")
outputs = model(**inputs)
print(outputs)

Step 4: Running Model Inference Offline

Now, you can run model inference without needing an internet connection, making it perfect for production environments where offline access is required.


Advanced Usage of Hugging Face Models

For advanced use cases such as fine-tuning models, customizing pipelines, or integrating multiple models, Hugging Face provides powerful tools. Here’s how to take it a step further.

Fine-tuning a Model

Hugging Face offers support for fine-tuning models using custom datasets. Here’s an example of fine-tuning a BERT model on a custom dataset.

from transformers import Trainer, TrainingArguments
from datasets import load_dataset

# Load dataset
dataset = load_dataset('imdb')

# Fine-tuning the model
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    evaluation_strategy="epoch",     # evaluate after every epoch
    learning_rate=2e-5,              # learning rate
    per_device_train_batch_size=8,   # batch size for training
    per_device_eval_batch_size=8,    # batch size for evaluation
    num_train_epochs=3,              # number of training epochs
    weight_decay=0.01,               # strength of weight decay
)

trainer = Trainer(
    model=model,                         # model to fine-tune
    args=training_args,                  # training arguments
    train_dataset=dataset['train'],      # training dataset
    eval_dataset=dataset['test'],        # evaluation dataset
)

trainer.train()

Using Hugging Face Pipelines for Easy Model Usage

The pipeline API in Hugging Face simplifies common tasks like text classification, summarization, translation, etc.

from transformers import pipeline

# Create a pipeline for sentiment-analysis
nlp = pipeline("sentiment-analysis")

# Run the model inference
result = nlp("I love using Hugging Face!")
print(result)

This provides a simplified interface for various tasks such as text classification, named entity recognition, and more.


Deploy Hugging Face Model with Flask

Flask is a lightweight framework for building web applications. Here’s how to deploy a Hugging Face model with Flask.

Step 1: Install Flask

Install Flask using pip:

pip install flask

Step 2: Create a Simple Flask App

Create a new Python file app.py and set up the Flask app.

from flask import Flask, request, jsonify
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

app = Flask(__name__)

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    text = data['text']

    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt")

    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)

    # Extract predicted class
    predicted_class = torch.argmax(outputs.logits, dim=-1).item()

    return jsonify({"prediction": predicted_class})

if __name__ == "__main__":
    app.run(debug=True)

Step 3: Run the Flask App

Run your Flask app using the following command:

python app.py

Now your model is accessible via a REST API at http://127.0.0.1:5000/predict.

Step 4: Test the API

You can test the API using a tool like curl or Postman. Here’s an example using curl:

curl -X POST -H "Content-Type: application/json" -d '{"text": "Hugging Face models are amazing!"}' http://127.0.0.1:5000/predict

Deploy Hugging Face Model with FastAPI

FastAPI is a modern, fast web framework for building APIs with Python. It’s similar to Flask but has higher performance and automatic OpenAPI documentation.

Step 1: Install FastAPI and Uvicorn

Install the necessary libraries:

pip install fastapi uvicorn

Step 2: Create the FastAPI App

Create a new file app.py for the FastAPI app.

from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

app = FastAPI()

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')

class TextInput(BaseModel):
    text: str

@app.post("/predict")
async def predict(input: TextInput):
    text = input.text
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt")

    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)

    # Extract predicted class
    predicted_class = torch.argmax(outputs.logits, dim=-1).item()

    return {"prediction": predicted_class}

Step 3: Run the FastAPI App

Run the FastAPI app using Uvicorn:

uvicorn app:app --reload

Now your model is accessible via a REST API at http://127.0.0.1:8000/predict.

Step 4: Test the API

You can test the API using a POST request with the text you want to classify. Here’s an example using curl:

curl -X 'POST' 'http://127.0.0.1:8000/predict' -H 'Content-Type: application/json' -d '{"text": "FastAPI is awesome!"}'

Leave a Reply

Your email address will not be published. Required fields are marked *