Table of Contents
- How to Use Hugging Face Models Offline
- Advanced Usage of Hugging Face Models
- Deploy Hugging Face Model with Flask
- Deploy Hugging Face Model with FastAPI
How to Use Hugging Face Models Offline
Hugging Face provides state-of-the-art machine learning models, but sometimes you need to run these models offline. Here’s how to do that using Python.
Step 1: Install Necessary Libraries
First, install the Hugging Face transformers
library along with torch
(or tensorflow
depending on the backend).
pip install transformers torch
Step 2: Load the Pre-trained Model
You can use the transformers
library to load a pre-trained model and tokenizer. For offline usage, you’ll need to download the model files first and cache them locally.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')
# Save the model to a local directory
model.save_pretrained('./my_local_model')
tokenizer.save_pretrained('./my_local_model')
Step 3: Load the Model Locally for Offline Use
Once the model is saved locally, you can load it without an internet connection by specifying the path to the model.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load locally saved model
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')
# Test the model with some input
inputs = tokenizer("Hello, Hugging Face!", return_tensors="pt")
outputs = model(**inputs)
print(outputs)
Step 4: Running Model Inference Offline
Now, you can run model inference without needing an internet connection, making it perfect for production environments where offline access is required.
Advanced Usage of Hugging Face Models
For advanced use cases such as fine-tuning models, customizing pipelines, or integrating multiple models, Hugging Face provides powerful tools. Here’s how to take it a step further.
Fine-tuning a Model
Hugging Face offers support for fine-tuning models using custom datasets. Here’s an example of fine-tuning a BERT model on a custom dataset.
from transformers import Trainer, TrainingArguments
from datasets import load_dataset
# Load dataset
dataset = load_dataset('imdb')
# Fine-tuning the model
training_args = TrainingArguments(
output_dir='./results', # output directory
evaluation_strategy="epoch", # evaluate after every epoch
learning_rate=2e-5, # learning rate
per_device_train_batch_size=8, # batch size for training
per_device_eval_batch_size=8, # batch size for evaluation
num_train_epochs=3, # number of training epochs
weight_decay=0.01, # strength of weight decay
)
trainer = Trainer(
model=model, # model to fine-tune
args=training_args, # training arguments
train_dataset=dataset['train'], # training dataset
eval_dataset=dataset['test'], # evaluation dataset
)
trainer.train()
Using Hugging Face Pipelines for Easy Model Usage
The pipeline
API in Hugging Face simplifies common tasks like text classification, summarization, translation, etc.
from transformers import pipeline
# Create a pipeline for sentiment-analysis
nlp = pipeline("sentiment-analysis")
# Run the model inference
result = nlp("I love using Hugging Face!")
print(result)
This provides a simplified interface for various tasks such as text classification, named entity recognition, and more.
Deploy Hugging Face Model with Flask
Flask is a lightweight framework for building web applications. Here’s how to deploy a Hugging Face model with Flask.
Step 1: Install Flask
Install Flask using pip:
pip install flask
Step 2: Create a Simple Flask App
Create a new Python file app.py
and set up the Flask app.
from flask import Flask, request, jsonify
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
app = Flask(__name__)
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
text = data['text']
# Tokenize input
inputs = tokenizer(text, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
# Extract predicted class
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
return jsonify({"prediction": predicted_class})
if __name__ == "__main__":
app.run(debug=True)
Step 3: Run the Flask App
Run your Flask app using the following command:
python app.py
Now your model is accessible via a REST API at http://127.0.0.1:5000/predict
.
Step 4: Test the API
You can test the API using a tool like curl
or Postman. Here’s an example using curl
:
curl -X POST -H "Content-Type: application/json" -d '{"text": "Hugging Face models are amazing!"}' http://127.0.0.1:5000/predict
Deploy Hugging Face Model with FastAPI
FastAPI is a modern, fast web framework for building APIs with Python. It’s similar to Flask but has higher performance and automatic OpenAPI documentation.
Step 1: Install FastAPI and Uvicorn
Install the necessary libraries:
pip install fastapi uvicorn
Step 2: Create the FastAPI App
Create a new file app.py
for the FastAPI app.
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
app = FastAPI()
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('./my_local_model')
model = AutoModelForSequenceClassification.from_pretrained('./my_local_model')
class TextInput(BaseModel):
text: str
@app.post("/predict")
async def predict(input: TextInput):
text = input.text
# Tokenize input
inputs = tokenizer(text, return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
# Extract predicted class
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
return {"prediction": predicted_class}
Step 3: Run the FastAPI App
Run the FastAPI app using Uvicorn:
uvicorn app:app --reload
Now your model is accessible via a REST API at http://127.0.0.1:8000/predict
.
Step 4: Test the API
You can test the API using a POST
request with the text you want to classify. Here’s an example using curl
:
curl -X 'POST' 'http://127.0.0.1:8000/predict' -H 'Content-Type: application/json' -d '{"text": "FastAPI is awesome!"}'
Leave a Reply