
Exploring the HeyGen API
Introduction
AI tools for generating videos are becoming more common, and one of them is the HeyGen API. It lets you create talking avatar videos, translate speech into other languages, and build interactive avatars that respond to user input. In this post, we’ll walk through what the HeyGen API can and can’t do, how its features work in practice, and how to set it up in a basic Node.js project. If you’re interested in creating scripted videos or setting up an AI avatar, this guide covers the main steps.
HeyGen’s avatar selection interface, showing customizable digital presenters with varying appearances and styles.
Digital Avatars
The HeyGen API lets you look through a list of different avatars and select one of your choosing. You then type up a script and a video will be created of that character saying the script.
The avatars also have different features based on what your situation is. For example, you can choose between male and female voices. There are also different styles for use cases like casual conversations, advertisements, social media content, educational videos and storytelling. These speakers can also have different voice ages like child, young adult, middle-aged and old. You can also set different emotions like excited, friendly, serious and soothing.
HeyGen's customization panel for avatar settings, showing options for gender, use case, and voice age selection.
HeyGen's avatar library interface, showing a selection of available digital presenters with various style tags like ‘Cheerful’ and ‘Friendly.’
We’ve spent some time experimenting with HeyGen avatars and even though the lip movements match the words well, the speech still sounds pretty artificial compared to a real person. The tone sounds a bit dull and doesn’t exactly match the tone of regular human speech. So, if you want a more natural sounding voice, you can record your own voiceover or use other more powerful text to speech platforms like ElevenLabs. Also, another limitation that we observed is that the expressions look a little more mechanical, with repetitive gestures that don't fully show how real people move and react.
In general, this tool can work well for creating basic tutorial videos where you talk about the same information each time. When it comes to showing real, natural, human emotion, that is where it falls flat.
Video Translations API
Before having a deep dive into the Interactive Avatars API, let's briefly talk about HeyGen’s video translation feature. It lets you take a video where someone is speaking (either an avatar or a real person) and have it translated into another language while keeping the general sound of the original voice.
For straightforward content, this works decently well. If you've got a clear speech explaining a product or giving basic instructions, the translation will usually get the main points across accurately. Technical terms and simple language tend to come through fine.
HeyGen’s video translation feature, showcasing different languages such as Spanish, English and Japanese.
But here's where things get tricky. The system doesn't handle casual language very well. Jokes often fall flat in translation, and slang terms might come out sounding strange. When I tested a pun: “Why did the scarecrow win an award? Because it was out standing in its field.”, the avatar delivered the line in one breath, with zero pauses for comedic timing. It sounded like someone reading a grocery list, not telling a joke. Casual phrases can get translated too literally, losing their natural rhythm or meaning. While the translated voice does sound similar to the original, there's still that AI quality to it. If the original video is more enthusiastic, the translated version can often sound more monotone.
This feature can be used for companies/businesses. Instead of reshooting commercials, they can use video translation to adapt their existing videos. Customer service is another use case, it can be used for callback messages that play when you're on hold. Translating those into Spanish or other languages could help non-English speakers.
There are certain limitations to be aware of though. Unless you download the translated videos right away, they won't stick around forever. Depending on your subscription plan, HeyGen API only stores them for about a week before they disappear. This means you'll need to save copies to your own computer if you want to keep using them.
HeyGen's tiered subscription model, ranging from a free plan (with watermarks) to paid 'Creator,' 'Team,' and 'Enterprise' tiers offering progressively more features.
HeyGen’s paid tiers (Creator/Team/Enterprise) offer a 22% discount for annual billing.
Also, if you are using the free plan, each video will have a watermark attached. For personal testing, this might not matter. But if you are planning to use these videos in some kind of professional setting, you’ll need to upgrade your plan to have the watermark removed.
Example of a HeyGen-generated video with the platform's watermark, which appears on all free-tier outputs.
Interactive Avatars API
Now moving onto the Interactive Avatars API. For this feature, the system lets you type questions, and the avatar responds with a video answer.
First, you need to understand that these avatars don't actually think or understand anything. They're more like video answer machines. They rely on a Large Language Model (LLM) to create replies. The quality of these replies depends on how capable and accurate the LLM is.
HeyGen's interactive api workflow: From session token generation to WebSocket-based avatar interactions and session termination.
Keep in mind that although the responses are technically correct, there's always this slightly unnatural pause between when you ask your question and when the avatar responds. That tiny delay makes the whole interaction feel stiff and makes it feel like you are talking to a robot.
HeyGen's interactive avatar interface showing real-time connection logs, voice/avatar selection, and LLM-powered text input.
Even with these limitations, this interactivity feature can still be applied to areas like customer service. It can be used to answer questions that are similar to these: “What’s my current balance?”, “When’s my payment due?” and “What are your business hours?”.
Here’s how it works: behind the scenes, the LLM pulls from data sources like user account information, company databases or preset FAQs. Using that information, the LLM is then able to generate a response for the avatar to say and this process works well for questions that are fact-based, predictable and emotion-free.
In these situations, the mechanical tone won’t be as much of an issue because all anyone would want from these questions is quick information. When conversations need human judgment, this is where these interactions become more difficult.
Basic Implementation of HeyGen API
Now that we’ve explained the features and applications, let’s move onto setting up a basic video with the HeyGen API
Before getting started you will need to make sure to have Node.js and npm installed on your computer.
If you don’t, follow this video until 5:00 to get it installed.
https://www.youtube.com/embed/NqANV4wXhx4?si=1vynWqnLsZ9s0WU_?start=0&end=300
Getting Started with the HeyGen API
The HeyGen registration page showing multiple sign-up options: email, Google, or Apple account authentication.
-
Sign Up and Get an API Key:
To obtain HeyGen’s API key, follow these steps:
-
Create a HeyGen account: make an account with HeyGen using this link: https://app.heygen.com/signup
-
Navigate to the HeyGen API Key: After successfully logging in, navigate to Settings->Subscriptions->HeyGenAPI->API Token and make sure to hit copy and save the api key somewhere safe.
HeyGen's subscription settings panel showing where to locate and copy the API key.
Setting up the Project
Here is a link to a completed version of the project which will be useful to refer to:
https://github.com/rubberart7/heygen-api-introduction
Now that you have the API key, you will need to use the key to make a call to the API in order to be able to generate the AI video.
Create a new Node.js project
Go to GitHub and create a new repository and check on the box that adds a README file.
This will just initialize the project with an empty file that you can add information to later that explains the project.
After that, clone that repository to your computer so you can work on the project locally.
GitHub’s home page interface, showing where to click to create a new repository for this project.
Create a .env file in your project root and define a variable associated with the API key. An example is HEYGEN_API_KEY=your_api_key
After that, create a .gitignore file and put .env in it, that way your API key in the .env will not be exposed when you push your project to a remote repository.
Follow the instructions in this video from 5:00 up to 10:11 to setup an index.js and package.json file.
https://www.youtube.com/embed/NqANV4wXhx4?si=1vynWqnLsZ9s0WU_?start=300&end=611
Next, make sure to run the command: npm install dotenv
This will allow you to access environment variables like your api key.
Make sure to also run the command: npm install axios
This will allow you to access the axios library for this project.
By this point, you should have these files for your project:
Node_modules contains all npm package dependencies. .env stores environment variables like API keys.
.gitignore specifies which files to exclude from version control. Index.js has the main implementation for this project and sets up the interaction with the API.
Package-lock.json and package.json tracks dependencies and project metadata.
In your index.js file, copy in this code:
require('dotenv').config();
const axios = require('axios');
const API_KEY = process.env.HEYGEN_API_KEY;
if (!API_KEY) {
console.error("API Key is missing! Please check your .env file.");
process.exit(1);
}
const videoData = {
"video_inputs": [
{
"character": {
"type": "avatar",
"avatar_id": "Daisy-inskirt-20220818",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"input_text": "Welcome to the HeyGen API!",
"voice_id": "2d5b0e6cf36f460aa7fc47e3eee4ba54"
},
"background": {
"type": "color",
"value": "#008000"
}
}
],
"dimension": {
"width": 1280,
"height": 720
}
};
const generateVideo = async () => {
try {
const response = await axios.post('https://api.heygen.com/v2/video/generate', videoData, {
headers: {
'X-Api-Key': API_KEY,
'Content-Type': 'application/json'
}
});
const videoId = response.data?.data?.video_id;
if (videoId) {
const videoLink = `https://app.heygen.com/videos/${videoId}?sid=video-preview`;
console.log('Link:', videoLink);
} else {
console.error('Video ID not found in the response.');
}
} catch (error) {
console.error('Error generating video:', error);
}
};
generateVideo();
This index.js script uses the HeyGen API to generate a video with a custom avatar, voice, and background. It starts by loading the API key from a .env file. Then it defines the video settings like the avatar, speech text, and background color. It sends a POST request to HeyGen’s video generation endpoint using Axios. If successful, it logs a link to the generated video.
At this point, you can run this command to get a link to the video: node index.js
Creating video data
const videoData = {
"video_inputs": [
{
"character": {
"type": "avatar",
"avatar_id": "Daisy-inskirt-20220818",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"input_text": "Welcome to the HeyGen API!",
"voice_id": "2d5b0e6cf36f460aa7fc47e3eee4ba54"
},
"background": {
"type": "color",
"value": "#008000"
}
}
],
"dimension": {
"width": 1280,
"height": 720
}
};
This part of the code defines the data that is sent to the HeyGen API. It includes key components that will define how the video will look and sound. The video_inputs object contains information like what character will be used in the video, the type of avatar with avatar_id, the visual style of the avatar with avatar_style and the voice that will read out the text using input_text and voice_id. The background color is also defined, and the dimension defines the resolution of the video.
Sending a Video Generation Request
const response = await axios.post('https://api.heygen.com/v2/video/generate', videoData, {
headers: {
'X-Api-Key': API_KEY,
'Content-Type': 'application/json'
}
This part sends a POST request to the HeyGen API using the axios library. The request is made to the https://api.heygen.com/v2/video/generate endpoint, which is the URL responsible for generating videos. The videoData object, which was defined earlier, is passed as the request body in JSON format. The X-Api-Key header is used to authenticate the request by passing the API key, and the Content-Type: application/json header indicates that the body content is formatted in JSON.
Extracting the Video ID
const videoId = response.data?.data?.video_id;
if (videoId) {
const videoLink = `https://app.heygen.com/videos/${videoId}?sid=video-preview`;
console.log('Link:', videoLink);
} else {
console.error('Video ID not found in the response.');
}
After sending the POST request, the response from the API is processed. The video_id is extracted from the response.data object, which contains the result of the API call. If a video_id is found, the script creates a URL that links to the generated video, and this link is shown on the terminal. If the video_id is not found in the response, the script shows an error message indicating the issue.
Handling Errors
catch (error) {
console.error('Error generating video:', error);
}
This part of the code is designed to handle any errors that occur during the API request. If the request to HeyGen’s API fails for any reason, an error message will be shown on the terminal.
Final Output
If the script runs successfully, the output will be something like:
Video generated! Link: https://app.heygen.com/videos/<video_id>?sid=video-preview
This message indicates that the video was successfully generated and provides a link to watch the video in the HeyGen platform. The <video_id> part of the URL is replaced with the actual ID of the generated video.
Clicking the link will redirect to a video that looks like this:
https://www.youtube.com/watch?v=IiAOtFtj2C0&ab_channel=MirajYafi
Common Errors and Solutions
-
Missing API key: Make sure you have your .env file set up properly and that the api key is copied and pasted correctly.
-
Invalid parameter values: Visit this documentation to get different values for the avatar, voice, background etc.
-
Rate limits: Visit this documentation to see what your request limits are depending on your subscription plan.
HeyGen API: A Useful Tool With Limits
After testing HeyGen’s API, I’ve found that it is best suited for tasks like generating explanation videos or translating scripted content. The avatars allow you to have a basic talking head and the translation feature can save time for multilingual projects.
However, once you stray away from some of these use cases, the limitations become more obvious. Jokes land awkwardly, emotional speech sounds hollow, and casual dialogue comes across as forced and unnatural. The interactive avatars which handle FAQ responses well, still have that unnatural delay in responses that makes the conversation feel more like talking to an advanced answering machine rather than a person.
With that being said, it’s exciting to see how tools like this are pushing the boundaries of what’s possible with AI-driven videos. They are unlocking new possibilities for how we create, share and experience content.