Resource guide

How Much Does a Private AI Assistant Cost?

A private AI assistant can cost as little as a normal AI subscription, or it can become a real software deployment with cloud hosting, model usage, integrations, security work, and ongoing maintenance.

That is why the most honest answer is not one number.

Four cost layers

The VM or hosting environment

The AI model

The tools and integrations

The setup and maintenance

The short version

You are not just paying for “AI.”

A basic AI subscription might cost $20–$30 per month. A self-hosted private agent might cost $35–$150 per month in hosting and AI usage. A serious business assistant that is connected to your email, files, calendar, CRM, databases, and recurring workflows may require several thousand dollars of setup work, plus ongoing cloud, model, and support costs.

The difference is simple: you are not just paying for “AI.” You are paying for where it runs, what it can access, how much work it does, and who is responsible for keeping it reliable.

The short answer

A practical way to think about the cost.

Type of assistant	Typical cost	What you are paying for
Consumer AI subscription	$20–$200+ per user/month	Access to a hosted AI product like ChatGPT, Grok, Claude, or Gemini
Lightweight assistant app	$20–$200+ per user/month	Prebuilt AI workflows, often with limited customization
DIY private AI agent	$35–$150+/month	A VM, model access, and your own setup time
Managed private AI assistant	Several thousand dollars upfront, plus ongoing support	Deployment, integrations, workflow design, security, and maintenance
Enterprise custom AI assistant	Custom pricing, often five figures+	Compliance, custom systems, SLAs, governance, and deeper integrations

For most business owners and executives, the real decision is not whether an AI assistant is “cheap” or “expensive.” The real decision is whether the assistant saves enough time, reduces enough manual work, or improves enough workflows to justify the cost.

Definitions

What counts as a private AI assistant?

The phrase “private AI assistant” can mean different things. At the low end, someone may simply mean a paid AI account with private chats. That can be useful, but it is not the same thing as a private agent working inside your business.

A more serious private AI assistant usually has several characteristics:

It runs in a persistent environment, often a virtual machine.

It can access business systems, such as email, files, calendars, databases, project management tools, or internal documents.

It has memory or context that persists across sessions.

It can use tools, not just answer questions.

It can perform multi-step tasks.

It can run scheduled jobs or recurring workflows.

It can communicate with you through chat, email, or messaging apps.

It can be configured around your business, your rules, and your priorities.

This is the difference between asking a chatbot, “Can you draft this email?” and having an assistant that can review the thread, check the relevant files, understand the customer, prepare a draft, and alert you when it is ready.

Two big cost categories

Infrastructure and inference.

The best way to understand private AI assistant cost is to separate it into two categories:

1. Infrastructure

Where the assistant runs.

2. Inference

The AI model usage that powers the assistant’s thinking.

Many people focus only on the subscription price of the AI model, but that leaves out the system needed to run an actual assistant. Others focus only on the server cost, but that leaves out the model usage, which can be much more variable.

A useful formula

Monthly cost = hosting + model usage + tool usage + support/maintenance

If the assistant is professionally implemented, there may also be a setup fee for deployment, integration, workflow design, and training.

1. Infrastructure cost

The host.

A private AI assistant usually needs a place to live. For a basic chatbot, you can open a browser tab. For a private agent, you usually want a persistent server or virtual machine that can keep running whether your laptop is open or not.

That server may handle:

The agent process itself.

Scheduled jobs.

Messaging gateways.

Webhooks.

File indexing.

Local databases.

Memory stores.

Logs.

Authentication tokens.

Browser automation sessions.

Terminal or code execution tools.

Background processes.

This does not mean the VM is necessarily running the AI model itself. In most setups, the VM runs the assistant software, connectors, memory, tools, and workflows. The AI model is usually called through a hosted provider, a subscription-backed endpoint, or an API. If you want to run strong local models yourself, the hardware requirements can become much higher, often involving GPUs or a separate model server.

A practical baseline

2 CPUs and 6 GB of memory

For many business deployments, 8 GB of memory is safer. The difference between 6 GB and 8 GB is not usually where the budget is won or lost. Running too close to the limit can cause more trouble than the small savings are worth.

Infrastructure cost by hosting environment

Hosting environment	Rough monthly cost for a practical small deployment	Best for
Generic VPS	$20–$50/month	Budget-conscious private deployments, simple 24/7 agents
Google Cloud VM	Often around $50+/month before disk, bandwidth, and discounts	Businesses already using Google Workspace or Google Cloud
AWS EC2 VM	Often around $55–$65+/month before disk, bandwidth, and discounts	Businesses already using AWS infrastructure, IAM, backups, and monitoring

These are rough planning numbers, not guaranteed quotes. Cloud pricing varies by region, operating system, discounts, storage, bandwidth, backups, and support options.

Generic VPS

A generic VPS can be the cheapest way to run a private assistant. It is often enough for a personal assistant, a developer setup, or a lightweight business assistant.

The upside is predictable cost. Many VPS plans include a fixed amount of compute, memory, storage, and bandwidth for a flat monthly price.

The downside is that you are responsible for more of the operational work. You need to handle server security, updates, backups, monitoring, firewall rules, SSH access, and recovery if something breaks.

For a business assistant with sensitive data, the cheapest possible VPS is usually not the right benchmark. You want reliable uptime, reasonable backups, secure credential handling, and enough headroom for the assistant to run without becoming fragile.

Google Cloud

Google Cloud can make sense when the assistant needs to work closely with Google Workspace, Google Drive, Gmail, service accounts, logging, IAM, or other Google Cloud resources.

The base VM is only part of the cost. You may also pay for persistent disk, snapshots, logs, network egress, and other services. If the assistant is going to run 24/7, committed-use discounts can reduce the effective compute cost, but they also reduce flexibility.

Google Cloud is usually not the cheapest option for a small private assistant, but it can be a strong choice when the surrounding business systems already live in Google’s ecosystem.

AWS

AWS can make sense when the business already uses AWS, or when the deployment needs to fit into existing AWS security, networking, IAM, monitoring, and backup practices.

A small EC2 instance with 2 vCPUs and 8 GB of memory is a common starting point. Storage is usually billed separately through EBS, and snapshots, logging, bandwidth, and other services can add more cost.

AWS is often more complex than a simple VPS, but that complexity can be worth it for companies that already have AWS infrastructure and operational practices in place.

2. Inference cost

The brain.

The AI model is often the biggest variable. A private assistant can use the same VM and the same software but have a very different monthly cost depending on the model setup. There are three main approaches.

Subscription-backed access

Per-token API pricing

Self-hosted (open-weights) models

Subscription-backed access

A subscription can make costs feel simple and predictable.

For example, a user might pay for a plan from OpenAI, xAI, Google, Nous Portal, or another AI provider, then use that account within supported limits. Some agent frameworks can authenticate through subscription-style or OAuth-based providers instead of requiring a separate API key for every task.

This can be dramatically cheaper for some users than paying per token through an API, especially if the subscription includes generous usage.

But there are tradeoffs:

Subscriptions can have rate limits or usage caps.

Some subscriptions are intended for individual interactive use, not unattended production automation.

API usage may be billed separately from the consumer subscription.

Provider terms and limits can change.

OAuth-based access can be less predictable than direct API billing.

It may be harder to monitor exact cost per workflow.

A subscription can be a good way to get started, but it should not be confused with a guaranteed unlimited production API.

Per-token API pricing

API pricing is usually cleaner for production. You pay for what the assistant actually uses.

The downside is that agentic workflows can use many more tokens than a normal chat.

A simple chatbot may make one model call. A private AI agent may go through a loop:

Understand the task.

Plan the steps.

Search files or the web.

Read retrieved content.

Use a tool.

Observe the result.

Re-plan.

Call another tool.

Draft an answer.

Check its work.

Revise the answer.

This creates what you might call the agent context tax.

The agent may need to send tool descriptions, memory summaries, retrieved documents, prior steps, system instructions, and partial results back into the model again and again. A task that looks short to the user can quietly become a large amount of model input and output.

The number of skills and tools you enable feeds into this as well. Each enabled skill or tool adds a short description to the agent’s context on every model call, so the more you install and leave switched on, the larger that baseline becomes. Well-designed agents limit the damage by keeping only brief skill descriptions loaded up front and pulling in a skill’s full instructions only when it is actually used. Hermes Agent works this way, so a long list of available skills costs little on its own—but each skill that actually fires loads its full content into context, and a task that chains several skills can use far more tokens than a single answer. Enabling only the skills you need keeps the baseline lean.

Self-hosted models

A third option is to run an open-weights model yourself, either on rented cloud GPUs or on hardware you own. This is the most genuinely private option, because the model runs inside your own environment and your prompts and data do not have to leave it.

The economics work differently from a subscription or a per-token API. You are not paying for what you use. You are paying for the capacity to run the model at all.

Renting cloud GPUs avoids a large up-front purchase, but it creates a large fixed monthly cost. A GPU big enough to serve a capable model bills continuously, whether the assistant is busy or idle.

Buying your own hardware turns that into a large up-front cost, plus ongoing power, cooling, maintenance, and eventual replacement.

In exchange, you get effectively unlimited usage within the limits of your hardware. There is no per-token meter, which can look appealing for heavy workloads.

In practice, though, self-hosting is often more expensive, not less. Because the cost is fixed, you pay for the GPU even when nothing is running. Unless your usage is consistently high and steady, a managed model billed per token or through a subscription is usually cheaper, simpler, and easier to keep current as stronger models are released.

Self-hosting makes the most sense when strict data-privacy or residency rules prevent sending data to an outside provider, or when usage is heavy and steady enough to justify paying for the capacity full time.

Example model-cost difference

Suppose an assistant uses 10 million input tokens and 2 million output tokens in a month. Depending on the model, that could be roughly:

Model type	Example pricing pattern	Approximate monthly inference cost for 10M input + 2M output
Lower-cost mini model	Low input and output cost	~$5–$25
Mid-range model	Moderate input and output cost	~$50–$75
Premium reasoning/frontier model	Higher input and output cost	~$100–$200+
Heavy autonomous use with premium models	Many more calls, long context, retries, tool loops	Hundreds or thousands per month

This is why the cheapest server is not necessarily the cheapest assistant. The VM might cost $20–$60 per month, while the model usage could be $10, $100, $500, or more depending on how the assistant is used.

3. Tool and integration costs

What the assistant can do.

A private AI assistant becomes more valuable when it can do things. That usually means tools. Some AI platforms bundle these capabilities. Others require separate accounts and separate API keys.

Possible tool costs include:

Web search.

Full-page web extraction.

Browser automation.

File parsing.

OCR.

Image generation.

Text-to-speech.

Speech-to-text.

Vector databases.

Document search.

Email and calendar integrations.

CRM integrations.

Project management integrations.

Database access.

Monitoring and logging.

Error tracking.

Backups and snapshots.

This matters because an assistant that only chats is relatively simple. An assistant that researches, reads your files, updates systems, monitors inboxes, runs scheduled jobs, and sends you finished work is more useful, but also more expensive to configure and maintain.

4. Setup and implementation cost

The largest upfront cost.

For a serious business assistant, setup is often the largest upfront cost. This is the difference between having access to AI and having an assistant that actually works inside a business.

Setup may include:

Choosing the agent framework.

Provisioning the VM.

Securing the server.

Installing and configuring the assistant.

Connecting model providers.

Setting up messaging access.

Connecting email, calendar, files, CRM, or internal systems.

Configuring memory.

Defining what the assistant is allowed to access.

Creating reusable workflows.

Testing the assistant on real tasks.

Setting up monitoring and backups.

Training the user or team.

Documenting how the assistant should be used.

A private assistant is not valuable because it can answer generic questions. It is valuable because it understands the right business context, can reach the right systems, and can reliably complete useful work.

5. Maintenance and support cost

Not “set it and forget it.”

Private AI assistants are not “set it and forget it” systems. This cost is easy to underestimate. The more useful the assistant becomes, the more likely you are to depend on it. Once it is handling important business workflows, reliability matters.

Ongoing maintenance may include:

Updating the server.

Updating the agent software.

Rotating API keys and OAuth credentials.

Fixing broken integrations.

Adjusting workflows as the business changes.

Improving prompts and instructions.

Reviewing failures.

Monitoring token usage.

Switching models when pricing or quality changes.

Adding new tools.

Improving security boundaries.

Checking logs and scheduled jobs.

Hermes Agent

What does it cost to run Hermes Agent?

Hermes Agent is an open-source autonomous agent framework built by Nous Research. It is designed to run outside a single browser tab and can be used through the terminal or through messaging platforms. It supports memory, skills, tools, scheduled jobs, messaging gateways, model provider configuration, and integrations.

A practical minimum

2 CPUs and 6 GB of memory

For a durable business deployment, plan on running Hermes Agent on a persistent VM.

The official Hermes Agent documentation lists a lower floor of around 4 GB of memory for API-backed setups, but 6 GB gives more comfortable headroom for a durable business deployment.

For more headroom, especially if you are using Docker, browser automation, file indexing, scheduled jobs, or multiple integrations, 8 GB of memory is a better target.

Hermes Agent on a generic VPS

A VPS is the lowest-cost way to run Hermes Agent.

This can work well for:

Personal assistants.

Developer assistants.

Lightweight business workflows.

Simple scheduled jobs.

Telegram or Discord access.

Basic research and drafting workflows.

A reasonable budget might look like:

Cost item	Approximate range
VPS	$20–$50/month
Model access	$20–$100+/month
Tool usage	$0–$100+/month
Total lightweight range	~$40–$250/month

The main tradeoff is operational responsibility. A VPS can be inexpensive, but someone still needs to secure it, update it, monitor it, and fix it.

Hermes Agent on Google Cloud

Google Cloud is a good fit when the assistant needs to operate near Google Workspace or Google Cloud systems.

A Google Cloud deployment may make sense if the assistant needs to work with:

Gmail.

Google Calendar.

Google Drive.

Google Cloud IAM.

Google Cloud databases.

Google Cloud logging.

Existing GCP networking or security policies.

A reasonable planning budget for the VM alone is often around $50/month or more, before storage, network, backups, and support. Discounts may reduce this if the VM runs continuously and the organization is willing to commit.

The advantage is not usually the lowest raw price. The advantage is fitting into a more mature cloud environment.

Hermes Agent on AWS

AWS is a good fit when the business already uses AWS or expects the assistant to interact with AWS-hosted systems.

An AWS deployment may make sense if the assistant needs to fit into:

AWS IAM.

VPC networking.

Existing EC2 practices.

CloudWatch logging.

EBS snapshots.

S3 storage.

Internal APIs or databases hosted on AWS.

A small EC2 instance with 2 vCPUs and 8 GB of memory is a reasonable starting point. The VM may cost roughly $55–$65/month on demand depending on the instance family and region, before storage and extras. EBS storage for a small root volume may only add a few dollars per month, but backups, logs, data transfer, and support can increase the total.

AWS gives you strong infrastructure flexibility, but it is usually more complex than a simple VPS.

Hermes Agent model costs

The model decides the economics.

Hermes Agent itself is not usually the expensive part. The expensive part is what model you use and how much work you ask it to do. Hermes can work with different model providers and endpoints. That flexibility is important because the right model strategy may vary by workflow.

For example:

Use a cheaper model for routine summarization, classification, and notifications.

Use a stronger model for complex reasoning, coding, research, or high-stakes drafting.

Use a subscription-backed provider when predictable cost matters.

Use per-token API billing when production monitoring and precise usage control matter.

Use fallback providers so one provider outage does not break the assistant.

Use caching and model routing to reduce repeated context costs.

The model decision can change the economics dramatically.

A light personal Hermes setup might stay within a subscription and modest hosting budget. A heavy business assistant doing long research tasks, browser automation, code work, and large document analysis can become much more expensive.

Subscription vs. API

The most important cost distinction.

A lot of confusion comes from mixing up AI subscriptions and API pricing. An AI subscription might cost a fixed amount per month. That is attractive because it makes spending predictable.

But API pricing is different. API usage is usually billed per token, meaning you pay for the amount of text and data sent into and generated by the model.

For agentic systems, API usage can grow quickly because every step in the agent loop may be another model call.

The practical takeaway:

If you are experimenting or building a personal assistant, a subscription-backed setup may be enough.

If you are building a business-critical assistant, API billing may give better visibility, reliability, and control.

If you are cost-sensitive, use cheaper models for routine work and reserve premium models for harder tasks.

If you are using a subscription, confirm what is allowed, what limits apply, and whether API usage is separate.

People vs. software

Is a private AI assistant cheaper than a human assistant?

A private AI assistant can be much cheaper than a human executive assistant, but it is not the same thing.

AI assistants are strong at:

Drafting.

Summarizing.

Searching.

Reporting.

Organizing information.

Monitoring recurring tasks.

Pulling data from systems.

Preparing first drafts.

Handling repetitive workflows.

Humans are still much better at:

Relationship management.

Judgment-heavy communication.

Sensitive interpersonal situations.

Ambiguous prioritization.

Physical-world coordination.

Representing you in social or business contexts.

For many business owners, the best use case is not replacing a human assistant. It is taking the recurring digital work off the owner’s plate before hiring, or making an existing team much more effective.

Example monthly budgets

What a setup might cost in practice.

Lean personal assistant

~$35–$100/month

This might include:

Generic VPS.

One AI subscription or low-cost model access.

Basic messaging access.

Light scheduled jobs.

Limited integrations.

Good for experimentation, personal productivity, and technical users who can manage their own server.

Serious solo/business-owner assistant

~$100–$500/month

This might include:

A more reliable VM.

Better model access.

Web search or browser tools.

Email/calendar/file integrations.

Basic monitoring and backups.

Occasional maintenance.

Good for owners, executives, and operators who want an assistant that can handle recurring workflows.

Heavy agentic workflow setup

$500–$2,000+/month

This might include:

Premium models.

Large document analysis.

Long-context reasoning.

Browser automation.

Frequent scheduled jobs.

Multiple integrations.

Higher usage volume.

Ongoing workflow improvement.

Good for teams that rely on the assistant heavily and want it involved in research, reporting, coding, operations, or customer-facing workflows.

Managed implementation

Upfront setup plus monthly support

If someone implements the private assistant for you, the monthly cloud and model cost is only part of the picture.

You are also paying for:

System design.

Deployment.

Security.

Integrations.

Workflow design.

Testing.

Training.

Support.

Maintenance.

This is the category Norse Computer focuses on: not just giving you access to an AI model, but launching a private AI assistant that works inside your business.

Norse Computer

How Norse Computer approaches private AI assistant cost.

Norse Computer builds private AI assistants for business owners and executives. The goal is not to give you the cheapest possible chatbot. The goal is to deploy a useful assistant that can work across your actual business systems.

A typical private assistant by Norse Computer may involve:

Hermes Agent as the core agent framework.

A VM in an appropriate cloud environment.

A practical baseline of at least 2 CPUs and 6 GB of memory.

Google Cloud, AWS, or a VPS depending on the customer’s needs.

Model access through the right provider setup.

Configuration for email, files, calendar, internal tools, or other business systems.

Initial workflow design.

Ongoing maintenance or support if needed.

The model cost can vary drastically depending on whether the assistant uses subscription-backed access or per-token API billing.

For some customers, a subscription-style setup may keep monthly AI costs predictable. For others, direct API billing may be better because it gives clearer usage tracking, stronger production controls, and more flexible model routing.

The right answer depends on:

How much the assistant will be used.

Which systems it needs to access.

How sensitive the data is.

Whether the assistant needs to run 24/7.

Whether it needs scheduled jobs.

Whether it will use browser automation.

Whether it needs premium reasoning models.

Whether someone internal can maintain it.

Hidden costs to watch for

Questions that matter more than the sticker price.

Before choosing a private AI assistant setup, ask about these hidden costs.

Does the assistant need a persistent VM?

Are storage, backups, logs, and snapshots included?

Is model usage included or billed separately?

Is API usage separate from subscription usage?

Are web search and browser automation included?

Are there rate limits?

How many skills and tools are enabled, and how much do they add to each model call?

What happens if the model provider is down?

Who maintains OAuth credentials and API keys?

Who updates the server?

Who reviews failed tasks?

Who improves workflows after launch?

Can the assistant be moved to another cloud later?

What data is sent to external AI providers?

Is the model self-hosted, or is the assistant calling a third-party model?

Are there human review steps for sensitive actions?

These questions matter more than the sticker price.

A cheap assistant that cannot safely access your systems is not very useful. An expensive assistant that is overbuilt for your needs is also wasteful.

The best setup is the one that matches the value of the workflows you want to delegate.

Bottom line

Four cost layers, not one price.

A private AI assistant usually has four cost layers. The biggest mistake is comparing a private AI assistant directly to a $20/month chatbot subscription.

The VM or hosting environment

The AI model

The tools and integrations

The setup and maintenance

For a lightweight setup, the cost may be under $100/month. For a serious business assistant, the monthly cloud and model cost may be a few hundred dollars, and professional setup may cost several thousand dollars.

A chatbot gives you access to an AI model. A private AI assistant gives that model a place to run, access to your business context, tools to take action, memory across sessions, and workflows designed around your work.

That is what you are paying for.

Keep reading

Related Norse Computer resources.

More on the technology behind a private AI assistant and how it fits a business.

What Is Hermes Agent?All resources

Launch a private AI assistant that works inside your business.

The goal is not to give you the cheapest possible chatbot. The goal is to deploy a useful assistant that can work across your actual business systems — connected to your day-to-day software, priority workflows, and business context.

See what it can do