building your own AI knowledge assistant

Table of Contents

Volume and variety of data sources

Volume

Variety

Answer quality

Maintenance and upkeep

Privacy and permissions

Accessibility

TL;DR: What’s the best option for your business?

Share on

Combine economic tightening with the advent of AI and it’s no surprise that companies are increasingly more interested in getting their existing knowledge to work harder. When you see ChatGPT effortlessly answer complex LSAT questions, the next natural step is to wonder if an AI assistant can do the same thing with company data.

Given privacy concerns and the accessibility of LLMs and plug-and-play vector DBs, you may want to take a stab at building an AI assistant yourself.

You’ll run a hackathon, and after a day or two of hacking away, you're already seeing amazing results. By just adding some key company docs to a vector DB, fetching the nearest neighbors to a user's query, and sending them off to an OpenAI API, it looks like you're already 90% of the way there

But before you decide to productionize the entire system, there are several factors to consider.

Under certain conditions, building your own assistant is probably the right choice (more on that later). But for most companies, buying an off-the-shelf solution is better. It will likely be more accurate, cheaper, and much more time efficient in the long run.

Ultimately, the question isn’t whether you can build a workable company knowledge assistant. It’s whether building one inhouse will allow you to fully harness AI’s potential impact on your business.

(Disclaimer: The guidance in this post assumes you are considering following a vector DB backed RAG reference architecture. Dashworks’ solution differs by using live API calls at query time to avoid any data indexing. Relatedly, since I’m an engineer at Dashworks, it would be natural to assume that I’m biased, but I’ve tried to be as objective as possible in the analysis below.)

Volume and variety of data sources

Volume

The more information you have in your knowledge base, the more sophisticated your tools need to be to quickly surface the most relevant content. If your knowledge assistant only has to search through 100 or so static documents, it probably won’t be that hard to find matches, especially for highly specific keywords.

As the number of documents in your repository increases, the ratio of searchable content vs what you can fit in a single prompt starts to increase as well. Therefore, you’ll need a knowledge assistant capable of finding those proverbial needles in the haystack. That means the most relevant information related to your query—and with new content being created all the time, you’ll also want a smart solution that can determine what’s outdated or that is capable of resolving conflicts that may arise as documents are updated over time.

Variety

If you’re using a lot of common business tools— like Asana, Hubspot and Google or Microsoft suites—a pre-built knowledge assistant that readily integrates with those solutions will likely meet your needs.

Conversely, if your organization uses lots of niche, custom-built applications, you may require a purpose-built solution. That said, don’t rule out a provider just because they don’t appear to integrate with all your tools. At Dashworks, we work closely with companies to meet their needs. If you don’t see a specific integration, just get in touch, and we’ll see what we can do.

Answer quality

Response accuracy is the important metric your users will care about when interacting with your assistant. There is only so far you can go with prompt tuning, so it’s important to understand the long tail of challenges associated with getting consistently high response accuracy:

Parsing queries: For natural language questions, you’ll need to separate a single query into its component parts, understand user intent, and understand domain-specific language.
Surfacing the latest data: As content changes over time, it’s crucial to build intermediate components that can feed an LLM the latest information.
Content parsing: This requires chunking text and parsing different content types such as text, PDFs, and spreadsheets.

Even after you build the above capabilities, your assistant won’t get it right 100% of the time, so it’s important to allow your users to steer future results. Commonly referred to as “human-in-the-loop” training, building the feedback mechanisms to improve your system over time can have a similarly long tail. Some examples of features you may want to build (that you get out of the box with a pre-built system):

Prioritizing certain content for types of queries - e.g. answer finance-related questions from a specific source.
Highlighting factual inaccuracies.
Guiding the assistant to avoid answering on sensitive internal topics.

Eventually, you will need to set up data pipelines and regular ML training jobs to make sure the system does not degrade over time.

If you do have the in-house machine learning talent to support the above operations, you can probably build your own assistant. In all other cases, it’s best to go with a pre-built one.

Maintenance and upkeep

Your knowledge base is a living and ever-growing repository, and your solution needs to be able to keep up.

For example, as you create new content or onboard new apps, you’ll need a way to make sure that data stays in sync. If you’re using an in-house vector database solution, you’ll want to have a plan in place to migrate content over, update metadata, and so on. The general trade off here is the greater freshness you need, the more effort will be involved in the solution you build.

On the other hand, if your database is static or changes infrequently, you’ll have to spend less effort on maintaining its freshness. In that case, a build-your-own solution assistant could work just fine.

Another point to consider when building your own solution is the ongoing engineering effort required to support it, including tasks like deploying security updates and fixing bugs. (If you go with an outsourced solution, this work is handled by your provider.)

Privacy and permissions

Whether it’s a super-secret product launch or sensitive employee data, a lot of company information isn’t meant to be seen by all employees. Most enterprise applications have their own custom permissioning model and you’ve likely invested time into making that work for your requirements.

Depending on what you’re using your knowledge assistant for, you’ll need to figure out the right level of information access for it. For example, if the purpose is to answer questions about company HR policy then access to all widely-shared HR docs works just fine.

On the other hand, if you want your assistant to produce team- or person-specific answers, permissioning will be more complicated, and likely harder to handle in-house. It is a deceptively difficult problem that you should think carefully about before building any custom solutions (trust me, I spent a year of my life thinking through a similar problem, culminating in this talk).

Accessibility

Finally, the last thing to consider is where your employees will be accessing their assistant. No one wants to have another tool added to their already-crowded tech stacks. Ideally, your assistant will be accessible wherever work is already happening – either in your company’s instant messaging tool or within a browser tab.

The core of your product will be an API with which various frontends can communicate. One challenge here is understanding who is asking a query given the different identities they can take on various platforms. Our solution involves plugging into existing company directories and mapping users’ identities on each platform to their directory identity.

If you want to build your own, you will need a team capable of building across different frontend platforms (web, mobile, IM) and an existing HR directory with which you can integrate.

TL;DR: What’s the best option for your business?

Ultimately, it’s not that hard to build a simple knowledge assistant in-house that is capable of quickly searching within company documents.

But, as you see from the above analysis, an assistant that spans multiple use cases, produces consistently accurate results, and is personalized, private, and accessible is a much more complex endeavor.

(PS: If you’re in the market for an off-the-shelf knowledge assistant, check out Dash AI. Built using secure real-time API integrations to your tech stack, Dash AI is instantly accessible, accurate, and affordable.)

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Sign up for Dashworks

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Schedule a demo

14-day free trial

Cancel anytime

Book demo

Build Vs Buy: Choosing the Right Approach for Your AI Knowledge Assistant

Volume and variety of data sources

Volume

Variety

Answer quality

Maintenance and upkeep

Privacy and permissions

Accessibility

TL;DR: What’s the best option for your business?

Heading

Sign up for Dashworks

Explore more posts

Get a demo