How we managed to build our open-source content library crazy fast
#nlp #machinelearning #datascience #opensource
Our newest project here at Kern AI offers you a fantastic library of modules called bricks, with which you can enrich your NLP text data. Our content library seamlessly integrates into our main tool, Kern AI refinery. But it also provides the source code for all the modules, providing you with maximum control. All modules can also be tested by calling an endpoint via an API.
We managed to build this incredible tool in just less than two months, thanks in large part to the amazing team at Kern, but also in part to the stunning capabilities of DigitalOcean's App Platform. We also managed to increase development by automating large parts of our content management system.
Let's first talk about the general structure of bricks and then dive into a little bit more detail!
How bricks is structured
bricks is built using four components:
- Frontend using next.js built with Tailwind UI
- Backend with Strapi and a managed PostgreSQL database
- Service to serve the live endpoints on bricks
- A separate search module to easily find modules
The overall design of bricks should be fitting to the one that is also used in refinery. The bricks UI is created using React and deployed via NextJS. We also used Tailwind for the UI elements.
For the backend, we use Strapi, which is an awesome open-source content management system. Strapi is connected to a PostgreSQL database to store all the content that is displayed on bricks. The frontend connects to the backend via an API to then display all the content.
Managing content with Strapi itself is super easy, but to make things even more easier for us, we wrote an automation script that is able to fetch new modules created for bricks and automatically add them to Strapi. That's why the source code of a bricks module needs to be in a specific format in order to be added to Strapi.
Every module can be tested right from bricks itself. On the right side of every module, you'll see a window that allows you to try out the module without the need to install anything.
Providing this was very important for us, as we want users to find out what exactly they get with every module and test the module with some of their own data. The default input is usually some text or sometimes some additional parameters for the endpoint.
To quickly find the right modules, we also build a custom search module. The search module uses a small transformer model to embed all the names of the module, which can be searched very quickly.
Let's now take a closer look at the technologies we used to quickly get bricks live.
Leveraging DigitalOcean's App Platform
The App Platform is a convenient and cheap way to deploy your web apps. Instead of deploying an app on a virtual machine that you'll have to manage yourself, the app will run in a Docker container. That way you don't have to think about the underlying infrastructure and also get the benefit of easy scalability. It's also a bit cheaper than hosting your app on a single VM. In the case of DigitalOcean, you also get the option to auto-deploy from a GitHub repository, which is super handy.
There are many cloud platforms out there offering such a service, but for our purposes, we chose to use DigitalOcean. This post is not sponsored by them, we just like their service a lot.
To get started with bricks, we used this tutorial on how to deploy Strapi to DigitalOcean. We highly recommend you to check it out as well if you would like to use Strapi on DigitalOcean, as it was really helpful to get us started with the project.
Auto-deploy from a GitHub repository
To deploy on DigitalOcean, you can simply attach a GitHub repository, from which the app will automatically get deployed. In our case, we use the auto-deploy function for our endpoint service, so that new modules added to bricks will automatically get integrated.
But before we can do that do that, we first need to deploy our backend and frontend components. To keep things clear, we deploy them separately and also store backend and frontend in a different repository. DigitalOcean also allows you to connect your app to a managed database, which is super convenient.
Setting up a managed database
Before we can deploy the backend, we need a managed PostgreSQL database first. DigitalOcean offers many different database types, but PostgreSQL should be just fine for our needs. When deploying Strapi on DigitalOcean, you can also choose a cheaper dev database for your app. However, we had a lot of trouble getting that dev database to run, so we instead directly went for the managed database that is meant to be used in production anyway.
Creating an App on DigitalOcean
Next up, we are going to create our first App on DigitalOcean. The app will host the Strapi backend of the site and will be connected to the managed PostgreSQL database we created in the previous step. Deploying the backend is fairly easy, you simply select the GitHub repo and the fitting directory you want to deploy, and DigitalOcean will handle all the rest for you. You can also opt-in for auto-deploy, and your app will be redeployed whenever there is a new change to your repository.
Creating a second App for the frontend
While it is technically possible to host the backend and the frontend on the same app, we chose not to do that. Setting up the frontend was much easier in a separate app, and apps are very cheap in general, so we would only save a few dollars if we would've deployed on the same app. So we thought it would not be worth the hassle. The frontend gets all the information from the backend via a simple API call, so the frontend and backend don't need to be connected in any other way, too.
Building the second app for the frontend is essentially the same procedure as for the backend. You simply select the repository and the directory and let DigitalOcean do the work for you.
Deploying the endpoint app
Once backend and frontend are up and running, we need to deploy the service that is running our endpoints. Otherwise, a user would still be able to access bricks and check out the modules, but they couldn't directly try them out on the site itself.
The procedure is the same as before: connect your GitHub repository and deploy a containerized application via DigitalOcean. The endpoint service is using FastAPI to deliver the results of each endpoint to bricks. So far, a single service is enough to serve all the 50+ endpoints we have available on bricks so far.
Using bricks to quickly enrich dataset for NLP
We hope that you liked this insight into the structure behind bricks. You can try out bricks here to inspect the result for yourself.
If you have any questions or feedback you would like to share, feel free to reach out to us any time. Have fun using bricks!