If you haven’t tried to ship or integrate a custom chatbot into your product, you could be forgiven for believing it’s a fully solved problem. By now you’ve probably traded one or two or hundreds of thousands of messages with these new artificial parrots, and you’ve undoubtedly noticed: they’re pretty damn good. So… build your own? Why?
Well, for one, it’s a remarkably honest litmus test of both AI capabilities and your ability to compose them into a real product that users touch. The architecture you end up with is, fundamentally, the architecture of any modern application enabled by intelligence: agents, copilots, recommendation systems, autonomous workflows. Build a real chatbot and you’ve roughly built the skeleton of all of them.
For me, besides it being my day job to dogfood the Databricks platform, I happen to have a perfect repeatable use case every year: the official chat assistant for our annual Data + AI Summit.
This chat assistant is called Brickbot. It lives inside our conference mobile app, and in a few weeks about 30,000 attendees will use it to find sessions, plan their day, confess their deepest conference anxieties, and of course ask many small but important questions like: what is the Wi-Fi password? Where is the help desk? And, why is my lunch cold? A chatbot like this is about as close as you can get to a Hello World for Databricks’ suite of features. To build a real one well you’ll have to traverse most of our product surface area: data ingestion, governance, context retrieval, model serving, inference tracing, prompt management, deployment, etc.
But leveraging this nice list of Databricks’ capabilities for a chat assistant is only the way it looks this year. Brickbot has actually been shipping every year since 2024, and the path to “fully native on Databricks” has been less elegant than I’d like, mostly because of how fast this field moves. The upside though is that because our chat assistant is fairly stable in its feature set, every summer I get a fresh perspective on how much things have changed over the last 12 months.
In 2024, when we first shipped Brickbot as a last minute idea, it went out as a pretty standard Python web app with a chat interface that used Llama 3 to build its responses. It was still pretty early as far as today’s tool calling capabilities go, so all our context (“RAG”) building happened manually via a dedicated intent router. That classification too was done using a much smaller LLM call to identify whether the user was interested in sessions or exhibitors or FAQs, etc. The routing logic was just basic hand written Python control logic because the custom agent frameworks were just too complicated to bother with. The underlying event data was quite small and slow moving, so we just pulled it from our external vendor (RainFocus) each time (~daily) we redeployed the application in our Heroku pipeline. Our biggest takeaway at the time was something that many teams were independently discovering: “vector” search is cool, but mostly just in theory. The best results came from good old fashioned BM25 search, which meant no fancy embedding stores… just DuckDB and some duct tape.
The following year, 2025, short on time as usual, we opted to redeploy more or less the exact same application, but, this time, with a truly rare win: more features with less code. In fact, our PRs in 2025 were mostly just deleting code from 2024. This was pretty much entirely thanks to how much the models improved in that 12 month span. We removed all the manual intent routing and integrated our already existing context retrieval functions directly into LLama 4’s tool calling. While we were at it, we added personalization, which was also quite trivial now that we had reliable tool calling. All authentication was resolved outside the LLM so, luckily, we didn’t struggle with any of the common (and dangerous!) auth on-behalf-of issues.
The total sum of two year old Brickbot was that LLMs were steadily subsuming more and more functionality that had previously been the domain of manual code and infrastructure. So, where exactly was Databricks in this picture? Well, besides some conversation tracing using MLflow (Databricks Experiments) and then the actual LLM inference… nowhere! At the end of 2024 I wrote the following in our internal recap, as the gap I wanted to close in 2025:
More Databricks Native: We envision next year’s Brickbot built and deployed fully on Databricks. This will enable us to dogfood more of our AI and data offerings, showcase the power of our platform in a DAIS talk (and in other content), and involve other teams in using Databricks to develop an end-to-end production application.
And, predictably, 2025 came and went without closing that gap. Databricks had shipped a number of amazing features but they were too new, Brickbot’s scale too small, and our dev time too short, to give us any hope to meaningfully experiment, integrate, and then actually demo live.
Luckily, this year all of that has changed, and the new and improved Brickbot is now fully native on Databricks. I’ll be talking about those reasons in depth in my presentation at this year’s Data + AI Summit, but, for now, here’s a brief breakdown of how our little bot maps across the Databricks ecosystem:
Ingestion is facilitated by Databricks Jobs which pull data from the RainFocus REST APIs every 15 minutes and write the raw responses into Delta tables. There’s no need for real time because the data changes very infrequently, and we simply overwrite the existing data (with some circuit breakers in case the API fails) instead of mucking around with intricate change data capture patterns (it’s small!). In previous years this was just Python code as part of our deployment pipeline.
Transformation is done using Lakeflow Spark Declarative Pipelines. Since the event data includes unapproved sessions, stale speaker entries, test data etc., this is where we filter everything out and produce our final production tables ready for querying. This was fully coupled to ingestion in previous years. The benefit might look elusive for such small data, but, just a week ago we had a team ask for the event data for integration into their products, and it took 2 minutes to set up a Delta Share to enable them. This is the kind of stuff that could hold up a large enterprise for months.
Governance is entirely done by Unity Catalog, which is an infinite improvement over previous years which had a bus factor of 1 (my laptop had all the data and deployment keys). Unity Catalog manages all of our RainFocus connections, our data catalog, our function calling tools (free MCP endpoints), our prompts, and even our monitoring data. In my view, Unity Catalog is the central nervous system of the Databricks platform.
Search & Retrieval leverages Databricks’ AI Search (formerly Vector Search) which has had massive improvements from previous years. The key feature for us was hybrid search, which enables both vector based and classic BM25 together at the same time, which meant that it was a direct drop-in for our existing retrieval code.
Agents are a net new addition to Brickbot this year. We configured a Genie Space directly on our production tables and exposed it via tool calling and the REST API so our main chat could hand off support for an entirely new class of queries that we simply couldn’t achieve in previous years with basic retrieval. This year Brickbot is effectively fully natural language to SQL capable.
There’s a lot more to every one of these areas… what each looked like in practice, where it broke, what we’d do differently next year. I’m hoping to write some of that up in more detail soon. For now, the long version is the talk: Duct Tape to Databricks: 3 Years of Brickbot, Wednesday June 17, 2:00 PM.