Packaging ML Pipelines from Experiment to Deployment

August 22, 2025

7

As an ML Engineer, we’re usually tasked with fixing some enterprise drawback with know-how. Sometimes it includes leveraging information property that your group already owns or can purchase. Typically, except it’s a quite simple drawback, there can be a couple of ML mannequin concerned, perhaps various kinds of fashions relying on the sub-task, perhaps different supporting instruments reminiscent of a Search Index or Bloom Filter or third-party API. In such instances, these completely different fashions and instruments can be organized into an ML Pipeline, the place they might cooperate to provide the specified answer.

My common (very excessive stage, very hand-wavy) course of is to first persuade myself that my proposed answer will work, then persuade my challenge house owners / friends, and eventually to deploy the pipeline as an API to persuade the applying workforce that the answer solves the enterprise drawback. After all, producing the preliminary proposed answer is a process in itself, and will must be composed of a number of sub-solutions, every of which must be examined individually as nicely. So very seemingly the preliminary “proposed answer” is a partial bare-bones pipeline to start with, and improves by means of successive iterations of suggestions from the challenge and utility groups.

Previously, I’ve handled these phases as largely disjoint, and every section is constructed (principally) from scratch with lot of copy-pasting of code from the earlier section. That’s, I might begin with notebooks (on Visible Studio Code after all) for the “convice myself” section, copy-paste a variety of the performance right into a Streamlit utility for the “persuade challenge house owners / friends” section, and eventually do one other spherical of copy-pasting to construct the backend for a FastAPI utility for the “convnice utility workforce” section. Whereas this works normally, folding in iterative enhancements into every section will get to be messy, time-consuming, and probably error-prone.

Impressed by a few of my fellow ML Engineers who’re extra steeped in Software program Engineering finest practices than I’m, I made a decision to optimize the method by making it DRY (Do not Repeat Your self). My modified course of is as follows:

Persuade Your self — proceed utilizing a mix of Notebooks and Brief code snippets to check out sub-task performance and compose sub-tasks into candidate pipelines. Focus is on exploration of various choices, when it comes to pre-trained third celebration fashions and supporting instruments, fine-tuning candidate fashions, understanding the habits of the person elements and the pipeline on small subsets of knowledge, and many others. There isn’t any change right here, the method could be as organized or chaotic as you want, if it really works for you it really works for you.

Persuade Challenge Homeowners — on this section, your viewers is a set of those that perceive the area very nicely, and are usually occupied with how you might be fixing it, and the way your answer will behave in wierd edge instances (that they’ve seen up to now and that you could be not have imagined). They may run your notebooks in a pinch however they would like an utility like interface with a lot of debug data to indicate them how your pipeline is doing what it’s doing.

Right here step one is to extract and parameterize performance from my pocket book(s) into features. Features would characterize particular person steps in multi-step pipeline, and may be capable of return extra debug data when given a debug parameter. There also needs to be a operate representing all the pipeline, composed of calls to the person steps. That is additionally the operate that may cope with elective / new performance throughout a number of iterations by means of function flags. These features ought to reside in a central mannequin.py file that may be known as from all subsequent purchasers. Features ought to have related unit checks (unittest or pytest).

The Streamlit utility ought to name the operate representing all the pipeline with the debug data. This ensures that because the pipeline evolves, no modifications must be made to the Streamlit consumer. Streamlit gives its personal unit testing performance within the type of the AppTest class, which can be utilized to run a couple of inputs by means of it. The main target is extra to make sure that the app doesn’t fail in a non-interactive method so it may be run on a schedule (maybe by a Github motion).

Persuade Challenge Staff — whereas that is much like the earlier step, I consider it as having the pipeline evaluated by area consultants within the challenge workforce in opposition to a bigger dataset than what was achievable on the Streamlit utility. We do not want as a lot intermediate / debugging data as an instance how the method works. The main target right here is on establishing that the answer generalizes for a sufficiently massive and numerous set of knowledge. This could be capable of leverage the features within the mannequin we constructed within the earlier section. The output anticipated for this stage is a batch report, the place you name the operate representing the pipeline (with debug set to False this time), and format the returned worth(s) right into a file.

Persuade Software Staff — this is able to expose a self-describing API that the applying workforce can name to combine your work into the applying fixing the enterprise drawback. That is once more only a wrapper on your operate name to the pipeline with debug set to False. Having this up as early as doable permits the applying workforce to start out working, in addition to present you precious suggestions round inputs and outputs, and level out edge instances the place your pipeline would possibly produce incorrect or inconsistent outcomes.

I additionally used the requests library to construct unit checks for the API, the target is to only be capable of check that it does not fail from the command line.

There may be more likely to be a suggestions loop again to the Persuade Your self section from every of those section as inconsistencies are noticed and edge instances are uncovered. These might end in extra elements being added to or faraway from the pipeline, or their performance modified. These modifications ought to ideally solely have an effect on the mannequin.py file, except we have to add extra inputs, in that case these modifications would have an effect on the Streamlit app.py and the FastAPI api.py.

Lastly, I orchestrated all these utilizing SnakeMake, which I discovered about within the current PyData World convention I attended. This enables me to not have to recollect all of the instructions related to operating the Streamlit and FastAPI purchasers, operating the completely different sorts of unit checks, and many others, if I’ve to return again to the applying after some time.

I applied this strategy over a small challenge not too long ago, and the method is just not as clear minimize as I described, there was a good quantity of refactoring as I moved from the “Persuade Challenge Proprietor” to “Persuade Software Staff”. Nonetheless, it feels much less like a chore than it did when I’ve to fold in iterative enhancements utilizing the copy-paste strategy. I feel it’s a step in the proper route, at the very least for me. What do you assume?

Packaging ML Pipelines from Experiment to Deployment

Related Articles

Posit AI Weblog: mall 0.2.0

Do ChatGPT Prompts Geared toward Avoiding AI Detection Work?

Exploring the Way forward for Healthcare with Yurii Kryvoborodov, Head of AI & Information Consulting, Unicsoft

LEAVE A REPLY Cancel reply

Latest Articles

Posit AI Weblog: mall 0.2.0

Do ChatGPT Prompts Geared toward Avoiding AI Detection Work?

Exploring the Way forward for Healthcare with Yurii Kryvoborodov, Head of AI & Information Consulting, Unicsoft

Ideas on utilizing LangChain LCEL with Claude

Implementing the Hangman Sport in Python