Picking a Node.js Backend Framework in 2023

Deciding between express.js, fastify, nest.js, tRPC, GraphQL/Apollo for a modern, fully type-safe, extensible node.js backend

Picking a Node.js Backend Framework in 2023
Photo by Javier Allegue Barros / Unsplash

I love express.js.

It's what introduced me, with open arms, to the wonderful land of node.js when I migrated from LAMP (Linux, Apache, MySQL, PHP) + jQuery, a long, long time ago.

Its API was so simple, yet so expressive - (req, res, next) => { ... } was all you needed. JavaScript's weak typing played really nice with express's small, dynamic, and easily-understandable API. It was so extensible and sensible - if you plugged a middleware before routes, the middleware ran before routes; if you plugged a middleware before a controller, the middleware ran only for the controller. And its vast ecosystem, supported by its status as the node.js backend framework, made you feel really productive - just pull in anything if you need something; chances are, there's an app a library for it!

And, despite its simplicity, I've seen people use its simple nature to build abstraction layers and structures on top of it to make it more "scalable" (in the sense that you could use express for far larger and far more complicated projects, not just the typical 10-route microservice).

It truly scaled up to infinity, and scaled down to zero. It was perfect.

But it's really showing its age.

It's the year 2023. TypeScript is far more normalized than ever before (and its tooling far more mature than it ever has been), new, “established” methods of building node backends have popped up (and died) over the years, we’re supposed to package things in a “new” way now (require vs. import), we now have an actual choice in the how the frontend talks to the backend (REST vs. GraphQL), and just… things have aged.

Things that were “state-of-the-art” are rotting. Bits and pieces of the ecosystem have slowly died away, or have been downright hijacked. People have moved to new and different stuff in an effort to replace the old (and the unmaintained).

And it’s really showing.

In my application (Blink, a link shortener), I am already struggling to dig my way out of the tech debt hole without completely rewriting things from scratch (which, at this point, seems like an inevitability).

Here are just some of the “hard-to-fix-without-a-rewrite” problems:

  • CJS to ESM transition (hint: it’s not as simple as replacing require with import) - many libraries (cough Sindre Sorhus cough) now require ESM throughout your entire codebase (you can't have part ESM and part CJS - it's all or nothing); so you can’t even update your libraries if your codebase is CJS.
  • TypeScript migration (hint: it's not as simple as renaming .js files to .ts) - you must first build the tooling around it, the build system, and ensure that whatever you're writing "plays well" with typescript (so that you don't have to make up types 5000 times for a single object); that the types get passed around nicely.
  • Adding a public schema (hint: it's not as simple as writing a schema and calling it a day) - to avoid having to write what are essentially duplicates of the schema over and over again, you need to make sure that all of the schema's consumers have integrations to consume it "as-is", without needing to maintain a separate "form" of the schema that the consumers require.
  • Getting rid of old, deprecated, and/or unmaintained libraries (that are core to how the framework functions) - without an ecosystem that is actively being developed, it's really hard to "replace" one part without breaking everything, because of the convention that holds them together. And these libraries lie at the core of express's ecosystem, to boot!

It's clearly not working out.

Looking for a Replacement

If I'm going to have to rewrite the thing, I might as well make it count, so that I don't come back to this a few years down the road and face the same "oh shit we gotta replace the core of the system because it's just aged really badly" problem.

So, what do I need in order to a. solve the problems of today (outlined above), and b. make sure that even as the node.js landscape continually changes, it will still be adaptable enough to not require a full rewrite (or at least, "lean" enough so that I can replace it without a ditching everything else as well)?

  1. Think of how the backend is going to be consumed, end-to-end. Having a holistic approach prevents the problem of having a great backend that is a huge pain in the ass (and requires duplicative work) to integrate into our frontend, into our API consumers, etc.
  2. Choose something that is composable - key to having something that can be "swapped out" is making sure that the thing doesn't spread its tentacles everywhere; obviously, with enough architecting, you could technically get around this. But some designs just naturally lead to a more "pluggable" approach than others (i.e. the whole "framework vs. library" debate).
  3. Figure out which projects are not only currently actively being developed, but will continue to be actively developed. As outlined on my post on why the node.js ecosystem is such a shitshow (I'm paraphrasing here), there's only two real solutions to this: a consolidated, grassroots community backing, or commercial support (so that the authors have the incentive and the means to keep maintaining things).
  4. Figuring out what will be the driver of the whole application. Do we write the frontend first and then write the backend to easily support whatever the frontend demands (the GraphQL approach), or force the frontend to be nothing more than a simple consumer of the backend (the REST approach)? Do we use the schema as the source of truth and generate the types from it ("schema-first"), or use the types as the source of truth and generate the schema from it ("code-first")?

We'll get back to the last point later, but first, let's get to our contestants:

  • express.js. Yes, it shows up again. Yes, continuing to use it is an option. Yes, we can renovate and fix things up to the best of our ability and call it a day. And, frankly, its strengths still shine to this day - it's still very extensible, it's still very "light" in abstractions (allowing you to build over it), it's still "the standard". There's a reason it's still widely used in greenfield projects, today.
  • fastify. Rather than koa.js (who has many router libraries(!), one of which actually got hijacked), fastify is turning out to be the heir apparent to express's throne. Not only are there multiple, active core maintainers (who, by the way, are also actively involved in the development process of node.js itself), but its actually actively-maintained ecosystem - partly by its core authors, partly by the thriving community around it - stand out as its biggest strength. Still, as someone who "grew up" with express, fastify's "plugin" model really doesn't really "flow" as well as express's middlewares.
  • nest.js (not to be confused with next.js). Its tagline should honestly just be "Spring, but in node" (let's just hope it's the Spring Boot kind, not the Spring MVC kind). It is very structured, which, while it might be overkill starting out, should lend very well to building nicely structured and architected apps down the line, especially if you have more than one person working on the codebase. Just like Spring, its core strength is in Dependency Injection (DI) and type safety, which is very much welcomed in the field of many, many dead typescript DI libraries and frameworks. And just like Spring, it takes a "corporate" approach to long-term sustainability.
  • and last but certainly not least, tRPC. This upcoming "hot new thing" is all about one thing - end-to-end type safety, and it's really good at it. Just write a backend in typescript, and you can just... use it directly on the frontend, with full type safety. No compilation or schema generation required - It Just Works(TM)! And you can plug in a backend of your choice (express/fastify/node-http) to run the actual server. It's very well maintained and is currently very actively being maintained, with a bus factor of >1 (though there is a "main guy" whose contributions trump everyone else's, by far).

All very solid options, but let's look at what our needs are to see which ones suit them the best.

First off, express fails on account of everything the type safety (or lack thereof), and a community which, while they have built impressive things up, has largely disappeared, leaving its projects to visibly rot.

I'm sorry, little one.

fastify, nest.js, and tRPC all seem to fit the bill - at least, if you squint at them from a distance.

All of them are actively maintained. All of them have enthusiasts advocating for them. All of them are basically products of having learned from mistakes of the past (each focusing on different mistakes). All of them are type-safe (at least, on their own), and all of them solve the "ecosystem slowly dying" problem of express.js in different ways (nest.js and tRPC simply piggyback off of express and fastify's own ecosystem; fastify has teams dedicated to building supplemental libraries, and is winning mindshares to help bolster the community development).

To help differentiate between the three, we need to go back to our requirements, and see what we really need.

Application Development Process

So, this is the time to answer the questions we left unanswered, in order to help narrow down our exact requirements:

  • "backend first" or "frontend first"?
  • "code first" or "schema first"?

And let's start by clarifying what we're building.

My apps follow the standard 3-tier architecture, with a frontend, a backend, and a database. The frontend talks to the backend via a standardized API (i.e. the frontend is an SPA), and the API should also directly be consumed by API clients.

Side note: this type of app is surprisingly common. In particular, you'll see this type of app littered across the corporate sector in form of an API (which is consumed internally by other apps and APIs), with an admin page (usually CRUD) attached to it, to basically get a glimpse of the problem domain the API was designed to solve.

Therefore, since the API serves both the frontend and API clients, solutions that tightly bind the frontend with the backend are not a good fit.

This is the reason why I didn't even list the GraphQL-based frameworks (Apollo/Nexus/TypeGraphQL/whatever) as an option - while you could theoretically get your API clients to consume GraphQL and the likes, it's... generally not common, and is a much less "established" way of publishing an API (not to mention the fact that it complicates caching to the point where it's just not worth the hassle for most teams).

Which is also why tRPC is out.

In fact, the tRPC team itself specifically points out that it's probably not the best fit when you expect to have 3rd-party consumers.

Another knock against tRPC is that since the types are derived from the routes, it's hard to test things that make up a route (i.e. "end-to-end" testing of a route vs. unit testing of components that make up the route). This is because it's hard to "crack open" the routes and extract the types from them for use in stuff like controllers and models, without causing circular dependency (controllers/models/services/etc need types from the routes, but the routes need to import the former for functionality).

This is also why extensibility is important - because it allows you to test the individual pieces, rather than having to bring out the blunt (and flaky) instrument of live endpoint testing for everything - something that tRPC's architecture naturally leads to.

Note that I'm architecting not for technical "scalability" or anything, but rather for organizational fit (i.e. Conway's law): I want people to be able to individually develop APIs and apps for a specific problem domain, and 1. to be able to take a peek into said problem domain using a nice, simple, UI, while also 2. being able to just "consume" others' work via well-defined APIs, without tight coupling, without the complicated inner workings of that problem domain.

Schema Driven Development(?)

As we need to design for the APIs to be consumed by third parties, documentation of the API endpoints themselves will be an important part of the development process.

Which brings us to schema vs. code: if we want documentation of the API endpoints, the only “real” solution here is OpenAPI/Swagger (we’ve already ruled out GraphQL), meaning the ultimate source of the documentation have to be the OpenAPI yaml files. And while there are ways to generate the OpenAPI document from code/types/etc, they are all… rather clunky and involve either copious amounts of "magic", or lots of duplication.

In particular, typing the individual routes' body types, params, and queries using annotation/decorators and "joining" them just... don't work well for describing the entire API, which is more than the sum of the routes' payload shapes. If we're going down the field annotation/decorator approach, we need to be able to annotate the route and the server itself, not just the payloads.

So then, do we just write a singular OpenAPI document and just create everything from it, backend included?

No.

Remember that we rejected certain solutions above due to the fact that the same API had to serve both the frontend and various clients. That means development has to be "backend first", and "backend first" means developing the backend, route by route, and having the OpenAPI file document the routes, not define them. This is for several reasons:

  1. As you change the schema, the backend routes have to be generated from it over and over again. This is a tricky process, is often messy, and can lead to potential issues with accidentally erasing or adding routes because you made changes to the schema for a route that already exists.
  2. When you generate the routes from the schema, the generator has no context of what the routes should look like, how they should be setup and configured, what properties to expect, which middlewares to use, etc. The net effect is that you constantly have to modify your routes as you generate them from the schema.
  3. And if you don't generate the routes from the OpenAPI document, but instead simply "import" it or its generated types into the routes, then the OpenAPI document isn't really the "source of truth" anymore, and you can have routes that are defined in the document but haven't been implemented in the backend. Talk about false advertising!

In other words, the schema contains a subset of the backend's information.

This is also why tRPC derives its types from the routers, not the schemas.

And when generating stuff (whether it be types, schema, or code), it's always more natural to go from something that has the most information and then "distill" it down into other forms, rather than to start from something that has a subset of the information that you need, and then provide the rest of the information and "join" them during the generation process (eww).

Does using the backend as the "source of truth" mean we're giving up on our schema-based approach? No. It only means the individual routes will be based on the schema (and OpenAPI documentation); the overall document will then be generated from those routes, "concatenating" it all to a singular OpenAPI document (this also circumvents problem #3 outlined above).

And that singular document can be used to generate types (getting around tRPC's problems with testing), API documentation, and even frontend clients...!

In fact, generating frontend clients from the OpenAPI document is very natural due to the fact that the frontend is a mere "consumer" of the API, not to mention the fact that the frontend's interaction with the API requires only a subset of the information that the OpenAPI document contains.

Furthermore, as we generate the singular document across commits, we can easily see which sections have been added/removed from the diff on that YAML file, and thus how each change affects the API at a glance.

Determining the Fit

So let's now go back to our candidates. At this point, the only ones remaining are fastify and nest.js, and both of them do allow - via different methods - the schema-based approach I described above.

In fastify, you define the JSON Schema (plus descriptions) for each route, and using type providers, it can directly feed the route's input validation and typing needs (though you can also generate the types from the combined schema; which, while cumbersome, gets us around the tRPC problem of having the types being "locked" to the routes). Then, the routes form the server, the server can generate the OpenAPI document, and the document can do everything else.

In nest.js, you do things slightly differently. Basically, instead of the JSON -> route -> server -> schema -> whatever workflow of fastify, you are expected to define the schema using DTOs (basically just typescript classes that are used to define the 'shape' of objects), and then use the framework-provided decorators for literally everything - validation, documentation, type definition, etc.

Thankfully, there's a CLI plugin that basically infers most of the stuff you need by inspecting the AST (which is like, it's so cool that it does that, but also at the same time, it's such a comically big gun to bring out just to make the OpenAPI schema reflect what I've already defined in my typescript classes).

Still, it works well enough for 99% of cases, and you always have decorators to fall back on. Plus, the approach really makes sense in the context of nest.js, because in nest-land, literally everything is decorator-based, due to the fact that they take a very class-centric design, and rely heavily on things like class-transformer and class-validator.

So if you're into that class-centric design, you'll get a very good typescript experience, with the "magic" of the CLI plugin basically automatically handling most things to generate the document with minimal effort on your end. Plus, the DTO-based design means you get an "on rails" experience for typing the websocket subscriptions as well (OpenAPI doesn't handle websockets), which is really nice!

However, I don't really plan on using websockets that much (and even then, you can just always have a "shared package" of types/DTOs that both the frontend and the backend can consume), and I just don't feel comfortable going balls-deep into a very framework-specific approach with its magic CLI and proprietary decorators... I just feel that it would be very hard not only to migrate away from, but also to just keep the framework-specific stuff away from the rest of the codebase.

Honestly, at this point, I should do a PoC with both frameworks, as both fit the bill. Both are capable of doing what I ask for, and at this point, it's really just a matter of personal preferences, not technical differentiators (yes, there are small differences you can pick out, but squint hard enough and they look the same, provided you are experienced enough - like me - to be able to build structures and architecture on top of fastify).

I'll definitely be trying both when I eventually (I've been saying "eventually" for 2 years now, dear god) get to a v2 rewrite of Blink, and see which method I like better. Who knows, maybe I might end up liking the class-based approach? Though throughout my career, I have always, always ended up hating "on rails" solutions and ended up building hacky stuff to get around the guardrails, so maybe I'd prefer building my own structure on top of fastify. Who knows!

Conclusion

For me, the conclusion is simple - just try building a backend with both fastify and nest.js, and see which I hate less.

For you, the takeaway isn't that one framework is "better" than the others, or that GraphQL is trash. Instead, it's that you need to evaluate your own needs - not just the what you're building, but also the how.

Different frameworks/libraries fit different approaches, have different quirks, and favour different designs - designs that may or may not run counter to your desired workflow.