Understanding gRPC architecture in simple terms

15 Apr 2026 • 4 min read

Most tutorials explain how to write a .proto file and call a gRPC method. But very few explain what actually happens after that call.

I recently implemented gRPC in one of my projects and while learning it, I tried to understand its architecture from first principles — not just how to use it, but why it is designed this way and why it is fast.

This post is a simple breakdown of that.

RPC: Calling remote code like it’s local

Let's first just remove 'g' from gRPC and understand what RPC means. You have a scenario where you have to execute some code that is not on your machine but is on some other remote machine. What would you do? Write an HTTP API for this.

But here is the catch. Doing this with HTTP requires you to write extra boilerplate code and create APIs and will also degrade performance (at least more as compared to RPC).

What if there is a way to execute code on a remote server as if it were on your local machine (just like we call functions)? This is what RPC is all about. RPC abstract all the networking complexity for you.

But how does RPC actually achieve this?

Here is a simple example:

result := Add(2, 3)

This function, Add, may seem local, but it will actually be executed on a remote server by the following components.

Client: Software that initiates the RPC by calling the remote server.
Server: Software that will receive the request from the client and execute the code.
Stub: It is the local implementation of a remote server. Client will interact with it.
Skeleton: It is the server interface to the client. It is responsible for receiving requests from clients.
Protocol: An underlying communication protocol. It can be TCP or anything else.

The client will call Add function using the stub. Stub will prepare an RPC request including information about the function to execute and parameters that are required. Data will then be marshalled to send over the network. The server skeleton will receive the request and execute the code. Then the server marshals the response and returns it to the client.

What is gRPC?

RPC is just a concept, not a specific technology. RPC tells us how to "Call a function on another machine as if it were local." And the implementation is our responsibility; we can do it as we want.

gRPC is a specific implementation of RPC, developed by Google. It’s a modern, high-performance RPC framework with strong conventions. There are several key features of gRPC that make it very unique and useful.

Uses HTTP/2
Uses Protocol Buffers
Supports streaming, deadlines, cancellation, metadata, authentication, and load balancing (at least in some libraries)
Support several languages
Automatic code generation
and most IMPORTANT... It is very fast

gRPC Architecture

At a high level, a gRPC call looks like this:

Client → Stub → Protobuf → HTTP/2 → Server Stub → Service

Everything else in gRPC is built around this pipeline.

HTTP/2

HTTP 1.0 was released in 1996, and HTTP 1.1 was released in 1999. For a long time HTTP 1.1 dominated the web (and still dominates it) until the release of HTTP 2 in 2015. gRPC is also based on HTTP 2. Until HTTP 1.1 there was no way to send multiple requests and receive multiple responses in a single HTTP connection. For multiple requests we need to make multiple connections. This wastes a lot of bandwidth and degrades system performance.

HTTP/2 solved this by introducing request/response multiplexing that allowed us to send and receive multiple requests and responses in a single connection. This also fixed head-of-line blocking, where a single slow request can slow down the whole system.

Header Optimization

HTTP/2 keeps a map of headers for both the client and the server. If a header contains the same value, HTTP 2 will skip it and send headers with different values from the previous response. HTTP/2 also uses HPACK, a header compression technique that compresses the header before sending it to the client. This decreases the payload size and improves performance.

Protocol Buffers

Traditional RPC implementations were sending data as JSON or XML, which is not very performant. gRPC comes up with its own technique called "ProtoBuf" that sends data in binary and not in plain text. Unlike JSON, which is a text-based format, Protobuf uses a binary format that is inherently more efficient for machines to process.

JSON parsers must tokenize strings, handle escape characters, and convert text representations of numbers (like "12345") into binary values. Protobuf skips these expensive string operations by storing data in a more direct, machine-ready format.

This is how a simple .proto file look like:

syntax = "proto3";

service UserService {
  rpc GetUser (GetUserRequest) returns (GetUserResponse);
}

message GetUserRequest {
  int32 id = 1;
}

message GetUserResponse {
  int32 id = 1;
  string name = 2;
  string email = 3;
}

Both client and server must have same proto file as code is generated from this.

Streaming

As HTTP/2 allows us to multiplex requests and responses, gRPC took advantage of this to allow streaming requests and responses. There are 3 types of streaming in gRPC.

Server Streaming: The client sends a single request, and the server responds with a stream of multiple responses.
Client Streaming: The client will send multiple requests, and the server will send a single response.
Bi-Directional Streaming: Both client and server can send multiple requests and responses.

Load Balancing

gRPC supports load balancing to distribute client requests across multiple backend servers, crucial for high-performance microservices.

Request Cancellation

gRPC clients have the ability to cancel a request if they no longer care about the response. This is useful because once a client decides to drop a request, there’s no point in continuing the work on the server side.

You May Also Read

Before You Go

If you made it this far, Thank You.

I usually write about backend engineering, distributed systems, and things I learn while working on real problems. Not theory — mostly practical stuff that I wish someone had explained to me earlier.

I run a free newsletter where I share these kinds of write-ups. No spam. Just occasional backend engineering notes.