Java is faster, even on AWS Serverless Lambda
In the last two years, serverless technologies have emerged as the next big thing in cloud computing. As we move to serverless architectures, here is an analysis of some programming principles that would apply for serverless architectures.
I heard so many people claiming that Nodejs is faster than Java in AWS Lambda. That is not true and I will tell you why below.
Threads in Serverless
When you are running an application on a server you are usually accepting requests, each spawning its own thread (or thread like entity) or you run an application that is probably multithreaded in nature whether it’s polling for work or doing some consistent work. The key thing to note here that when it’s time to choose a language for your application you might decide on a language based on the type of work you do.
Nodejs performs really well for applications that fetch and return data since it handles async io really well, better than Java in some cases. Python asyncio performs as well as Nodejs. Both of these languages are easier to write than Java.
On the other hand for data manipulation and complex logic, Java outperforms the two, though python has it’s place especially in GPU applications.
The point is, in server land, how fast you execute and run multiple threads dedicates your performance and the trade-offs you make.
In serverless land, things are different. Each request now trigger a single serverless instance. The paradigm most people are using to write serverless applications – we call them functions from now on – (and this might change) is that you spin up a function that handles a single request. You multi-thread by running multiple serverless functions.
Basically you are usually running only one thread per function in serverless land. You trigger a lambda, it will do some work and return a result. The function is not waiting for requests. It is being triggered by the serverless framework itself. We will look over how multi-threaded serverless functions are not needed and then move to language choice implications of single-threaded serverless functions.
Multi-threaded serverless functions not needed
This is not always true, since for example your AWS lambda might be responsible for triggering 20,000 other lambda functions. It does need to multi thread there. This is usually needed if you are polling for work (from SQS maybe) and then you need to go distribute that work. AWS natively provides way to support polling for serverless work natively. For example, you can use SNS or Kinesis to trigger Lambda functions directly.
In a map-reduce application, SNS or Kinesis can take care of the map part, but you would still have to implement the reduce functionality yourself. For that you might not need multithreading inside your serverless lambda. You might though. The complexity could still be there, but I am sure over the next year we would come up with better map reduce function.
Startup Time in Serverless
Because the time to spin up only threads doesn’t mater in serverless applications, the only time that matters is the time for the application to startup! Don’t forget this! The entire performance of your application could be dependent on that.
In regular server applications, especially in the cloud, we never have to worry about startup time too much. The entire process of spinning up hosts is automated for most people (through CodeDeploy, ElasticBeanStalk or other products like Heroku). You are always thinking about performance in how the language of choice handles your workload more than anything else.
Why Java is Faster than Nodejs on AWS Lambda
For start-up time, Java is always faster. This is true today on your machine, in a VM, so why wouldn’t it be true in whatever virtualization AWS Lambda runs in.
Cloudguru has a test here proving that. Specifically for just starting up and doing minimal work (Return a simple JSON), Java executes in 0.11ms on average compared to 0.28-0.32ms for Nodejs and python.
Package size doesn’t [ever]
Theoretically package size shouldn’t matter. The application has a pointer to a memory location it must start execution from and that’s it. Package size could be problematic if code can’t be brought fully into memory and thus cases a lot of memory thrashing to execute. This is rarely if ever the case.
I heard too many times that Nodejs is faster, in person and online. For example, here. The reason that is not true is because those people are running benchmark tests with limited memory, which translates into limited CPU. The power of Java comes from the fact that even when loading up it is multithreaded (Read about Java Concurrency Model here), so when you limit Java’s startup time to one CPU or less it is limited in it’s startup time.
Language Choice implications for single-threaded serverless functions
It is important that as we move towards serverless architectures, we stop to think about computer science fundamentals that would dictate how we use the architecture. This is a good opportunity for the community to reassess assumptions and invest in new technologies.
Because most serverless functions are single threaded, the performance of the function is only dependent on the time to execute the main thread in the application/initial setup and time to do the work (regardless if it’s CPU or IO intensive).
If you pick Java make sure to give it ample CPU and memory
If I convinced you to pick Java. then I would recommend you give your Lambda function adequate resources to run fast, which will mean more CPU and memory. This could save you money since your application can run faster (even if on a more expensive resource) thus cutting cost overall. Also your customers will be happy.
Considerations for picking a language
If performance/speed is all your care about,t hen go with Java.
If you care about cost over speed, then you might have to test with Java vs Nodejs and pick one with better performance (speed vs cost since Lambdas are charged per 200ms they are running).
If you care about code maintainability and cleanness, you might go with the language of your choosing. Both are powerful and because performance improvements of Java vs Nodejs turn out to be a few tens of ms, choice of language should be based on developer preference and maintainability.
Time for a new Serverless optimized language or at least a compiler
I think it is time for a new language that takes advantage of serverless architecture. Maybe Go would be an ideal language for this? What we need is compiled code that is able to run on a single threaded serverless infrastructure with great efficiency. Computer architecture trends are favoring architecture optimizations per domain and I would think a new paradigm for thinking about computer architecture would help speed things up a great deal.
*I am going to try to produce results against Mark West’s results with more CPU for Java. Results should