Off-heap Hash Map in Java, part 2

I spent some time trying to re-produce issue we had in production which justified usage of Off-heap hash map. And it was a total fail! In theory I knew that it is related to cases when app needs huge maps consuming almost all heap memory and those maps are not static, they are constantly changed triggering Full GC cycles. Anyway I got some interesting results just comparing time and memory usage of map population.

So I had this simple code to create a map. OHM uses same interface as HashMap so it is pretty simply to test them both.

public class HashMapTest {
    public static void main(String[] args) {
        //final Map<String, String> map = new HashMap<>(15_000_000, 0.9f);
        final Map<String, String> map = new OHMap<>(15_000_000);
 
        for (int i = 0; i < 10; i++) {
            map.clear();
 
            System.out.print("Loading map...");
            long start = System.currentTimeMillis();
            populateMap(map);
            long end = System.currentTimeMillis();
            System.out.println("Done in " + (end - start) + "ms.");
        }
    }
 
    private static void populateMap(Map<String, String> map) {
        for (int i = 0; i < 10 * 1000_000; i++) {
            map.put(String.valueOf(i), UUID.randomUUID().toString());
        }
    }
}

I started with OpenJDK 8 and as expected OHM was slower than HashMap: 33ms vs 23ms. But memory consumption is quit opposite! I had to pump -Xmx to 3Gb to make HashMap test work and total memory used by Java process was 3181Mb. OHM worked even with -Xmx1G though total memory consumption was also close to 3Gb.

Now the most interesting results (prompted me to post this) I got from using OpenJDK 11! The performance difference between HashMap and OHM was shocking: 17ms vs. 34ms!!! And memory consumption for HasMap test with -Xmx3G was lower than 3Gb!

Undoubtedly Java engineers did a good job with core JDK11. With such results I may no need of OHM in production when we switch to Java 11. But I still wasn’t able to re-produce the state of continuous GC cycles with huge hash maps. My next try will be adding multi-threading to get closer to production use case.

Off-heap Hash Map in Java, part 1

A month ago I had to deal with an interesting case: huge hash maps in Java! When input started to grow we switched to larger AWS instance. But that didn’t help because input kept growing. And I observed huge heap consumption and very long GC pauses. Essentially application was partly doing its job and partly doing GC. But when it started to hit max heap I had to do something. And I started investigating off-heap solutions.

My “googling” quickly led me to two Java solutions: Java Large Off Heap Cache and Binary Off Heap Hash Map. Both solutions are treating keys and values as blob of bytes. I chose BinaryOffheapHashMap because it is a small code which I can understand. Even that code was experimental it solved my task: creating a hash map outside of GC world. You can read more about that project here. OHC looks more “professional” and is something I will try next time.

So my experiments allowed me to look on Java from a different angle: “greedy” memory consumption and really nasty GC cycles. I will publish my test results in my next post.

Few Optimizations for my Blog

It’s been a while I touched any of configurations. But few days ago I was reading DZone article about web application performance and tried one of the tool described there – Google’s PageSpeed Insights. And I was slightly disappointed to see that my WordPress is not 100% optimal! While nothing I can really do with recommendations about JavaScript, CSS or even images (unless I hack into WordPress), I found that enabling compression is doable. StackOverflow is the winner again. So I created /etc/httpd/conf.d/deflate.conf with this content:

SetOutputFilter DEFLATE
# mod_deflate configuration
<IfModule mod_deflate.c>
# Restrict compression to these MIME types
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xml+rss
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE text/css
<IfModule mod_headers.c>
# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
</IfModule>
</IfModule>

After restarting Apache and running PageSpeed again I got 93/100 for Desktop Optimization!

After updating Sucuri plugin I also noticed one new security recommendation: Disable Server Banners. Essentially they recommend to turn off any information exposing your server version and modules. For that I just added two lines to /etc/httpd/conf/httpd.conf:

ServerSignature Off
ServerTokens Prod

And the last minor note that I had no issues with upgrading my AWS instance to Amazon Linux 2018.3. And they actually help you to do that in motd:

sudo yum clean all
sudo yum update

And you will get latest Linux 4.14 kernel and bunch of updates. I encountered no issues with my WordPress after restarting my box.

Let’s Encrypt on Amazon Linux

More then a year ago I installed SSL/TLS support for this blog using Amazon’s guide. Now that certificate has expired and I need a new one. This time I decided to use Let’s Encrypt because I have successfully used it for my other projects. And it was actually very easy:

wget https://dl.eff.org/certbot-auto
chmod +x certbot-auto
./certbot-auto run --apache -d blog.apalagin.net

This tool will complain that Amazon Linux is experimental. But I had no issues with that and it did all the work for me! Then only caveat is that Let’s Encrypt certificates expire in 2 month, so you should add a cron job to renew it regularly. For example, something like this in your /etc/crontab:

39 1,13 * * * root /home/ec2-user/certbot-auto renew

I also should mention that there is a next version of Amazon Linux – 2.2 – and you can install Cerbot there from EPEL repository.

 

QConSF 2017

About

QCon has been hosted by InfoQ in 11 years in a row. It attracts a lot of engineers from all other the world. All famous brands like IBM, Oracle, Google, Netflix, LinkedIn etc. tend to have at least few talks there. QCon describes itself as a conference for senior engineers with emphasize on practical approaches.  As a InfoQ reader I decided to give a try for its San Francisco version.

This year QCon hosted 175 speaker across 18 tracks! I’ve visited three days of presentations and three workshops. To be honest I wasn’t very comfortable with how crowded this conference was – more then 1600 people registered! I’ve been only on one US conference before – AWS Community Summit 2017 – so I don’t qualify to give grades, but in my opinion the organization and IT infrastructure was much above my understanding of  “standard”. Even breakfasts/lunches were thought trough so people don’t need to stay a lot in the lines.

The material quality and its diversity was pretty good and some times surprising. I was also impressed by the number of “big” companies playing as sponsors and speakers. I attended sessions hosted by IBM, LinkedIn, Oracle, Reddit, Docker, AWS. There were two exhibition rooms were you could talk to engineers/managers from Microsoft, Pivotal, MySQL, Vaadin, AppDynamics, Azul, RedisLab, MongoDB! There were pretty long  breaks between sessions (25 minutes) where you could share your thoughts with either presenters or your college engineers. I personally met with engineers from Canada, Norway, Poland, Netherlands, US Cincinnati, US Texas, Russia and Ukraine!

Hypes and Buzzwords

Some say that QCon is an indicator for new  “big things” and that QCon have spread the “Microservises” hype first. Based on 2017 track titles and there popularity (by votes) Microservices is still a #1 buzzword: I counted more then 20 sessions mentioning “microservice” or “service” word! The next big thing this time was Chaos Engineering: Chaos Architecture got The Best Attended mark! And I would give #3 for Serverless and Containers because those themes were very connected to Microservices.

IMHO

I have never attended large conferences before and it was essential to me to try one of the best and understand the importance and role conferences play in day-to-day life of any engineer. And… I am confused. I am not disappointed, no. I just proved to myself low usefulness and minimal feedback from a conference if you treat them wrong. I will try to elaborate on that statement. Basically in the Internet era everything can be found and learn from online resources. Literary every thing! Services like Coursera or Udemy will even force you to learn stuff b/c you payed your hard own money for that 🙂 So if you were looking for “new” material and “secret” knowledge you’d be very disappointed. The true meaning is to share knowledge and give or receive feed backs! So the real pearls of any conference are Open Spaces or “Ask Me Anything” sessions where you can get “secret” or even “sacred” knowledge from authors, maintainers or early adopters! Though it doesn’t mean conference useless if you do only presentations and workshops. It still can be very useful for anyone who can’t or don’t want to track all changes in IT world via Internet. Or if you want to hear/try something completely different, something out of your daily duties. I think there is also another benefit: just to make sure that you (and your company) are not insane and you are doing right thing e.g. using right frameworks, tools, databases etc.

Java Tutorial: JNDI Trail Tips for OpenLDAP

With every major release of JDK I quickly review Oracle’s Java Tutorial for any updates. I did that for JDK 8 and will do that for JDK 9 soon. Usually I skip trails like JNDI or JavaFX because I don’t use them at my job. But few months ago I decided to read JNDI trail and want to share some tips I had to use.

Server Setup

So you will need an LDAP server and tutorial refers reader to few vendors. I try to use implementations for Linux and sure there is one – OpenLDAP. Given that my desktop is Windows I have to run it in virtual machine. And for that I use VritualBox + Vagrant. Here is my Vagrantfile (configuration for Vagrant):

Vagrant.configure("2") do |config| 
  config.vm.box = "ubuntu/trusty64" 
  config.vm.network "forwarded_port", guest: 389, host: 1389
  config.vm.provision "shell", inline: &lt;&lt;-SHELL 
    export DEBIAN_FRONTEND=noninteractive 
    apt-get update 
    apt-get install -y slapd ldap-utils gnutls-bin ssl-cert
  SHELL 
end

It deploys Ubuntu an installs all necessary tools. After VM is up and running, you need to login (vagrant ssh) and re-configure slapd for tutorial needs. This official guide helped me a lot.

So first thing is to re-configure OpenLDAP:

sudo dpkg-reconfigure slapd

It will ask for domain name. Use something easy, like example.com. Then it will ask for Organization Name. Enter “JNDITutorial”. Then it will ask for administrator password. Don’t forget it 🙂 For any further questions you can safely use default values.

Next step is to update LDAP with schemas used by tutorial:

sudo ldapadd -Q -Y EXTERNAL -H ldapi:/// -f /etc/ldap/schema/java.ldif
sudo ldapadd -Q -Y EXTERNAL -H ldapi:/// -f /etc/ldap/schema/corba.ldif

Next thing is populating DB with test data. JNDI trail has a link to tutorial.ldif. You need to download and update its DN names to our installed server: if we used example.com as domain name, then our full DN will be o=JNDITutorial,dc=example,dc=com and we have to ensure that in the file:

sed -i 's/o=JNDITutorial/o=JNDITutorial,dc=example,dc=com/' tutorial.ldif

Now you can upload test data (this is where you need to use admin password):

ldapadd -x -D cn=admin,dc=example,dc=com -W -f tutorial.ldif

There is a big chance you will get something like this:

ldap_add: Object class violation (65)
 additional info: invalid structural object class chain (alias/organizationalUnit)

Ignore that – it doesn’t affect tutorial.

Connection and Authentication

The connection string in JNDI examples must be slightly modified – you have to specify full DN and correct port. Given our configuration and domain example.com env initialization should look like this:

Hashtable&lt;String, Object&gt; env = new Hashtable&lt;&gt;();
env.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.PROVIDER_URL, "ldap://localhost:1389/o=JNDITutorial,dc=example,dc=com");

Examples where something is updated or created require authentication. By default OpenLDAP excepts simple authentication. In this case you have to add additional settings to env:

env.put(Context.SECURITY_PRINCIPAL, "cn=admin,dc=example,dc=com");
env.put(Context.SECURITY_CREDENTIALS, "password");

Digest-MD5

Example with Digest-MD5 will not work w/o additional modifications. This is what I did to make it functional (thanks StackOverflow). First of all sasldb must be accessible by slapd:

sudo adduser openldap sasl

Then OpenLDAP hast to be configured to use sasldb directly. Create sasldb.ldif file:

dn: cn=config
changetype: modify
replace: olcSaslAuxprops
olcSaslAuxprops: sasldb

And update OpenLDAP configuration with it:

sudo ldapmodify -Q -Y EXTERNAL -H ldapi:/// -f sasldb.ldif

Last thing is to create user in SASLDB. For example user “test”:

sudo saslpasswd2 -c test

That’s it! Now you will be able to connect to OpenLDAP using these environment configuration:

Hashtable<String, Object> env = new Hashtable<>();
env.put(Context.INITIAL_CONTEXT_FACTORY, 
  "com.sun.jndi.ldap.LdapCtxFactory");
env.put(Context.PROVIDER_URL, 
  "ldap://localhost:1389/o=JNDITutorial,dc=example,dc=com");
env.put(Context.SECURITY_AUTHENTICATION, "DIGEST-MD5");
env.put(Context.SECURITY_PRINCIPAL, "test");
env.put(Context.SECURITY_CREDENTIALS, "*****");

SSL and Custom Sockets

OpenLDAP does not support SSL/LDAPS out of the box. Instead server guide instructs you how to configure TLS which allows to negotiate encrypted connection using the same server port. Though TLS case is slightly different then just SSL protocol – it requires to use JSSE extension. There is a detail trail here. In short: environment settings are the same as for unencrypted connection, but all your work has to be done inside encrypted TLS session:

StartTlsResponse tls = (StartTlsResponse) ctx.extendedOperation(
  new StartTlsRequest());
tls.negotiate();
// Do your work with LDAP context
tls.close();

The important step to make that work is adding server certificate to JRE keystore. Otherwise your connection will fail. So if you followed OpenLDAP guide then copy /etc/ssl/certs/ldap01_slapd_cert.pem to your local machine (or /vagrant for Vagrant). And then use keytool to import it:

keytool -importcert -alias jnditutorial ^
-file ldap01_slapd_cert.pem ^
-keystore "C:\Program Files\Java\jre1.8.0_151\lib\security\cacerts"

Although this is a Windows example, Linux or Unix would be very similar. Note that keystore is called cacerts (not jseecacerts). Also note a little caveat: if you have both JDK and JRE installed there is a big chance calling “java” runs JRE’s JVM, not JDK’s one.

CMake and MinGW

I used to build my C/C++ toy projects in Code::Blocks, but now I have moved to CLion. And that IDE uses CMake under the hood. It automates all processes (same as Code::Blocks btw), but I was interested how to build my project w/o IDE. CMake has a nice and short tutorial, but it was missing the main point: how to start the build!!! I had to surf Internet for other tutorials for that. One of them gave me some clues. But if you follow it then you may have some interesting troubles. First of all, never run cmake in the source folder! Create a separate folder like “build” and run cmake there. Secondly, if you have MS Visual C++ compiler then cmake will detect it and use it. That wasn’t my goal. So I had to read another tutorial which gave more insight. And then I realized I should have just read cmake –help more carefully 🙂

Anyway here is a short note how to run cmake with MinGW:

mkdir build
cd build
cmake -G "MinGW Makefiles" ..
mingw-make

Explanation:

First of all don’t forget to install CMake from official website (or use choco). Secondly add it to user’s PATH variable. Then you can open command line and go to your project source. CMake generates tons of files and that’s why we better run it in another folder.

Note that running cmake for Linux/MacOS is similar: just use -G “Unix Makefiles” and then make!

Upgrading WordPress to PHP7 on Amazon Linux

Major of instructions were taken from StackOverflow. Though I didn’t follow all steps plus I also had to deal with SSL module. Anyway the migration was fast and flawless. Just don’t forget to backup you website 🙂

Here are my instructions if you followed AWS tutorial to setup WordPress on Apache with SSL. Note that following these instructions is relatively safe and doesn’t corrupt any WordPress files (if you on Amazon Linux).

  1. Stop Apache and remove httpd 2.2 and PHP 5:
     sudo service httpd stop
     sudo yum remove httpd* php*
  2. Install Apache 2.4 and mod_ssl
    sudo yum install http24
    sudo yum install mod24_ssl
    
  3. Install PHP 7 and required modules
    sudo yum install php70
    sudo yum install php70-mysqlnd
    sudo yum install php70-gd
    
  4. Update Apache configuration to react on index.php files:
    sudo nano /etc/httpd/conf/httpd.conf

    Find dir_module section and update it to:

    <IfModule dir_module>
      DirectoryIndex index.html index.php
    </IfModule>
    

    Find <Directory "/var/www/html"> and update it to:

    <Directory "/var/www/html">
      Options Indexes FollowSymLinks
      AllowOverride All
      Require all granted
    </Directory>
    
  5. Now it’s time to copy back your SSL configuration:
    sudo mv /etc/httpd/conf.d/ssl.conf.rpmsave /etc/httpd/conf.d/ssl.conf
    
  6. Final steps: adding httpd to boot sequence and launching it:
    sudo chkconfig httpd on
    sudo service httpd start
    

Voila! Your WordPress should be back online running on PHP 7! Many thanks to WordPress, PHP, Apache and Amazon people who surely worked hard to make such transitions so simple and burden-free.

TensorFlow on Amazon Linux

This time I had to install Google’s TensorFlow for my wife’s study projects. Unfortunately TensorFlow officially supports only Ubuntu Linux and I didn’t find any tutorial for Amazon Linux. But I was able to find something for Cent OS which is very close, thanks to Tim Hoolihan!

First of all I had to install prerequisites. Note that I didn’t use Python’s virtual env. We use Jupyter Notebook front-end for study projects and I don’t know if virtualenv would be handy there.

sudo yum -y install epel-release
sudo yum -y install gcc gcc-c++ python-pip python-devel atlas atlas-devel gcc-gfortran openssl-devel libffi-devel
pip install --upgrade numpy scipy wheel cryptography

Then we have to install TensorFlow package from URL we can find on TensorFlow.org. I choose Python 2.7 package with CPU-only support. GPU support requires much more “dancing” and is not recommended for newbies.

sudo pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl

Secured Jupyter Notebook on Amazon Linux

My last post was about running Jupyter remotely and using SSH tunnel for connections. That appeared to be inconvenient: too many steps for launching Jupyter and then connecting to it. I read through documentation of Notebook website and they have pretty detailed instructions how to run a public server.

I’m using AWS EC2 c4 instance with Amazon Linux. In general there are two steps: making server public and then securing it.

First you have to generate a configuration file:

jupyter notebook --generate-config

Then generate SHA1 password hash for your login by running Python command prompt:

python
 >> from notebook.auth import passwd
 >> passwd()

Then update your configuration file /home/user/.jupyter/jupyter_notebook_config.py by adding these settings to the end:

c.NotebookApp.ip='*'
c.NotebookApp.password=u'sha1:<your hashed password here>
c.NotebookApp.open_browser=False
c.NotebookApp.port=9999

Now you can run jupyter notebook and access your server using public IP or DNS name. But it’s better to secure your connection with SSL/TLS. And for that you have to generate SSL certificate and key. I will describe my case where I registered a DNS record A for my hostname and then used Let’s Encrypt to generate a valid HTTPS certificate.

The first step is obviously registering your DNS hostname which is out of scope. (With AWS Route 53 it is super easy though)

Then you have to configure your firewall to accept connections on port 443 (you can remove that later). In AWS you need to update security group for your instance and create a rule for port HTTPS.

Next step is downloading a tool from Lets’Encrypt:

wget https://dl.eff.org/certbot-auto
chmod a+x certbot-auto

That tool does all the job for creating keys, certificate and then signing it. That’s why it requires to have port 443 open: it’s going to check that you actually own the domain. by connecting to it from outside server. Don’t be scared by the amount of packages it’s going to install during the first run.
So the command is:

sudo ./certbot-auto certonly --standalone --debug -d <your domain>

When it finishes you will get bunch of files in /etc/letsencrypt directory. But you need files from /etc/letsencrypt/live/<you domain> folder. My problem was that these files are symlinks to ../archive and ec2-user can’t read them. So I had to change permissions:

sudo chmod +x /etc/letsencrypt/archive/
sudo chmod +r /etc/letsencrypt/archive/*

After that we can specify our key and certificate in Notebook config file:

c.NotebookApp.certfile=u'/etc/letsencrypt/live/<domain>/fullchain.pem'
c.NotebookApp.keyfile=u'/etc/letsencrypt/live/<domain>/privkey.pem'

Now your Notebook can be re-started and you must use HTTPS protocol for your connection: https://<domain>:9999/

My Jupyter Notebook also starts during the boot sequence. In Amazon Linux you can use /etc/rc.d/rc.local file for that by adding these command there:

jupyter notebook --config path_to_your_config > /var/log/jupyter-notebook.log &2>1 &