← Back to team overview

ubuntu-docker-images team mailing list archive

Re: Review request: Cassandra

 

On Thursday, June 10 2021, I wrote:

> On Thursday, June 10 2021, Athos Ribeiro wrote:
>
>> Hi,
>>
>> I have been working on a new Cassandra OCI image and it is finally ready
>> for a first review.
>>
>> I created the OCI repository for the new image at
>>
>> https://code.launchpad.net/~canonical-server/ubuntu-docker-images/+git/cassandra
>>
>> The code to be review is in the "main" branch.
>
> Yay!
>
> Thank you, Athos.  I will review this today or tomorrow, depending on
> how soon I can finish what I'm doing right now.

OK, here's the review for the cassandra OCI image.

| # Slightly based on github.com/docker-library/cassandra and on cassandra snap setup
| # mount the following files for overriding configurations:
| # cassandra.yaml logback.xml cassandra-env.sh jvm.options
| FROM ubuntu:focal AS snap-installer

ubuntu:impish here.

| 
| RUN apt-get update && \
|     DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y && \

We're using "apt-get full-upgrade -y" now.

|     DEBIAN_FRONTEND=noninteractive apt-get install -y \
|     curl ca-certificates squashfs-tools

We should use the 'set -eux;...' idiom here.  Also, the next two RUN
statements can be joined to this one.  For example, this excerpt from
the prometheus 21.04 Dockerfile:

RUN set -eux; \
	apt-get update; \
	DEBIAN_FRONTEND=noninteractive apt-get full-upgrade -y; \
	DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
		jq curl ca-certificates squashfs-tools; \
# taken from https://snapcraft.io/docs/build-on-docker
# Alternatively, we can install snapd, and issue `snap download prometheus`
	curl -L $(curl -H 'X-Ubuntu-Series: 16' -H "X-Ubuntu-Architecture: $(dpkg --print-architecture)" 'https://api.snapcraft.io/api/v1/snaps/details/prometheus?channel=21.04/edge' | jq '.download_url' -r) --output prometheus.snap; \
	mkdir -p /snap; \
	unsquashfs -d /snap/prometheus prometheus.snap


| 
| # taken from https://snapcraft.io/docs/build-on-docker
| # Alternatively, we can install snapd, and issue `snap download cassandra`
| RUN curl -L 'https://launchpad.net/~athos-ribeiro/+snap/cassandra-test/+build/1430794/+files/cassandra_4.0-rc1_amd64.snap' --output cassandra.snap
| 
| RUN mkdir -p /snap && unsquashfs -d /snap/cassandra cassandra.snap
| 
| 
| FROM ubuntu:focal

ubuntu:impish here as well.

| LABEL maintainer="Ubuntu Server team <ubuntu-server@xxxxxxxxxxxxxxxx>"

We don't use this on all images, but it may be a good idea to adopt this
practice.

| 
| ENV TZ UTC
| 
| ENV CASSANDRA_HOME /usr/share/cassandra
| ENV STATIC_CONF /etc/cassandra
| ENV CASSANDRA_CONF /etc/cassandra
| ENV CASSANDRA_LOG_DIR /var/log/cassandra
| ENV CASSANDRA_DATA /var/lib/cassandra
| ENV CLASSPATH $CASSANDRA_CONF
| ENV CASSANDRA_INCLUDE ""
| ENV JAVA_VERSION 8
| ENV cassandra_storagedir "$CASSANDRA_DATA"
| 
| RUN mkdir -p $CASSANDRA_CONF $CASSANDRA_LOG_DIR $CASSANDRA_DATA

I'm wondering if we can somehow join this RUN with the following one.
See below.

| 
| COPY --from=snap-installer /snap/cassandra/usr/sbin/cassandra /usr/sbin/cassandra
| COPY --from=snap-installer /snap/cassandra/usr/share/cassandra /usr/share/cassandra
| COPY --from=snap-installer /snap/cassandra/etc/cassandra /etc/cassandra
| 
| # Copy the manifest files from the snap
| COPY --from=snap-installer /snap/cassandra/snap/snapcraft.yaml /usr/share/rocks/
| COPY --from=snap-installer /snap/cassandra/snap/manifest.yaml /usr/share/rocks/
| 
| RUN set -eux; \
| 	apt-get update; \
| 	DEBIAN_FRONTEND=noninteractive apt-get full-upgrade -y; \
| 	DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
|   openjdk-8-jre-headless ca-certificates tzdata \

Indentation here.

| # solves warning: "jemalloc shared library could not be preloaded to speed up memory allocations"
| 		libjemalloc2 \
| # "free" is used by cassandra-env.sh
| 		procps \
| # "ip" is not required by Cassandra itself, but is commonly used in scripting Cassandra's configuration (since it is so fixated on explicit IP addresses)
| 		iproute2 \
| # Cassandra will automatically use numactl if available
| #   https://github.com/apache/cassandra/blob/18bcda2d4c2eba7370a0b21f33eed37cb730bbb3/bin/cassandra#L90-L100
| #   https://github.com/apache/cassandra/commit/604c0e87dc67fa65f6904ef9a98a029c9f2f865a
| 		numactl \
| 	; \
| 	DEBIAN_FRONTEND=noninteractive apt-get remove --purge --auto-remove -y; \
| 	rm -rf /var/lib/apt/lists/*; \
| # https://issues.apache.org/jira/browse/CASSANDRA-15767 ("bin/cassandra" only looks for "libjemalloc.so" or "libjemalloc.so.1" which doesn't match our "libjemalloc.so.2")
| 	libjemalloc="$(readlink -e /usr/lib/*/libjemalloc.so.2)"; \
| 	ln -sT "$libjemalloc" /usr/local/lib/libjemalloc.so; \
| 	ldconfig; \

Ugh :-(.  I looked at the upstream issue and it seems it's stuck since
last year; maybe it'd be interesting to ping it.

|   chmod +x /usr/sbin/cassandra

Ouch.  Does it really install the binary without the executable bit set?
If we could remove this part, we'd be able to run the "mkdir -p"
commands from the previous RUN statement here, and move all the COPY
statements below.

On a side note, I found https://github.com/moby/moby/issues/34819, which
may come to our rescue when/if it's released.

| ENV JVM_OPTS "$JVM_OPTS -Dcassandra.config=file://$CASSANDRA_CONF/cassandra.yaml"
| 
| COPY entrypoint.sh /usr/local/bin/

A couple of comments about the entrypoint script.  There's this part
here:

  case $(arch) in
      x86_64)
          export ARCH=amd64
          ;;
      aarch64)
          export ARCH=arm64
          ;;
      ppc64le)
          export ARCH=ppc64el
          ;;
      s390x)
          export ARCH=s390x
          ;;
      *)
          echo "Unsupported architecture $(arch)"
          exit 1
          ;;
  esac

I think you can simplify this by just setting ARCH to $(dpkg
--print-architecture).

For this part:

  export JAVA=/usr/lib/jvm/java-8-openjdk-"$ARCH"/jre/bin/java

Maybe it's better to match any Java version (i.e., java-*-openjdk...)?

And I noticed this part:

  for jar in $SNAP/usr/share/cassandra/lib/*.jar; do
      CLASSPATH="$CLASSPATH:$jar"
  done

You could remove the $SNAP variable.


| 
| VOLUME /var/lib/cassandra
| VOLUME /var/log/cassandra
| 
| ENTRYPOINT ["entrypoint.sh"]
| # 7000: intra-node communication
| # 7001: TLS intra-node communication
| # 7199: JMX
| # 9042: CQL
| # 9160: thrift service
| EXPOSE 7000 7001 7199 9042 9160
| CMD ["cassandra", "-R", "-f"]



Some comments about the data/cassandra.yaml file:

- I think the description can be improved a bit.  It currently reads:

    description: >
    Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.

  What do you think about taking what's on their web page and adapting a
  little bit?

    Apache Cassandra is an open source NoSQL distributed database trusted
    by thousands of companies for scalability and high availability
    without compromising performance. Linear scalability and proven
    fault-tolerance on commodity hardware or cloud infrastructure make it
    the perfect platform for mission-critical data.

  I'm just concerned that "highly-scalable partitioned row store" sounds
  a bit cryptic for me.



Generic comments about the image in general:

- It will need a unit test :-).

- I followed the Quick Start guide on Cassandra's home page
  (https://cassandra.apache.org/quickstart/) to check if everything is
  working.  It took me some time to realize that I needed to set the
  CQLVERSION variable when invoking the nuvo/docker-cqlsh container.
  This may be a good thing to include in the docs.

- I haven't checked the stuff under examples/.

>> Note that the Cassandra snap is being fetched from 
>>
>> https://launchpad.net/~athos-ribeiro/+snap/cassandra-test
>>
>> for now. It would also be nice to get some feedback on the overall state
>> of the snap itself.
>
> Great.  I'll take a look at the snap as well, but you might want to send
> a broader RFC to server-crew as well, given that there are other people
> knowleadgeable about snaps there.

Here's the review of the snap.

| name: cassandra
| # Set the version during the build based on a tag name
| adopt-info: cassandra
| base: core18

We're striving to use core20 for our snaps.

| summary: Manage massive amounts of data, fast, without losing sleep
| description: |
|   The Apache Cassandra database is the right choice when you need
|   scalability and high availability without compromising performance. Linear
|   scalability and proven fault-tolerance on commodity hardware or cloud
|   infrastructure make it the perfect platform for mission-critical
|   data.Cassandra's support for replicating across multiple datacenters is
|   best-in-class, providing lower latency for your users and the peace of
|   mind of knowing that you can survive regional outages.

Heh, this is yet another different description for the project :-P.  We
should probably stick with just one.

| 
| confinement: devmode

Out of curiosity, did you test the snap with "strict" confinement?

| grade: devel
| 
| apps:
|     cassandra:
|         command: wrapper-cassandra
|         daemon: forking
|         # mount-observe needs to be manually connected post-install:
|         # `sudo snap connect cassandra:mount-observe ubuntu-core:mount-observe`
|         plugs:
|           - network
|           - network-bind
|           # Needed for checking available space in the data directories
|           # (java.io.File.getTotalSpace).
|           - mount-observe
|     nodetool:
|         command: wrapper-nodetool
|         plugs:
|           - network
| 
|     # Helpers. These will go away when snapd provides a config interface.
|     # Deployment tools can use these to avoid hardcoding paths.
|     config-get:
|         # cassandra.config-get: fetch a configuration file from CASSANDRA_CONF.
|         command: config-get
|     config-set:
|         # cassandra.config-set: place a configuration file in CASSANDRA_CONF.
|         command: config-set
|     env-get:
|         # cassandra.env-get: print the value of an environment variable,
|         # e.g. CASSANDRA_DATA.
|         command: env-get
| parts:
|     cassandra:
|         plugin: ant
|         ant-properties:
|             dist.dir: $SNAPCRAFT_PART_INSTALL
|         ant-build-targets:
|             - artifacts
|         source: https://github.com/apache/cassandra
|         source-type: git
|         ant-openjdk-version: "8"
|         override-build: |
|             # Hacky trick to handle proxy settings for resolver-ant-tasks
|             # https://bugs.launchpad.net/launchpad-buildd/+bug/1702130
|             # https://maven.apache.org/resolver-ant-tasks/
|             if [ -n ${http_proxy} -a -n ${https_proxy} ]; then
|                 mkdir -p $HOME/.m2
|                 http_port=$(echo $http_proxy | rev | cut -d':' -f 1 | rev)
|                 http_host=$(echo $http_proxy | sed "s;:${http_port}$;;")
|                 http_port=$(echo $http_port | sed 's;/;;g')
|                 https_port=$(echo $https_proxy | rev | cut -d':' -f 1 | rev)
|                 https_host=$(echo $https_proxy | sed "s;:${https_port}$;;")
|                 https_port=$(echo $https_port | sed 's;/;;g')
|                 # remove http(s):// from hosts
|                 http_host=$(echo $http_host | sed 's;^https\?://;;')
|                 https_host=$(echo $https_host | sed 's;^https\?://;;')
|                 cat << EOF > $HOME/.m2/settings.xml
|             <?xml version="1.0" encoding="UTF-8"?>
|             <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0";
|                       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
|                       xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd";>
|               <proxies>
|                <proxy>
|                   <id>launchpad-proxy-http</id>
|                   <active>true</active>
|                   <protocol>http</protocol>
|                   <host>${http_host}</host>
|                   <port>${http_port}</port>
|                 </proxy>
|                <proxy>
|                   <id>launchpad-proxy-https</id>
|                   <active>true</active>
|                   <protocol>https</protocol>
|                   <host>${https_host}</host>
|                   <port>${https_port}</port>
|                 </proxy>
|               </proxies>
|             </settings>
|             EOF
|                 echo "Installed settings.xml:"
|                 cat $HOME/.m2/settings.xml
|             fi

Ah, so here's the infamous proxy problem you're facing!  Interesting.
BTW, I know that LP also sets $no_proxy sometimes; do you think it's
better to cover it as well?

|             # make sure we will use openjdk 8 for the build
|             update-alternatives --set java $(update-alternatives --list java | grep 8-openjdk)
|             candidate_version="latest"
|             last_committed_tag="$(git describe --tags --abbrev=0)"
|             last_committed_tag_ver="$(echo $last_committed_tag | sed 's/cassandra-//')"
|             last_released_tag="$(snap info wethr | awk '$1 == "beta:" { print $2 }')"
|             # If the latest tag from the upstream project has not been released to
|             # beta, build that tag instead of master.
|             if [ "${last_committed_tag_ver}" != "${last_released_tag}" ]; then
|               git fetch
|               git checkout "${last_committed_tag}"
|               candidate_version="${last_committed_tag_ver}"
|             fi
|             snapcraftctl set-version "${candidate_version}"
|             snapcraftctl build
|         build-packages:
|             - ant-optional
|             - build-essential
|             - python
|             - sed
|         stage-packages:
|             # Copy some packages into the stage directory to steal their
|             # pre-built binaries.
|             - mawk
|             - grep
|             # For free(1)
|             - procps
|         organize:
|             # Rename directories into their expected locations. Analogous to
|             # Debian's .install files.
|             conf: etc/cassandra
|             bin/cassandra: usr/sbin/cassandra
|             lib: usr/share/cassandra/lib
|             # Rename mawk to awk instead of shipping a symlink.
|             usr/bin/mawk: usr/bin/awk
|         prime:
|             # Files from the stage directory to include in the snap.
|             - bin/nodetool
|             - etc/cassandra
|             - usr/sbin/cassandra
|             - usr/share/cassandra/lib/*.jar
|             - usr/lib/jvm
|             # awk, grep, and free are needed by cassandra-env.sh
|             - usr/bin/awk
|             - bin/grep
|             - usr/bin/free
|     glue:
|         plugin: dump
|         source: .
|         organize:
|             # Snaps are confined within:
|             # - /snap/$name/$revision (RO mounted squashfs)
|             # - /var/snap/$name/$revision (RW area for persisting data,
|             #                              copied between upgrades)
|             #
|             # The following wrapper scripts set some environment variables to
|             # point Cassandra at the right locations for jars, the data
|             # directory, logs, etc.
|             wrapper-cassandra: bin/wrapper-cassandra
|             wrapper-nodetool: bin/wrapper-nodetool
|             # Helpers. Again, these will go away when snapd provides a config
|             # interface.
|             config-get: bin/config-get
|             config-set: bin/config-set
|             env-get: bin/env-get
|             common: bin/common
|         prime:
|             - bin/*

I'm wondering: can you perform the "chmod +x /usr/sbin/cassandra" in the
snap?  This way you won't need to do it in the OCI, and we can get rid
of that extra RUN.


I think that's all I have for now.  I skimmed throught the wrapper-* and
config-* scripts in the snap but didn't find anything else to comment.

Again, thanks a lot for working on this.

-- 
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0  EB2F 106D A1C8 C3CB BF14


Follow ups

References