ubuntu-docker-images team mailing list archive
-
ubuntu-docker-images team
-
Mailing list archive
-
Message #00031
Re: Review request: Cassandra
On Thursday, June 10 2021, I wrote:
> On Thursday, June 10 2021, Athos Ribeiro wrote:
>
>> Hi,
>>
>> I have been working on a new Cassandra OCI image and it is finally ready
>> for a first review.
>>
>> I created the OCI repository for the new image at
>>
>> https://code.launchpad.net/~canonical-server/ubuntu-docker-images/+git/cassandra
>>
>> The code to be review is in the "main" branch.
>
> Yay!
>
> Thank you, Athos. I will review this today or tomorrow, depending on
> how soon I can finish what I'm doing right now.
OK, here's the review for the cassandra OCI image.
| # Slightly based on github.com/docker-library/cassandra and on cassandra snap setup
| # mount the following files for overriding configurations:
| # cassandra.yaml logback.xml cassandra-env.sh jvm.options
| FROM ubuntu:focal AS snap-installer
ubuntu:impish here.
|
| RUN apt-get update && \
| DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade -y && \
We're using "apt-get full-upgrade -y" now.
| DEBIAN_FRONTEND=noninteractive apt-get install -y \
| curl ca-certificates squashfs-tools
We should use the 'set -eux;...' idiom here. Also, the next two RUN
statements can be joined to this one. For example, this excerpt from
the prometheus 21.04 Dockerfile:
RUN set -eux; \
apt-get update; \
DEBIAN_FRONTEND=noninteractive apt-get full-upgrade -y; \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
jq curl ca-certificates squashfs-tools; \
# taken from https://snapcraft.io/docs/build-on-docker
# Alternatively, we can install snapd, and issue `snap download prometheus`
curl -L $(curl -H 'X-Ubuntu-Series: 16' -H "X-Ubuntu-Architecture: $(dpkg --print-architecture)" 'https://api.snapcraft.io/api/v1/snaps/details/prometheus?channel=21.04/edge' | jq '.download_url' -r) --output prometheus.snap; \
mkdir -p /snap; \
unsquashfs -d /snap/prometheus prometheus.snap
|
| # taken from https://snapcraft.io/docs/build-on-docker
| # Alternatively, we can install snapd, and issue `snap download cassandra`
| RUN curl -L 'https://launchpad.net/~athos-ribeiro/+snap/cassandra-test/+build/1430794/+files/cassandra_4.0-rc1_amd64.snap' --output cassandra.snap
|
| RUN mkdir -p /snap && unsquashfs -d /snap/cassandra cassandra.snap
|
|
| FROM ubuntu:focal
ubuntu:impish here as well.
| LABEL maintainer="Ubuntu Server team <ubuntu-server@xxxxxxxxxxxxxxxx>"
We don't use this on all images, but it may be a good idea to adopt this
practice.
|
| ENV TZ UTC
|
| ENV CASSANDRA_HOME /usr/share/cassandra
| ENV STATIC_CONF /etc/cassandra
| ENV CASSANDRA_CONF /etc/cassandra
| ENV CASSANDRA_LOG_DIR /var/log/cassandra
| ENV CASSANDRA_DATA /var/lib/cassandra
| ENV CLASSPATH $CASSANDRA_CONF
| ENV CASSANDRA_INCLUDE ""
| ENV JAVA_VERSION 8
| ENV cassandra_storagedir "$CASSANDRA_DATA"
|
| RUN mkdir -p $CASSANDRA_CONF $CASSANDRA_LOG_DIR $CASSANDRA_DATA
I'm wondering if we can somehow join this RUN with the following one.
See below.
|
| COPY --from=snap-installer /snap/cassandra/usr/sbin/cassandra /usr/sbin/cassandra
| COPY --from=snap-installer /snap/cassandra/usr/share/cassandra /usr/share/cassandra
| COPY --from=snap-installer /snap/cassandra/etc/cassandra /etc/cassandra
|
| # Copy the manifest files from the snap
| COPY --from=snap-installer /snap/cassandra/snap/snapcraft.yaml /usr/share/rocks/
| COPY --from=snap-installer /snap/cassandra/snap/manifest.yaml /usr/share/rocks/
|
| RUN set -eux; \
| apt-get update; \
| DEBIAN_FRONTEND=noninteractive apt-get full-upgrade -y; \
| DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
| openjdk-8-jre-headless ca-certificates tzdata \
Indentation here.
| # solves warning: "jemalloc shared library could not be preloaded to speed up memory allocations"
| libjemalloc2 \
| # "free" is used by cassandra-env.sh
| procps \
| # "ip" is not required by Cassandra itself, but is commonly used in scripting Cassandra's configuration (since it is so fixated on explicit IP addresses)
| iproute2 \
| # Cassandra will automatically use numactl if available
| # https://github.com/apache/cassandra/blob/18bcda2d4c2eba7370a0b21f33eed37cb730bbb3/bin/cassandra#L90-L100
| # https://github.com/apache/cassandra/commit/604c0e87dc67fa65f6904ef9a98a029c9f2f865a
| numactl \
| ; \
| DEBIAN_FRONTEND=noninteractive apt-get remove --purge --auto-remove -y; \
| rm -rf /var/lib/apt/lists/*; \
| # https://issues.apache.org/jira/browse/CASSANDRA-15767 ("bin/cassandra" only looks for "libjemalloc.so" or "libjemalloc.so.1" which doesn't match our "libjemalloc.so.2")
| libjemalloc="$(readlink -e /usr/lib/*/libjemalloc.so.2)"; \
| ln -sT "$libjemalloc" /usr/local/lib/libjemalloc.so; \
| ldconfig; \
Ugh :-(. I looked at the upstream issue and it seems it's stuck since
last year; maybe it'd be interesting to ping it.
| chmod +x /usr/sbin/cassandra
Ouch. Does it really install the binary without the executable bit set?
If we could remove this part, we'd be able to run the "mkdir -p"
commands from the previous RUN statement here, and move all the COPY
statements below.
On a side note, I found https://github.com/moby/moby/issues/34819, which
may come to our rescue when/if it's released.
| ENV JVM_OPTS "$JVM_OPTS -Dcassandra.config=file://$CASSANDRA_CONF/cassandra.yaml"
|
| COPY entrypoint.sh /usr/local/bin/
A couple of comments about the entrypoint script. There's this part
here:
case $(arch) in
x86_64)
export ARCH=amd64
;;
aarch64)
export ARCH=arm64
;;
ppc64le)
export ARCH=ppc64el
;;
s390x)
export ARCH=s390x
;;
*)
echo "Unsupported architecture $(arch)"
exit 1
;;
esac
I think you can simplify this by just setting ARCH to $(dpkg
--print-architecture).
For this part:
export JAVA=/usr/lib/jvm/java-8-openjdk-"$ARCH"/jre/bin/java
Maybe it's better to match any Java version (i.e., java-*-openjdk...)?
And I noticed this part:
for jar in $SNAP/usr/share/cassandra/lib/*.jar; do
CLASSPATH="$CLASSPATH:$jar"
done
You could remove the $SNAP variable.
|
| VOLUME /var/lib/cassandra
| VOLUME /var/log/cassandra
|
| ENTRYPOINT ["entrypoint.sh"]
| # 7000: intra-node communication
| # 7001: TLS intra-node communication
| # 7199: JMX
| # 9042: CQL
| # 9160: thrift service
| EXPOSE 7000 7001 7199 9042 9160
| CMD ["cassandra", "-R", "-f"]
Some comments about the data/cassandra.yaml file:
- I think the description can be improved a bit. It currently reads:
description: >
Apache Cassandra is a highly-scalable partitioned row store. Rows are organized into tables with a required primary key.
What do you think about taking what's on their web page and adapting a
little bit?
Apache Cassandra is an open source NoSQL distributed database trusted
by thousands of companies for scalability and high availability
without compromising performance. Linear scalability and proven
fault-tolerance on commodity hardware or cloud infrastructure make it
the perfect platform for mission-critical data.
I'm just concerned that "highly-scalable partitioned row store" sounds
a bit cryptic for me.
Generic comments about the image in general:
- It will need a unit test :-).
- I followed the Quick Start guide on Cassandra's home page
(https://cassandra.apache.org/quickstart/) to check if everything is
working. It took me some time to realize that I needed to set the
CQLVERSION variable when invoking the nuvo/docker-cqlsh container.
This may be a good thing to include in the docs.
- I haven't checked the stuff under examples/.
>> Note that the Cassandra snap is being fetched from
>>
>> https://launchpad.net/~athos-ribeiro/+snap/cassandra-test
>>
>> for now. It would also be nice to get some feedback on the overall state
>> of the snap itself.
>
> Great. I'll take a look at the snap as well, but you might want to send
> a broader RFC to server-crew as well, given that there are other people
> knowleadgeable about snaps there.
Here's the review of the snap.
| name: cassandra
| # Set the version during the build based on a tag name
| adopt-info: cassandra
| base: core18
We're striving to use core20 for our snaps.
| summary: Manage massive amounts of data, fast, without losing sleep
| description: |
| The Apache Cassandra database is the right choice when you need
| scalability and high availability without compromising performance. Linear
| scalability and proven fault-tolerance on commodity hardware or cloud
| infrastructure make it the perfect platform for mission-critical
| data.Cassandra's support for replicating across multiple datacenters is
| best-in-class, providing lower latency for your users and the peace of
| mind of knowing that you can survive regional outages.
Heh, this is yet another different description for the project :-P. We
should probably stick with just one.
|
| confinement: devmode
Out of curiosity, did you test the snap with "strict" confinement?
| grade: devel
|
| apps:
| cassandra:
| command: wrapper-cassandra
| daemon: forking
| # mount-observe needs to be manually connected post-install:
| # `sudo snap connect cassandra:mount-observe ubuntu-core:mount-observe`
| plugs:
| - network
| - network-bind
| # Needed for checking available space in the data directories
| # (java.io.File.getTotalSpace).
| - mount-observe
| nodetool:
| command: wrapper-nodetool
| plugs:
| - network
|
| # Helpers. These will go away when snapd provides a config interface.
| # Deployment tools can use these to avoid hardcoding paths.
| config-get:
| # cassandra.config-get: fetch a configuration file from CASSANDRA_CONF.
| command: config-get
| config-set:
| # cassandra.config-set: place a configuration file in CASSANDRA_CONF.
| command: config-set
| env-get:
| # cassandra.env-get: print the value of an environment variable,
| # e.g. CASSANDRA_DATA.
| command: env-get
| parts:
| cassandra:
| plugin: ant
| ant-properties:
| dist.dir: $SNAPCRAFT_PART_INSTALL
| ant-build-targets:
| - artifacts
| source: https://github.com/apache/cassandra
| source-type: git
| ant-openjdk-version: "8"
| override-build: |
| # Hacky trick to handle proxy settings for resolver-ant-tasks
| # https://bugs.launchpad.net/launchpad-buildd/+bug/1702130
| # https://maven.apache.org/resolver-ant-tasks/
| if [ -n ${http_proxy} -a -n ${https_proxy} ]; then
| mkdir -p $HOME/.m2
| http_port=$(echo $http_proxy | rev | cut -d':' -f 1 | rev)
| http_host=$(echo $http_proxy | sed "s;:${http_port}$;;")
| http_port=$(echo $http_port | sed 's;/;;g')
| https_port=$(echo $https_proxy | rev | cut -d':' -f 1 | rev)
| https_host=$(echo $https_proxy | sed "s;:${https_port}$;;")
| https_port=$(echo $https_port | sed 's;/;;g')
| # remove http(s):// from hosts
| http_host=$(echo $http_host | sed 's;^https\?://;;')
| https_host=$(echo $https_host | sed 's;^https\?://;;')
| cat << EOF > $HOME/.m2/settings.xml
| <?xml version="1.0" encoding="UTF-8"?>
| <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
| xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
| xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
| <proxies>
| <proxy>
| <id>launchpad-proxy-http</id>
| <active>true</active>
| <protocol>http</protocol>
| <host>${http_host}</host>
| <port>${http_port}</port>
| </proxy>
| <proxy>
| <id>launchpad-proxy-https</id>
| <active>true</active>
| <protocol>https</protocol>
| <host>${https_host}</host>
| <port>${https_port}</port>
| </proxy>
| </proxies>
| </settings>
| EOF
| echo "Installed settings.xml:"
| cat $HOME/.m2/settings.xml
| fi
Ah, so here's the infamous proxy problem you're facing! Interesting.
BTW, I know that LP also sets $no_proxy sometimes; do you think it's
better to cover it as well?
| # make sure we will use openjdk 8 for the build
| update-alternatives --set java $(update-alternatives --list java | grep 8-openjdk)
| candidate_version="latest"
| last_committed_tag="$(git describe --tags --abbrev=0)"
| last_committed_tag_ver="$(echo $last_committed_tag | sed 's/cassandra-//')"
| last_released_tag="$(snap info wethr | awk '$1 == "beta:" { print $2 }')"
| # If the latest tag from the upstream project has not been released to
| # beta, build that tag instead of master.
| if [ "${last_committed_tag_ver}" != "${last_released_tag}" ]; then
| git fetch
| git checkout "${last_committed_tag}"
| candidate_version="${last_committed_tag_ver}"
| fi
| snapcraftctl set-version "${candidate_version}"
| snapcraftctl build
| build-packages:
| - ant-optional
| - build-essential
| - python
| - sed
| stage-packages:
| # Copy some packages into the stage directory to steal their
| # pre-built binaries.
| - mawk
| - grep
| # For free(1)
| - procps
| organize:
| # Rename directories into their expected locations. Analogous to
| # Debian's .install files.
| conf: etc/cassandra
| bin/cassandra: usr/sbin/cassandra
| lib: usr/share/cassandra/lib
| # Rename mawk to awk instead of shipping a symlink.
| usr/bin/mawk: usr/bin/awk
| prime:
| # Files from the stage directory to include in the snap.
| - bin/nodetool
| - etc/cassandra
| - usr/sbin/cassandra
| - usr/share/cassandra/lib/*.jar
| - usr/lib/jvm
| # awk, grep, and free are needed by cassandra-env.sh
| - usr/bin/awk
| - bin/grep
| - usr/bin/free
| glue:
| plugin: dump
| source: .
| organize:
| # Snaps are confined within:
| # - /snap/$name/$revision (RO mounted squashfs)
| # - /var/snap/$name/$revision (RW area for persisting data,
| # copied between upgrades)
| #
| # The following wrapper scripts set some environment variables to
| # point Cassandra at the right locations for jars, the data
| # directory, logs, etc.
| wrapper-cassandra: bin/wrapper-cassandra
| wrapper-nodetool: bin/wrapper-nodetool
| # Helpers. Again, these will go away when snapd provides a config
| # interface.
| config-get: bin/config-get
| config-set: bin/config-set
| env-get: bin/env-get
| common: bin/common
| prime:
| - bin/*
I'm wondering: can you perform the "chmod +x /usr/sbin/cassandra" in the
snap? This way you won't need to do it in the OCI, and we can get rid
of that extra RUN.
I think that's all I have for now. I skimmed throught the wrapper-* and
config-* scripts in the snap but didn't find anything else to comment.
Again, thanks a lot for working on this.
--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14
Follow ups
References