← Back to team overview

debcrafters-packages team mailing list archive

[Bug 2117104] Re: Bash read and mapfile intermittently corrupt/truncate piped input on Ubuntu 24.04.

 

** Changed in: bash (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of
Debcrafters packages, which is subscribed to bash in Ubuntu.
https://bugs.launchpad.net/bugs/2117104

Title:
  Bash read and mapfile intermittently corrupt/truncate piped input on
  Ubuntu 24.04.

Status in bash package in Ubuntu:
  Incomplete

Bug description:
  Bug Title: Bash read/mapfile/xargs intermittently corrupts/truncates
  piped absolute paths from find on Ubuntu 24.04 LTS

  [Gemini AI assisted report.]

  Ubuntu Version: Ubuntu 24.04.2 LTS (Noble Numbat)
  Bash Version: GNU bash, version 5.2.21(1)-release
  System Environment: Standard Ubuntu 24.04 installation, no unusual configurations (e.g., not WSL, chroot, or custom kernels). Occurs with standard user login and also when run with sudo.
  Locale: LANG=en_GB.UTF-8

  Description:
  When attempting to process a list of absolute file paths (generated by find -print0 or find -print) in a Bash script using standard methods like while IFS= read -r -d $'\0', mapfile -d $'\0' -t, or xargs -d '\n' -I {} bash -c 'func "$@"', the filepath variable (or the argument passed to the function) intermittently presents truncated or corrupted paths. This often manifests as the leading / or /home/ portion of the absolute path being missing, leading to "No such file or directory" errors from subsequent commands like ffmpeg or realpath. The issue is inconsistent, affecting some file paths in a list while others are processed correctly within the same execution.

  Crucially, isolated diagnostic commands show that find itself
  correctly produces the full paths, and that read -d $'\0' and xargs
  can process them correctly when run in simplified, isolated tests. The
  problem appears to arise specifically within more complex script
  structures involving pipes and variable assignments/subshell
  execution.

  Steps to Reproduce:

  Create a test directory and sample files:

  Bash

  mkdir -p /tmp/test_mp3s
  cd /tmp/test_mp3s
  # Create dummy MP3 files with varied names, including spaces and special chars
  touch "music one.mp3"
  touch "might be music one.mp3"
  touch "suspect music one with different tags.mp3"
  touch "Something Different.mp3"
  touch "Something Different (copy).mp3"
  touch "MUSIC ONE (another copy).mp3"
  touch "MUSIC ONE (copy).mp3"
  touch "MUSIC ONE.mp3"
  # Create dummy files that are NOT mp3s to ensure find filtering works
  touch "document.txt"
  touch "image.jpg"
  ls -l # Verify files
  Verify find's raw output (Expected: Perfect):

  Bash

  find "$(pwd)" -type f -iname "*.mp3" -print0 | hexdump -C
  Expected Output: Full absolute paths, correctly terminated by 00 (null byte), e.g., /tmp/test_mp3s/music one.mp3\0.

  Observed (on user's system): This step consistently showed perfect
  output, confirming find is not the source of corruption.

  Test read -d $'\0' in isolation (Expected: Perfect, but fails in full script):
  Create a script diag_read.sh:

  Bash

  #!/bin/bash
  TEST_DIR="/tmp/test_mp3s"
  echo "--- Testing raw read from find -print0 ---"
  find "$TEST_DIR" -type f -iname "*.mp3" -print0 | while IFS= read -r -d $'\0' line; do
      echo "--- Start of line ---"
      echo "Raw line received: '$line'" # Added for clarity
      echo -n "$line" | hexdump -C  # Hexdump the line to see exact bytes
      echo "--- End of line ---"
      # Try to resolve it just to see if realpath gives an error here
      realpath "$line" 2>&1 || echo "realpath error on '$line'"
  done
  echo "--- Test complete ---"
  Run: bash diag_read.sh

  Expected Output: All paths should be fully printed and hexdumped
  correctly. realpath should succeed for all.

  Observed (on user's system): This diagnostic consistently showed
  perfect output, confirming read -d $'\0' in isolation works.

  Test xargs function execution in isolation (Expected: Perfect, but fails in full script):
  Create a script diag_xargs.sh:

  Bash

  #!/bin/bash
  test_xargs_func() {
      local arg="$1"
      echo "DEBUG: xargs received argument: '$arg'"
      echo -n "$arg" | hexdump -C
      echo "DEBUG: --- End argument hexdump ---"
  }
  export -f test_xargs_func

  echo "--- Testing xargs function execution with full paths ---"
  find "/tmp/test_mp3s" -type f -iname "*.mp3" -print | xargs -d '\n' -I {} bash -c 'test_xargs_func "$@"' _ {}
  echo "--- Xargs test complete ---"
  Run: bash diag_xargs.sh

  Expected Output: All paths should be fully printed and hexdumped
  correctly via xargs.

  Observed (on user's system): This diagnostic also consistently showed
  perfect output.

  Run the "Problematic" script (initial mapfile version) - (Expected: All processed, Actual: Intermittent corruption):
  Create a script reproduce_mapfile.sh (simplified for just the loop part):

  Bash

  #!/bin/bash
  set -eEuo pipefail # Added for strict error handling and debugging
  TEST_DIR="/tmp/test_mp3s"
  echo "--- Running problematic mapfile script ---"
  declare -a all_mp3_files

  # Use mapfile to read null-terminated output
  # This is where the observed intermittent corruption/truncation occurred
  mapfile -d $'\0' -t all_mp3_files < <(find "$TEST_DIR" -type f -iname "*.mp3" -print0)

  for filepath in "${all_mp3_files[@]}"; do
      if [ -z "$filepath" ]; then
          echo "Processing empty filepath (SKIPPING)."
          continue
      fi
      echo "Processing file: '$filepath'"
      echo -n "$filepath" | hexdump -C # Observe the path content here
      echo "---"
      # Dummy command, replace with ffmpeg if available for full replication
      realpath "$filepath" 2>&1 || echo "ERROR: realpath failed for '$filepath'"
  done
  echo "--- Script finished ---"
  Run: bash reproduce_mapfile.sh

  Expected Output: All files processed with full, correct paths.
  realpath should succeed.

  Observed (on user's system): Intermittent path corruption, e.g.,
  Processing file: 'ome/...' instead of /home/..., or Processing file:
  '' leading to realpath errors.

  Run the "Problematic" script (initial xargs with variable capture) - (Expected: All processed, Actual: Silent failure):
  Create a script reproduce_xargs_silent.sh (simplified for just the loop part):

  Bash

  #!/bin/bash
  set -eEuo pipefail # Added for strict error handling and debugging
  TEST_DIR="/tmp/test_mp3s"
  echo "--- Running problematic xargs (silent failure) script ---"

  process_file_for_test() {
      local filepath="$1"
      if [ -z "$filepath" ]; then
          echo "Processing empty filepath in function (SKIPPING)." >&2
          return 0
      fi
      echo "Processing file (in func): '$filepath'" >&2 # Sent to stderr
      echo -n "$filepath" | hexdump -C >&2 # Sent to stderr
      echo "---" >&2
      echo "dummy_hash::${filepath}" # Simulate output
  }
  export -f process_file_for_test

  # This capture was observed to yield no output, leading to silent failure in the main script
  COLLECTED_OUTPUT=$(find "$TEST_DIR" -type f -iname "*.mp3" -print | xargs -d '\n' -I {} bash -c 'process_file_for_test "$@"' _ {})

  echo "--- Collected Output (length: ${#COLLECTED_OUTPUT}) ---"
  echo "$COLLECTED_OUTPUT"
  echo "--- End Collected Output ---"

  # This loop would then fail to run if COLLECTED_OUTPUT is empty
  echo "$COLLECTED_OUTPUT" | while IFS=:: read -r hash path; do
      echo "Final processing: Hash=$hash, Path=$path"
  done
  echo "--- Script finished ---"
  Run: bash reproduce_xargs_silent.sh

  Expected Output: COLLECTED_OUTPUT should contain multiple
  dummy_hash::/tmp/... lines, and the final loop should process them.

  Observed (on user's system): COLLECTED_OUTPUT was often empty, leading
  to no files being processed, despite diag_xargs.sh showing that xargs
  can execute the function.

  Summary of the Problem:

  The consistent observation across multiple attempts is that Bash's
  mechanisms for reading piped data into variables or arrays, or
  capturing command output into variables, appear to be intermittently
  faulty, leading to byte-level corruption (often truncation of leading
  characters) or complete loss of data, but only within the context of a
  multi-command script. Isolated tests of the individual components
  (find, read, xargs, hexdump, realpath) show them functioning
  correctly. This suggests a subtle bug in Bash's internal pipe
  handling, process substitution, or variable assignment in specific
  execution scenarios.

  Additional Debugging Information that May Help Devs:

  Suggest set -x in the problematic scripts to trace execution and
  variable assignments.

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: bash 5.2.21-2ubuntu4
  ProcVersionSignature: Ubuntu 6.11.0-29.29~24.04.1-generic 6.11.11
  Uname: Linux 6.11.0-29-generic x86_64
  NonfreeKernelModules: zfs
  ApportVersion: 2.28.1-0ubuntu3.8
  Architecture: amd64
  CasperMD5CheckResult: pass
  CurrentDesktop: ubuntu:GNOME
  Date: Wed Jul 16 20:44:45 2025
  InstallationDate: Installed on 2024-07-07 (375 days ago)
  InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
  RebootRequiredPkgs: Error: path contained symlinks.
  SourcePackage: bash
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bash/+bug/2117104/+subscriptions