How can I separate multiple files from single file caused by zstd -r folder -o output.zst?

I didn’t read enough the manual and run the following command

$ zstd -r folder -o output.zst

The following command gave me back a single file called output

$ unzstd output.zst 

The output file has all the contents of the files under the folder concatenated.

Is there some tools or programs to un-concatenate the single file into multiple original files?

This is the only backup file I have and I need the backup.

EDIT: what I really should have run (according to this thread) is

# for tar version 1.31 and above
$ tar --zstd -cf output.tar.zst folder

# for tar version < 1.31
$ tar --use-compress-program zstd -cf output.tar.zst folder
Asked By: Tun

||

I also posted this question in zstd github issue and I learnt the following from Cyan4973.

all compressed frames are just stored back to back in the same file output.zst.

While there would be a way, at least in theory, to separate each frame, and therefore find the boundaries of each file, another problem is that none of these frames contain the file name, nor the position in directory tree. So you would end up with a bunch of nameless files.

The proper way to archive is to combine zstd with tar, which is in charge of preserving file metadata.

Currently there is no tools or programs to separate frames. But someone could write using lz4frame.h.

By default, the CLI will just decompress all frames back-to-back into the same decompressed file …

… program it yourself, … use the ZSTD_decompressStream() API.

Answered By: Tun

This Github issue comment has suggested code to reproduce files (without filenames and folder hierarchy)

#undef NDEBUG
#define ZSTD_STATIC_LINKING_ONLY

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <zstd.h>

static uint64_t get_file_size(char const* filename) {
  FILE* f = fopen(filename, "rb");
  assert(f != NULL);
  int ret = fseek(f, 0L, SEEK_END);
  assert(ret == 0);
  long int const size = ftell(f);
  assert(size >= 0);
  fclose(f);
  return (uint64_t)size;
}

static void read_file(char const* filename, void* buffer, size_t size) {
  FILE* f = fopen(filename, "rb");
  assert(f != NULL);
  size_t const read = fread(buffer, 1, size, f);
  assert(read == size);
  char tmp;
  assert(fread(&tmp, 1, 1, f) == 0);
  fclose(f);
}

static size_t decompress_one_frame(char const* inputPtr, char const* inputEnd, char const* outputPrefix, int idx) {
  size_t const inputSize = (size_t)(inputEnd - inputPtr);
  size_t const compressedSize = ZSTD_findFrameCompressedSize(inputPtr, inputSize);
  assert(!ZSTD_isError(compressedSize));

  size_t const decompressBound = ZSTD_decompressBound(inputPtr, compressedSize);
  assert(decompressBound != ZSTD_CONTENTSIZE_ERROR);
  void* const decompressed = malloc(decompressBound);
  assert(decompressed != NULL);

  size_t const decompressedSize = ZSTD_decompress(decompressed, decompressBound, inputPtr, compressedSize);
  assert(!ZSTD_isError(decompressedSize));


  size_t const outputFileSize = strlen(outputPrefix) + 11;
  char* const outputFile = malloc(outputFileSize);
  assert(outputFile != NULL);
  {
    size_t const written = snprintf(outputFile, outputFileSize, "%s%d", outputPrefix, idx);
    assert(written < outputFileSize);
  }
  {
    FILE* f = fopen(outputFile, "wb");
    size_t const written = fwrite(decompressed, 1, decompressedSize, f);
    assert(written == decompressedSize);
    fclose(f);
  }

  free(outputFile);
  free(decompressed);
  return compressedSize;
}

int main(int argc, char** argv) {
  if (argc != 3) {
    fprintf(stderr, "USAGE: %s FILE.zst OUT-PREFIXn", argv[0]);
    fprintf(stderr, "Decompresses a zstd file containing more than one frame to ${OUT-PREFIX}0, ${OUT-PREFIX}1, ...n");
    return 1;
  }
  char const* const inputFile = argv[1];
  char const* const outputPrefix = argv[2];

  size_t const inputSize = get_file_size(inputFile);
  char* const input = malloc(inputSize);
  assert(input != NULL);
  read_file(inputFile, input, inputSize);

  char const* inputPtr = input;
  char const* const inputEnd = input + inputSize;
  int idx = 0;
  while (inputPtr < inputEnd) {
    size_t const compressedSize = decompress_one_frame(inputPtr, inputEnd, outputPrefix, idx);
    inputPtr += compressedSize;
    ++idx;
  }
  assert(inputPtr == inputEnd);
  free(input);
  return 0;
}

This program will work for you. If you write it into a file called unzstd.c, and have libzstd installed, you can compile it with cc unzstd.c -lzstd -o unzstd. Then, if the file you want to decompress is input.zst you can run:

mkdir output
./unzstd input.zst output/
ls output/

It will create one output file per input file that you compressed, in the order that they were compressed, named output/0, output/1, etc. So you’ll lose the filenames, and the directory structure, but you will get all your files back.

Answered By: Tun
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.