The Wayback Machine - https://web.archive.org/web/20250312181712/https://github.com/localstack/localstack/issues/4087
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 files bigger then 21 mb delete after container restart #4087

Closed
dolevhadad opened this issue May 31, 2021 · 7 comments
Closed

s3 files bigger then 21 mb delete after container restart #4087

dolevhadad opened this issue May 31, 2021 · 7 comments
Assignees
Labels
area: persistence Retain state between LocalStack runs aws:s3 Amazon Simple Storage Service status: resolved/stale Closed due to staleness type: bug Bug report

Comments

@dolevhadad
Copy link

Type of request: This is a ...

[X] bug report
[ ] feature request

Detailed description

files bigger then 21 mb desapird after container restart

Expected behavior

...

Actual behavior

...

Steps to reproduce

Command used to start LocalStack

...

Client code (AWS SDK code snippet, or sequence of "awslocal" commands)

...

@mgagliardo
Copy link
Contributor

Hello @dolevhadad, could you please share your docker compose or how are you running localstack?
On the other hand, are you setting DATA_DIR for persistence?

DATA_DIR: Local directory for saving persistent data (currently only supported for these services: Kinesis, DynamoDB, Elasticsearch, S3, Secretsmanager, SSM, SQS, SNS). Set it to /tmp/localstack/data to enable persistence (/tmp/localstack is mounted into the Docker container), leave blank to disable persistence (default).

@dolevhadad
Copy link
Author

dolevhadad commented May 31, 2021

Hi, @mgagliardo
The problems occurs only if you upload large files to the s3,(20mb and up) when you upload you can see a few PUT request in the log.

The file I tried to upload call AWSCLIV2.msi

Waiting for all LocalStack services to be ready
Ready.
2021-05-31 08:14:09,503:API: 127.0.0.1 - - [31/May/2021 08:14:09] "GET / HTTP/1.1" 200 -
2021-05-31 08:14:09,536:API: 127.0.0.1 - - [31/May/2021 08:14:09] "POST / HTTP/1.1" 200 -
2021-05-31T08:14:09:INFO:localstack.utils.analytics.profiler: Execution of "start_api_services" took 8599.83515739441ms
2021-05-31 08:14:59,856:API: 127.0.0.1 - - [31/May/2021 08:14:59] "POST /my-bucket/AWSCLIV2.msi?uploads HTTP/1.1" 404 -
2021-05-31 08:15:27,871:API: 127.0.0.1 - - [31/May/2021 08:15:27] "PUT /my-bucket HTTP/1.1" 200 -
2021-05-31 08:15:40,271:API: 127.0.0.1 - - [31/May/2021 08:15:40] "POST /my-bucket/AWSCLIV2.msi?uploads HTTP/1.1" 200 -
2021-05-31 08:15:40,422:API: 127.0.0.1 - - [31/May/2021 08:15:40] "PUT /my-bucket/AWSCLIV2.msi?partNumber=4&uploadId=Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ HTTP/1.1" 200 -
2021-05-31 08:15:41,162:API: 127.0.0.1 - - [31/May/2021 08:15:41] "PUT /my-bucket/AWSCLIV2.msi?partNumber=1&uploadId=Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ HTTP/1.1" 200 -
2021-05-31 08:15:41,163:API: 127.0.0.1 - - [31/May/2021 08:15:41] "PUT /my-bucket/AWSCLIV2.msi?partNumber=2&uploadId=Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ HTTP/1.1" 200 -
2021-05-31 08:15:41,180:API: 127.0.0.1 - - [31/May/2021 08:15:41] "PUT /my-bucket/AWSCLIV2.msi?partNumber=3&uploadId=Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ HTTP/1.1" 200 -
2021-05-31 08:15:41,447:API: 127.0.0.1 - - [31/May/2021 08:15:41] "POST /my-bucket/AWSCLIV2.msi?uploadId=Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ HTTP/1.1" 200 -
2021-05-31 08:16:08,003:API: 127.0.0.1 - - [31/May/2021 08:16:08] "PUT /my-bucket/anaconda-ks.cfg HTTP/1.1" 200 -
2021-05-31 08:16:26,384:API: 127.0.0.1 - - [31/May/2021 08:16:26] "GET / HTTP/1.1" 200 -
2021-05-31 08:16:43,912:API: 127.0.0.1 - - [31/May/2021 08:16:43] "GET /my-bucket?list-type=2&delimiter=%2F&prefix=&encoding-type=url HTTP/1.1" 200 -

If you don't restart the container everting is ok because the file is in the memory, and you can download it or delete it .

When you restart the container it will try to load the AWSCLIV2.msi file from the file outside the container , but you receive an errors as number of PUT request you have in the upload.

Example of one of the errors:

2021-05-31 08:17:25,671:API: Error on request:
Traceback (most recent call last):
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/werkzeug/serving.py", line 319, in run_wsgi
    execute(self.server.app)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/werkzeug/serving.py", line 308, in execute
    application_iter = app(environ, start_response)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/server.py", line 178, in __call__
    return backend_app(environ, start_response)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 2088, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.handle_exception(e)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 2070, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 1515, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 1513, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/flask/app.py", line 1499, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/core/utils.py", line 156, in __call__
    result = self.callback(request, request.url, {})
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/s3/responses.py", line 1004, in key_or_control_response
    response = self._key_response(request, full_url, headers)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/s3/responses.py", line 1165, in _key_response
    return self._key_response_put(
  File "/opt/code/localstack/localstack/services/s3/s3_starter.py", line 154, in s3_key_response_put
    result = s3_key_response_put_orig(request, body, bucket_name, query, key_name, headers, *args, **kwargs)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/s3/responses.py", line 1287, in _key_response_put
    key = self.backend.set_part(bucket_name, upload_id, part_number, body)
  File "/opt/code/localstack/.venv/lib/python3.8/site-packages/moto/s3/models.py", line 1622, in set_part
    multipart = bucket.multiparts[multipart_id]
KeyError: 'Ax7m7AUzt14ALWNBQG7kbCwD4RD1stjiVrht6VeKfMjSPRBTCrp6nBQ'

My docker compose file :

services:
  localstack:
    container_name: "localstack_main"
    image: localstack/localstack
    privileged: true
    ports:
      - "443:4566"
      - "80:4566"
      - "4571:4571"
      - "8080-8081:8080-8081"
    environment:
      - TEST_AWS_ACCOUNT_ID="000000000000"
      - DEFAULT_REGION=us-east-1
      - HOSTNAME=localstack.qalab.vmm
      - LOCALSTACK_HOSTNAME=localstack
      - SERVICES=secretsmanager,s3
      - DATA_DIR=/opt/code/localstack/data
      - DOCKER_HOST=unix:///var/run/docker.sock
      - HOST_TMP_FOLDER=/tmp/localstack
    volumes:
      - "/root/data/localstack_data:/opt/code/localstack/data"
      - "/root/data/localstack:/tmp/localstack"
      - "/var/run/docker.sock:/var/run/docker.sock"

To work around this issue you will need to change the multipart_threshold parameter in the .aws/config file to value bigger then the file you want to upload .
It means if the file is 21MB you need to change the parameter to 25MB , Only this way the container work properly with big files and the file don't despaired after restart.

Example of .aws/config file

[profile us]
region = eu-west-1
output = json
s3 =
  multipart_threshold = 64MB

@tai2
Copy link

tai2 commented Aug 2, 2021

I confirmed the same issue happened in my environment.
Here are the steps to reproduce.

# docker-compose.yml
version: "3"

services:
  localstack:
    image: localstack/localstack:0.12.16
    ports:
      - "4566:4566"
    environment:
      SERVICES: s3
      DATA_DIR: /tmp/localstack/data
    volumes:
      - "./tmp/localstack:/tmp/localstack"
# Run docker-compose up in another terminal

$ dd if=/dev/random of=7MB.img bs=1048576 count=7
7+0 records in
7+0 records out
7340032 bytes transferred in 0.026214 secs (280002961 bytes/sec)
$ dd if=/dev/random of=8MB.img bs=1048576 count=8
8+0 records in
8+0 records out
8388608 bytes transferred in 0.030799 secs (272364915 bytes/sec)
$ awslocal s3 cp 7MB.img s3://test-bucket/
upload: ./7MB.img to s3://test-bucket/7MB.img
$ awslocal s3 cp 8MB.img s3://test-bucket/
upload: ./8MB.img to s3://test-bucket/8MB.img
$ awslocal s3 ls s3://test-bucket
2021-08-02 10:35:09    7340032 7MB.img
2021-08-02 10:35:14    8388608 8MB.img

# Stop and start the s3 container

$ awslocal s3 ls s3://test-bucket
2021-08-02 10:36:22    7340032 7MB.img

It restores the 8MB.img properly if you set the multipart_threahold more than 8MB.

@tai2
Copy link

tai2 commented Aug 2, 2021

@benediktbrandt
Copy link
Contributor

The root cause for this issue is in the constructor of the FakeMultipart object in the moto library which is used by localstack. The issue is the call to os.urandum:

class FakeMultipart(BaseModel):
    def __init__(self, key_name, metadata):
        self.key_name = key_name
        self.metadata = metadata
        self.parts = {}
        self.partlist = []  # ordered list of part ID's
        rand_b64 = base64.b64encode(os.urandom(UPLOAD_ID_BYTES))
        self.id = (
            rand_b64.decode("utf-8").replace("=", "").replace("+", "").replace("/", "")
        )

The call to urandum ensures that there is a random id every time a FakeMultipart upload is initiated.

A Multipart upload is a series of API calls where the later API calls reference the ID that is contained in the response to the first API call. When localstack replays the API calls, the first API call gets a new ID (bc of the urandum), but the replay of the remaining API calls still references the old ID. This is why multipart uploads aren't correctly restored.

One possible fix is to make the id generation pseudo random (based on the key). For instance one can do something like:

self.id = base64.b64encode(key_name.ljust(UPLOAD_ID_BYTES,"A")[:UPLOAD_ID_BYTES].replace("=", "").replace("+", "").replace("/", "")).decode("utf-8").replace("=", "").replace("+", "").replace("/", "")

I tried that a little while back and it worked fine for my use case. However every multipart upload to the same key will have the same request id which is not ideal (it deviates from how AWS behaves).

Ultimately the correct solution for this problem would be to modify the localstack replay code so that it supports the change in IDs.

@dominikschubert dominikschubert added upstream-issue aws:s3 Amazon Simple Storage Service status: triage needed Requires evaluation by maintainers type: bug Bug report area: persistence Retain state between LocalStack runs labels Oct 18, 2021
@knmueller
Copy link

knmueller commented Oct 28, 2021

Hello. Is this issue (or similar #2527) being planned for a fix? With this KeyError, we are unable to use localstack for multi-part upload testing with the Java AWS SDK, as any restart of the container requires a reset of the files uploaded. I thought I was able to workaround the KeyError by patching the FakeMultipart class as @benediktbrandt mentioned in the last comment, but while this fixes the KeyError after a localstack restart, it doesn't actually populate the data. I see a success message that replay was successful, but the files it says were replayed do not exist (verified using the aws s3 ls <bucket> command). Example of the replay success log:

INFO:localstack.utils.persistence: Restored 5 API calls from persistent file: /tmp/localstack/data/recorded_api_calls.json

For me, this solution would work fine for us if the file data was replayed correctly, as I have logic to change the key to the upload if there is an existing file with the same key.

The patch I used with suggestion from the last comment against 0.12.19.1, which prevented the KeyError:

[/usr/local/src/localstack] $ git diff                                                                                                                                     10:37:45
diff --git a/localstack/services/s3/s3_starter.py b/localstack/services/s3/s3_starter.py
index 25553840..bec5699c 100644
--- a/localstack/services/s3/s3_starter.py
+++ b/localstack/services/s3/s3_starter.py
@@ -1,3 +1,4 @@
+import base64
 import logging
 import os
 import traceback
@@ -7,6 +8,7 @@ from urllib.parse import urlparse
 from moto.s3 import models as s3_models
 from moto.s3 import responses as s3_responses
 from moto.s3.exceptions import S3ClientError
+from moto.s3.models import UPLOAD_ID_BYTES
 from moto.s3.responses import S3_ALL_MULTIPARTS, MalformedXML, is_delete_keys, minidom
 from moto.s3.utils import undo_clean_key_name
 from moto.s3bucket_path import utils as s3bucket_path_utils
@@ -441,3 +443,15 @@ def apply_patches():
         key._etag = None

     s3_models.s3_backend.copy_object = types.MethodType(copy_object, s3_models.s3_backend)
+
+    # patch FakeMultipart.__init__method for multipart IDs in moto
+    def FakeMultipart_init_replace(self, key_name, metadata):
+        fake_multipart_init_orig(
+            self, key_name, metadata
+        )
+        # Replace ID to work with multipart upload replays
+        keyname_b64 = base64.b64encode(str.encode(key_name.ljust(UPLOAD_ID_BYTES,"A")[:UPLOAD_ID_BYTES]))
+        self.id = keyname_b64.decode("utf-8").replace("=", "").replace("+", "").replace("/", "")
+
+    fake_multipart_init_orig = s3_models.FakeMultipart.__init__
+    s3_models.FakeMultipart.__init__ = FakeMultipart_init_replace

@giograno
Copy link
Member

Hi @dolevhadad, does the problem still persist? With today's latest it seems I can persist large files. Would you please pull the latest Docker image and give it another try? Thanks for the patience 🙏

@giograno giograno added status: stale To be closed soon due to staleness and removed status: triage needed Requires evaluation by maintainers labels Aug 23, 2022
@thrau thrau removed their assignment Sep 3, 2022
@localstack-bot localstack-bot added status: resolved/stale Closed due to staleness and removed status: stale To be closed soon due to staleness labels Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: persistence Retain state between LocalStack runs aws:s3 Amazon Simple Storage Service status: resolved/stale Closed due to staleness type: bug Bug report
Projects
None yet
Development

No branches or pull requests

9 participants