S3.upload () के लिए एक स्ट्रीम पाइप


95

मैं वर्तमान में Amazon S3 के लिए बहुत बड़ी फ़ाइलों को स्ट्रीम करने के लिए s3-upload-stream नामक एक नोड .js प्लगइन का उपयोग कर रहा हूं । यह मल्टीपार्ट API का उपयोग करता है और अधिकांश भाग के लिए यह बहुत अच्छी तरह से काम करता है।

हालाँकि, यह मॉड्यूल अपनी उम्र दिखा रहा है और मुझे पहले से ही इसमें संशोधन करना पड़ा है (लेखक ने इसे भी चित्रित किया है)। आज मैं अमेज़ॅन के साथ एक और मुद्दे में भाग गया, और मैं वास्तव में लेखक की सिफारिश लेना चाहूंगा और अपने अपलोड को पूरा करने के लिए आधिकारिक aws-sdk का उपयोग करना शुरू करूंगा।

परंतु।

आधिकारिक SDK पाइपिंग का समर्थन नहीं करता है s3.upload()। S3.upload की प्रकृति यह है कि आपको S3 कंस्ट्रक्टर के तर्क के रूप में पठनीय स्ट्रीम को पास करना होगा।

मेरे पास लगभग 120+ उपयोगकर्ता कोड मॉड्यूल हैं जो विभिन्न फ़ाइल प्रसंस्करण करते हैं, और वे अपने आउटपुट के अंतिम गंतव्य के लिए अज्ञेय हैं। इंजन उन्हें एक पाइप करने योग्य लेखन योग्य उत्पादन स्ट्रीम सौंपता है, और वे इसे पाइप करते हैं। मैं उन्हें एक AWS.S3वस्तु नहीं सौंप सकता और उन्हें upload()सभी मॉड्यूल के कोड को जोड़े बिना उस पर कॉल करने के लिए कह सकता हूं । मैंने इसका उपयोग किया s3-upload-streamथा क्योंकि यह पाइपिंग का समर्थन करता था।

वहाँ एक रास्ता बनाने के लिए है aws-sdk s3.upload()कुछ मैं स्ट्रीम करने के लिए पाइप कर सकते हैं?

जवाबों:


137

upload()नोड .js stream.PassThrough()स्ट्रीम के साथ S3 फ़ंक्शन को लपेटें ।

यहाँ एक उदाहरण है:

inputStream
  .pipe(uploadFromStream(s3));

function uploadFromStream(s3) {
  var pass = new stream.PassThrough();

  var params = {Bucket: BUCKET, Key: KEY, Body: pass};
  s3.upload(params, function(err, data) {
    console.log(err, data);
  });

  return pass;
}

2
Great, this solved my very ugly hack =-) Can you explain what the stream.PassThrough() actually does?
mraxus

6
Does your PassThrough stream close when you do this? I'm having a heck of a time propegating the close in s3.upload to hit my PassThrough stream.
four43

7
the size of the uploaded file is 0 byte. If I pipe the same data from source stream to file system all works good. Any idea?
Radar155

3
A passthrough stream will take bytes written to it and output them. This lets you return a writable stream that aws-sdk will read from as you write to it. I'd also return the response object from s3.upload() because otherwise you can't ensure the upload completes.
reconbot

1
Isn't this just the same as passing the readable stream to Body but with more code? The AWS SDK is still going to call read() on the PassThrough stream so there's no true piping all the way to S3. The only difference is there's an extra stream in the middle.
ShadowChaser

96

A bit late answer, it might help someone else hopefully. You can return both writeable stream and the promise, so you can get response data when the upload finishes.

const AWS = require('aws-sdk');
const stream = require('stream');

const uploadStream = ({ Bucket, Key }) => {
  const s3 = new AWS.S3();
  const pass = new stream.PassThrough();
  return {
    writeStream: pass,
    promise: s3.upload({ Bucket, Key, Body: pass }).promise(),
  };
}

And you can use the function as follows:

const { writeStream, promise } = uploadStream({Bucket: 'yourbucket', Key: 'yourfile.mp4'});
const readStream = fs.createReadStream('/path/to/yourfile.mp4');

const pipeline = readStream.pipe(writeStream);

Now you can either check promise:

promise.then(() => {
  console.log('upload completed successfully');
}).catch((err) => {
  console.log('upload failed.', err.message);
});

Or as stream.pipe() returns stream.Writable, the destination (writeStream variable above), allowing for a chain of pipes, we can also use its events:

 pipeline.on('close', () => {
   console.log('upload successful');
 });
 pipeline.on('error', (err) => {
   console.log('upload failed', err.message)
 });

It looks great, but on my side I am getting this error stackoverflow.com/questions/62330721/…
Arco Voltaico

just replied to your question. hope it helps.
Ahmet Cetin

49

In the accepted answer, the function ends before the upload is complete, and thus, it's incorrect. The code below pipes correctly from a readable stream.

Upload reference

async function uploadReadableStream(stream) {
  const params = {Bucket: bucket, Key: key, Body: stream};
  return s3.upload(params).promise();
}

async function upload() {
  const readable = getSomeReadableStream();
  const results = await uploadReadableStream(readable);
  console.log('upload complete', results);
}

You can also go a step further and output progress info using ManagedUpload as such:

const manager = s3.upload(params);
manager.on('httpUploadProgress', (progress) => {
  console.log('progress', progress) // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
});

ManagedUpload reference

A list of available events


1
aws-sdk now offers promises built into 2.3.0+, so you don't have to lift them anymore. s3.upload(params).promise().then(data => data).catch(error => error);
DBrown

1
@DBrown Thanks for the pointer! I've updated the answer, accordingly.
tsuz

1
@tsuz, trying to implement your solution give me an error: TypeError: dest.on is not a function, any idea why?
FireBrand

What is dest.on? Can you show an example? @FireBrand
tsuz

9
This says the accepted answer is incomplete but it doesn't work with piping to s3.upload as indicated in @Womp's updated post. It would be very helpful if this answer was updated to take the piped output of something else!
MattW

6

None of the answers worked for me because I wanted to:

  • Pipe into s3.upload()
  • Pipe the result of s3.upload() into another stream

The accepted answer doesn't do the latter. The others rely on the promise api, which is cumbersome to work when working with stream pipes.

This is my modification of the accepted answer.

const s3 = new S3();

function writeToS3({Key, Bucket}) {
  const Body = new stream.PassThrough();

  s3.upload({
    Body,
    Key,
    Bucket: process.env.adpBucket
  })
   .on('httpUploadProgress', progress => {
       console.log('progress', progress);
   })
   .send((err, data) => {
     if (err) {
       Body.destroy(err);
     } else {
       console.log(`File uploaded and available at ${data.Location}`);
       Body.destroy();
     }
  });

  return Body;
}

const pipeline = myReadableStream.pipe(writeToS3({Key, Bucket});

pipeline.on('close', () => {
  // upload finished, do something else
})
pipeline.on('error', () => {
  // upload wasn't successful. Handle it
})


It looks great, but on my side I am getting this error stackoverflow.com/questions/62330721/…
Arco Voltaico

5

Type Script solution:
This example uses:

import * as AWS from "aws-sdk";
import * as fsExtra from "fs-extra";
import * as zlib from "zlib";
import * as stream from "stream";

And async function:

public async saveFile(filePath: string, s3Bucket: AWS.S3, key: string, bucketName: string): Promise<boolean> { 

         const uploadStream = (S3: AWS.S3, Bucket: string, Key: string) => {
            const passT = new stream.PassThrough();
            return {
              writeStream: passT,
              promise: S3.upload({ Bucket, Key, Body: passT }).promise(),
            };
          };
        const { writeStream, promise } = uploadStream(s3Bucket, bucketName, key);
        fsExtra.createReadStream(filePath).pipe(writeStream);     //  NOTE: Addition You can compress to zip by  .pipe(zlib.createGzip()).pipe(writeStream)
        let output = true;
        await promise.catch((reason)=> { output = false; console.log(reason);});
        return output;
}

Call this method somewhere like:

let result = await saveFileToS3(testFilePath, someS3Bucket, someKey, someBucketName);

4

The thing here to note in the most accepted answer above is that: You need to return the pass in the function if you are using pipe like,

fs.createReadStream(<filePath>).pipe(anyUploadFunction())

function anyUploadFunction () { 
 let pass = new stream.PassThrough();
 return pass // <- Returning this pass is important for the stream to understand where it needs to write to.
}

Otherwise it will silently move onto next without throwing an error or will throw an error of TypeError: dest.on is not a function depending upon how you have written the function


3

If it helps anyone I was able to stream from the client to s3 successfully:

https://gist.github.com/mattlockyer/532291b6194f6d9ca40cb82564db9d2a

The serverside code assumes req is a stream object, in my case it was sent from the client with file info set in the headers.

const fileUploadStream = (req, res) => {
  //get "body" args from header
  const { id, fn } = JSON.parse(req.get('body'));
  const Key = id + '/' + fn; //upload to s3 folder "id" with filename === fn
  const params = {
    Key,
    Bucket: bucketName, //set somewhere
    Body: req, //req is a stream
  };
  s3.upload(params, (err, data) => {
    if (err) {
      res.send('Error Uploading Data: ' + JSON.stringify(err) + '\n' + JSON.stringify(err.stack));
    } else {
      res.send(Key);
    }
  });
};

Yes it breaks convention but if you look at the gist it's much cleaner than anything else I found using multer, busboy etc...

+1 for pragmatism and thanks to @SalehenRahman for his help.


multer, busboy handle multipart/form-data uploads. req as a stream works when client sends a buffer as body from XMLHttpRequest.
André Werlang

To clarify, the upload is being performed from the back end not the client right?
numX

Yes it's "piping" the stream, ON the backend, but it came from a frontend
mattdlockyer

3

For those complaining that the when they use the s3 api upload function and a zero byte file ends up on s3 (@Radar155 and @gabo) - I also had this problem.

Create a second PassThrough stream and just pipe all data from the first to the second and pass the reference to that second to s3. You can do this in a couple of different ways - possibly a dirty way is to listen for the "data" event on the first stream and then write that same data to the second stream - the similarly for the "end" event - just call the end function on the second stream. I've no idea whether this is a bug in the aws api, the version of node or some other issue - but it worked around the issue for me.

Here is how it might look:

var PassThroughStream = require('stream').PassThrough;
var srcStream = new PassThroughStream();

var rstream = fs.createReadStream('Learning/stocktest.json');
var sameStream = rstream.pipe(srcStream);
// interesting note: (srcStream == sameStream) at this point
var destStream = new PassThroughStream();
// call your s3.upload function here - passing in the destStream as the Body parameter
srcStream.on('data', function (chunk) {
    destStream.write(chunk);
});

srcStream.on('end', function () {
    dataStream.end();
});

This actually worked for me aswell. The S3 upload function did just "die" silently whenever a multipart upload was used, but when using your solution it worked fine (!). Thanks! :)
jhdrn

Can you give some info on why the second stream is needed?
noob7

2

Following the other answers and using the latest AWS SDK for Node.js, there's a much cleaner and simpler solution since the s3 upload() function accepts a stream, using await syntax and S3's promise:

var model = await s3Client.upload({
    Bucket : bucket,
    Key : key,
    ContentType : yourContentType,
    Body : fs.createReadStream(path-to-file)
}).promise();

This works for the specific use-case of "reading a very large file" the author mentioned, but the other answers are still valid if you are using streams outside the context of a file (for example trying to write a mongo cursor stream to s3 where you still need to use a PassThrough stream + pipe)
Ken Colton

0

I'm using KnexJS and had a problem using their streaming API. I finally fixed it, hopefully the following will help someone.

const knexStream = knex.select('*').from('my_table').stream();
const passThroughStream = new stream.PassThrough();

knexStream.on('data', (chunk) => passThroughStream.write(JSON.stringify(chunk) + '\n'));
knexStream.on('end', () => passThroughStream.end());

const uploadResult = await s3
  .upload({
    Bucket: 'my-bucket',
    Key: 'stream-test.txt',
    Body: passThroughStream
  })
  .promise();

-3

If you know the size of the stream you can use minio-js to upload the stream like this:

  s3Client.putObject('my-bucketname', 'my-objectname.ogg', stream, size, 'audio/ogg', function(e) {
    if (e) {
      return console.log(e)
    }
    console.log("Successfully uploaded the stream")
  })
हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.