अवलोकन
ठीक अब तुम शुरु करो। यह एक स्क्रिप्ट के रूप में प्रोग्रामेटिक सॉल्यूशन है:
#!/bin/bash
# NAME: pdflinkextractor
# AUTHOR: Glutanimate (http://askubuntu.com/users/81372/), 2013
# LICENSE: GNU GPL v2
# DEPENDENCIES: wget lynx
# DESCRIPTION: extracts PDF links from websites and dumps them to the stdout and as a textfile
# only works for links pointing to files with the ".pdf" extension
#
# USAGE: pdflinkextractor "www.website.com"
WEBSITE="$1"
echo "Getting link list..."
lynx -cache=0 -dump -listonly "$WEBSITE" | grep ".*\.pdf$" | awk '{print $2}' | tee pdflinks.txt
# OPTIONAL
#
# DOWNLOAD PDF FILES
#
#echo "Downloading..."
#wget -P pdflinkextractor_files/ -i pdflinks.txt
स्थापना
आपको स्थापित wget
और lynx
स्थापित करने की आवश्यकता होगी :
sudo apt-get install wget lynx
प्रयोग
स्क्रिप्ट को .pdf
वेबसाइट पर सभी फाइलों की एक सूची मिलेगी और इसे कमांड लाइन आउटपुट और वर्किंग डायरेक्टरी में टेक्स्टफाइल में डंप कर देगा। यदि आप "वैकल्पिक" wget
कमांड लिखते हैं, तो स्क्रिप्ट सभी फाइलों को एक नई निर्देशिका में डाउनलोड करने के लिए आगे बढ़ेगी।
उदाहरण
$ ./pdflinkextractor http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm
Getting link list...
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JSPopupCalendar.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ModifySubmit_Example.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/DynamicEmail_XFAForm_V2.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcquireMenuItemNames.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/BouncingButton.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JavaScriptClock.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/Matrix2DOperations.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/RobotArm_3Ddemo2.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/SimpleFormCalculations.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/TheFlyv3_EN4Rdr.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ImExportAttachSample.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcroForm_BasicToggle.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcroForm_ToggleButton_Sample.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/AcorXFA_BasicToggle.pdf
http://www.pdfscripting.com/public/FreeStuff/PDFSamples/ConditionalCalcScripts.pdf
Downloading...
--2013-12-24 13:31:25-- http://www.pdfscripting.com/public/FreeStuff/PDFSamples/JSPopupCalendar.pdf
Resolving www.pdfscripting.com (www.pdfscripting.com)... 74.200.211.194
Connecting to www.pdfscripting.com (www.pdfscripting.com)|74.200.211.194|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 176008 (172K) [application/pdf]
Saving to: `/Downloads/pdflinkextractor_files/JSPopupCalendar.pdf'
100%[===========================================================================================================================================================================>] 176.008 120K/s in 1,4s
2013-12-24 13:31:29 (120 KB/s) - `/Downloads/pdflinkextractor_files/JSPopupCalendar.pdf' saved [176008/176008]
...