Download using wget #27

Boruch-Baum · 2019-03-13T07:21:19Z

See the commentary of the new file w3m-download.el, and mailing list posts [13304], [13305], [13306].

+ Use wget for downloads + Save temporary and partial downloads to disk, not memory + Track progress of downloads in individual log buffers, each deleted upon successful completion. + Resume partial downloads that were interrupted due to aborts or other failures. + Optionally, add meta-data tag Exif.Image.ImageDescription to png and jpeg files.

yamaoka · 2019-03-13T08:09:53Z

@Boruch-Baum
As for the new w3m-download function, the Lisp form is quite strange, i.e., the thing is all done within (interactive ...).　　Are the arguments, url, filename, no-cache, etc., really passed to w3m-download-using-* ?

Boruch-Baum · 2019-03-13T08:34:52Z

@yamaoka : No, that's a cosmetic indentation bug. The code evaluates the if after the interactive.

…b_download

Boruch-Baum · 2019-03-13T09:07:34Z

@yamaoka : A few things just occurred to me now about this PR:

w3mhack.el would need to be modified to recognize file w3m-download.el and its contents?
w3m-download.el should (provide 'w3m-download)? And w3m.el should somewhere (require 'w3m-download)?
With a new file would maybe come changes to the make recipe?

+ The way external-view seems to be currently handled is that it is different from download, in that it is a temporary action, in the sense that after the downloaded file is sent to the viewer program, it is deleted from disk. The user can save the file from the external viewer, if that external program has that option. + external-view probably doesn't need a progress buffer, although it technically wouldn't hurt. + It does need to wait for the download to complete in order to proceed to execute the view command for the downloaded file. This could potentially be added to the wget code, using the 'handler' argument and the existing 'w3m-process' architecture.

yamaoka · 2019-03-14T01:00:51Z

Sorry for my mistake above.　　This is not closed yet.
I tried the latest one and confirmed it works.　　But I added some tweaks (a diff attached).　　A problem, that I couldn't resolve so far, is downloading often ends with "exited abnormally with code 1".

My changes:

* Remove lexical-binding cookie because lexical-let is used here and there.
* In an environment of other than yours, `w3m--message' will not be provided
  at the run time, so replace it with a compiler macro.
* Use `expand-file-name' instead of `concat' `in w3m-download-using-wget'
  to generate the destination name because `w3m-default-save-directory'
  might not end with "/".

* Perform `indent-region' on the whole contents in the condition where
  cl is loaded (for `lexical-let') and `indent-tabs-mode' is set.
* Fold long lines so as to be filled in within 80-column.

DIFF.gz

Boruch-Baum · 2019-03-14T01:51:31Z

@yamaoka

But I added some tweaks (a diff attached).

There has got to be a better way, a more git or github way, to collaborate instead of using attached diffs. In the short term, could you post a diff that excludes the whitespace changes (diff options -w -b -B)?

A problem, that I couldn't resolve so far, is downloading often ends with "exited abnormally with code 1".

This is an issue in how wget behaves; it often succeeds, but reports an exit code 1. My "solution", as you probably noticed, was to leave the progress buffer open, but with a note that the download probably succeeded. I didn't like the alternative of treating an exit code 1 as a success, but if that's what the project wants, it would be simple to do. Doing so would eliminate confusion and buffer clutter most of the time, but might occasionally result in a false positive.

My changes:

Remove lexical-binding cookie because lexical-let is used here and there.

Ahhh. I missed that some of the functions that I moved into the file w3m-download had that feature. Had I noticed them, I would have removed them. How about removing the lexical-let calls instead? Doesn't the project (and emacs in general) want to become totally lexically-scoped? This could be an opportunity to slowly continue that process, if you agree.

In an environment of other than yours, `w3m--message' will not be provided
at the run time, so replace it with a compiler macro.

Oh, I thought that the conditional defun would be sufficient. Why is a defmacro required.

Use expand-file-name' instead of concat' in w3m-download-using-wget' to generate the destination name because w3m-default-save-directory'
might not end with "/".

OK.

Perform indent-region' on the whole contents in the condition where cl is loaded (for lexical-let') and `indent-tabs-mode' is set.

I don't understand, but it might be obvious if could send me a diff excluding all the whitespace changes. If we had a more git or github collaboration method, it would probably be simple for me to perform the diff on your commit, but with the current DIFF.gz I'm at a loss.

Fold long lines so as to be filled in within 80-column.

Always good.

I'll post a support ticket to github asking whether there's a better way than using attached diffs.

Boruch-Baum · 2019-03-14T01:59:46Z

Before I post a support request, there is one thing that I see might be helpful in other cases. At the top of this web page is a tab bar with tabs titled " Conversation", "Commits", "Checks", "Files changed". Under "Files changed", you can manually edit the files and if you block on a line, you can insert comments or start a discussion about the line. So that's a partial alternative.

yamaoka · 2019-03-14T06:21:18Z

I think I resolved the "exited abnormally with code 1" issue.　　The cause of it is that the first argument passed to wget will be a zero length string if it is the first time to download a remote file.　　I.e.,

wget "" -O destination source

Because of this "", wget complaints it as "http://: invalid host name" and counts it as an error.
My solution:

--- w3m-download.el~	2019-03-14 06:15:13.849277800 +0000
+++ w3m-download.el	2019-03-14 06:16:42.254984000 +0000
@@ -306,7 +306,7 @@
        (setq buffer-read-only t)
        (setq w3m--download-local-proc
          (apply 'start-process "w3m-download" buf
-           (list "wget" (if resume "-c" "") "-O" save-path url)))
+           (delq nil (list "wget" (if resume "-c") "-O" save-path url))))
        (set-process-filter w3m--download-local-proc 'w3m--download-process-filter)
        (setq w3m--download-metadata-operation metadata)
        (push (cons w3m--download-local-proc buf) w3m--download-processes-list)

Boruch-Baum · 2019-03-14T11:57:36Z

@yamaoka : That looks great. Thanks for the catch! It's probably a good idea to retain the current user message, since the 'exit code 1' is defined as generic, so it could pop up again for other issues.

+ Katsumi-san found that I was unintentionally sending an empty string arg to wget.

Boruch-Baum · 2019-03-14T15:30:40Z

Possible bug-fix pending for the messaging compatibility, lines ~107-110:

(eval-when-compile
  (when (not (fboundp 'w3m--message))
    (defun w3m--message (timeout face message &rest args)
      (w3m-message message args))))

Note though that those lines don't need to be part of the merge at all. It's there for compatibility with another pull request I have pending (#14). If the project isn't ever going to accept that other PR, there's no sense in it, and I can just replace the six calls to w3m--message with w3m-message. If #14 is accepted, then the snippet also becomes unnecessary.

yamaoka · 2019-03-14T23:15:24Z

Off topic:
The ones surrounded with (eval-when-compile ...) will not be coded in the elc file, in other words, the function definition in it will be populated only when byte-compiling.　　So, those who don't have a version that provides the function will get lost.

Boruch-Baum · 2019-03-14T23:32:44Z

Off topic:
The ones surrounded with (eval-when-compile ...) will not be coded in the elc file, in other words, the function definition in it will be populated only when byte-compiling.　　So, those who don't have a version that provides the function will get lost.

Ah. If I'm understanding this correctly then, what would be ok would the same clause, just without the eval-when-compile? A fix is needed anyway, because my original version wasn't handling the arguments correctly.

yamaoka · 2019-03-14T23:36:27Z

In addition,
I am not familiar with the world of lexical-binding=t.　　Probably so would be the other emacs-w3m developers.　　As for me, it is hard to run edebug in the lexical world, so I am afraid if it makes the code black box.

Boruch-Baum · 2019-03-15T00:36:08Z

In addition,
I am not familiar with the world of lexical-binding=t.　　Probably so would be the other emacs-w3m developers.　　As for me, it is hard to run edebug in the lexical world, so I am afraid if it makes the code black box.

That seems to have been fixed. What I mean is that I just looked it up and found that the current on-line documentation says in footnote 1 that "a lexical binding can also be accessed from the Lisp debugger". Do you have a simple test?

+ Partial downloads are tagged with extension .PART + Don't accumulate download progress messages in kill-ring + Fix definition spec of functions w3m-message and w3m--message, and standardize use of w3m--message + Truncate line to <80 columns + Decode %xx codes in file basenames + Use function expand-file-name + Add function: w3m-download-delete-all-download-buffers and alias w3m-download-kill-all-wget-processes + remove lexical-let (entire file is lexical-binding: t) + Remove cruft: unused local variables + check for valid w3m-cache-buffer + add 'require's for w3m

+ New feature: Function w3m-download-select bulk downloads links from a buffer or region by a regex and/or selection table. Popular regex'es are maintained in w3m-download-select-filter-list. + Downloads accumulate in a queue and are performed subject to dynamically adjustable w3m-download-max-simultaneous. + Option to use wget's '--xattr' feature, which saves URL and referer HTTP header to a file extended-attributes (only available if the target filesystem supports extended-attributes, doh). + Major documentation and layout changes.

…b_download

+ disable undo in download buffer + increase scope of mutex + allow all queues to be re-arranged + allow re-arrangement by prefix-arg positions

+ eg. "downloadhandler.ashx"

+ search items of ambiguous filenames can be regexes, not just string literals + handler functions can accept any number of args + the default handler function accepts and additional arg: the filename extension to be used.

+ The type definition did not match the default data structure

+ w3m-download-select-deep-search: Feature to search for links embedded in <script> and other places in the HTML source. Also, now check for links appearing as plain text in HTML documents.

+ Improve operation of kill-ring-save + delete trailing whitespace + improve handling of default refresh interval when changing the default + don't log keybinding help to *Messages* buffer

1) Expects the 'face property to be a value of type list, and on that basis pops off the symbol and later creates a list for it. However, 'face could be a simple symbol. 2) Performs its comparison test using 'eq instead of 'equal, which wouldn't catch cases such as '(:weight bold). 3) Doesn't perform `remove-text-properties', so properties not caught by the 'eq test accumulate. 4) Accepts an OBJECT argument, and uses it to retrieve properties, but not to restore the properties when it calls w3m-add-text-properties.

+ Not just cloudflare, but also similars such as: infuria.io, and ipfs.io. They set the filename to a useless hash, and hide the filename as a url query?filename=

+ (eg '.*$')

+ remove url from split-string operation + convert spaces to %20 in url argument.

+ This is a case that shouldn't happen, but I see it happening in practice with cloudflare downloads. They use a url query "filename=" and the filename can have embedded ampersands!

Boruch-Baum added 5 commits March 12, 2019 00:18

w3m-download-using-wget: use cache when available

15510a3

Integrate the wget downloader functions into the mainline code

1eddf4a

Merge branch 'master' into bb_download

928b622

Merge branch 'bb-upstream' into bb_download

45df71f

Boruch-Baum added 2 commits March 13, 2019 04:36

w3m-download: fix deceptive indentation error

5a683bb

Merge branch 'bb_download' of github.com:boruch-baum/emacs-w3m into b…

4e49609

…b_download

Boruch-Baum added 2 commits March 13, 2019 08:14

w3m-download: add requires and provides

aba4172

yamaoka closed this Mar 14, 2019

yamaoka reopened this Mar 14, 2019

BUGFIX: 'exit code 1' was due to argparse error

9c10827

+ Katsumi-san found that I was unintentionally sending an empty string arg to wget.

Boruch-Baum force-pushed the bb_download branch 2 times, most recently from f7467f8 to eb641a2 Compare March 18, 2019 03:58

Boruch-Baum force-pushed the bb_download branch from eb641a2 to 213c35d Compare March 19, 2019 14:56

Boruch-Baum added 30 commits December 18, 2019 13:56

Merge branch 'bb_download' of github.com:Boruch-Baum/emacs-w3m into b…

cdc4c74

…b_download

w3m-download: enable reordering for all queues, and n positions

7a74efa

+ disable undo in download buffer + increase scope of mutex + allow all queues to be re-arranged + allow re-arrangement by prefix-arg positions

w3m-download: canonicalize urls befire use

5bbe2a2

w3m-download: handler for repetitive filenames

aff2d9f

+ eg. "downloadhandler.ashx"

w3m-download-ambiguous-basename-alist: fix M-x customize-variable

ea38aa1

+ The type definition did not match the default data structure

w3m-download-select: find hidden links, update documentation

3d064aa

+ w3m-download-select-deep-search: Feature to search for links embedded in <script> and other places in the HTML source. Also, now check for links appearing as plain text in HTML documents.

w3m-download-select: Improve regexp for deep search option

f0e4c0a

w3m-download-queue: improve keybinding hints, bind '?" to describe mode

ceb6a89

w3m-download Fix regexes, response to blank entry, text search

b192eb3

w3m-download-select: Remove dupes, single-quotes. Use thing-at-point

ab3f0c6

w3m-download: Add keybinding help, bound to '?'

b8f5eca

w3m-download: better copy-region, messages, default refresh interval

4c51c07

+ Improve operation of kill-ring-save + delete trailing whitespace + improve handling of default refresh interval when changing the default + don't log keybinding help to *Messages* buffer

Update for new hash table handling

af8f1d1

w3m-download: pkg key-assist for help; prettify buffer; display empty

69dadac

w3m-download: use key-assist for mode-help

2f3696c

w3m-download: Replace statistics line with static header line.

102b804

w3m-download: New feature: toggle line wrap

66cd795

New funciton: w3m-download-help; fix header-line for session buffers

4dd42b3

w3m-download-delete-line: Use prefix arg to delete multiple lines

a6fdf01

w3m-download: start search at beginning of buffer

452ca39

Handle cloudflare download filenames

47d50b0

+ Not just cloudflare, but also similars such as: infuria.io, and ipfs.io. They set the filename to a useless hash, and hide the filename as a url query?filename=

w3m-download-select: bugfix for certain regexes

7db4141

+ (eg '.*$')

w3m-download: Improve handling of SPC in urls

abb0140

+ remove url from split-string operation + convert spaces to %20 in url argument.

w3m--url-get-queries: Handle weird case of '&' embedded in query

f1b2192

+ This is a case that shouldn't happen, but I see it happening in practice with cloudflare downloads. They use a url query "filename=" and the filename can have embedded ampersands!

Remove artifact from stash merge

6b1173c

w3m-download: new feature: w3m-download-retry

9fae7b6

Questionable commit: tabs v spaces

f64f722

w3m-download: youtu.be support

6f9a7fc

Download using wget #27

Are you sure you want to change the base?

Download using wget #27

Uh oh!

Conversation

Boruch-Baum commented Mar 13, 2019

Uh oh!

yamaoka commented Mar 13, 2019

Uh oh!

Boruch-Baum commented Mar 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boruch-Baum commented Mar 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yamaoka commented Mar 14, 2019

Uh oh!

Boruch-Baum commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boruch-Baum commented Mar 14, 2019

Uh oh!

yamaoka commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boruch-Baum commented Mar 14, 2019

Uh oh!

Boruch-Baum commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yamaoka commented Mar 14, 2019

Uh oh!

Boruch-Baum commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yamaoka commented Mar 14, 2019

Uh oh!

Boruch-Baum commented Mar 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Boruch-Baum commented Mar 13, 2019 •

edited

Loading

Boruch-Baum commented Mar 13, 2019 •

edited

Loading

Boruch-Baum commented Mar 14, 2019 •

edited

Loading

yamaoka commented Mar 14, 2019 •

edited

Loading

Boruch-Baum commented Mar 14, 2019 •

edited

Loading

Boruch-Baum commented Mar 14, 2019 •

edited

Loading