Skip to content

Conversation

@Boruch-Baum
Copy link
Contributor

See the commentary of the new file w3m-download.el, and mailing list posts [13304], [13305], [13306].

+ Use wget for downloads

+ Save temporary and partial downloads to disk, not memory

+ Track progress of downloads in individual log buffers, each deleted
  upon successful completion.

+ Resume partial downloads that were interrupted due to aborts or
  other failures.

+ Optionally, add meta-data tag Exif.Image.ImageDescription to png and
  jpeg files.
@yamaoka
Copy link
Contributor

yamaoka commented Mar 13, 2019

@Boruch-Baum
As for the new w3m-download function, the Lisp form is quite strange, i.e., the thing is all done within (interactive ...).  Are the arguments, url, filename, no-cache, etc., really passed to w3m-download-using-* ?

@Boruch-Baum
Copy link
Contributor Author

Boruch-Baum commented Mar 13, 2019

@yamaoka : No, that's a cosmetic indentation bug. The code evaluates the if after the interactive.

@Boruch-Baum
Copy link
Contributor Author

Boruch-Baum commented Mar 13, 2019

@yamaoka : A few things just occurred to me now about this PR:

  1. w3mhack.el would need to be modified to recognize file w3m-download.el and its contents?

  2. w3m-download.el should (provide 'w3m-download)? And w3m.el should somewhere (require 'w3m-download)?

  3. With a new file would maybe come changes to the make recipe?

+ The way external-view seems to be currently handled is that it is
  different from download, in that it is a temporary action, in the
  sense that after the downloaded file is sent to the viewer program,
  it is deleted from disk. The user can save the file from the
  external viewer, if that external program has that option.

+ external-view probably doesn't need a progress buffer, although it
  technically wouldn't hurt.

+ It does need to wait for the download to complete in order to
  proceed to execute the view command for the downloaded file. This
  could potentially be added to the wget code, using the 'handler'
  argument and the existing 'w3m-process' architecture.
@yamaoka yamaoka closed this Mar 14, 2019
@yamaoka
Copy link
Contributor

yamaoka commented Mar 14, 2019

Sorry for my mistake above.  This is not closed yet.
I tried the latest one and confirmed it works.  But I added some tweaks (a diff attached).  A problem, that I couldn't resolve so far, is downloading often ends with "exited abnormally with code 1".

My changes:

* Remove lexical-binding cookie because lexical-let is used here and there.
* In an environment of other than yours, `w3m--message' will not be provided
  at the run time, so replace it with a compiler macro.
* Use `expand-file-name' instead of `concat' `in w3m-download-using-wget'
  to generate the destination name because `w3m-default-save-directory'
  might not end with "/".

* Perform `indent-region' on the whole contents in the condition where
  cl is loaded (for `lexical-let') and `indent-tabs-mode' is set.
* Fold long lines so as to be filled in within 80-column.

DIFF.gz

@yamaoka yamaoka reopened this Mar 14, 2019
@Boruch-Baum
Copy link
Contributor Author

Boruch-Baum commented Mar 14, 2019

@yamaoka

But I added some tweaks (a diff attached).

There has got to be a better way, a more git or github way, to collaborate instead of using attached diffs. In the short term, could you post a diff that excludes the whitespace changes (diff options -w -b -B)?

A problem, that I couldn't resolve so far, is downloading often ends with "exited abnormally with code 1".

This is an issue in how wget behaves; it often succeeds, but reports an exit code 1. My "solution", as you probably noticed, was to leave the progress buffer open, but with a note that the download probably succeeded. I didn't like the alternative of treating an exit code 1 as a success, but if that's what the project wants, it would be simple to do. Doing so would eliminate confusion and buffer clutter most of the time, but might occasionally result in a false positive.

My changes:

  • Remove lexical-binding cookie because lexical-let is used here and there.

Ahhh. I missed that some of the functions that I moved into the file w3m-download had that feature. Had I noticed them, I would have removed them. How about removing the lexical-let calls instead? Doesn't the project (and emacs in general) want to become totally lexically-scoped? This could be an opportunity to slowly continue that process, if you agree.

  • In an environment of other than yours, `w3m--message' will not be provided
    at the run time, so replace it with a compiler macro.

Oh, I thought that the conditional defun would be sufficient. Why is a defmacro required.

  • Use expand-file-name' instead of concat' in w3m-download-using-wget' to generate the destination name because w3m-default-save-directory'
    might not end with "/".

OK.

  • Perform indent-region' on the whole contents in the condition where cl is loaded (for lexical-let') and `indent-tabs-mode' is set.

I don't understand, but it might be obvious if could send me a diff excluding all the whitespace changes. If we had a more git or github collaboration method, it would probably be simple for me to perform the diff on your commit, but with the current DIFF.gz I'm at a loss.

  • Fold long lines so as to be filled in within 80-column.

Always good.

I'll post a support ticket to github asking whether there's a better way than using attached diffs.

@Boruch-Baum
Copy link
Contributor Author

Before I post a support request, there is one thing that I see might be helpful in other cases. At the top of this web page is a tab bar with tabs titled " Conversation", "Commits", "Checks", "Files changed". Under "Files changed", you can manually edit the files and if you block on a line, you can insert comments or start a discussion about the line. So that's a partial alternative.

@yamaoka
Copy link
Contributor

yamaoka commented Mar 14, 2019

I think I resolved the "exited abnormally with code 1" issue.  The cause of it is that the first argument passed to wget will be a zero length string if it is the first time to download a remote file.  I.e.,

wget "" -O destination source

Because of this "", wget complaints it as "http://: invalid host name" and counts it as an error.
My solution:

--- w3m-download.el~	2019-03-14 06:15:13.849277800 +0000
+++ w3m-download.el	2019-03-14 06:16:42.254984000 +0000
@@ -306,7 +306,7 @@
        (setq buffer-read-only t)
        (setq w3m--download-local-proc
          (apply 'start-process "w3m-download" buf
-           (list "wget" (if resume "-c" "") "-O" save-path url)))
+           (delq nil (list "wget" (if resume "-c") "-O" save-path url))))
        (set-process-filter w3m--download-local-proc 'w3m--download-process-filter)
        (setq w3m--download-metadata-operation metadata)
        (push (cons w3m--download-local-proc buf) w3m--download-processes-list)

@Boruch-Baum
Copy link
Contributor Author

@yamaoka : That looks great. Thanks for the catch! It's probably a good idea to retain the current user message, since the 'exit code 1' is defined as generic, so it could pop up again for other issues.

+ Katsumi-san found that I was unintentionally sending an empty string
  arg to wget.
@Boruch-Baum
Copy link
Contributor Author

Boruch-Baum commented Mar 14, 2019

Possible bug-fix pending for the messaging compatibility, lines ~107-110:

(eval-when-compile
  (when (not (fboundp 'w3m--message))
    (defun w3m--message (timeout face message &rest args)
      (w3m-message message args))))

Note though that those lines don't need to be part of the merge at all. It's there for compatibility with another pull request I have pending (#14). If the project isn't ever going to accept that other PR, there's no sense in it, and I can just replace the six calls to w3m--message with w3m-message. If #14 is accepted, then the snippet also becomes unnecessary.

@yamaoka
Copy link
Contributor

yamaoka commented Mar 14, 2019

Off topic:
The ones surrounded with (eval-when-compile ...) will not be coded in the elc file, in other words, the function definition in it will be populated only when byte-compiling.  So, those who don't have a version that provides the function will get lost.

@Boruch-Baum
Copy link
Contributor Author

Boruch-Baum commented Mar 14, 2019

Off topic:
The ones surrounded with (eval-when-compile ...) will not be coded in the elc file, in other words, the function definition in it will be populated only when byte-compiling.  So, those who don't have a version that provides the function will get lost.

Ah. If I'm understanding this correctly then, what would be ok would the same clause, just without the eval-when-compile? A fix is needed anyway, because my original version wasn't handling the arguments correctly.

@yamaoka
Copy link
Contributor

yamaoka commented Mar 14, 2019

In addition,
I am not familiar with the world of lexical-binding=t.  Probably so would be the other emacs-w3m developers.  As for me, it is hard to run edebug in the lexical world, so I am afraid if it makes the code black box.

@Boruch-Baum
Copy link
Contributor Author

In addition,
I am not familiar with the world of lexical-binding=t.  Probably so would be the other emacs-w3m developers.  As for me, it is hard to run edebug in the lexical world, so I am afraid if it makes the code black box.

That seems to have been fixed. What I mean is that I just looked it up and found that the current on-line documentation says in footnote 1 that "a lexical binding can also be accessed from the Lisp debugger". Do you have a simple test?

@Boruch-Baum Boruch-Baum force-pushed the bb_download branch 2 times, most recently from f7467f8 to eb641a2 Compare March 18, 2019 03:58
+ Partial downloads are tagged with extension .PART

+ Don't accumulate download progress messages in kill-ring

+ Fix definition spec of functions w3m-message and w3m--message, and
  standardize use of w3m--message

+ Truncate line to <80 columns

+ Decode %xx codes in file basenames

+ Use function expand-file-name

+ Add function: w3m-download-delete-all-download-buffers and alias
  w3m-download-kill-all-wget-processes

+ remove lexical-let (entire file is lexical-binding: t)

+ Remove cruft: unused local variables

+ check for valid w3m-cache-buffer

+ add 'require's for w3m
+ New feature: Function w3m-download-select bulk downloads links from
  a buffer or region by a regex and/or selection table. Popular
  regex'es are maintained in w3m-download-select-filter-list.

+ Downloads accumulate in a queue and are performed subject to
  dynamically adjustable w3m-download-max-simultaneous.

+ Option to use wget's '--xattr' feature, which saves URL and referer
  HTTP header to a file extended-attributes (only available if the
  target filesystem supports extended-attributes, doh).

+ Major documentation and layout changes.
+ disable undo in download buffer

+ increase scope of mutex

+ allow all queues to be re-arranged

+ allow re-arrangement by prefix-arg positions
+ search items of ambiguous filenames can be regexes, not just string
  literals

+ handler functions can accept any number of args

+ the default handler function accepts and additional arg: the
  filename extension to be used.
+ The type definition did not match the default data structure
+ w3m-download-select-deep-search: Feature to search for links
  embedded in <script> and other places in the HTML source. Also, now
  check for links appearing as plain text in HTML documents.
+ Improve operation of kill-ring-save

+ delete trailing whitespace

+ improve handling of default refresh interval when changing the default

+ don't log keybinding help to *Messages* buffer
1) Expects the 'face property to be a value of type list, and on that
   basis pops off the symbol and later creates a list for it. However,
  'face could be a simple symbol.

2) Performs its comparison test using 'eq instead of 'equal, which
   wouldn't catch cases such as '(:weight bold).

3) Doesn't perform `remove-text-properties', so properties not
   caught by the 'eq test accumulate.

4) Accepts an OBJECT argument, and uses it to retrieve properties, but
   not to restore the properties when it calls w3m-add-text-properties.
+ Not just cloudflare, but also similars such as: infuria.io, and
  ipfs.io. They set the filename to a useless hash, and hide the
  filename as a url query?filename=
+ remove url from split-string operation

+ convert spaces to %20 in url argument.
+ This is a case that shouldn't happen, but I see it happening in
  practice with cloudflare downloads. They use a url query "filename="
  and the filename can have embedded ampersands!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants