Newspipe

Table of Contents


SourceForge.net Logo

RSS Feed Available (new releases)

Newspipe's Home
at Sourceforge.net

Introduction

Newspipe is an RSS/Atom aggregator with a difference: It allows you to keep track of your feeds through e-mail - you create an OPML file listing your feeds and Newspipe will collect them, convert them to e-mail messages and send them to your mailbox.

This means you can read, organize and archive news feeds using your current mail client (or even webmail), without needing to use a separate program. Newspipe can send you news items as plaintext or HTML mail, both as single items or grouped in a digest.

Newspipe is the creation of Ricardo M. Reyes, and started out as an RSS-to-POP3 converter called rss2pop3. Over time, it grew to the extent where real-time collection via POP3 wasn't flexible enough, and it was retooled to work via SMTP.

Features

Requirements

To use Newspipe, you need:

All the required Python modules and libraries are included in the distribution archive.

Newspipe has so far been successfully tested on:

Download

You can download the latest stable version from the files section.

If you would like to try the latest version and you have a CVS client, you are welcome to download it from the CVS repository (module newspipe at cvs.sourceforge.net:/cvsroot/newspipe). You can also browse the latest source code through the SourceForge viewcvs interface.

Installation

To install Newspipe, just unzip the files you downloaded to a new folder, and edit the file Newspipe.ini. The only parameters you absolutely need to edit are smtp_server and opml:

Make sure that the ownerName and ownerEmail options at the top of the OPML file are correct. They should show your name and email address, and Newspipe will use them to form the recipient of the emails. You can address the emails to multiple recipients by separating them with commas. (For example: <ownerEmail>first@example.com, second@example.com</ownerEmail>)

You can also set the optional tags fromName and fromEmail to control the originating address of the emails. If you don't, Newspipe will try to form the originating address with the feed's name and the author's email.

Once the configuration files are set, just execute the script Newspipe.py. In Windows, it's usually enough to double-click on it, or you can create a shortcut to it. In Unix/Linux, run it trough Python's interpreter, with the command: python Newspipe.py (assuming Python binaries are in the search PATH).

Configuration

There are two main sets of configuration options program-wide options set via a .INI file, and per-feed options set inside the OPML file.

By default, Newspipe will look for the .ini file in the same folder where it's installed, but you can control the location of the ini file using the command line parameter: -i

Every parameter set in the .ini file can be overrided using command-line parameters. Execute Newspipe.py --help to see the list of available parameters

.INI file Options

KeyDefaultNotes
opml None REQUIRED - the filename or URL of the OPML file with the list of feeds to check
smtp_server None REQUIRED - fully qualified domain name or IP address of the SMTP server to send messages through
sender None optional e-mail address to use as From: - overrides the OPML ownerEmail field.
textonly 0 if set to 1, all the messages sent by Newspipe will be sent in plaintext format, without any HTML. The plaintext version is formatted in Markdown and includes references to all links, images, etc.
log_console 0 If set to 1, this will send logging output to the console and to the log file.
check_online None If present, should be the URL of a webpage that the program will try to fetch to determine if there is a network connection available - useful for dialup users that may not be always online.
A good value for this is http://www.google.com.
sleep_time 5 Number of minutes to wait before re-checking feeds. If the value is 0, then the program will exit after checking all feeds, instead of waiting to start again (batch mode).
offline 0 If set to 1, the program won't try to fetch any data from the internet, using cached versions instead. Useful for debugging.
debug 0 If set to 1, logging becomes more verbose. Useful for debugging.
workers 10 Number of threads to use simultaneusly. Bigger numbers should mean better performance, but with increased memory and CPU use.
Dialup users may want to change this to 3, since more threads may also mean more timeout errors for them.
can_pipe 0

Enables pipe:// URI support. When set to "1", the program will accept feed URLs that start with pipe:// - an arbitrary protocol name used to tell Newspipe that to get the feed data, it should run the program or command after the pipe:// and use its standard output as feed data.

Example:

<outline ... xmlURL="pipe://python.exe scrape_site.py" />

tells Newspipe to run

python.exe scrape_site.py

and capture its standard output as feed data. This allows you to use screen-scraping scripts to generate feed data from sites that have no RSS feeds, and since it makes Newspipe invoke external programs, it is disabled by default as a security measure.

encoding utf-8

Determines the unicode encoding to use when composing the emails (you can find the list of possible values at this page)

proxy None

Determines the address and port of the proxy server to go trough, separated by a colon. For example: http://192.168.0.1:8080

threading 0

If set to "1", Newspipe will add the References and In-Reply-To headers to emails, to allow threading of the messages

subject None

The value of this parameter will be added at the begining of the subject of every email, so that they can be identified and filtered by mail programs that can't access the mail headers

smtp_auth 0

If set to "1", Newspipe will use SMTP Authorization when connecting to the SMTP server

smtp_user None

If smtp_auth is set to "1", the value of this setting will be used as the username when connecting to the smtp server

smtp_pass None

If smtp_auth is set to "1", the value of this setting will be used as the password when connecting to the smtp server

multipart on If set to on, Newspipe will include a plaintext version of item contents as well as an HTML version.

Setting this to off sends only HTML and images, and might be desirable if you want to receive slightly smaller messages - or if you use a mailer/webmail that has trouble with alternative versions of the same message (Gmail, for instance, defaults to showing plaintext versions of messages).

reverse 0

If set to "1", Newspipe sends the emails in reversed order

send_method SMTP

Determines the method that Newspipe will use to send the emails. Possible values:

  • SMTP: Send the emails via an SMTP server
  • PROCMAIL: Invoke a PROCMAIL script or program for each email, sending the email's text through it's standard input
  • BOTH: Combines the two previous options, sending each email trough the smtp server and the procmail script

procmail None

Full path to a procmail script that will be invoqued for each email when SEND_METHOD is PROCMAIL or BOTH

OPML Options

Since OPML is an hierarchical format, you can have multiple outline items nested inside each other. Some outline items will denote single feeds, and some will group related sets of feeds together.

Newspipe allows you to use any sort of layout you like, and lets you set feed options on a group basis or for a single feed:

Example opml file:

   <?xml version="1.0" encoding="ISO-8859-1"?>
   <opml version="1.1">
      <head>
         <title>test.opml</title>
         <ownerName>Your name goes here</ownerName>
         <ownerEmail>your.name@example.com</ownerEmail>
         <ownerMobile>your.phone@example.com</ownerEmail>
      </head>
      <body>
         <outline text="General">
            <outline text="dive into mark"
               description="A lot of effort went into making this effortless."
               htmlUrl="http://diveintomark.org"
               xmlUrl="http://diveintomark.org/xml/atom.xml" />

            <outline text="The Tao of Mac"
               description="mac.against.org"
               htmlUrl="http://the.taoofmac.com/"
               xmlUrl="http://the.taoofmac.com/space/RecentChanges?format=rss" />

         </outline>
         <outline text="1 Email per feed" digest="1">
            <outline text="The Scobleizer Weblog"
               description="Human Aggregator of geek life"
               htmlUrl="http://radio.weblogs.com/"
               xmlUrl="http://radio.weblogs.com/0001011/rss.xml" />

            <outline text="Boing Boing Blog"
               description="The Blog of Wonderful things"
               htmlUrl="http://boingboing.net"
               xmlUrl="http://boingboing.net/rss.xml" />

            <outline text="1 email per feed, but without titles" titles="0">
               <outline text="Scripting News"
                  description="All scripting, all the time, forever."
                  htmlUrl="http://www.scripting.com/"
                  xmlUrl="http://www.scripting.com/rss.xml" />

            </outline>
         </outline>
         <outline text="Download the linked page" download_link="1">
            <outline text="ongoing"
               description="Ongoing fragmented essay by Tim Bray."
               htmlUrl="http://www.tbray.org/ongoing/"
               xmlUrl="http://www.tbray.org/ongoing/ongoing.rss" />

         </outline>
         <outline text="Plain text emails" textonly="1">
            <outline text="Wired News"
               description="Wired News"
               htmlUrl="http://www.wired.com"
               xmlUrl="http://www.wired.com/news/feeds/rss2/0,2610,,00.xml" />

         </outline>
      </body>
   </opml>

AttributeDefaultNotes
text None REQUIRED - Title of the feed, to be used in the log files.
htmlUrl None REQUIRED - URL of the site related to this feed. It's used in the "Home" link in the footer of each email.
xmlUrl None REQUIRED - URL of the RSS/Atom feed
digest 0 If set to "1", all the new or modified items in the feed will be grouped to a single email.
titles 1 If set to "0", the digest email will have a more compact format, without titles, separator, author name and date-time of each item. This setting only works when digest="1" and is useful in feeds that don't have titles in their items, like scripting.com.
download_link 0 If set to "1", Newspipe will download the webpage linked by each item, and use the html code of that page instead of the text of the item. This is useful to get full text entries of feeds that only have exerpts. Use with caution, the resulting emails can get really big when the linked webpage has too many images, or long text (like Slashdot's).
diff 1 If set to "1", Newspipe will compare the text of each item with the previous version (if any) stored in the cache, and will highlight any additions and deletions in the text. You can disable this feature seting this parameter to "0" if it gets too distracting.
check_text 1 When this parameter is set to "0", Newspipe will not re-send items previously seen, even if the text of the item has changed. When set to "1", old items that have any change in their text will be treated as new items, and will be sent by email.
delay 60 Number of minutes to wait between checks to this feed.
textonly 0 If set to "1", the emails of this feed will be in plain-text format, instead of HTML.
mobile 0 If set to "1" and the OPML file has a ownerMobile field, a text-only copy of the feed items will be sent to ownerMobile (useful for PDAs or MMS-enabled mobile phones)
auth None Contains an optional username:password string for basic HTTP authentication (known to work with Gmail Atom feeds.)
download_images 1 If set to "0", Newspipe won't download any images, and the emails will link to them externally.
check_time None The value of this parameter should be a time range in the form "10:00 to 21:00". If present, Newspipe will only download and check this feed when the current time is inside the indicated time range.
mobile_time None The value of this parameter should be a time range in the form "10:00 to 21:00", and only has effect when the mobile option is set to "1". If present, Newspipe will send the text email to the secondary address only when the current time is inside the indicated time range. For example, you can set this option to "8:00 - 21:00" to avoid receiving alerts in your phone during the night
remove None The value in this parameter should be a regular expression. Any text that matches this regular expression in the feed will be removed (specially usefull to remove ads). For example, you can use this:

remove="&lt;a .+? http://feeds.feedburner.com/ .+? iBag\?a= .+? &lt;/a&gt;"

to remove adds from Boing Boing (and probably from other "feedburner enhaced" feeds)
owneremail None With this attribute you can override the OPML's OwnerEmail value.

Any attribute that is unknown by Newspipe will be turned into a mail header of the form: X-Custom-NAME: VALUE, that you can use to filter or classify the emails in your mail client. For example, if you put something like this in your opml file:

    <outline text="Name of the blog" importance="High" xmlUrl="http://example.com" />

All the emails produced for that blog will carry a mail header of the form:

    X-Custom-importance: High

Also, you can have the same effect with something like this:

    <outline text="Important Blogs" importance="High">
        <outline text="First one" xmlUrl="http://example.com" ... />
        <outline text="Second one" xmlUrl="http://example.com" ... />
        <outline text="Third one" xmlUrl="http://example.com" ... />
    </outline>

In this later case, the three items "inherit" the parent's attribute, and all of them will have the same extra mail header. This is a useful way to group feeds together when you want to filter or process them in some way with your mail client.

Mail Headers

Newspipe adds a number of mail headers to messages that you can use to filter or classify them into folders, and which also provide some useful information as to what it has done to parse the feed:

HeaderValues & Meaning
X-Channel-Feedfeed URL (same as the xmlURL attribute in the OPML file)
X-NewsPipe-Versionversion information about Newspipe, Python and the platform
X-Channel-Titlethe feed title
X-Channel-Descriptionthe feed description
List-IdTitle and xml url of the feed
Content-LocationUrl of the item

The following headers will only be present if the .ini parameter threading is present and set to "1".

HeaderValues & Meaning
In-Reply-ToLast Message-ID generated from this feed (to simulate e-mail threading)
ReferencesList of Message-IDs generated from this feed (to simulate e-mail threading)

The following headers will only be present if the program is running in DEBUG mode.

HeaderValues & Meaning
X-Item-Attributesattributes parsed by Newspipe while building the message
X-Item-Text-Keyfeed attribute that contained the item text
X-Newspipe-Versionprogram version number and revision
X-Item-ModifiedWhether Newspipe detected changes to the feed item (set to "Unknown" for digest feeds)
X-Item-Hash-Linkmd5-encoded item link
X-Item-Hash-Feedmd5-encoded feed URL (xmlUrl)
X-Item-Hash-Subjectmd5-encoded Subject
X-Item-Hashmd5-encoded item URL
X-Channel-X-Cache-Resultwhether the data was downloaded ("Downloaded") or obtained from a pipe:// feed ("From process")

Acknowledgements

Contact the Author

You can contact the author of Newspipe trough email at chiquito@gmail.com

Any suggestion, bug reports, or comments are most welcome.

References

Changelog