Introduction
Newspipe is an RSS/Atom aggregator with a difference: It allows you to keep track of your feeds through e-mail - you create an OPML file listing your feeds and Newspipe will collect them, convert them to e-mail messages and send them to your mailbox.
This means you can read, organize and archive news feeds using your current mail client (or even webmail), without needing to use a separate program. Newspipe can send you news items as plaintext or HTML mail, both as single items or grouped in a digest.
Newspipe is the creation of Ricardo M. Reyes, and started out as an RSS-to-POP3 converter called rss2pop3. Over time, it grew to the extent where real-time collection via POP3 wasn't flexible enough, and it was retooled to work via SMTP.
Features
- Supports RSS and Atom feeds through Mark Pilgrim's Universal Feed Parser
- feeds are listed in an OPML file:
- feed options can be set individually for each feed or for a group of feeds
- supports screen-scraping scripts via an internal pipe:// URI schema
- the OPML file can reside locally (in your hard disk) or remotely (in a web server).
- sends news items via SMTP to a designated e-mail address
- messages can be in HTML/multipart MIME format or plaintext
- multiple items from a feed can be grouped and sent in a digest message
- updated news items are detected and re-sent with additions and
deletionshighlighted - images linked from the feed items are downloaded and included as inline images inside the mail message (great for archiving purposes).
- a mobile (text-only) view of a feed can be sent to a secondary e-mail address (to read on a PDA or an MMS-enabled mobile phone)
- full support for HTTP optimizations like gzip compression, If-Modified-Since and If-None-Match headers. Feeds and image files will only be downloaded when they have changed.
- E-mail "threading" based on previously sent RSS items.
Requirements
To use Newspipe, you need:
- Python 2.3 or above
- access to an SMTP server
- a mail client (preferably one capable of displaying HTML mail)
All the required Python modules and libraries are included in the distribution archive.
Newspipe has so far been successfully tested on:
- Windows (plain and Cygwin)
- Linux (several distros)
- Mac OS X (10.3.x)
Download
You can download the latest stable version from the files section.
If you would like to try the latest version and you have a CVS client, you are welcome to download it from the CVS repository (module newspipe at cvs.sourceforge.net:/cvsroot/newspipe). You can also browse the latest source code through the SourceForge viewcvs interface.
Installation
To install Newspipe, just unzip the files you downloaded to a new folder, and edit the file Newspipe.ini. The only parameters you absolutely need to edit are smtp_server and opml:
- smtp_server should have the address of the mail server you'll use to send the emails. Here you can use the network name of the server, or the IP address. If you intend to use a local server that runs on your machine, use the address 127.0.0.1
- opml should point to your OPML file that lists the feeds you are subscribed to. If your are currently using another aggregator, it probably has an option or command to export your subscription list. If not, you can start a new file editing the test.ompl included in the .zip file you downloaded.
Make sure that the ownerName and ownerEmail options at the top of the OPML file are correct. They should show your name and email address, and Newspipe will use them to form the recipient of the emails. You can address the emails to multiple recipients by separating them with commas. (For example: <ownerEmail>first@example.com, second@example.com</ownerEmail>)
You can also set the optional tags fromName and fromEmail to control the originating address of the emails. If you don't, Newspipe will try to form the originating address with the feed's name and the author's email.
Once the configuration files are set, just execute the script Newspipe.py. In Windows, it's usually enough to double-click on it, or you can create a shortcut to it. In Unix/Linux, run it trough Python's interpreter, with the command: python Newspipe.py (assuming Python binaries are in the search PATH).
Configuration
There are two main sets of configuration options program-wide options set via a .INI file, and per-feed options set inside the OPML file.
By default, Newspipe will look for the .ini file in the same folder where it's installed, but you can control the location of the ini file using the command line parameter: -i
Every parameter set in the .ini file can be overrided using command-line parameters. Execute Newspipe.py --help to see the list of available parameters
.INI file Options
Key | Default | Notes |
---|---|---|
opml | None | REQUIRED - the filename or URL of the OPML file with the list of feeds to check |
smtp_server | None | REQUIRED - fully qualified domain name or IP address of the SMTP server to send messages through |
sender | None | optional e-mail address to use as From: - overrides the OPML ownerEmail field. |
textonly | 0 | if set to 1, all the messages sent by Newspipe will be sent in plaintext format, without any HTML. The plaintext version is formatted in Markdown and includes references to all links, images, etc. |
log_console | 0 | If set to 1, this will send logging output to the console and to the log file. |
check_online | None | If present, should be the URL of a webpage that the program will try to fetch to determine if there is a network connection available - useful for dialup users that may not be always online. A good value for this is http://www.google.com. |
sleep_time | 5 | Number of minutes to wait before re-checking feeds. If the value is 0, then the program will exit after checking all feeds, instead of waiting to start again (batch mode). |
offline | 0 | If set to 1, the program won't try to fetch any data from the internet, using cached versions instead. Useful for debugging. |
debug | 0 | If set to 1, logging becomes more verbose. Useful for debugging. |
workers | 10 | Number of threads to use simultaneusly. Bigger numbers should mean better performance, but with increased memory and CPU use. Dialup users may want to change this to 3, since more threads may also mean more timeout errors for them. |
can_pipe | 0 | Enables pipe:// URI support. When set to "1", the program will accept feed URLs that start with pipe:// - an arbitrary protocol name used to tell Newspipe that to get the feed data, it should run the program or command after the pipe:// and use its standard output as feed data. Example: <outline ... xmlURL="pipe://python.exe scrape_site.py" />tells Newspipe to run python.exe scrape_site.pyand capture its standard output as feed data. This allows you to use screen-scraping scripts to generate feed data from sites that have no RSS feeds, and since it makes Newspipe invoke external programs, it is disabled by default as a security measure. |
encoding | utf-8 | Determines the unicode encoding to use when composing the emails (you can find the list of possible values at this page) |
proxy | None | Determines the address and port of the proxy server to go trough, separated by a colon. For example: http://192.168.0.1:8080 |
threading | 0 | If set to "1", Newspipe will add the References and In-Reply-To headers to emails, to allow threading of the messages |
subject | None | The value of this parameter will be added at the begining of the subject of every email, so that they can be identified and filtered by mail programs that can't access the mail headers |
smtp_auth | 0 | If set to "1", Newspipe will use SMTP Authorization when connecting to the SMTP server |
smtp_user | None | If smtp_auth is set to "1", the value of this setting will be used as the username when connecting to the smtp server |
smtp_pass | None | If smtp_auth is set to "1", the value of this setting will be used as the password when connecting to the smtp server |
multipart | on | If set to on, Newspipe will include a plaintext version of item contents as well as an HTML version. Setting this to off sends only HTML and images, and might be desirable if you want to receive slightly smaller messages - or if you use a mailer/webmail that has trouble with alternative versions of the same message (Gmail, for instance, defaults to showing plaintext versions of messages). |
reverse | 0 | If set to "1", Newspipe sends the emails in reversed order |
send_method | SMTP | Determines the method that Newspipe will use to send the emails. Possible values:
|
procmail | None | Full path to a procmail script that will be invoqued for each email when SEND_METHOD is PROCMAIL or BOTH |
OPML Options
Since OPML is an hierarchical format, you can have multiple outline items nested inside each other. Some outline items will denote single feeds, and some will group related sets of feeds together.
Newspipe allows you to use any sort of layout you like, and lets you set feed options on a group basis or for a single feed:
Example opml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<opml version="1.1">
<head>
<title>test.opml</title>
<ownerName>Your name goes here</ownerName>
<ownerEmail>your.name@example.com</ownerEmail>
<ownerMobile>your.phone@example.com</ownerEmail>
</head>
<body>
<outline text="General">
<outline text="dive into mark"
description="A lot of effort went into making this effortless."
htmlUrl="http://diveintomark.org"
xmlUrl="http://diveintomark.org/xml/atom.xml" />
<outline text="The Tao of Mac"
description="mac.against.org"
htmlUrl="http://the.taoofmac.com/"
xmlUrl="http://the.taoofmac.com/space/RecentChanges?format=rss" />
</outline>
<outline text="1 Email per feed" digest="1">
<outline text="The Scobleizer Weblog"
description="Human Aggregator of geek life"
htmlUrl="http://radio.weblogs.com/"
xmlUrl="http://radio.weblogs.com/0001011/rss.xml" />
<outline text="Boing Boing Blog"
description="The Blog of Wonderful things"
htmlUrl="http://boingboing.net"
xmlUrl="http://boingboing.net/rss.xml" />
<outline text="1 email per feed, but without titles" titles="0">
<outline text="Scripting News"
description="All scripting, all the time, forever."
htmlUrl="http://www.scripting.com/"
xmlUrl="http://www.scripting.com/rss.xml" />
</outline>
</outline>
<outline text="Download the linked page" download_link="1">
<outline text="ongoing"
description="Ongoing fragmented essay by Tim Bray."
htmlUrl="http://www.tbray.org/ongoing/"
xmlUrl="http://www.tbray.org/ongoing/ongoing.rss" />
</outline>
<outline text="Plain text emails" textonly="1">
<outline text="Wired News"
description="Wired News"
htmlUrl="http://www.wired.com"
xmlUrl="http://www.wired.com/news/feeds/rss2/0,2610,,00.xml" />
</outline>
</body>
</opml>
Attribute | Default | Notes |
---|---|---|
text | None | REQUIRED - Title of the feed, to be used in the log files. |
htmlUrl | None | REQUIRED - URL of the site related to this feed. It's used in the "Home" link in the footer of each email. |
xmlUrl | None | REQUIRED - URL of the RSS/Atom feed |
digest | 0 | If set to "1", all the new or modified items in the feed will be grouped to a single email. |
titles | 1 | If set to "0", the digest email will have a more compact format, without titles, separator, author name and date-time of each item. This setting only works when digest="1" and is useful in feeds that don't have titles in their items, like scripting.com. |
download_link | 0 | If set to "1", Newspipe will download the webpage linked by each item, and use the html code of that page instead of the text of the item. This is useful to get full text entries of feeds that only have exerpts. Use with caution, the resulting emails can get really big when the linked webpage has too many images, or long text (like Slashdot's). |
diff | 1 | If set to "1", Newspipe will compare the text of each item with the previous version (if any) stored in the cache, and will highlight any additions and |
check_text | 1 | When this parameter is set to "0", Newspipe will not re-send items previously seen, even if the text of the item has changed. When set to "1", old items that have any change in their text will be treated as new items, and will be sent by email. |
delay | 60 | Number of minutes to wait between checks to this feed. |
textonly | 0 | If set to "1", the emails of this feed will be in plain-text format, instead of HTML. |
mobile | 0 | If set to "1" and the OPML file has a ownerMobile field, a text-only copy of the feed items will be sent to ownerMobile (useful for PDAs or MMS-enabled mobile phones) |
auth | None | Contains an optional username:password string for basic HTTP authentication (known to work with Gmail Atom feeds.) |
download_images | 1 | If set to "0", Newspipe won't download any images, and the emails will link to them externally. |
check_time | None | The value of this parameter should be a time range in the form "10:00 to 21:00". If present, Newspipe will only download and check this feed when the current time is inside the indicated time range. |
mobile_time | None | The value of this parameter should be a time range in the form "10:00 to 21:00", and only has effect when the mobile option is set to "1". If present, Newspipe will send the text email to the secondary address only when the current time is inside the indicated time range. For example, you can set this option to "8:00 - 21:00" to avoid receiving alerts in your phone during the night |
remove | None | The value in this parameter should be a regular expression. Any text that matches this regular expression in the feed will be removed (specially usefull to remove ads). For example, you can use this:
remove="<a .+? http://feeds.feedburner.com/ .+? iBag\?a= .+? </a>" to remove adds from Boing Boing (and probably from other "feedburner enhaced" feeds) |
owneremail | None | With this attribute you can override the OPML's OwnerEmail value. |
Any attribute that is unknown by Newspipe will be turned into a mail header of the form: X-Custom-NAME: VALUE, that you can use to filter or classify the emails in your mail client. For example, if you put something like this in your opml file:
<outline text="Name of the blog" importance="High" xmlUrl="http://example.com" />
All the emails produced for that blog will carry a mail header of the form:
X-Custom-importance: High
Also, you can have the same effect with something like this:
<outline text="Important Blogs" importance="High"> <outline text="First one" xmlUrl="http://example.com" ... /> <outline text="Second one" xmlUrl="http://example.com" ... /> <outline text="Third one" xmlUrl="http://example.com" ... /> </outline>
In this later case, the three items "inherit" the parent's attribute, and all of them will have the same extra mail header. This is a useful way to group feeds together when you want to filter or process them in some way with your mail client.
Mail Headers
Newspipe adds a number of mail headers to messages that you can use to filter or classify them into folders, and which also provide some useful information as to what it has done to parse the feed:
Header | Values & Meaning |
---|---|
X-Channel-Feed | feed URL (same as the xmlURL attribute in the OPML file) |
X-NewsPipe-Version | version information about Newspipe, Python and the platform |
X-Channel-Title | the feed title |
X-Channel-Description | the feed description |
List-Id | Title and xml url of the feed |
Content-Location | Url of the item |
The following headers will only be present if the .ini parameter threading is present and set to "1".
Header | Values & Meaning |
---|---|
In-Reply-To | Last Message-ID generated from this feed (to simulate e-mail threading) |
References | List of Message-IDs generated from this feed (to simulate e-mail threading) |
The following headers will only be present if the program is running in DEBUG mode.
Header | Values & Meaning |
---|---|
X-Item-Attributes | attributes parsed by Newspipe while building the message |
X-Item-Text-Key | feed attribute that contained the item text |
X-Newspipe-Version | program version number and revision | X-Item-Modified | Whether Newspipe detected changes to the feed item (set to "Unknown" for digest feeds) |
X-Item-Hash-Link | md5-encoded item link |
X-Item-Hash-Feed | md5-encoded feed URL (xmlUrl) |
X-Item-Hash-Subject | md5-encoded Subject |
X-Item-Hash | md5-encoded item URL |
X-Channel-X-Cache-Result | whether the data was downloaded ("Downloaded") or obtained from a pipe:// feed ("From process") |
Acknowledgements
- Rui Carmo, for the initial logo, writing this documentation, for his Linux and MacOS testing and a lot of suggestions and bug reports.
- Bruno David Rodrigues, for a number of MIME and UTF-8 fixes, e-mail threading headers and memory usage debugging.
- Mark Pilgrim, author of feedparser.py (download feedparser.py)
-
Aaron Swartz, author of html2text.py (download html2text.py)
Aaron has program similar to Newspipe called rss2email. I've never tried it myself, but you should give it a try if you are interested in this "RSS to Email thing". Oddly enough, I found out about it AFTER using html2text.py for several months in Rss2pop3 (the predecessor of Newspipe).
- Rupa Schomaker, for a couple of patches and comments.
- Lwiechec (real name unknown) for the SMTP authentication patch
- David Malkovsky for the Proxy support code
- Newspaper image from Free Tubes, used with permission.
Contact the Author
You can contact the author of Newspipe trough email at chiquito@gmail.com
Any suggestion, bug reports, or comments are most welcome.
References
Changelog
- v1.1.9 17-July-2005
- - Eliminated the option --send_immediate because of concurrency problems
- + New .ini parameter SEND_METHOD: possible values SMTP, PROCMAIL, BOTH
- + New .ini parameter PROCMAIL: path to a procmail script that will be invoqued for each email when SEND_METHOD is PROCMAIL or BOTH
- + New .ini parameter REVERSE: possible values 0 or 1. When set to 1, Newspipe sends the emails in reverse order.
- + Applied to patches from Rupa Schomaker:
- + Support overriding ownerEmail on a per-outline basis (with a new OPML parameter OWNEREMAIL)
- + Support including the link for an enclosure tag on each email
- * Fixed a bug related to accented chars in the HomeDir path
- * Fixed bug on feeds without description
- * Fixed bug on feeds with < and &rt; entities in the item text
- v1.1.8 06-April-2005
- New opml attribute "remove". The value should be a regular expression, and Newspipe will eliminate any text that matches that regular expression from the feeds text. Usefull to remove ads from feeds
- SMTP Authorization support(Thanks to lwiechec for this)
- New OPML head tags: fromEmail and fromName (Thanks to lwiechec for this too)
- Newspipe can now send the emails to multiple recipients, separating the email address es with ","
- v1.1.7 20-March-2005
- Now all the unknown attributes in the OPML file will be turned into X-Custom-... headers in the emails, for sorting and filtering purposes.
- v1.1.6 20-February-2005
- New opml parameter check_time that controls the time range when the feed will be checked (for example, "7am - 22:00")
- New opml parameter mobile_time that controls the time range when Newspipe will send the mobile versions of emails (for example, "16:30 - 10:30pm")
- New .ini parameter subject, can be used to prepend a string to the subject of every email
- v1.1.5 05-February-2005
- Added List-Id and Content-Location headers, as suggested in this feature request
- The References and In-Reply-To will only be present when the new threading parameter is set to 1 in the .ini file. It's been reported that they annoy some users :)
- Removed some debugging headers. They will only be included in the mails when the program is run in DEBUG mode.
- Try to use the feed author email as From address, when possible
- Fixed typo in parameters: immediate
- v1.1.4 26-December-2004
- Support for proxy servers, controlled with the new .ini file parameter: proxy (Thanks to David Malkovsky for this)
- New command line parameter: --inifile to control the location and name of the .ini file
- Now all the parameters in the .ini file can be overridden with command line parameters. (use the --help parameter to see a list of available options)
- v1.1.3 21-December-2004
- Fixed a bug resolving relative urls, introduced in v1.1.2
- v1.1.2 13-December-2004
- Fixed the stupid error when Newspipe.py wasn't run from the same folder it's installed in
- New OPML parameter: download_images. It defaults to "1". If set to "0", Newspipe won't download any images, and the emails will link to them externally
- v1.1.1 05-December-2004
- Added an option to set the text encoding of the emails, trough the .ini parameter "encoding" (default: utf-8)
- Added support for use of the module dummy_threading, for those installations where the "real" threading module is not available
- v1.1 17-October-2004
- Better handling of socket exceptions when downloading images
- Indicate platform name and python version in the UserAgent header
- Added Python version and platform information to X-Newspipe-Version header
- Added basic authentication support, based on http://jonasgalvez.com/blog/2004-10/gmail-atom-proxy
- Inline images now display correctly in Thunderbird (a single image only, non-digest messages)
- Fixed Thunderbird rendering of both single and digest feeds
- Added "multipart" setting to disable plaintext section (should help with webmails)
- Full encoding support - exaustivly tested with iso-8859-1, -15 and UTF-8 blogs
- Full Multipart support - works perfectly with Apple Mail, Thunderbird, Outlook Express, IMP (Horde) and Evolution. Now you can click on "choose alternative part" in your email client, and you can save the pictures with the original names
- Quick and dirty patch to relate posts with each others with Message-ID header - allows you to see each feed in a separate thread
- Fix for smtp timeouts - now it retries it later
- Fix for CTRL-C behaviour
- v1.0.3 05-September-2004
- Fixed exception when the remote server doesn't include a Date http header.
- Better handling of file expirations and purging.
- html2text.py: New version 2.21
- Fixed overlapping items in digest feeds (added br clear=all tag). (by Rui)
- Fixed image/121212121 MIME types (such as those from the Engadget feed) by using the Content-Type provided by the cache object (which in turn is the one provided by the web server we got the images from).(by Rui)
- Fixed the exception "TypeError: string indices must be integers" when sending text-only messages.
- Better handling of exceptions when purging the cache, due to misterious exceptions on the pickle module.
- When an item has multiple instances of a single image (like TaoofMac's) download only one file and use it as many times as needed.
- v1.0.1b 02-August-2004
- manual.html updated to reflect the changes in v1.0.1.
- v1.0.1 01-August-2004
- HTML to Text conversion generates only one footnote for each link/image
- When downloading images, retry up to 3 times on timeouts
- Store all log, cache and data files at $HOME/.newspipe when running on UNIX, or $APPDATA/.newspipe when running on Windows
- Bugfix 2002: Parsing error in http://food.against.org/index.rdf
- Added "sender" override
- Changed mail processing to allow for any sender/destination
- Added support for sending a "mobile" version to an alternate e-mail address:
- "mobile" attribute in OPML determines which feeds are copied
- ownerMobile value at top of OPML determines alternate e-mail address (point it to your MMS or mobile phone address)
- Refactoring of error handling in image download
- v1.0 21-July-2004
- Initial release.