How to append items to the CSV file without header row?

Scrapy Architecture Scrapy provides a few item exporters by default to export items in commonly used file formats like CSV/JSON/XML. I usually use CSV to export items, it is pretty convenient, and it comes in two ways: appending mode, for example, scrapy crawl foo -o test.csv overwriting mode with -O option, like scrapy crawl foo -O test.csv But in the appending mode, it's a bit annoying that it always appends the header row before the newly scraped items, which is not correctly in terms of CSV format. [Read More]

Writing a Python Script in Emacs in 45 Minutes!

Note: watch my live coding session of this article: Intro If you've heard some rumors of Emacs that it has a very steep learning curve (or that Emacs makes a computer slow), you may be too scared to look at it. It indeed has some learning curve (learning anything does have one), but it isn't very steep. I learned this after getting my hands dirty with Emacs a few years ago. [Read More]

Beautiful Soup 4 Cheatsheet

Beautiful Soup Detailed docs: the Beautiful Soup 4 Docs. Assume t is an object of Tag. Core concepts (classes) Tag, a Tag object corresponds to an XML or HTML tag. BeautifulSoup, the BeautifulSoup object represents the parsed document as a whole. You can treat it like a special Tag. It needs a parser to parse the document, a built-in parser is "html.parser", e.g. soup = BeautifulSoup("<html>a web page</html>", 'html.parser') NavigableString, a string corresponds to a bit of text (as you see it in the browser) within a tag. [Read More]

String Title Case in Clojure

These days I like to write scripts for some tasks in Python instead of shell. One important reason I think that's because Python is powerful at string manipulation. Recently I'm learning Clojure, and I'm trying to find similar ways in Clojure, one of them is s.title() for getting a title-cased version of a string. For example, >>> ' Hello world'.title() ' Hello World' How to do that in Clojure? To make the problem simple, let's assume that the input string only has letters and spaces, that is, [a-zA-Z ] in regex pattern. [Read More]

String Manipulation in Clojure

Python string APIs are powerful and concise, that is an important reason I use it to do a lot of scripting these days, join, split, strip, to name a few. Since I am learning Clojure recently, I am wondering, how is string manipulation like in Clojure and how to implement equivalent ones? I think it's an excellent opportunity to get familiar with Clojure. Before diving into the implementation, how to declare a multi-line string? [Read More]

An Online Python re.findall Service

As a programmer, I know that grep, sed and awk are powerful for processing text, but they sometimes aren't that straight-forward for specific tasks, as I need to think about how to filter the lines and the columns out. So I wonder if there is a handy way to do these tasks? After using it for a while, I think using regex directly can help, so I launched a re. [Read More]

How To Run Bleeding-edge Qtile Within a Virtualenv

For having been using GNOME for quite a long time, I was considering trying some tiling window managers to see what it's like a few weeks ago. Along the way, I found a nice window manager written in Python: Qtile, what interests me most is that it's a hackable window manager, which makes it flexible to extend or change its behaviors. Well, switching to use a tiling window manager is far simpler than I thought. [Read More]

Generating org-mode Outlines for wikiHow Articles

Recently I found some great articles on wikiHow, then I want to keep notes of them in org-mode files. At first, I manually copied the ToC of articles, but soon I found it's tedious and takes a lot of time. Today I wrote a requests-based Python script to help me extract the ToCs (Table of Content) into org-mode outlines. It takes two arguments, the first one is the URL, the second one is the containing heading's level for the generated ToC in org-mode. [Read More]

ER Diagrams in Plain Text

If you ever wonder how to plot ER diagrams in plain text, you may have already heard of erd. It's a cool command line program written by Andrew Gallant in Haskell, to "compile" plain text files into nicely looking images, leveraging the power of GraphViz. I've used erd for some time, it's cool and the syntax is quite simple. It's also quite simple to install it on Linux, just install GraphViz and erd itself, by following the instructions in the README page. [Read More]