About unihan-etl¶
unihan-etl provides configurable, self-serve data exports of the About UNIHAN database.
Retrieval¶
unihan-etl will download and cache the raw database files for the user.
No encoding headaches¶
Dealing with unicode encodings can be cumbersome across platforms. unihan-etl deals with handling output encoding issues that could come up if you were to try to export the data yourself.
Python 2 and 3¶
Designed and tested to work across Python versions. View the travis test matrix for what this software is tested against.
Customizable output¶
“Structured” output¶
JSON, YAML, and python dict only
Support for structured output of information in fields. unihan-etl refers to this as expansion.
Users can opt-out via --no-expand
. This will preserve the values in
each field as they are in the raw database.
Filters out empty values by default, opt-out via --no-prune
.
Filtering¶
Support for filtering by fields and files.
To specify which fields to output, use -f
/ --fields
and separate
them in spaces. -f kDefinition kCantonese kHanyuPinyin
.
For files, -i
/ --input-files
. Example: -i
Unihan_DictionaryLikeData.txt Unihan_Readings.txt
.