How to use the command 'trdsql' (with examples)
trdsql
is a powerful command-line tool that allows users to execute SQL queries on structured data files like CSV, LTSV, JSON, YAML, and TBLN. With its capability to directly manipulate and query data within these file types, it eliminates the need for a separate database management system, thereby streamlining data processing tasks. Users can easily perform complex data transformations, aggregations, and exports using standard SQL syntax.
Convert object data from multiple JSON files to a CSV file with header and double quotes
Code:
trdsql -ocsv -oh "SELECT * FROM path/to/file/*.json" | sed 's/\([^,]*\)/"&"/g' > path/to/file.csv
Motivation:
Converting JSON files into CSV format is a common task when dealing with data analytics and reporting. CSV files are more manageable in applications like spreadsheets or BI tools, making it easier for users to visualize or further analyze the data. This command helps automate this conversion, especially when dealing with multiple JSON files at once.
Explanation:
-ocsv
: Specifies the output format as CSV. When working with data, CSV is widely supported and easily integrable with various software tools.-oh
: Indicates that the column headers from the original JSON should be included in the CSV output.SELECT * FROM path/to/file/*.json
: Applies SQL to select all data from every JSON file in the specified directory.| sed 's/\([^,]*\)/"&"/g'
: Piped command to include double quotes around each field in the output, ensuring that all data is encapsulated properly within the CSV format.> path/to/file.csv
: Directs the processed output to a new CSV file.
Example output:
Suppose we have multiple JSON files with key-value pairs like {"name": "Alice", "age": "25"}
, the resultant CSV would look like this:
"name","age"
"Alice","25"
"Bob","30"
...
Interpret JSON list as a table and put objects inside as columns
Code:
trdsql "SELECT * FROM path/to/file.json::.list"
Motivation:
Often, JSON data contains nested arrays or lists, making it challenging to visualize and analyze directly. By interpreting lists as tables, users can easily extract and work with nested content, flattening the data for analysis.
Explanation:
"SELECT * FROM path/to/file.json::.list"
: Here,::
is used to navigate to the specific JSON path inside the file to the keylist
and treat its contents as tabular data.
Example output:
If file.json
contains [{"age":"26","name":"Tanaka"}]
, the output will resemble:
age | name |
---|---|
26 | Tanaka |
Manipulate complex SQL query with data from multiple CSV files
Code:
trdsql -icsv -ih "SELECT column1, column2 FROM path/to/file*.csv WHERE column2 != '' ORDER BY column1 GROUP BY column1"
Motivation:
Analyzing data from multiple CSV files together can lead to more comprehensive insights, especially when cleaning up data by removing unwanted rows or aggregating specific columns.
Explanation:
-icsv
: Designates that the input data format is CSV.-ih
: Assumes the first line of each CSV file contains header information.SELECT column1, column2 ...
: Employs an SQL query to retrieve data from specified CSV columns, filter out emptycolumn2
entries, and organize the results bycolumn1
usingORDER BY
andGROUP BY
.
Example output:
Given CSV files containing sales records, this command might produce:
column1 | column2 |
---|---|
Product A | 100 |
Product B | 200 |
Merge content of 2 CSV files to one CSV file
Code:
trdsql "SELECT column1, column2 FROM path/to/file1.csv UNION SELECT column1, column2 FROM path/to/file2.csv"
Motivation:
When handling data from different sources, it’s often useful to merge them into a single dataset for unified processing or reporting purposes. This example illustrates how to concatenate two CSV files vertically based on selected columns.
Explanation:
SELECT column1, column2 FROM path/to/file1.csv UNION SELECT column1, column2 FROM path/to/file2.csv
: This unified query retrieves columns from both files and merges their data, de-duplicating entries.
Example output:
Combining two CSV files with unique records might result in:
column1 | column2 |
---|---|
Alpha | 123 |
Beta | 456 |
Connect to PostgreSQL database
Code:
trdsql -driver postgres -dsn "host=hostname port=5433 dbname=database_name" "SELECT 1"
Motivation:
This command demonstrates how to connect trdsql
to a PostgreSQL database. This is useful for executing trdsql
operations on live database data as opposed to static files. Users in data engineering or development can benefit from this capability by integrating and comparing file-based data directly with database tables.
Explanation:
-driver postgres
: Specifies PostgreSQL as the database driver for connection.-dsn "host=hostname port=5433 dbname=database_name"
: Provides connection details to connecttrdsql
to the PostgreSQL database."SELECT 1"
: A simple query often used to test database connection; expecting a result of1
.
Example output:
On successful connection, the output of the command will be 1
, indicating connectivity:
1
Create table data to MySQL database from CSV file
Code:
trdsql -driver mysql -dsn "username:password@hostname/database" -ih "CREATE TABLE table (column1 int, column2 varchar(20)) AS SELECT column3 AS column1, column2 FROM path/to/header_file.csv"
Motivation:
Automating table creation in a database from a CSV file is beneficial for data migration processes where raw data from files needs to be uploaded into database systems quickly.
Explanation:
-driver mysql
: Specifies MySQL as the target database system.-dsn "username:password@hostname/database"
: Contains the necessary credentials and connection details for MySQL.-ih
: Indicates that the input file’s first row contains headers.CREATE TABLE table ...
: An SQL command to construct a new table with the selected columns and types in the specified database, populated by data extracted from the CSV.
Example output:
Execution results in a new table within the MySQL database, possibly with rows like:
column1 | column2 |
---|---|
101 | Sample Text |
Show data from compressed log files
Code:
trdsql -iltsv "SELECT * FROM path/to/access.log.gz"
Motivation:
Compressed log files are common in maintaining space efficiency. By leveraging trdsql
, users can query and analyze LTSV-formatted log files without having to decompress them first, aiding in rapid data diagnostics or monitoring tasks.
Explanation:
-iltsv
: Specifies input format as LTSV (Labeled Tab-separated Values)."SELECT * FROM path/to/access.log.gz"
: Queries and displays all contents of the specified compressed log file.
Example output:
If access.log.gz
includes logs like path=/index, code=200, duration=0.1ms
, it might produce:
path | code | duration |
---|---|---|
/index | 200 | 0.1ms |
Conclusion:
The trdsql
command is a flexible utility for SQL operations on various data types and formats. It bridges the gap between data stored in flat files and sophisticated SQL manipulations while also enabling transformations between data environments.