How to use the command 'trdsql' (with examples)

How to use the command 'trdsql' (with examples)

trdsql is a powerful command-line tool that allows users to execute SQL queries on structured data files like CSV, LTSV, JSON, YAML, and TBLN. With its capability to directly manipulate and query data within these file types, it eliminates the need for a separate database management system, thereby streamlining data processing tasks. Users can easily perform complex data transformations, aggregations, and exports using standard SQL syntax.

Convert object data from multiple JSON files to a CSV file with header and double quotes

Code:

trdsql -ocsv -oh "SELECT * FROM path/to/file/*.json" | sed 's/\([^,]*\)/"&"/g' > path/to/file.csv

Motivation:
Converting JSON files into CSV format is a common task when dealing with data analytics and reporting. CSV files are more manageable in applications like spreadsheets or BI tools, making it easier for users to visualize or further analyze the data. This command helps automate this conversion, especially when dealing with multiple JSON files at once.

Explanation:

  • -ocsv: Specifies the output format as CSV. When working with data, CSV is widely supported and easily integrable with various software tools.
  • -oh: Indicates that the column headers from the original JSON should be included in the CSV output.
  • SELECT * FROM path/to/file/*.json: Applies SQL to select all data from every JSON file in the specified directory.
  • | sed 's/\([^,]*\)/"&"/g': Piped command to include double quotes around each field in the output, ensuring that all data is encapsulated properly within the CSV format.
  • > path/to/file.csv: Directs the processed output to a new CSV file.

Example output:
Suppose we have multiple JSON files with key-value pairs like {"name": "Alice", "age": "25"}, the resultant CSV would look like this:

"name","age"
"Alice","25"
"Bob","30"
...

Interpret JSON list as a table and put objects inside as columns

Code:

trdsql "SELECT * FROM path/to/file.json::.list"

Motivation:
Often, JSON data contains nested arrays or lists, making it challenging to visualize and analyze directly. By interpreting lists as tables, users can easily extract and work with nested content, flattening the data for analysis.

Explanation:

  • "SELECT * FROM path/to/file.json::.list": Here, :: is used to navigate to the specific JSON path inside the file to the key list and treat its contents as tabular data.

Example output:
If file.json contains [{"age":"26","name":"Tanaka"}], the output will resemble:

agename
26Tanaka

Manipulate complex SQL query with data from multiple CSV files

Code:

trdsql -icsv -ih "SELECT column1, column2 FROM path/to/file*.csv WHERE column2 != '' ORDER BY column1 GROUP BY column1"

Motivation:
Analyzing data from multiple CSV files together can lead to more comprehensive insights, especially when cleaning up data by removing unwanted rows or aggregating specific columns.

Explanation:

  • -icsv: Designates that the input data format is CSV.
  • -ih: Assumes the first line of each CSV file contains header information.
  • SELECT column1, column2 ...: Employs an SQL query to retrieve data from specified CSV columns, filter out empty column2 entries, and organize the results by column1 using ORDER BY and GROUP BY.

Example output:
Given CSV files containing sales records, this command might produce:

column1column2
Product A100
Product B200

Merge content of 2 CSV files to one CSV file

Code:

trdsql "SELECT column1, column2 FROM path/to/file1.csv UNION SELECT column1, column2 FROM path/to/file2.csv"

Motivation:
When handling data from different sources, it’s often useful to merge them into a single dataset for unified processing or reporting purposes. This example illustrates how to concatenate two CSV files vertically based on selected columns.

Explanation:

  • SELECT column1, column2 FROM path/to/file1.csv UNION SELECT column1, column2 FROM path/to/file2.csv: This unified query retrieves columns from both files and merges their data, de-duplicating entries.

Example output:
Combining two CSV files with unique records might result in:

column1column2
Alpha123
Beta456

Connect to PostgreSQL database

Code:

trdsql -driver postgres -dsn "host=hostname port=5433 dbname=database_name" "SELECT 1"

Motivation:
This command demonstrates how to connect trdsql to a PostgreSQL database. This is useful for executing trdsql operations on live database data as opposed to static files. Users in data engineering or development can benefit from this capability by integrating and comparing file-based data directly with database tables.

Explanation:

  • -driver postgres: Specifies PostgreSQL as the database driver for connection.
  • -dsn "host=hostname port=5433 dbname=database_name": Provides connection details to connect trdsql to the PostgreSQL database.
  • "SELECT 1": A simple query often used to test database connection; expecting a result of 1.

Example output:
On successful connection, the output of the command will be 1, indicating connectivity:

1

Create table data to MySQL database from CSV file

Code:

trdsql -driver mysql -dsn "username:password@hostname/database" -ih "CREATE TABLE table (column1 int, column2 varchar(20)) AS SELECT column3 AS column1, column2 FROM path/to/header_file.csv"

Motivation:
Automating table creation in a database from a CSV file is beneficial for data migration processes where raw data from files needs to be uploaded into database systems quickly.

Explanation:

  • -driver mysql: Specifies MySQL as the target database system.
  • -dsn "username:password@hostname/database": Contains the necessary credentials and connection details for MySQL.
  • -ih: Indicates that the input file’s first row contains headers.
  • CREATE TABLE table ...: An SQL command to construct a new table with the selected columns and types in the specified database, populated by data extracted from the CSV.

Example output:
Execution results in a new table within the MySQL database, possibly with rows like:

column1column2
101Sample Text

Show data from compressed log files

Code:

trdsql -iltsv "SELECT * FROM path/to/access.log.gz"

Motivation:
Compressed log files are common in maintaining space efficiency. By leveraging trdsql, users can query and analyze LTSV-formatted log files without having to decompress them first, aiding in rapid data diagnostics or monitoring tasks.

Explanation:

  • -iltsv: Specifies input format as LTSV (Labeled Tab-separated Values).
  • "SELECT * FROM path/to/access.log.gz": Queries and displays all contents of the specified compressed log file.

Example output:
If access.log.gz includes logs like path=/index, code=200, duration=0.1ms, it might produce:

pathcodeduration
/index2000.1ms

Conclusion:

The trdsql command is a flexible utility for SQL operations on various data types and formats. It bridges the gap between data stored in flat files and sophisticated SQL manipulations while also enabling transformations between data environments.

Related Posts

How to Use the Command 'tlmgr install' (with Examples)

How to Use the Command 'tlmgr install' (with Examples)

The tlmgr install command is an essential tool for managing packages in TeX Live, a distribution of TeX software.

Read More
Using the Command 'pstopnm' to Convert PostScript to PNM Images (with examples)

Using the Command 'pstopnm' to Convert PostScript to PNM Images (with examples)

The pstopnm command is a utility from the Netpbm library used to convert PostScript (.

Read More
How to Use the Command `pkgutil` (with Examples)

How to Use the Command `pkgutil` (with Examples)

pkgutil is a versatile command-line tool available in macOS that allows users to query and manipulate installer packages and receipts.

Read More