Tax files are indeed compressed files (with the Unix compress calssique) containing records of type text whose decomposition is given in their first line. For example:
# TicketVersion, 1.5 L, CalledNumber, 6.31, L, ChargedNumber, 32.51, L ,....
- ’#’ Begins a line description format - the first field is called "Ticket version" - it starts in column 1 and ends in column 5 - the type is L (the types are ’L’ text field , ’N’ text field, and ’R’, integer) - the second field is "CalledNumber" - etc. ...
This definition can change from one software version to another version that is also consistently reiterated in the "TicketVersion. In the version I used for my tests (4.8, 4.9 and 5.1), the fields are:
TABLE MISSING A note from one version to another type and size of fields may change!
Our import procedure will operate as follows:
1 / We will create a blank database that will serve as a container for data
2 / Then we’ll create in the database tables necessary for the proper functioning of the application:
- A temporary table "spooledfiles" which will contain the list of available files (which will load only the new files from the spool directory);
- A meta table (meta_columndesc describing the data table (from one version to another format is succeptible to change, so we will keep a list of documents to determine if the format already exists and if it does not If we dynamically adjust to a new table);
- An early taxdata table containing the minimum information necessary to operate the system: a unique identifier of the ticket and the identifier of the file that contained it.
Nb: possible changes in the system, it should manage the concepts of profile (which can do what), type of analysis with a predefined performance parameters, etc. ... For now our goal is simply to load these files from taxation in an RDBMS, then provide some examples of treatments in pure SQL. The interface clickodromesque come later ;-). Thus be created:
- Tables describing their treatment and trigger events;
- Related tables storing results of said treatment.
3 / The import procedure will calculate the list of files to import and then load them if necessary, adapting the storage table (addition / conversion of columns).
Note: to enable a more efficient data will be reworked as follows:
- The "leading spaces" and "trailing spaces" are deleted;
- Dates (StartDateTime and endDateTime) will be converted to timestamp.