Decoding principle of taxation files - Paul Courbis



Please read this...

You'll fin below this text a semi-automated translation of the original website. Texts will be gradually reviewed to make them as accurate as possible but it's strongly recommanded to read the original page (preview on the right side of this text) :

Link to the original article

(if this link is broken, please go to the original website's root page and look for wanted data. Sorry about that).

Some parts of this website will never be translated to English. Part of them are user's comments that are not transfered from the French (original) version to this version. Obvously, you can leave your own comments here but they wont be transfered to the French version.

You can send comments or suggestions to the webmaster.

Paul Courbis

Decoding principle of taxation files

Thursday 4 January 2007, by Paul Courbis

We come the files to recover is stored in a binary format. You will find below how to transform them into a readable version and explotation ...

Tax files are indeed compressed files (with the Unix compress calssique) containing records of type text whose decomposition is given in their first line. For example:

 # TicketVersion, 1.5 L, CalledNumber, 6.31, L, ChargedNumber, 32.51, L ,....


- ’#’ Begins a line description format - the first field is called "Ticket version" - it starts in column 1 and ends in column 5 - the type is L (the types are ’L’ text field , ’N’ text field, and ’R’, integer) - the second field is "CalledNumber" - etc. ...

This definition can change from one software version to another version that is also consistently reiterated in the "TicketVersion. In the version I used for my tests (4.8, 4.9 and 5.1), the fields are:

TABLE MISSING A note from one version to another type and size of fields may change!

Our import procedure will operate as follows:

1 / We will create a blank database that will serve as a container for data

2 / Then we’ll create in the database tables necessary for the proper functioning of the application:

- A temporary table "spooledfiles" which will contain the list of available files (which will load only the new files from the spool directory);

- A meta table (meta_columndesc describing the data table (from one version to another format is succeptible to change, so we will keep a list of documents to determine if the format already exists and if it does not If we dynamically adjust to a new table);

- An early taxdata table containing the minimum information necessary to operate the system: a unique identifier of the ticket and the identifier of the file that contained it.

Nb: possible changes in the system, it should manage the concepts of profile (which can do what), type of analysis with a predefined performance parameters, etc. ... For now our goal is simply to load these files from taxation in an RDBMS, then provide some examples of treatments in pure SQL. The interface clickodromesque come later ;-). Thus be created:

- Tables describing their treatment and trigger events;

- Related tables storing results of said treatment.

3 / The import procedure will calculate the list of files to import and then load them if necessary, adapting the storage table (addition / conversion of columns).

Note: to enable a more efficient data will be reworked as follows:

- The "leading spaces" and "trailing spaces" are deleted;

- Dates (StartDateTime and endDateTime) will be converted to timestamp.

Read more ...

Comment on this article