# AIP - Specification

After passing the ingest workflow of the ContentBroker, a SIP gets converted
to a new structure which contains the original data from the SIP as well as
data automatically generated by the system. 
The whole content which means the user generated content as well as the content
generated by the user gets repacked into a single container and then put
onto long term archival storage media. A container which corresponds to the AIP
in OAIS terms could look like this, for example:

    [oid].pack_1.tar
        [oid].pack_1/
        	bag-info.txt
        	bagit.txt
        	manifest-md5.txt
        	tagmanifest-md5.txt
        	data/
        		2014_01_03+12_15_12+a/premis.xml
        		2014_01_03+12_15_12+a/abc.jpg
        		2014_01_03+12_15_12+a/cde.tif
        		2014_01_03+12_15_12+a/subfolder/fgh.jpg
        		2014_01_03+12_15_12+b/premis.xml
        		2014_01_03+12_15_12+b/abc.tif
        		2014_01_03+12_15_12+b/subfolder/fgh.tif
        		
Here one has to note several details:

#### oid, package name, bagit, tar

#### Representations -- Restructuring contents with representations

In each package there are
exactly two representations. This also applies to delta packages which belong logically to the same object but
are physically separeted for now. The more complex case when different packages are consolidated is topic of
another paragraph in this document (merge deltas). Both representations start with an encoding of the actual
date and time which allows the system to allow alphabetical ordering to facilitate working with them (see also
the packages in work area section TODO). In addition to that the first representation then is always 
suffixed with +a, while the second one is always suffixed with +b. Hence they are also called the a-representation
and b-representation or a- and b-rep for short. 

The a-rep always contains the original contents delivered by
the user. This means all of the contents of the data folder of the original SIP are simply moved to the a-rep, of course
even preserving their original hierarchical ordering in subfolders, if existent in the original data set.
No modifications are made to any of the original files. This ensures you always can come back to unaltered contents
in case any of the modifications made while processing the data turned out to be destructive in any way. 

The b-rep then contains modified versions of the original files. In certain cases (TODO - rdf) it even can contain
new entirely system generated files.

Here an explanation for each of the newly generated files:

        		2014_01_03+12_15_12+b/premis.xml
        		
A premis file gets regenerated every time a package passes one of the workflows of the ContentBroker which lead
to a new AIP. This means every time an alteration (more correct: addition) to the material destined for long term
preservation gets made. It contains the events explaining the changes and the object history which the systems
collects from different resources (file system, database, original premis files).
        		
        		2014_01_03+12_15_12+b/abc.tif
        		2014_01_03+12_15_12+b/subfolder/fgh.tif 
        		
These two are files created by the format module of the ContentBroker. For each of the jpg files of the a-rep the
ContentBroker has done an ImageMagick conversion to create copies of them in the TIFF-format. Note that the hierarchy
of the files gets preserved.
        		
But wait, for one original file there is no new version in the b-rep. We are speaking of
        		
        		2014_01_03+12_15_12+a/cde.tif
        		
Here the ContentBroker decided not to do any conversions to create a new datastream out of the original file. This
is according to the rules and policies the system is configured with. Since TIFF is considered long term preservation
ready, a conversion is not necessary.

But this leads to an important point when speaking of representations. A representation has not necessarily to be complete
in the sense that every file has a successor in the b-rep. To understand the effects of this fact see dip_specification 
* [DIP Specification](./specification_dip.md)
For the effects this might have on your already existing metadata, see *[Metadata Specification](./specification_metadata.de.md)


#### premis.xml

#### preserved folder structure        		
        		


### AIP with additions (Deltas)

    [oid].pack_2.tar
        [oid].pack_2/
        	bag-info.txt
        	bagit.txt
        	manifest-md5.txt
        	tagmanifest-md5.txt
        	data/
        		2073_10_10+15_20_37+a/premis.xml
        		2073_10_10+15_20_37+a/abc.[fmt_a]
        		2073_10_10+15_20_37+a/cde.[fmt_b]
        		2073_10_10+15_20_37+a/subfolder/fgh.[fmt_c]
        		2073_10_10+15_20_37+b/premis.xml
        		2073_10_10+15_20_37+b/abc.[fmt_c]
        		2073_10_10+15_20_37+b/subfolder/fgh.[fmt_c]

### Packages in WorkArea



### Merge Deltas

Though not yet implemented, there are plans to let the system automatically
consolidate package contents to a new package in certain intervals so that
all the data constituting an object will come together at a single place on
the storage media.

    [oid].pack_3.tar
        [oid].pack_3/
        	bag-info.txt
        	bagit.txt
        	manifest-md5.txt
        	tagmanifest-md5.txt
        	data/
        		2014_01_03+12_15_13+a/premis.xml
        		2014_01_03+12_15_13+a/abc.[fmt_a]
        		2014_01_03+12_15_13+a/cde.[fmt_b]
        		2014_01_03+12_15_13+a/subfolder/fgh.[fmt_c]
        		2014_01_03+12_15_13+b/premis.xml
        		2014_01_03+12_15_13+b/abc.[fmt_c]
        		2014_01_03+12_15_13+b/subfolder/fgh.[fmt_c]
        		2073_10_10+15_20_13+a/premis.xml
        		2073_10_10+15_20_13+a/abc.[fmt_a]
        		2073_10_10+15_20_13+a/cde.[fmt_b]
        		2073_10_10+15_20_13+a/subfolder/fgh.[fmt_c]
        		2073_10_10+15_20_13+b/premis.xml
        		2073_10_10+15_20_13+b/abc.[fmt_c]
        		2073_10_10+15_20_13+b/subfolder/fgh.[fmt_c]
        		
### State of AIP

DNSCore introduces serveral states of objects which are represented by numeric codes and visually shown in DA-WEB User interface.

1. archived / valid
2. invalid (during initial creation)
3. working state (recieving delta or doing an integrity check)


        		


        	
