]> gerrit.simantics Code Review - simantics/platform.git/blob - docs/Developer/Ontology/XMLSchemaConversion.md
Mapped dev-wiki conversion situation for situational awareness
[simantics/platform.git] / docs / Developer / Ontology / XMLSchemaConversion.md
1 # Simantics XML-Schema Conversion version 0.1
2
3 The bundles that implement this functionality are:
4 * org.simantics.xml.sax.base
5 * org.simantics.xml.sax
6 * org.simantics.xml.sax.feature
7 * org.simantics.xml.sax.ontology
8 * org.simantics.xml.sax.ui
9
10 They are located in the [simantics/interop](https://gitlab.simantics.org/simantics/interop) project.
11
12 ## Summary
13
14 Simantics XML-Schema conversion creates:
15 * Simantics ontology as .pgraph file
16 * Java classes for SAX based parser
17
18 Schema conversion does not support:
19 * XML file export
20 * Many of the XML schema definitions
21 * Group definitions
22 * Attribute constraining facets
23
24 ## Notes
25
26 This work was done in PDS Integration project in co-operation with VTT and Fortum. Schema conversion was used for converting Proteus 3.6.0 XML Schema to Simantics ontology. Due to limited scope of the schema, the converter supports only limited part of the XML Schema definitions.
27
28 # Ontology definitions based on XML schema concepts
29
30 XML Schema conversion creates types and relations based on XML schema concepts that are used in the conversion
31
32 | Hard-coded ontology definition               | Notes                                                                                                                                                                                                                                  |
33 |----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
34 | hasAttribute <R L0.HasPropertyBase           | relation for all element/attribute relations                                                                                                                                                                                           |
35 | hasID <R hasAttribute                        | Base relation for IDs (Attributes with xsd:ID type)                                                                                                                                                                                    |
36 | ComplexType <T L0.Entity                     | Base type for ComplexTypes                                                                                                                                                                                                             |
37 | hasComplexType <R L0.IsComposedOf            | Base relation for containing elements that inherit the specified ComplexType                                                                                                                                                           |
38 | AttributeGroup <T L0.Entity                  | Base type for AttributeGroups                                                                                                                                                                                                          |
39 | Element <T L0.Entity                         | Base type for Elements                                                                                                                                                                                                                 |
40 | hasElement <R L0.IsComposedOf                | Base relation for containing elements                                                                                                                                                                                                  |
41 | ElementList <T L0.List                       | Base type for Lists containing Elements (storing the order of the elements)                                                                                                                                                            |
42 | hasElementList <R L0.IsComposedOf            | Base relation for element containing element lists. Used for creating Element type specific lists.                                                                                                                                     |
43 | hasOriginalElementList <R hasElementList     | Relation for element containing element lists. Stores the order of the all the child elements.                                                                                                                                         |
44 | hasReference <R L0.IsRelatedTo               | Base relation for object references (converted ID references)                                                                                                                                                                          |
45 | hasExternalReference <R L0.IsWeaklyRelatedTo | Relation for references between data imported from different files. Note: external references must be created with post process functions, since schema conversion itself is not able to resolve references between different imports. |
46
47 # Datatypes
48
49 XML uses three types of attributes, Atomic, List, and Union. Current XML Schema conversion support only Atomic attributes.
50
51 | XML datatype | Simantics     | Notes |
52 |--------------|---------------|-------|
53 | Atomic       | Supported     |       |
54 | List         | Not Supported |       |
55 | Union        | Not supported |       |
56
57 Primitive attributes are converted to Layer0 literals. List of primitive datatypes and respective literal types is:
58
59 | XML datatype | Simantics  | Notes |
60 |--------------|------------|-------|
61 | string       | L0.String  |       |
62 | boolean      | L0.Boolean |       |
63 | decimal      | L0.Double  |       |
64 | float        | L0.Float   |       |
65 | double       | L0.Double  |       |
66 | duration     |            |       |
67 | dateTime     |            |       |
68 | time         | L0.String  |       |
69 | date         | L0.String  |       |
70 | gYearMonth   |            |       |
71 | gYear        |            |       |
72 | gMonthDay    |            |       |
73 | gDay         |            |       |
74 | gMonth       |            |       |
75 | hexBinary    |            |       |
76 | base64Binary |            |       |
77 | anyUri       | L0.Uri     |       |
78 | QName        |            |       |
79 | NOTATION     |            |       |
80
81 Other built-in datatypes are converted to Layer0 literal types as well:
82
83 | XML datatype       | Simantics  | Notes                                                                                                          |
84 |--------------------|------------|----------------------------------------------------------------------------------------------------------------|
85 | normalizedString   | L0.String  |                                                                                                                |
86 | token              | L0.String  |                                                                                                                |
87 | language           |            |                                                                                                                |
88 | NMTOKEN            | L0.String  |                                                                                                                |
89 | Name               |            |                                                                                                                |
90 | NCName             |            |                                                                                                                |
91 | ID                 | L0.String  | ID attributes use XML.hasID property relation. An element is expected to have only one attribute with ID type. |
92 | IDREF              | L0.String  |                                                                                                                |
93 | IDREFS             |            |                                                                                                                |
94 | ENTITY             |            |                                                                                                                |
95 | ENTITIES           |            |                                                                                                                |
96 | integer            | L0.Integer |                                                                                                                |
97 | nonPositiveInteger | L0.Integer |                                                                                                                |
98 | negativeInteger    | L0.Integer |                                                                                                                |
99 | long               | L0.Long    |                                                                                                                |
100 | int                | L0.Integer |                                                                                                                |
101 | short              | L0.Integer |                                                                                                                |
102 | byte               | L0.Byte    |                                                                                                                |
103 | nonNegativeInteger | L0.Integer |                                                                                                                |
104 | unsignedLong       | L0.Long    |                                                                                                                |
105 | unsignedShort      | L0.Integer |                                                                                                                |
106 | unsignedByte       | L0.Byte    |                                                                                                                |
107 | positiveInteger    | L0.Integer |                                                                                                                |
108 | yearMonthDuration  |            |                                                                                                                |
109 | dayTimeDuration    |            |                                                                                                                |
110 | dateTimeStamp      |            |                                                                                                                |
111
112 XML schema allows defining new attribute types with constraining facets. Constraining facets are not currently supported.
113
114 | XML Constraining facets | Simantics | Notes |
115 |-------------------------|-----------|-------|
116 | length                  |           |       |
117 | minLength               |           |       |
118 | maxLength               |           |       |
119 | pattern                 |           |       |
120 | enumeration             |           |       |
121 | whitespace              |           |       |
122 | maxInclusive            |           |       |
123 | maxExclusive            |           |       |
124 | minInclusive            |           |       |
125 | minExclusive            |           |       |
126 | totalDigits             |           |       |
127 | fractionDigits          |           |       |
128 | Assertions              |           |       |
129 | explicitTimeZone        |           |       |
130
131 In addition, individual attributes can be converted to a single array with Attribute Composition rule. Supported array datatypes are: 
132
133 | Conversion configuration  | Simantics      |
134 |---------------------------|----------------|
135 | doubleArray               | L0.DoubleArray |
136 | stringArray               | L0.StringArray |
137
138 # Structures
139
140 ## Type definitions
141
142 ### SimpleType
143
144 XML schema allows SimpleTypes to be used as Element types for elements without child elements or as attribute types.
145  
146 When simpleType is used as attributes, the type will be converted to functional property relation:
147
148 ```xml
149 <xsd:simpleType name="LengthUnitsType">
150   <xsd:restriction base="xsd:NMTOKEN">
151     <xsd:enumeration value="mm"/>
152     \85                   
153   </xsd:restriction>
154 </xsd:simpleType>
155
156 <xsd:element name="UnitsOfMeasure">
157   <xsd:annotation>
158     <xsd:documentation>These are from \85</xsd:documentation>
159   </xsd:annotation>
160   <xsd:complexType>
161     <xsd:attribute name="Distance" type="LengthUnitsType" default="Millimetre">
162     </xsd:attribute>
163 ```
164
165 ```
166 PRO.hasLengthUnitsType <R PRO.XML.hasAttribute : L0.FunctionalRelation
167    --> L0.String
168
169 PRO.hasUnitsOfMeasure <R PRO.XML.hasElement
170 PRO.hasUnitsOfMeasureList <R PRO.XML.hasElementList
171 PRO.UnitsOfMeasure <T PRO.XML.Element
172 PRO.UnitsOfMeasure.hasDistance <R PRO.XML.hasAttribute: L0.FunctionalRelation
173    <R PRO.hasLengthUnitsType
174 ```
175
176 When simpleType is used as definition of Element, the definition is converted to inheritance from the base literal type. In the following example, Knot elements xsd:double base is converted to inheritance to L0.Double:
177
178 ```xml
179 <xsd:element name="Knot" maxOccurs="unbounded">
180   <xsd:simpleType>
181     <xsd:restriction base="xsd:double">
182       <xsd:minInclusive value="0.0"/>
183    </xsd:restriction>
184  </xsd:simpleType>
185 </xsd:element>
186 ```
187
188 ```
189 PRO.ComplexTypes.hasKnots.Knot <R PRO.XML.hasElement
190 PRO.ComplexTypes.hasKnots.KnotList <R PRO.XML.hasElementList
191 PRO.ComplexTypes.Knots.Knot <T PRO.XML.Element <T L0.Double
192 ```
193
194 ### ComplexType
195
196 Schema conversion creates hard-coded ComplexType entity as a base type for ComplexTypes.
197
198 ```
199 PRO.XML.ComplexType <T L0.Entity
200 ```
201
202 ComplexTypes that are defined in the input schema are converted to L0.Entities, which inherit the hard-coded ComplexType, and are put into \93ComplexTypes\94 library.  Conversion also generates ComplexType specific generic relation for composition, and another relation for lists.
203
204 Particles of a ComplexType are converted to ComplexType and particle specific relations inheriting the particle type related relation.  Also, Attributes of the ComplexType are converted to the ComplexType and Attribute specific relations.
205
206 For example, ComplexType \93PlantItem\94 is converted to \93ComplexTypes.PlantItem\94 entity, it has a \93ComplexTypes.hasPlantItem\94 composition relation, and \93ComplexTypes.hasPlantItemList\94 relation for lists. \93ID\94 attribute is converted to \93ComplexTypes.PlantItem.hasID\94 functional relation, and choice particle \93Presentation\94 is converted to \93ComplexTypes.PlantItem.hasPresentation\94 relation.
207
208 ```xml
209 <xsd:complexType name="PlantItem">
210   <xsd:choice minOccurs="0" maxOccurs="unbounded">
211     <xsd:element ref="Presentation"/>
212     <xsd:element ref="Extent"/>
213     \85
214     <xsd:element name="ModelNumber" type="xsd:string"/>
215     \85
216   </xsd:choice>
217   <xsd:attribute name="ID" type="xsd:ID" use="required"/>
218   <xsd:attribute name="TagName" type="xsd:string"/>
219   \85
220   <xsd:attribute name="ComponentType">
221    <xsd:simpleType>
222     <xsd:restriction base="xsd:NMTOKEN">
223       <xsd:enumeration value="Normal"/>
224       <xsd:enumeration value="Explicit"/>
225       <xsd:enumeration value="Parametric"/>
226     </xsd:restriction>
227    </xsd:simpleType>
228   </xsd:attribute>
229   \85
230 </xsd:complexType>
231 ```
232
233 ```
234 PRO.ComplexTypes.PlantItem <T PRO.XML.ComplexType
235 PRO.ComplexTypes.hasPlantItem <R PRO.XML.hasComplexType
236 PRO.ComplexTypes.hasPlantItemList <R PRO.XML.hasElementList
237    --> PRO.ComplexTypes.PlantItem
238 PRO.ComplexTypes.PlantItem.hasPresentation <R PRO.hasPresentation
239    --> PRO.Presentation
240 PRO.ComplexTypes.PlantItem.hasExtent <R PRO.hasExtent
241    --> PRO.Extent
242 \85
243 PRO.ComplexTypes.PlantItem.hasID <R PRO.XML.hasAttribute: L0.FunctionalRelation
244    --> L0.String
245 \85
246 PRO.ComplexTypes.PlantItem.hasComponentType <R PRO.XML.hasAttribute: L0.FunctionalRelation
247    --> L0.String
248 ```
249
250 ## Element
251
252 Element definitions are processed similarly to ComplexTypes, but the converted types are put directly into the ontology without any library. Hence, Element \93PlantModel\94 is converted to \93PlantModel\94 entity.
253
254 ```xml
255 <xsd:element name="PlantModel">
256   <xsd:complexType>
257     <xsd:sequence>
258       <xsd:element ref="PlantInformation"/>
259       <xsd:element ref="Extent"/>
260       <xsd:any namespace="##targetNamespace" maxOccurs="unbounded"/>
261     </xsd:sequence>
262   </xsd:complexType>
263 </xsd:element>
264 ```
265
266 ```
267 PRO.hasPlantModel <R PRO.XML.hasElement
268 PRO.hasPlantModelList <R PRO.XML.hasElementList
269 PRO.PlantModel <T PRO.XML.Element
270 PRO.PlantModel.hasPlantInformation <R PRO.hasPlantInformation
271    --> PRO.PlantInformation
272 PRO.PlantModel.hasExtent <R PRO.hasExtent
273    --> PRO.Extent
274 ```
275
276 When Element definition is defined with ComplexContent, ComplexContent\92s extension\92s base is converted to L0.Inheritance relation between the types. For example \93Equpiment\94 Element has \93PlantItem\94 as a base extension, so \93Equipment\94 entity is inherited from \93PlantItem\94 entity.
277
278 ```xml
279 <xsd:element name="Equipment">
280   <xsd:complexType>
281     <xsd:complexContent>
282       <xsd:extension base="PlantItem">
283         <xsd:choice minOccurs="0" maxOccurs="unbounded">
284           <xsd:element ref="Discipline" minOccurs="0"/>
285           <xsd:element ref="MinimumDesignPressure"/>
286           \85
287           <xsd:element ref="Equipment"/>
288           \85
289         </xsd:choice>
290         <xsd:attribute name="ProcessArea" type="xsd:string"/>
291         <xsd:attribute name="Purpose" type="xsd:string"/>
292       </xsd:extension>
293     </xsd:complexContent>
294   </xsd:complexType>
295 </xsd:element>
296 ```
297
298 ```
299 PRO.hasEquipment <R PRO.XML.hasElement
300 PRO.hasEquipmentList <R PRO.XML.hasElementList
301 PRO.Equipment <T PRO.XML.Element <T PRO.PlantItem
302 PRO.Equipment.hasProcessArea <R PRO.XML.hasAttribute: L0.FunctionalRelation
303    --> L0.String
304 PRO.Equipment.hasPurpose <R PRO.XML.hasAttribute: L0.FunctionalRelation
305    --> L0.String
306 PRO.Equipment.hasDiscipline <R PRO.hasDiscipline
307    --> PRO.Discipline
308 PRO.Equipment.hasMinimumDesignPressure <R PRO.hasMinimumDesignPressure
309    --> PRO.MinimumDesignPressure
310 \85
311 PRO.Equipment.hasEquipment <R PRO.hasEquipment
312    --> PRO.Equipment
313 \85
314 ```
315
316 ## Indicators (choice, sequence, all)
317
318 When Indicators have maxOccurs larger than 1, relations generated according to particles have no multiplicity restrictions (ass all previous examples are defined).  When indicator is choice with maxOccurs=1 (default value for maxOccurs), the particle relations is expected to refer to only one object that conforms to one of the specified types.
319
320 For example, Element \93TrimmedCurve\94 has choice indicator with 4 elements (\93Circle\94\93PCircle\94\93Ellipse\94\93PEllipse), and that choice is converted to \93TrimmedCurve.hasCircleOrPCircleOrEllipseOrPEllipse\94 relation. 
321
322 ```xml
323 <xsd:element name="TrimmedCurve" substitutionGroup="Curve">
324   <xsd:complexType>
325     <xsd:complexContent>
326       <xsd:extension base="Curve">
327         <xsd:sequence>
328           <xsd:choice>
329             <xsd:element ref="Circle"/>
330             <xsd:element ref="PCircle"/>
331             <xsd:element ref="Ellipse"/>
332             <xsd:element ref="PEllipse"/>
333           </xsd:choice>
334           <xsd:element ref="GenericAttributes" minOccurs="0"/>
335         </xsd:sequence>
336         <xsd:attribute name="StartAngle" type="xsd:double" use="required"/>
337         <xsd:attribute name="EndAngle" type="xsd:double" use="required"/>
338       </xsd:extension>
339     </xsd:complexContent>
340   </xsd:complexType>
341 </xsd:element>
342 ```
343
344 ```
345 PRO.hasTrimmedCurve <R PRO.XML.hasElement
346 PRO.hasTrimmedCurveList <R PRO.XML.hasElementList
347 PRO.TrimmedCurve <T PRO.XML.Element <T PRO.Curve
348 PRO.TrimmedCurve.hasStartAngle <R PRO.XML.hasAttribute: L0.FunctionalRelation
349    --> L0.Double
350 PRO.TrimmedCurve.hasEndAngle <R PRO.XML.hasAttribute: L0.FunctionalRelation
351    --> L0.Double
352 PRO.TrimmedCurve.hasCircleOrPCircleOrEllipseOrPEllipse <R PRO.hasCircle <R PRO.hasPCircle <R PRO.hasEllipse <R PRO.hasPEllipse
353    --> PRO.Circle
354    --> PRO.PCircle
355    --> PRO.Ellipse
356    --> PRO.PEllipse
357 PRO.TrimmedCurve.hasGenericAttributes <R PRO.hasGenericAttributes
358    --> PRO.GenericAttributes
359 ```
360
361 Note that Model Group definitions are not currently supported!
362
363 # Customization via configuration
364
365 ## Attribute composition
366
367 Attribute composition rule allows converting separate attributes into one array. For example, following rule:
368
369 ```xml
370 <AttributeComposition Name="XYZ" Type = "doubleArray">
371   <Attribute Name="X" Type ="double"/>
372   <Attribute Name="Y" Type ="double"/>
373   <Attribute Name="Z" Type ="double"/>
374 </AttributeComposition>
375 </source>
376 causes \93X\94,  \93Y\94 and \93Z\94  double attributes in \93Coordinate\94 Element definition
377 <source lang="xml">
378 <xsd:element name="Coordinate">
379   <xsd:complexType>
380     <xsd:attribute name="X" type="xsd:double" use="required"/>
381     <xsd:attribute name="Y" type="xsd:double" use="required"/>
382     <xsd:attribute name="Z" type="xsd:double"/>
383   </xsd:complexType>
384 </xsd:element>
385 ```
386
387 to be converted to \93XYZ\94 double array:
388
389 ```
390 PRO.Coordinate <T PRO.XML.Element
391 PRO.Coordinate.hasXYZ <R PRO.XML.hasAttribute: L0.FunctionalRelation
392    --> L0.DoubleArray
393 ```
394
395 ## ID references
396
397 Referencing other XML elements is usually done using element IDs.  Using ID Provider and ID Reference rules allows converting these references to statements in Simantics DB.
398 ID Provider rule is used for retrieving the ID from referred objects.  The rule does not affect the generated ontology.
399
400 ```xml
401 <IDProvider>
402   <ComplexType Name = "PlantItem"/>
403   <Attribute Name="ID" Type ="string"/>
404 </IDProvider>
405 ```
406
407 ID Reference rule is used for objects that use ID references. ID Source tells which attribute is used to refer another Element, and Reference defines the name of the relation. With the following rule:
408
409 ```xml
410 <IDReference>
411   <Element Name ="Connection"/>
412   <IDSource Name="ToID" Type ="string"/>
413   <Reference Name="ToIDRef" Type ="ref"/>
414 </IDReference>
415 ```
416
417 \93Connection\94 element definition\92\93ToID\94 reference is converted to ToIDRef relation. 
418
419 ```xml
420 <xsd:element name="Connection">
421   <xsd:complexType>
422     <xsd:attribute name="ToID" type="xsd:string"/>
423     \85
424   </xsd:complexType>
425 </xsd:element>
426 ```
427
428 The original attribute is kept, so that if ID reference cannot be located, the information about the reference still exists.
429
430 ```
431 PRO.Connection <T PRO.XML.Element
432 PRO.Connection.hasToID <R PRO.XML.hasAttribute: L0.FunctionalRelation
433    --> L0.String
434 PRO.Connection.ToIDRef <R PRO.XML.hasReference
435 ```
436
437 In imported data, the reference statement will point to referred Element, if the parser is able to locate a Element with the given ID.
438
439 | Predicate             | Object                           | Graph |
440 |-----------------------|----------------------------------|-------|
441 | Basic information     |                                  |       |
442 | InstanceOf            | Connection                       | DB    |
443 | Is Related To         |                                  |       |
444 | hasToID               | V_02_N6 (edit)                   | DB    |
445 | ToIDRef               | $412442 : (Nozzle)               | DB    |
446 | Other statements      |                                  |       |
447 | hasConnection/Inverse | $416145 : (PipingNetworkSegment) | DB    |
448
449 ## Ordered child
450
451 Ordered child rule allows storing the original order of the elements into lists. The rules either force the creating of the lists (used when the schema is interpreted to be indifferent of the order), or disabling the list generation.
452  
453 Currently the rule hat two types, original and child. An original type rule sets if all the child elements are out into \93OriginalElementList\94.  An child rule sets if the child elements are added to type specific lists.
454
455 ```xml
456 <OrderedChild Type="original" Value="disable">
457   <ComplexType Name = "PlantItem"/>
458 </OrderedChild>
459
460 <OrderedChild Type="child" Value="disable">
461   <ComplexType Name = "PlantItem"/>
462 </OrderedChild>
463 ```
464
465 ## Unrecognized child elements
466
467 Unrecognized child element rule allows processing XML files that do not conform to given schema, and use different element names. In practice, the rule allows injecting Java code to generated parser classes. The code is put into method, which signature is:
468
469 ```java
470 public void configureChild(WriteGraph graph, Deque<Element> parents, Element element, Element child) throws DatabaseException
471 ```
472
473 The method is called with \94element\94 as the element, which conforms to given type in the rule\92s configuration, and \94child\94 is the child element, that could not be recognized.  The following example is  used for handling incorrect files, which have replaced element name with the contents of attribute \93name\94.
474  
475 ```xml
476 <UnrecognizedChildElement>
477   <Element Name ="GenericAttributes"/>
478   <JavaMethod>
479   // Some commercial software do not handle GenericAttribute elements properly:
480   // they use "Name" attribute's value as element's name.
481   GenericAttribute ga = new GenericAttribute();
482   java.util.List&lt;Attribute&gt; attributes = new java.util.ArrayList&lt;Attribute&gt;();
483   attributes.addAll(child.getAttributes());
484   attributes.add(new Attribute("Name", "Name", "", child.getQName()));
485   Element newChild = new Element("", "", "GenericAttribute", attributes);
486   newChild.setParser(ga);
487   Resource res = ga.create(graph, newChild);
488   if (res != null) {
489     newChild.setData(res);
490     parents.push(element);
491     ga.configure(graph, parents, newChild);
492     connectChild(graph, element, newChild);
493     parents.pop();
494   }
495   </JavaMethod>
496 </UnrecognizedChildElement>
497 ```
498
499 An example of incorrect file:
500
501 ```xml
502 <GenericAttributes Number="28" Set="Values">
503   <Assembly Format="string" Value="5. Auxiliary Steam System" />
504   <Bendingradiusrtube Format="double" Value="0" Units="mm" ComosUnits="mm M01.15" />
505   <CostCode Format="double" Value="607" />
506 ```
507
508 When the content should be:
509
510 ```xml
511 <GenericAttributes Number="28" Set="Values">
512   <GenericAttribute Name="Assembly" Format="string" Value="5. Auxiliary Steam System" />
513   <GenericAttribute Name="Bendingradiusrtube" Format="double" Value="0" Units="mm\93 />
514   <GenericAttribute Name="CostCode" Format="double" Value="607" />
515 ```
516
517 # References
518
519 * W3C XML Schema definition language (XSD) 1.1 Part 1: Structures http://www.w3.org/TR/xmlschema11-1/
520 * W3C XML Schema definition language (XSD) 1.1 Part 2: Datatypes http://www.w3.org/TR/xmlschema11-2/
521 * [Layer0 specification](Layer0.pdf)