Quick Tips: XML Schemas and Validation
May 18, 2015
A quick side note before the post: I’ve finally started working on for the summer, so I’m hoping that I will be able to write posts more frequently than I did during the semester.
XML Schemas
What the heck is a schema anyway?
You can think of a schema as almost like writing a class in Java. It basically gives us a layout that we want out XML files to follow. It’s basically giving us a blueprint that we want our XML files to conform to.
Why should I validate my XML files?
Well it’s plain and simple. If your program parses XML files, and it needs to file a certain format, you want to make sure that all the files you process are in the correct format. If you don’t validate your files it could break your program. Think of it as like dumping pasta into your toaster oven because you want to cook your pasta. You can’t just cook everything by putting it in the toaster. You have to make sure it’s in the form of a bread first.
All right all right, so I get the point of schemas and validation. How do I validate my files? How do I even write a schema?
Well first, we’re going to need to write a schema to validate some kind of xml file.
Writing a Schema
I’m going to make this very simple. If you want to learn more about writing schemas check out w3 schools’ tutorial on it. We’re going to write a schema for a person. The person element should contain only 3 simple elements; name, age, and height (in cm). See below for the schema (Person.xsd
).
A Quick Note on Namespaces
If you notice in the above schema I have prefixed each element with the xs
namespace. This is done to avoid confusion with similar elements over large xml documents that might have conflicting element names. The above document without using namespaces can similarly be written like the following:
Whichever way you write the schemas doesn’t matter. If you do have conflicting elements though, it will be beneficial to look more into XML namespaces too. That is not within the scope of this post though.
Writing the XML Files
I use eclipse as my IDE and I have set up my project and I have the following folder structure
/src/xmlValidator
/schemas
/Person.xsd
/xml
/Bob.xml
/Penelope.xml
Now that you can see my folder structure let’s write some more files. You should have already created Person.xsd
from the above snippet.
Let’s Make a Person. We’ll call the file Bob.xml
Bob.xml
will contain the following:
Now here’s Penelope.xml
:
Okay, so now take a look at each of the xml files. Which one do you think is valid? not valid? Pay close attention to the types we defined in our schema.
Main Method to Check our Files
Now let’s actually write the code to how we can validate our xml files to ensure they follow the schema.
The heart of validation is done within the validateXML()
method. It’s quite simple and it also tells us the errors we receive.
The problem is that once we receive an error, it quits. We don’t find all of the errors which may or may not be a problem depending on the application. If you notice in the Penelope.xml
file there are two errors, but the program should only return once it finds the first.
And that’s simply it. XML Schema validation in a nutshell.