Select & Query Chain
The select() method is the default interpreter that takes the Select Strings and, under the hood, builds a query chain of Selectors. Just to be clear, the previous scraper example can also be built as follows:
So far, Select Strings works quite well for extracting any property of a DOM element. Now we want to go a little deeper by understanding the process.
Each Query is composed of Selectors concatenated in a chain form. Each Selector has a specific responsibility. Next, we will see the basic selectors, and then we will see how to build the same scraper of the last section using only Query Chains.
Selectors
While select() is the most used form, there are additional selectors types that can be used as well.
Extractors
css
The 'css' selector, as the name implies, uses a CSS selector to extract all matching elements in the DOM. Additionally, you can set an alternative CSS selector using the .alt() method.
xpath
The 'xpath' selector uses an Xpath expression to extract all matching elements in the DOM. Additionally, you can set an alternative Xpath expression using the .alt() method.
property
The selector 'property' extracts a specific property from a list of DOM elements. Additionally, you can set an alternative property using the .alt() method.
Validators
single
Ensures that only one element is matched and returned. By default, the 'single( )' selector is applied to every Query Chain.
Now let's suppose there is more than one h1 element in the DOM. In this case single() allows us to control to reduce any strange behavior or unwanted values by throwing an error informing us that there is more than one element that matches the specified selector. This gives the developer the opportunity to enhance the selector to match a single element.
all
Returns all the values. This selector prevents 'single( )' from being applied by default.
require
Throws an error if there are no values. This is caused because the selector does not match any of the DOM elements or the property doesn't exist. By default, the 'require( )' selector is applied to every Query Chain.
In the following example, given that the selector and the property are valid we obtain the expected result:
Here it is different, the selector does not exist, and although the property would be valid, we obtain an error generated by 'require( )'.
default
Returns the specified value if, for example, the selector didn't match any element. This selector prevents 'require( )' from being applied by default.
Here you get the expected result:
Suppose that for some reason there is no h1 element on the website, in this case, although the selector and the property are valid, in many cases, you can expect to get inconsistencies in the HTML layouts. The default() method gives the developer the flexibility to get a default value instead of the error generated by require().
Use of Query Chains
If we take our previous scraper:
We can use Query chains to obtain the same results:
If we remove the default methods then our scraper looks like this:
At this point you may be wondering when to use a Select String, a select() selector or a Query chain of Selectors. The answer is, if you need something more configurable then use a query chain of selectors. For example, the Select Strings are very easy to use but it doesn't allow us to set alternatives or get elements using Xpath expressions. Also, as we will see in our next topic, Select Strings can't use Advance Selectors because those advance concepts only exist in a query chain of selectors.
Last updated