latest (0.5.1-beta)
API

Classes

It manages the use of environment variables by providing useful methods for checking and obtaining the variables.
Process
Provides Puppeteer initialization by creating Browser and Page instances to provide the browser context necessary for the execution of a custom function.

Functions

Process.([browserOptions])Promise.<object>
Creates the Browser and Page instances.
Exposes the logger function to be used in the browser.

Environment

It manages the use of environment variables by providing useful methods for checking and obtaining the variables.
Kind: global class

environment.PRODUCTION

Value of the variable defined by the NodeJS processor to identify the Production execution environment.
Kind: instance property of Environment

environment.IMPRESSIONIST_VERSION

The current version of the library. It can be used, for example, for bug reporting plugins.
Kind: instance property of Environment

Environment.is(environment) ⇒ boolean

Check if a specific environment is running. i.e prod or dev.
Kind: static method of Environment Returns: boolean - Result true or false.
Param
Type
Description
environment
string
Identifier of the ENV environment variable. For example, prod, dev, etc.
Example
if(Environment.is(Environment.PRODUCTION)) {
...
}

Environment.has(variable) ⇒ boolean

Check if a specific environment variable exists.
Kind: static method of Environment Returns: boolean - Result true or false.
Param
Type
Description
variable
string
The variable to check.
Example
if(Environment.has('SENTRY_TAGS')) {
...
}

Environment.get(variable) ⇒ string | Array | object | null

Return a specific env var. If not exist return null.
Kind: static method of Environment
Param
Type
Description
variable
string
The variable to extract.
Example
for(const [name, value] of Object.entries(Environment.get('SENTRY_TAGS'))) {
sentry.setTag(name, value);
}

Process

Provides Puppeteer initialization by creating Browser and Page instances to provide the browser context necessary for the execution of a custom function.
Kind: global class Summary: Initialize Puppeteer.

Process.execute(url, customFunction) ⇒ Promise

It provides the necessary context to run a function in the puppeteer or browser context by executing a series of steps, such as initializing a Browser instance and applying extra configurations defined by an input parameter, initializing a Page instance and applying default configurations to it so that the Impressionist library can be used in the browser context of that specific instance.
Kind: static method of Process Returns: Promise - Promise object that represents the result of the custom function.
Param
Type
Description
url
string
Target URL for scraping process.
customFunction
function
Custom function to be executed in the Puppeteer context. Like customFunction(browser, page, ...args) { ... }.
Properties
Name
Type
Default
Description
[browserOptions]
object
{}
Please read the documentation about the Launch Options.
[args]
Array
[]
Any parameter necessary for the custom function.
Example (Basic Usage)
(async () => {
const data = await Impressionist.Process.execute(url, scrape);
console.log(JSON.stringify(data));
})(scrape);
async function scrape(browser, page) {
...
}
Example (Enabling Browser User Interface)
(async () => {
const data = await Impressionist.Process.execute(url, scrape, { browserOptions: { headless: false } } );
console.log(JSON.stringify(data));
})(scrape);
async function scrape(browser, page) {
...
}

Process.openConnection(url, [browserOptions]) ⇒ Promise.<object>

Open a connection to a browser instance.
Kind: static method of Process Returns: Promise.<object> - Promise object that represents an object that stores the browser, page instances.
Param
Type
Default
Description
url
string
Target URL for scraping process.
[browserOptions]
object
{}
Please read the documentation about the Launch Options.

Process.createPage(browser) ⇒ Promise.<object>

Creates a new instance of Page.
Kind: static method of Process Returns: Promise.<object> - Promise object that represents a Page instance.
Param
Type
Description
browser
object
Example (Create a second Page instance)
(async () => {
const data = await Impressionist.Process.execute(url, scrape);
console.log(JSON.stringify(data));
})(scrape);
async function scrape(browser, page) {
const resultMainPage = await page.evaluate(...);
// Need for a second page instance
const secondPage = await Impressionist.Process.createPage(browser);
...
}

Process.setPageConfigurations(page, url) ⇒ Promise.<void>

Method that takes as a parameter the Page instance that is started internally within the class. The method can modify the behavior of the Page instance. Please read the documentation about the Page instance.
Kind: static method of Process Returns: Promise.<void> - Promise object that represents the method execution completion.
Param
Type
Description
page
page
url
string
Target URL.
Properties
Name
Type
Default
Description
[defaultTimeout]
number
60000
Maximum time. Please read page.setDefaultTimeout documentation.
[viewport]
object
{ width: 1366, height: 768, deviceScaleFactor: 1 }
Viewport. Please read page.setViewport documentation.
[navigation]
object
{ waitUntil: 'networkidle2' }
Navigation parameters. Please read page.goto documentation.
Example (Create a second Page instance and apply default configurations)
(async () => {
const data = await Impressionist.Process.execute(url, scrape);
console.log(JSON.stringify(data));
})(scrape);
async function scrape(browser, page) {
const resultMainPage = await page.evaluate(...);
// Need for a second page instance
const secondPage = await Impressionist.Process.createPage(browser);
// Apply default configurations
await Impressionist.Process.setPageConfigurations(secondPage, 'https://...');
// Using the second Page instance
const resultSecondPage = await secondPage.evaluate(...);
...
}
Example (Create a second Page instance and set a different viewport)
(async () => {
const data = await Impressionist.Process.execute(url, scrape);
console.log(JSON.stringify(data));
})(scrape);
async function scrape(browser, page) {
const resultMainPage = await page.evaluate(...);
// Need for a second page instance
const secondPage = await Impressionist.Process.createPage(browser);
// Apply different configurations
await Impressionist.Process.setPageConfigurations(secondPage, 'https://...', {
viewport: {
width: 1920,
height: 1080,
deviceScaleFactor: 1
}
});
// Using the second Page instance
const resultSecondPage = await secondPage.evaluate(...);
...
}

Process.connect(browserWSEndpoint, customFunction, [...args]) ⇒ Promise.<any>

Execute a function in a specific browser endpoint.
Kind: static method of Process Returns: Promise.<any> - Promise object that represents the result of the custom function.
Param
Type
Default
Description
browserWSEndpoint
string
customFunction
function
Custom function to be executed in the Puppeteer context. Like customFunction(browser, page, ...args) { ... }.
[...args]
Array.<any>
[]
Any parameter necessary for the custom function.

Process.([browserOptions]) ⇒ Promise.<object>

Creates the Browser and Page instances.
Kind: global function Returns: Promise.<object> - Promise object that represents an object that stores the browser, page instances.
Param
Type
Default
Description
[browserOptions]
object
{}
Please read the documentation about the Launch Options.

Process.(page)

Exposes the logger function to be used in the browser.
Kind: global function
Param
Type
Description
page
object

Classes

Log on each of the subscribed monitoring tools.
Pino
Initialize Pino and expose its functionality for login.
Sentry
Provides an interface for Sentry integration with Puppeteerist.

Functions

Sentry.()
Perform the necessary steps to configure Sentry.

MonitorManager

Log on each of the subscribed monitoring tools.
Kind: global class

MonitorManager.log(report)

Register a report.
Kind: static method of MonitorManager
Param
Type
Description
report
object
Information to be logged in.

MonitorManager.subscribe(logger)

Subscribe to a monitoring or logging tool.
Kind: static method of MonitorManager
Param
Type
Description
logger
object
Monitoring or logging tool.

MonitorManager.unsubscribe(logger)

Unsubscribe to a monitoring or logging tool.
Kind: static method of MonitorManager
Param
Type
Description
logger
object
Monitoring or logging tool.

MonitorManager.clear()

Delete all logers. It can be used to discard the default loggers.
Kind: static method of MonitorManager

Pino

Initialize Pino and expose its functionality for login.
Kind: global class

Pino.log(report)

Log a report.
Kind: static method of Pino
Param
Type
Description
report
object
Information that will be used to compose the report.

Sentry

Provides an interface for Sentry integration with Puppeteerist.
Kind: global class

Sentry.log(report)

Log a report.
Kind: static method of Sentry
Param
Type
Description
report
object
Information that will be used to compose the report.

Sentry.sendException(error) ⇒ Promise.<void>

Send error generated to Sentry while Puppeteer execution.
Kind: static method of Sentry Returns: Promise.<void> - Promise object that represents end of the Sentry actions.
Param
Type
Description
error
Error
Object that represents the error generated during the execution of the scraper.

Sentry.()

Perform the necessary steps to configure Sentry.
Kind: global function Example
Sentry.setConfigurations();

Classes

Collector
Collect items from a Collection by being iterated through a context element.
Context
It provides a data structure to control the context, which is an object with two properties. The first is document, which refers to the document object present in the browser context. If the context is not running in Browser context then document is set to null. The other property is element, which stores an element that will be used by some instance during the execution of the collectable tree.
Custom error to be used by the QuerySelector class in case an element or series of elements does not exist in the DOM.
Custom error to be used by the Collectable class to identify if the error was created by the error or require methods of the same class.
Custom error to be used by the Collectable class to return any error caused in the execution of the Collectable tree and that does not have a Collectable default method specified.
Logger
Log useful information and register errors.
Provides a common method to share across the different strategies managers.
Provides a static method that checks the data type of an incoming value.
Loads all DOM elements that are handled by a LazyLoad.
Handles the pagination of an HTML section.

Functions

Logger.(elements)Array.<object>
Extract useful information from elements and give them a specific format inside a list.

Collector

Collect items from a Collection by being iterated through a context element.
Kind: global class

new Collector(contextAccessor, collection)

Param
Type
Description
contextAccessor
object
Objects that returns an iterable object as a result of its execution.
collection
Collection
A collection instance.

collector.iterationQuery

Queries to append in iterate method.
Kind: instance property of Collector

collector.call(context) ⇒ Promise.<Array>

Execute the collector.
Kind: instance method of Collector Returns: Promise.<Array> - An object that represents a promise for the collected items.
Param
Type
Description
context
Context
Object that represents the context or the element that is being passed on nested queries or instances executions.
Example
await page.evaluate(async () => {
const data = ( function () {
const css = SelectorDirectory.get('css');
return new Collector(
new Collection({
name: css('h1').property('innerText').single()
})
);
} )();
const context = new Context();
console.log(await data.call(context)); // [{ name: 'Plato Plugin' }]
});

collector.iterate(queries) ⇒ Collector

Creates a new Collector that have a custom accessor and a collection.
Kind: instance method of Collector Returns: Collector - A Collector instance.
Param
Type
Description
queries
object
A set of queries or callable objects.
Example
await page.evaluate(async () => {
const data = ( function () {
const css = SelectorDirectory.get('css');
return new ElementCollectorFactory('{#reviews > ul > li}*').iterate({
author: '#review-author',
title: '#review-title',
rating: '#review-rating',
body: '#review-body',
date: '#review-date'
});
} )();
const context = new Context();
console.log(await data.call(context)); // [{ author: 'John Doe', title: 'It is okay', rating: '4', body: 'Nice product. I would recommend the version X.', date: '01-12-2021' }, { author: 'Richard Roe', title: 'Amazing!', rating: '5', body: 'Really good product.', date: '10-12-2021' }]
});

Context

It provides a data structure to control the context, which is an object with two properties. The first is document, which refers to the document object present in the browser context. If the context is not running in Browser context then document is set to null. The other property is element, which stores an element that will be used by some instance during the execution of the collectable tree.
Kind: global class Summary: Create an object to give context to instance executions.

context.clone() ⇒ Context

Create a new instance of the same class and assign the values of the object from which it was created. The reason for this method is that the values of the element property can be updated or modified without affecting the original context object, since the new instance is a completely independent object from the original one. Something that does not happen with a simple copy of objects.
Kind: instance method of Context Summary: Clones a context instance. Returns: Context - Object that represents the context or the element that is being passed on nested instances executions. Example (Clone an existing Context instance.)
const context = new Context();
const newContext = context.clone();

context.update(element) ⇒ Context

Update the context object by adding a new value for the element property. First, clone the existing Context object using the clone () method and then update the value of element.
Kind: instance method of Context Summary: Update the Context object. Returns: Context - Object that represents the context or the element that is being passed on nested instances executions.
Param
Type
Description
element
string | Array | object
Stores an element that will be used by some instance during the execution of the collectable tree.
Example (Update an existing Context instance.)
const context = new Context();
const newContext = context.update('new element');

context.getElement() ⇒ string | Array | Object

Gets the value of the element property. If element is equal to null then it will return the value of the document property.
Kind: instance method of Context Returns: string | Array | Object - an element that will be used by some instance during the execution of the collectable tree. Example (Getting the node to be used for nested executions)
const context = new Context();
...
const element = context.getElement();

SelectorError

Custom error to be used by the QuerySelector class in case an element or series of elements does not exist in the DOM.
Kind: global class

new SelectorError(selector)

Param
Type
Description
selector
string

CustomError

Custom error to be used by the Collectable class to identify if the error was created by the error or require methods of the same class.
Kind: global class

new CustomError(message)

Param
Type
Description
message
string
Custom message.

CollectableError

Custom error to be used by the Collectable class to return any error caused in the execution of the Collectable tree and that does not have a Collectable default method specified.
Kind: global class

new CollectableError(message, history)

Param
Type
Description
message
string
Custom message
history
Array.<string>
An array of messages of previous executions in the collectable chain.

Logger

Log useful information and register errors.
Kind: global class

Logger.error(origin, message)

Error level.
Kind: static method of Logger
Param
Type
Description
origin
string
Name of the class, function of where the log comes from.
message
string
Message.

Logger.warn(origin, elements, message)

Warn level.
Kind: static method of Logger
Param
Type
Description
origin
string
Name of the class, function of where the log comes from.
elements
object
Any element to record its value, data type and instance.
message
string
Message

Logger.info(origin, elements, message)

Info level.
Kind: static method of Logger
Param
Type
Description
origin
string
Name of the class, function of where the log comes from.
elements
object
Any element to record its value, data type and instance.
message
string
Message

Logger.debug(origin, elements, message)

Debug level.
Kind: static method of Logger
Param
Type
Description
origin
string
Name of the class, function of where the log comes from.
elements
object
Any element to record its value, data type and instance.
message
string
Message

StrategyManager

Provides a common method to share across the different strategies managers.
Kind: global class

StrategyManager.lookUp(element, strategies) ⇒ Promise.<object>

Search or look up the best strategy.
Kind: static method of StrategyManager Returns: Promise.<object> - An object that represents a promise for a specific strategy.
Param
Type
Description
element
Any
A criterion to be evaluated.
strategies
Array.<object>
Available strategies.

TypeValidator

Provides a static method that checks the data type of an incoming value.
Kind: global class

TypeValidator.check(value, [type]) ⇒ Error | void

Check the entry value if its data type of instance is the same than the type argument.
Kind: static method of TypeValidator Returns: Error | void - - Throws an error if the type does not match the value's data type.
Param
Type
Default
Description
value
any
Any value for which want to check the data type.
[type]
string | object
"string"
Data Type: string, number, array, object, boolean, function, CollectionCollector, Collector, NodeCollector.
Example (Checking if values is String)
TypeValidator.check('name');
Example (Checking if value is Number)
TypeValidator.check(5, 'number');

TypeValidator.deepCheck(value, type)

Check recursively if he entry value if its data type of instance is the same than the type argument.
Kind: static method of TypeValidator
Param
Type
Description
value
any
Any value for which want to check the data type.
type
object | string
Data Type: string, number, array, object, boolean, function, CollectionCollector, Collector, NodeCollector. //TODO: Examples.

LazyLoadHandler

Loads all DOM elements that are handled by a LazyLoad.
Kind: global class

LazyLoadHandler.execute(buttonSelector)

Executes the loading of all the elements by providing a clickable element selector, for example, a Next button.
Kind: static method of LazyLoadHandler
Param
Type
Description
buttonSelector
string
CSS Selector.

Pagination

Handles the pagination of an HTML section.
Kind: global class

Pagination.execute(buttonSelector, [time])

Create a generator object with the new rendered document.
Kind: static method of Pagination
Param
Type
Default
Description
buttonSelector
string
CSS selector of the button that triggers the action to go to the next pagination.
[time]
number
300
Delay time for rendering the document object.

Logger.(elements) ⇒ Array.<object>

Extract useful information from elements and give them a specific format inside a list.
Kind: global function Returns: Array.<object> - - List of objects that represent an element.
Param
Type
Description
elements
object
Any element to record its value, data type and instance.

Classes

Execute a set of queries.
Iterates over each item in context created by a collection, returning it to be collected upon.
Default ContextAcessor, used when there is not contextProcessor in Collector.
Shortcut to create a Collector isntances for collecting a collection for each iterable item in the context.
Shortcut to create Collector that returns a NodeList of DOM elements.
Shortcut to create Collector that returns a list of combinations generated by Options intances.
Option
Creates an iterable object from a series of options and at the same time, the iterable object has a function that is executed in each iterative cycle (next method call) to select a specific option value.
Manage the OptionStrategies.

Functions

Execute normalization actions for each of the options. For example, adding default values in queries.
Add queries to the Collector instance to extract each of the options in the Collector run.

Collection

Execute a set of queries.
Kind: global class

new Collection(queries)

Param
Type
Description
queries
Object | function | string
Set of queries.

collection.call(context) ⇒ Promise.<object>

Execute a set of queries.
Kind: instance method of Collection Returns: Promise.<object> - An object that represents a promise for a object with the results of the queries.
Param
Type
Description
context
Context
Object that represents the context or the element that is being passed on nested queries or instances executions.
Example
await page.evaluate( async () => {
const data = new Collection({
name: css('h1').property('innerText').single()
});
const context = new Context();
console.log(await data.call(context)); // { name: 'Plato Plugin' }
});

collection.postProcessor(customProcessor) ⇒ Collection

Add a new postProcessor to the list.
Kind: instance method of Collection Returns: Collection - Returns the current Collection instance.
Param
Type
Description
customProcessor
function
A custom functionality that receives the query result and perform a process to return a transformed data.

IterableAccessor

Iterates over each item in context created by a collection, returning it to be collected upon.
Kind: global class

new IterableAccessor(collection)

Param
Type
Description
collection
Collection instance that provides the new context.

iterableAccessor.call(context) ⇒ Promise.<Generator>

Create a generator to pass the new context.
Kind: instance method of IterableAccessor Returns: Promise.<Generator> - An object that represents a promise for a generator of context objects.
Param
Type
Description
context
Context
Actual context object.
Example (Receives a Collection and returns an iterable of Context objects)
await page.evaluate( async () => {
const data = new IterableAccessor(
new Collection({
reviews: () => Array.from(document.querySelectorAll('#reviews > ul > li'))
}).postProcessor(CollectionElementProcessor)
);
const context = new Context();
for await(let newContext of data.call(context)) {
console.log(newContext); // Returns the li elements inside of a Context object.
}
});

ProxyAccessor

Default ContextAcessor, used when there is not contextProcessor in Collector.
Kind: global class

proxyAccessor.call(context) ⇒ Generator.<Context>

Returns a generator with the incoming context.
Kind: instance method of ProxyAccessor Returns: Generator.<Context> - Returns the incoming context as part of a generator.
Param
Type
Description
context
Context
Object that represents the context or the element that is being passed on nested queries or instances executions.
Example
await page.evaluate( async () => {
const context = new Context().update('Custom Context');
const data = new ProxyAccessor().call(context);
let contextContainer = [];
for await(let newContext of data) {
contextContainer.push(newContext);
}
console.log(contextContainer[0].getElement()); // 'Custom Context'
});

CollectionCollectorFactory

Shortcut to create a Collector isntances for collecting a collection for each iterable item in the context.
Kind: global class

new CollectionCollectorFactory(collector, queries)

Returns: Collector - A Collector instance.
Param
Type
Description
collector
Collector
Collector instance.
queries
object
Set of queries.

ElementCollectorFactory

Shortcut to create Collector that returns a NodeList of DOM elements.
Kind: global class

new ElementCollectorFactory(query)

Returns: Collector - A new instance of Collector.
Param
Type
Description