OrcFlowFileWriter

java.lang.Object
- org.apache.hadoop.hive.ql.io.orc.OrcFlowFileWriter

All Implemented Interfaces:

org.apache.hadoop.hive.ql.io.orc.MemoryManager.Callback, org.apache.hadoop.hive.ql.io.orc.Writer
```
public class OrcFlowFileWriter
extends java.lang.Object
implements org.apache.hadoop.hive.ql.io.orc.Writer, org.apache.hadoop.hive.ql.io.orc.MemoryManager.Callback
```
An ORC file writer. The file is divided into stripes, which is the natural unit of work when reading. Each stripe is buffered in memory until the memory reaches the stripe size and then it is written out broken down by columns. Each column is written by a TreeWriter that is specific to that type of column. TreeWriters may have children TreeWriters that handle the sub-types. Each of the TreeWriters writes the column's data as a set of streams.
This class is synchronized so that multi-threaded access is ok. In particular, because the MemoryManager is shared between writers, this class assumes that checkMemory may be called from a separate thread.

Constructor Summary

Constructors
Constructor and Description
OrcFlowFileWriter(java.io.OutputStream flowFileOutputStream, org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector inspector, long stripeSize, org.apache.hadoop.hive.ql.io.orc.CompressionKind compress, int bufferSize, int rowIndexStride, org.apache.hadoop.hive.ql.io.orc.MemoryManager memoryManager, boolean addBlockPadding, org.apache.hadoop.hive.ql.io.orc.OrcFile.Version version, org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterCallback callback, org.apache.hadoop.hive.ql.io.orc.OrcFile.EncodingStrategy encodingStrategy, org.apache.hadoop.hive.ql.io.orc.OrcFile.CompressionStrategy compressionStrategy, float paddingTolerance, long blockSizeValue, java.lang.String bloomFilterColumnNames, double bloomFilterFpp)

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`addRow(java.lang.Object row)`
`void`	`addRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)`
`void`	`addUserMetadata(java.lang.String name, java.nio.ByteBuffer value)`
`void`	`appendStripe(byte[] stripe, int offset, int length, org.apache.hadoop.hive.ql.io.orc.StripeInformation stripeInfo, org.apache.hadoop.hive.ql.io.orc.OrcProto.StripeStatistics stripeStatistics)`
`void`	`appendUserMetadata(java.util.List<org.apache.hadoop.hive.ql.io.orc.OrcProto.UserMetadataItem> userMetadata)`
`boolean`	`checkMemory(double newScale)`
`void`	`close()`
`static org.apache.hadoop.hive.ql.io.orc.CompressionCodec`	`createCodec(org.apache.hadoop.hive.ql.io.orc.CompressionKind kind)`
`long`	`getNumberOfRows()` Row count gets updated when flushing the stripes.
`long`	`getRawDataSize()` Raw data size will be compute when writing the file footer.
`java.io.OutputStream`	`getStream()`
`long`	`writeIntermediateFooter()`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

OrcFlowFileWriter

public OrcFlowFileWriter(java.io.OutputStream flowFileOutputStream,
                         org.apache.hadoop.fs.Path path,
                         org.apache.hadoop.conf.Configuration conf,
                         org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector inspector,
                         long stripeSize,
                         org.apache.hadoop.hive.ql.io.orc.CompressionKind compress,
                         int bufferSize,
                         int rowIndexStride,
                         org.apache.hadoop.hive.ql.io.orc.MemoryManager memoryManager,
                         boolean addBlockPadding,
                         org.apache.hadoop.hive.ql.io.orc.OrcFile.Version version,
                         org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterCallback callback,
                         org.apache.hadoop.hive.ql.io.orc.OrcFile.EncodingStrategy encodingStrategy,
                         org.apache.hadoop.hive.ql.io.orc.OrcFile.CompressionStrategy compressionStrategy,
                         float paddingTolerance,
                         long blockSizeValue,
                         java.lang.String bloomFilterColumnNames,
                         double bloomFilterFpp)
                  throws java.io.IOException

Throws:: java.io.IOException

Method Detail

createCodec

public static org.apache.hadoop.hive.ql.io.orc.CompressionCodec createCodec(org.apache.hadoop.hive.ql.io.orc.CompressionKind kind)

checkMemory
```
public boolean checkMemory(double newScale)
                    throws java.io.IOException
```
Specified by:

checkMemory in interface org.apache.hadoop.hive.ql.io.orc.MemoryManager.Callback

Throws:

java.io.IOException

getStream

public java.io.OutputStream getStream()
                               throws java.io.IOException

Throws:: java.io.IOException

addUserMetadata

public void addUserMetadata(java.lang.String name,
                            java.nio.ByteBuffer value)

Specified by:: addUserMetadata in interface org.apache.hadoop.hive.ql.io.orc.Writer

addRow
```
public void addRow(java.lang.Object row)
            throws java.io.IOException
```
Specified by:

addRow in interface org.apache.hadoop.hive.ql.io.orc.Writer

Throws:

java.io.IOException

addRowBatch

public void addRowBatch(org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)
                 throws java.io.IOException

Throws:: java.io.IOException

close
```
public void close()
           throws java.io.IOException
```
Specified by:

close in interface org.apache.hadoop.hive.ql.io.orc.Writer

Throws:

java.io.IOException

getRawDataSize
```
public long getRawDataSize()
```
Raw data size will be compute when writing the file footer. Hence raw data size value will be available only after closing the writer.

Specified by:

getRawDataSize in interface org.apache.hadoop.hive.ql.io.orc.Writer

getNumberOfRows
```
public long getNumberOfRows()
```
Row count gets updated when flushing the stripes. To get accurate row count call this method after writer is closed.

Specified by:

getNumberOfRows in interface org.apache.hadoop.hive.ql.io.orc.Writer

writeIntermediateFooter
```
public long writeIntermediateFooter()
                             throws java.io.IOException
```
Specified by:

writeIntermediateFooter in interface org.apache.hadoop.hive.ql.io.orc.Writer

Throws:

java.io.IOException

appendStripe

public void appendStripe(byte[] stripe,
                         int offset,
                         int length,
                         org.apache.hadoop.hive.ql.io.orc.StripeInformation stripeInfo,
                         org.apache.hadoop.hive.ql.io.orc.OrcProto.StripeStatistics stripeStatistics)
                  throws java.io.IOException

Specified by:: appendStripe in interface org.apache.hadoop.hive.ql.io.orc.Writer
Throws:: java.io.IOException

appendUserMetadata

public void appendUserMetadata(java.util.List<org.apache.hadoop.hive.ql.io.orc.OrcProto.UserMetadataItem> userMetadata)

Specified by:: appendUserMetadata in interface org.apache.hadoop.hive.ql.io.orc.Writer

Class OrcFlowFileWriter

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

OrcFlowFileWriter

Method Detail

createCodec

checkMemory

getStream

addUserMetadata

addRow

addRowBatch

close

getRawDataSize

getNumberOfRows

writeIntermediateFooter

appendStripe

appendUserMetadata